Friday, December 28, 2012

The Secret History of Lambda

Being a bit of an origins nut (I always want to know how something came to be or why it is a certain way), one of the things that always bothered me with regard to Lisp was that no one seemed to talking about the origin of lambda in the lambda calculus. I suppose if I wasn't lazy, I'd have gone to a library and spent some time looking it up. But since I was lazy, I used Wikipedia. Sadly, I never got what I wanted: no history of lambda. [1] Well, certainly some information about the history of the lambda calculus, but not the actual character or term in that context.

Why lambda? Why not gamma or delta? Or Siddham ṇḍha?

To my great relief, this question was finally answered when I was reading one of the best Lisp books I've ever read: Peter Norvig's Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp. I'll save my discussion of that book for later; right now I'm going to focus on the paragraph at location 821 of my Kindle edition of the book. [2]

The story goes something like this:
  • Between 1910 and 1913, Alfred Whitehead and Bertrand Russell published three volumes of their Principia Mathematica, a work whose purpose was to derive all of mathematics from basic principles in logic. In these tomes, they cover two types of functions: the familiar descriptive functions (defined using relations), and then propositional functions. [3]
  • Within the context of propositional functions, the authors make a typographical distinction between free variables and bound variables or functions that have an actual name: bound variables use circumflex notation, e.g. x̂(x+x). [4]
  • Around 1928, Church (and then later, with his grad students Stephen Kleene and J. B. Rosser) started attempting to improve upon Russell and Whitehead regarding a foundation for logic. [5]
  • Reportedly, Church stated that the use of  in the Principia was for class abstractions, and he needed to distinguish that from function abstractions, so he used x [6] or ^x [7] for the latter.
  • However, these proved to be awkward for different reasons, and an uppercase lambda was used: Λx. [8].
  • More awkwardness followed, as this was too easily confused with other symbols (perhaps uppercase delta? logical and?). Therefore, he substituted the lowercase λ. [9]
  • John McCarthy was a student of Alonzo Church and, as such, had inherited Church's notation for functions. When McCarthy invented Lisp in the late 1950s, he used the lambda notation for creating functions, though unlike Church, he spelled it out. [10] 
It seems that our beloved lambda [11], then, is an accident in typography more than anything else.

Somehow, this endears lambda to me even more ;-)

[1] As you can see from the rest of the footnotes, I've done some research since then and have found other references to this history of the lambda notation.

[2] Norvig, Peter (1991-10-15). Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp (Kindle Locations 821-829). Elsevier Science - A. Kindle Edition. The paragraph in question is quoted here:
The name lambda comes from the mathematician Alonzo Church’s notation for functions (Church 1941). Lisp usually prefers expressive names over terse Greek letters, but lambda is an exception. Abetter name would be make-function. Lambda derives from the notation in Russell and Whitehead’s Principia Mathematica, which used a caret over bound variables: x̂(x + x). Church wanted a one-dimensional string, so he moved the caret in front: ^x(x + x). The caret looked funny with nothing below it, so Church switched to the closest thing, an uppercase lambda, Λx(x + x). The Λ was easily confused with other symbols, so eventually the lowercase lambda was substituted: λx(x + x). John McCarthy was a student of Church’s at Princeton, so when McCarthy invented Lisp in 1958, he adopted the lambda notation. There were no Greek letters on the keypunches of that era, so McCarthy used (lambda (x) (+ x x)), and it has survived to this day.

[4] Norvig, 1991, Location 821.

[5] History of Lambda-calculus and Combinatory Logic, page 7.

[6] Ibid.

[7] Norvig, 1991, Location 821.

[8] Ibid.

[9] Looking at Church's works online, he uses lambda notation in his 1932 paper A Set of Postulates for the Foundation of Logic. His preceding papers upon which the seminal 1932 is based On the Law of Excluded Middle (1928) and Alternatives to Zermelo's Assumption (1927), make no reference to lambda notation. As such, A Set of Postulates for the Foundation of Logic seems to be his first paper that references lambda.

[10] Norvig indicates that this is simply due to the limitations of the keypunches in the 1950s that did not have keys for Greek letters.

[11] Alex Martelli is not a fan of lambda in the context of Python, and though a good friend of Peter Norvig, I've heard Alex refer to lambda as an abomination :-) So, perhaps not beloved for everyone. In fact, Peter Norvig himself wrote (see above) that a better name would have been make-function.

Tuesday, December 18, 2012

Seeking a Twisted Maintainer

Last week we posted on the Twisted Matrix blog about the maintainer position for the Twisted project being open. We are accepting applicants for a motivated and experienced release manager and core contributor. Our core maintainers are getting busier and busier with specialized Twisted work, and don't have the time that they used to be able to dedicate to maintaining Twisted.

The post on the Twisted Matrix blog gives a quick overview of the position; if you're interested, please check out the fellowship proposal for more details and email the address on that page (at the bottom).

Also, feel free to ping glyph, exarkun, or myself (oubiwann) on #twisted-dev on IRC to chat about it more.

Wednesday, December 12, 2012

Async in Python 3

Update: Guido has been working on PEP 3156; check on it regularly for the latest! (In the last two hours I've seen it updated with three big content changes.)

The buzz has died down a bit now, but the mellowing of the roaring flames has resulted in some nice embers in which an async for Python 3 is being forged. This is an exciting time for those of us who 1) love Python and 2) can't get us enough async.

I wanted to take the time to record some of the goodness here before I forgot or got too busy working on something else. So here goes:

The latest bout of Python async fever started in September of 2012 in this message when Christian M. Amsüss emailed the Python-ideas mail list about the state of async in Python and the hopes that a roadmap could be decided upon for Python 3. Note that this is the latest (re)incarnation of conversations that have been going on for some time and for which there is even a PEP (with related work on github).

After a few tens of messages were exchanged, Guido shared his thoughts, starting with:
This is an incredibly important discussion.
This seemed to really heat things up, eventually with core Twisted and Tornado folks chiming in. I learned a tremendous amount from the discussions that took place. There's probably a book deal in all that for a motivated archivist/interviewer...

After this went on for chunks of September and October, Guido stated that he'd like to break the discussion up into various sub-topics:
  • reactors
  • protocol implementations
  • Twisted (esp. Deferred)
  • Tornado
  • yield from vs. Futures

This was done in order to prevent the original thread from going over 100 messages and to better organize the discussion... but wow, things completely exploded after that (in good ways. mostly). It was async open season, and the ringing of shots in the air seemed continuous. If you scroll to about the half-way point of the October  archive page, you will see the first of these new threads ([Python-ideas] The async API of the future: Reactors). These messages essentially dominate the rest of the October archives. It's probably not unexpected that this continued into November. A related thread was started on Python-dev and it seemed to revive an old thread this month (on the same list).

All of this got mentioned on Reddit, too. It inspired at least two blog posts of which I am aware:  one post by Steve Dower, and another by Allen Short. Even better, though, Guido started an exploratory project called Tulip to test out some of these ideas in actual running code. As he mentions in the README, a tutorial by Greg Ewing was influential in the initial implementation of Tulip and initial design notes were made in the message [Python-ideas] Async API: some code to review.

Shortly after that, some of the Twisted devs local to Guido met with him at his former office in San Francisco. This went amazingly well and revolved mostly around the pros and cons of separating the protocol and transport functionality. Guido started experimenting with that in Tulip on December 6th. Yesterday, a followup meeting took place at the Rackspace office, this time with notes.

There's a long way to go still, but I find myself compulsively checking the commit log for Tulip now :-) It's exciting to imagine a future where Twisted and Tornado could easily interoperate with async support in Python 3 with a minimum of fuss. In fact, Glyph has already sketched out two classes which might be all that's needed for 2-way interoperation between Twisted and Python 3.

Here's to the future!

Tuesday, October 30, 2012

Async in Clojure: Playing with Agents, Part II

In the last post, we took a look at basic usage of Clojure's agent function. In this post, we'll dive a little bit deeper

We glossed over the options that you can define when creating an agent; one of them is the validator which one can use to check before the agent is updated with the passed value.

If we want to make sure that our read-agent always gets a string value, this is all we have to do:

Similarly, any function that takes a single value as a parameter can be used here. As you can see, we had to change our default value for the agent from nil to "" since there is now a string validator. If we hadn't, any time we tried to use that agent, we'd get java.lang.IllegalStateException.

When Things Go Wrong
Another option you can set when defining an agent is the error handler. This will be used in the event of an error, including if a value fails to be validated by your validator function. Here's an example:

With both of these options, you don't have to set them when the agent is defined; you can do it later with a function call, if you so desire (or if needs demand it):

Watch This!
So, we've got an error handler but no event handler? Yup. However, you can actually get callback-like behavior using watches. Check this out:

Now, any time our agent's state changes, the function passed to the watch will fire. As described in the docs, the parameters are: the agent, a key of your devising (must be unique per agent), and the handler that you want to have fired upon state change. The handler takes as parameters: the key you defined, the agent, the agent's old value, and the agent's new value.

All Together Now
With all our example code in place, we can now exercise the whole thing at once. Here's the whole thing:

To simply demonstrate the async nature and the callbacks in action, let's run the following:

Eventually, our callback will render output very much like the following:

Do note, however, that if we called a series of send-offs with different times (using the same agent and watch), we wouldn't see the ones with shorter times come back first. We'd see the callback output in the same order we called send-off. This is because the watch function is called synchronously on the agent's thread before any pending send-offs (or sends) are called. In future posts, I'll cover ways around this (constructing agents on the fly as well as exploring alternative solutions with external libraries).

Regardless, with these primitives, there are all sorts of things one can do. For instance...

To close, check out this neat little bit of code that sends 1,000,000 messages in a ring. This code creates a chain of agents, and then actions are relayed through it (taken from the agents doc page):

Kicking this puppy off, our million messages finish in about 1 second :-)

Monday, October 29, 2012

Async in Clojure: Playing with Agents, Part I

Clojure has a very interesting async primitive: the agent. There is some good documentation on agents, but for those that come from a background such as mine (Python at Twisted), I thought it might be nice to present one way of using agents to mimic the familiar async + callback reactive-style programming.

Do note, however, that Clojure agents run in one of two threadpools (one intended for CPU-intensive tasks, and the other for I/O-intensive tasks). As such, this is quite different than the event-loop approach that Twisted uses (or async frameworks that utilize libraries such as libevent or libev). Twisted has the deferToThread functionality, which is ... well, not exactly close, really. Regardless, let's get started.

In the following examples, we're going to pretend we have huge files we'll be reading off a local disk.

What to Call
Clojure's agent function is very, very simple: you pass it a value (it's initial state) and some options, if needed. That's it.

To update its state, you use either the send or send-off functions. If you've got CPU-bound tasks whose state you want to manage with agents, then you should use the send function. If your tasks will be I/O-bound, then you should use the send-off function for updating agent state. (The threadpool dedicated for use by send has a fixed size, based on the number of processors on your system. The threadpool for send-off is exapandable with thread caching and keep-alives.) Since our examples are focused on disk I/O, we'll be using send-off. (they have the same signature, though, so the following usage information applies to both).

When you send-off something to an agent, you pass if a few things:
  • an agent
  • the action or update function
  • any number of additional parameters you want the action function to consume
Here's what that looks like:

What to Write
So, we know what an agent looks like when bound and we know how we're going to send an update to the agent, but how might we construct the update itself? Perhaps like this:

As you can see, the first value that an action function takes is the "old" value of the agent -- the value that the agent has prior to the action that will take place. Once this function returns, the agent's value will be set to the return value of the action function. (What's more, if we needed to access the agent itself inside the action function for any reason, we could do so using the *agent* variable -- accessible within the scope of the action function).

Before we go on, let's take a look at this in action from the REPL:

The first thing we do is switch from the default namespace to one dedicated to our examples (this makes managing scope in the REPL much cleaner). Then we load a file that has the agent and action function defined. Then we tell it to run our fake "big read" function, asking it to run for about 10 seconds. As you can see, send-off returns the agent immediately. We then get the current value of the agent by dereferencing it. Finally our big read finishes, and we see it print how long it took. We then look at the agent directly, and then dereference it again -- both showing us what we'd expect: that the value of the agent has been updated to the return value of our big read function. Finally, we shutdown the agent threads and exit the REPL.

(The start-clojure script is wrapped with rlwrap so that I have access to a command line history, persistent over different sessions. The script boils down to this: rlwrap java -cp /usr/local/clojure-1.4.0/clojure-1.4.0.jar clojure.main.)

We've seen the agent in action now, but there's a bit more we can do. We'll take a look at that in the next post.

Friday, October 19, 2012

libevent for Lisp: A Signal Example

At MindPool, there are several async I/O options we've been exploring for Lisp:
As luck would have it, I started with cl-event, and it was a fun little adventure (given the fact that it hasn't been maintained). I corresponded with the very nice original author a bit, asking if there were any updates in other locations, but sadly there weren't.

I was ready to dive in and get things current, when one last Google search turned up cl-async. This little bugger was hard to find, as at that point it had not been listed on CLiki. (But it is now :-)). Andrew Lyon has done a tremendous amount of work on cl-async, with a very complete set of bindings for libevent. This is just what I had been looking for, so I jumped in immediately.

As one might imagine from the topic of this post, there's a lot to be explored, uncovered, and developed further around async programming in Lisp. I'll start off slowly with a small example, and add more over the course of time.

I also hope to cover IOlib and SBCL's SERVE-EVENT in some future posts. Time will tell... For now, let's get started with cl-async in SBCL :-)

In a previous post, I discussed getting an environment set up with SBCL, I the rest of this post assumes that has been read and done :-)

Getting cl-async and Setting Up an SBCL Environment for Hacking
Now let's download cl-async and install the Libevent bindings :-)

With the Lisp Libevent bindings installed, we're now ready to create a Lisp image to assist us when exploring cl-async. A Lisp image saves the current state of the REPL, with all the loaded libaries, etc., allowing for rapid start-ups and script executions. Just the thing, when you're iterating on something :-)

Example: Adding a Signal Handler
Let's dive into some signal handling now! Here is some code I put together as part of an effort to beef up the examples in cl-async:

Note that the as: is a nickname for the package namespace cl-async:.

As one might expect, there is a function to start the event loop. However, what is a little different is that one doesn't initialize the event loop directly, but with a callback. As such, one cannot set up handlers, etc., except within the scope of this callback.

We've got the setup-handler function for that, which adds a callback for a SIGINT event. Let's try it out :-)

Once your script has finished loading the core, you should see output like the above, with no return to the shell prompt.

When we send a SIGINT with ^C, we can watch our callback get fired:

Next up, we'll take a look at other types of handlers in cl-async.

Thursday, October 18, 2012

Getting Started with Steel Bank Common Lisp

As some of you know, I've been a closet Lisp fan for several years. When I first joined Canonical in 2008, I was hacking on Lisp in Python, so that I could do genetic programming in Python. In fact, my first and only lightening talk at a Canonical sprint was on genetic algorithms and programming :-)  (This was the same set of lightening talks that Vincenzo Di Somma gave a wonderful presentation on his photography; completely unrelated: this is one of my favorite pics of Vincenzo :-) ).

A few years later, I talked to Jim Baker about Python's AST, and how one might be able to do genetic programming by manipulating it directly, instead of running a Lisp in Python.

Throughout all this time, I've been touching in with various community projects, hacking on various Lispy Things, reading, etc., but generally doing so quite quietly. Over the past few months, however, I've really gotten into it, and Lisp has become a real force in my life, rapidly playing just as dominant a role as Python.

Similarly, MindPool has become active in several Lisp projects; as such, there are a great many things to share now. However, before I begin all that, I'd like to take an opportunity to get folks up and running with an example Lisp environment.

Future posts will explore various areas of Common Lisp, Scheme dialects, I/O loops, etc., but this one will provide a basis for all future posts that relate to Common Lisp and specifically the Steel Bank implementation.

Installing SBCL
If you don't have SBCL (Steel Bank Common Lisp; a pun on it's source parent, CMUCL), you need to install it:
  • For Ubuntu (12.04 LTS has 1.0.55): $ sudo apt-get install sbcl
  • Or you can go to the download page for everyone else.
apt-get for Lisp
Next, you'll need to install Quicklisp (as you might have surmised, it's like Debian apt-get for Common Lisp). The instructions on this page will get you up and running with Quicklisp.

I like having quicklisp available when I run SBCL, so I did the following after installing Quicklisp (and you might want to as well) from the sbcl prompt:
* (ql:add-to-init-file)

Readline Support
The default installation of SBCL doesn't have readline support for the REPL, so using your arrow keys won't give you the expected result (your command history). To remedy that, you can use a readline wrapper. First, install rlwrap:
  • Ubuntu: $ sudo apt-get install rlwrap
  • Mac OS X: $ brew install rlwrap
Then, create the chmoded 755 script ~/bin/start-sbcl with the following content (make sure that ~/bin is in your path):
rlwrap sbcl

At which point you can run the following and have access to a command history in SBCL:
$ start-sbcl

Why Steel Bank?
CMUCL gained an excellent reputation for being a highly performant, optimized implementation of Lisp. Based on CMUCL and continuing this tradition of excellent performance, SBCL's reputation preceded it. Over a range of different types of programs, SBCL not only compares favorably to other Lisp dialects, it seriously kicks ass all over.

SBCL comes in at 8th place in that benchmark ranking, beating out Go in 9th place. In all the languages that made it into the Top 10, I've only ever touched C, C++, Java, Scala, Lisp, and Go. In my list, SBCL made the Top 5 :-) Regardless, of all of them, Lisp has the syntax a find most pleasurable. Given my background in Python, this is not surprising ;-)

What's next? 
Funny that you should ask... given my background with Twisted, I'll give you one guess ;-)

Monday, October 15, 2012

Rendering ReST with Klein and Twisted Templates

In a previous life, I spent about 25 hours a day worrying about content management systems written in Python. As a result of the battle scars built up during those days, I have developed a pretty strong aversion for a heavy CMS when a simple approach will do. Especially if the users are technologically proficient.

At MindPool, we're building out our infrastructure right now using Twisted so that we can take advantage of the super amazing numbers of protocols that Twisted supports to provide some pretty unique combined services for our customers (among the many other types of services we are providing). For our website, we're using the Bottle/Flask-inspired Klein as our micro web framework, and this uses the most excellent Twisted templating. (We are, of course, also using Twitter Bootstrap.)

Here's the rub, though: we want to manage our content in the git repo for our site with ReStructured Text files, and there's no way to tell the template rendering machinery (the flattener code) to allow raw HTML into the mix. As such, my first attempt at ReST support was rendering HTML tags all over the user-facing content.

This ended up being a blessing in disguise, though, as I was fairly unhappy with the third-party dependencies that had popped up as a result of getting this to work. After a couple false starts, I was hot on the trail of a good solution: convert the docutils-generated HTML (from the ReST source files) to Twisted Stan tags, and push those into the renderers.

This ended up working like a champ. Here's what I did:
  1. Created a couple of utility functions for easily getting HTML from ReST and Stan from ReST.
  2. Wrote a custom IRenderable for ReST content (not strictly necessary, but organizationally useful, given what else will be added in the future).
  3. Updated the base class for "content" page templates to dispatch, depending upon content type.
Afterwards I was rewarded with some nicely rendered content on the staging MindPool site :-) (once the content text has been completed, we'll be pushing it live).

Kudos to David Reid for Klein and (as usual) to the Twisted community for one hell of a framework that is the engine of my internet.

Saturday, October 13, 2012

GIMP 2.8 and the Taming of Two Decades' Graphics Habits

At long last, I find myself in a 100% comfort zone with the GIMP. As a user, it's been a long road to get here ... I can't even imagine what the development teams have experienced over the past 16 years. Regardless, I am infinitely grateful for their efforts over that time, their hard work to bring the world a completely free, open source application for incredibly sophisticated image manipulation, drawing, and even painting.

I'll provide some more details on my experiences with GIMP 2.8, but first... a romp through the past!

I've been using graphics programs since my first exposure to the Macintosh Plus and SE machines in the late 1980s, where the head of the science department at Bangor High School often found himself without access to his machine (yeah, thanks to me...). As much as a science geek as I was, it was actually the graphics programs to which I was addicted. In particular, I was obsessed with using MacPaint to draw -- pixel by pixel -- a dithered bitmap of a photograph leaning next to the monitor.

About 6 or 7 years later, I was at it again doing graphics work on a non-profit's Mac (running System 7) with Photoshop 3.0 -- my first exposure to this software. It was absolutely mind-blowingly awesome what Photoshop enabled one to do with digital images. I was consumed. If I was awake, I was at the computer, creating something whacky from scratch or morphing something old into a new form. Though the work that I had done was focused on graphics for HTML and getting into the nascent field of "web design," some of my efforts ended up getting published as illustrations for a book.

I continued doing graphics as part of my work (in some form or another) from then on, usually borrowing a friend's machine in order to use Photoshop (which I could not afford). Despairing of not having my own copy of the software,  I would often try out other drawing, painting, and image manipulations applications. Without fail, none would ever measure up to the power and even more, the usability, of Photoshop.

It wasn't too much after this that I got involved with Linux and open source software. As for many from that time, my life has not been the same since. But those early years were painful. My first distro (Slackware), I had to manually hack the ethernet drivers in C just to get connected to the network using my particular hardware. Regardless, Linux was all I could think about; it was all I wanted to use.

Desperately hungering for a Photoshop-like experience on Linux, I searched HotBot on a regular basis, and finally discovered GIMP. Based on a quick glance at the now-historic splash screens, it looks like my first use of the GIMP started after the 1.0 release, though I didn't start using it more regularly until 1.2 (which had my favorite GIMP splash screen to date).

Despite my hopes and the best efforts of an amazing development team, the GIMP infuriated me. What took one step in Photoshop usually took about 5 steps in the GIMP. The user interface was anti-intuitive, and seemed to be built by folks with an intimate knowledge of the underlying API and reflected this far too well.

With each major release of the GIMP, I would find myself downloading and installing it, but only actually using it a handful of times. Ultimately, I'd run crying back to Photoshop, borrowing a friend's computer or persuading someone to give me an unused copy. The very nature of graphic arts is visual and intuitive... I felt that the GIMP was blocking that natural flow. As a result, I couldn't get anything done in it (and what little I did looked bad).

Year later, while working at Canonical, I had several opportunities to create graphics for various fun projects, gags, or practical jokes. Given the nature of the company's mission, I made sure to do that work using the GIMP. Though the pain of the old days had faded considerably, and GIMP had, by 2008, reached a much higher degree of usability, I still found myself enduring awkward workflow moments on a regular basis and this continued throughout the 4 years of version 2.6's release. However, during that time, an impressive selection of GIMP plugins were created, some of which I came to adore so completely that I would switch from Photoshop to GIMP, just so I could use them on a project. This was a new -- and amazingly welcome -- change for me.

Then, earlier this year, the last few pieces fell into place.

With the release of GIMP 2.8 in May, I have not looked for nor pined over any other sophisticated graphics programs. At long last, the GIMP completely satisfies. The biggest single point of pain for me in 2.6 (given that I was running on Mac OS X most of the time) was the lack of single-window  support. For every UI interaction, I had to click twice -- once to focus on the window, and once to actually perform the given action. It was crazy-making. 2.8 finally released the goodness of single-window mode for GIMP users around the globe. Futhermore, since I have been a layer-using maniac since Photoshop 3.0 and my graphics files tend to reach into the 200 and 300 MB range due to the numbers of layers (and resolution, of course), I have a desperate need to organize the chaos of all those layers. Prior to 2.8, my layer pallet on large projects was almost completely unnavigable. With 2.8, sanity and cleanliness abounds.

And there's more :-) Check out this short list:
  • single-window mode
  • multi-column dock windows
  • increased screen real-estate
  • layer groups
  • Cairo is used for all rendering
  • on-canvas text editing
For more details, you'll want to visit this page.

Oddly, this walk down memory lane and ultimate endorsement of the GIMP relates to MindPool.  I used the GIMP to do all the complicated graphics work for our initial branding, logos, etc. Looking at the results, I'm sure if seems fairly silly and even trivial... but there's a LOT that went into that simplicity :-)

For instance, the shot of the moon is actually taken from a photograph of the moon. It has been passed through multiple filters in order to get just the level of abstract color and shape I was looking for. Similarly for the water, but the settings were actually quite different to achieve the result I was seeking. I think the layer count peaked around 30 in one version of the file.

Another example is the logo that we'll be using for the company's main site. This also took advantage of some plugins (new and old), though mostly it relied upon paths, brush strokes, and mask layers (the latter for the water pool reflection watermarked into the image). The new features in 2.8 made all of that work very easy to do. No longer is the tool interfering with my workflow; rather, it is facilitating it. Working with graphics using open source tools is now an absolute pleasure for me.

For giggles, I'm including a screenshot of GIMP 2.8 in action, with my single-window mode view stretched across two 27" monitors (full version is here). Graphical editing has never been so clean!

Thanks, GIMP team :-) It's nice to finally be home.

Fun link: For the historically inclined, you might enjoy this quick read about the beginnings of Photoshop and the family that made it all happen :-)

Monday, May 14, 2012

CERN, OpenStack Keep Resonance Cascades at Bay

Tim Bell preparing to get his
OpenStack on
As previously mentioned, there's a growing momentum around ops-oriented participation in the OpenStack community. DreamHost is deeply invested in DevOps, seeing how that's where we're going to be living in a few months! As Simon Anderson, CEO of DreamHost, recently said:
"When we're running a complex fabric of apps on over 5,000 servers across three data centers, we need a lean and nimble approach to software development and operational implementation. Without a DevOps approach, we wouldn't be able to push code into production as fast or as efficiently as we do, and our customers would not be happy! Today's developers demand up-to-the-hour security and performance updates to Internet infrastructure, so we aim to deliver just that with DevOps."
Though expressed in the context of our work, the import of DevOps that Simon's comment generally highlights is going to be increasingly important for nearly anyone running cloud services. 

In particular, I've been following the work of the intrepid folks at CERN. As such, this post is not about DreamHost; rather, it's a mad tale of OpenStack, DevOps, and averting alien invasion.

After countless long-distance phone conversations, a flight to Switzerland, and spending several days buying pints for a security guard in the know (referred to from now on as "Barney"), I've uncovered some profound truths -- Mulder-style -- and have confirmed that the impact of OpenStack at CERN is huge. 

Superficial examinations turn up the usual: CERN's planning slides, nice quotes, discussions of features and savings in time and money. For instance, in a recent email conversation with Tim "Gordon Freeman" Bell at CERN, I learned that 
"The CERN Agile Infrastructure project aims to develop CERN's computing resources and processes to support the expanding needs of LHC physicists and the CERN organisation."
I think these guys have been hanging out with Simon! But once you slip behind the scenes, peek at some of the whiteboards in unattended rooms, or rifle through notes lying about, you see that things are not what they appear. I've included a shot of Mr. OpenStack-at-CERN himself; this was my first clue.

Publicly, he's been working with other teams at CERN to:
  • modernise the data centre configuration tools and automating operations procedures
  • exploit wide scale use of virtualisation, improving flexibility and efficiency
  • enhance monitoring such that the usage of the infrastructure can be fully understood and tuned to maximise the resources available
But privately, it seems that he and his team have been doing much, much more. This was alluded to in a statement made by team member Jan van Eldik: "We expect the number of requests to insert non-standard specimens into the scanning beam of the Anti-Mass Spectrometer to significantly decrease, once automation is in place and everyone is using the standard infrastructure we are setting up."

That isn't to say there haven't been incidents...

Innocuously enough, the current toolchains are based around:
  • OpenStack as a single Infrastructure-as-a-Service providing physics experiment services, developer boxes, applications servers as well as the large batch farm
  • Puppet for configuration management
  • Scientific Linux CERN as the dominant operating system with sizeable chunk of Windows installs
But that second bullet caught my eye, and one of Barney's pub mates confirmed a rumor that we'd heard: the Puppet instances are actually trained headcrabs. The primary training tool? You guessed it, a crowbar. Barney said that the folks from Dell took inspiration from this and developed it further for their OpenStack deployment framework after an extended visit to CERN.

Although Barney hadn't seen any evidence of resonance cascades, there have been minor cross-dimensional disturbances as a result of some "cowboy" activity and folks not following DevOps best practices. This has been kept quiet for obvious reasons, but has led to a small pest problem in some of CERN's older tunnel complexes. As rouge elements are discovered, CERN has been educating transgressors aggressively. (Sometimes they go as far as sending employees to Xen training... or was it Xen training?)

One artist's conception of what success will
look like for OpenStack at CERN
Despite the minor hiccoughs along the way, CERN is aiming for success. (Given the lack of Combine and forced relocation programs, they're already doing better than Black Mesa's Anomalous Materials team.) Plans are in place for an initial pre-production service, OpenStack deployment this year. Following that, they will be moving towards 300,000 virtual machines on 15,000 hosts spread across two data centres by 2015.

The OpenStack community is supporting them in their efforts with fantastic new features, high-quality discussions on the mail lists, and real-time interaction on the IRC channels. In an act of reciprocity and community spirit, operators at CERN have volunteered to contribute back to the OpenStack community with regard to operations best practices, reference architecture documentation, and support on the operators' mail list.

To see how other institutions were taking this news, I spent several days waiting on hold. In particular, Aperture Science could not be reached for comment. However, Ops team member Belmiro Rodrigues Moreira did say that there's an audio file being circulated at CERN of Cave Johnson threatening to "burn down OpenStack" ... with lemons. Given Aperture Science's failure record with time machine development, it's generally assumed to be a prank audio reconstruction. CloudStack developers are considered to be the prime suspects, seeing how much time they have on their hands while waiting for ant to finish compiling the latest Java contributions.

When asked what advice he could give to shops deploying OpenStack, Tim said simply: "Remember, the cake is a lie. Don't get distracted and don't stop. Just keep hacking."

Alyx, explaining to her dad why she loves DreamHost
Couldn't have said it better myself.

In closing, and interestingly enough, one of DreamHost's employees has an uncle who works at the Black Mesa Research Facility. Though his teleportation research team was too busy for an extended interview, his daughter did mention that she is a DreamHost customer and can't wait to use OpenStack while interning at CERN next summer. After all, that's what she uses to auto-scale her WordPress blog (she's in our private beta program).

It's a small world.

And, thanks to Tim and the rest at CERN, a safer one, too.

Saturday, May 12, 2012

Twisted SSH: Rendering a Log-in Banner/MOTD in Conch

A few weeks ago, I pinged my peeps on #twisted asking why the banner for a custom SSH server wasn't rendering properly. After some digging around and some inconsistent results (well, consistently bad results for me), we weren't able to resolve anything, and I had to set the problem aside.

The Symptom
The first thing I had tried was subclassing Manhole from twisted.conch.manhole, overriding (and up-calling) connectionMade, writing the banner to the terminal upon successful connection. This didn't work, so I then tried overriding initializeScreen by subclassing twisted.conch.recvline.RecvLine. Also a no-go. And by "didn't work" here's what I mean:

In both Linux (Ubuntu 12.04 LTS, gnome-terminal) and Mac (OS X 10.6.8,, after a successful login to the Twisted SSH server, the following sequence would occur:
  1. an interactive Python prompt was rendered, e.g., ":>>"
  2. the banner was getting written to the terminal, and
  3. the terminal screen refreshed with the prompt at the top
This all happened so quickly, that I usually never even saw #1 and #2. Just the second ":>>" prompt from #3. Only by scrolling up the terminal buffer would I see that the banner had actually been rendered. Even though I was doing my terminal.write after connectionMade and initializeScreen, it didn't seem to matter.

Some time last week, I put together example Twisted plugins showing what the problem was, and the circumstances under which a banner simply didn't get rendered. The idea was that I would provide some bare-bones test cases that demonstrated where the problem was occurring, post them to IRC or the Twisted mail list, and we could finally get it resolved. 'Cause, ya know, I really want my banners ...

While tweaking the second Twisted plugin example, I finally poked my head into the right method and discovered the issue. Here's what's happening:

  • twisted.conch.recvline.RecvLine.connectionMade calls t.c.recvline.RecvLine.initializeScreen
  • t.c.recvline.RecvLine.initializeScreen does a terminal.reset, writes the prompt, and then switches to insert mode. But this is a red herring. Since something after initializeScreen is causing the problem, we really need to be asking "who's calling connectionMade?"
  • t.c.manhole_ssh.TerminalSession.openShell is what kicks it off when it calls the transportFactory (which is really TerminalSessionTransport)
  • openShell takes one parameter, proto -- this is very important :-)
  • openShell instantiates TerminalSessionTransport
  • TerminalSessionTransport does one more thing after calling the makeConnection method on an insults.ServerProtocol instance (the one I had tried overriding without success), and as such, this is the prime suspect for what was preventing the banner from being properly displayed: it calls  chainedProtocol.terminalProtocol.terminalSize
  • chainedProtocol is an insults.ServerProtocol instance, and its terminalProtocol attribute is set when ServerProtocol.connectionMade is called.
  • A quick check reveals that terminalProtocol is none other than the proto parameter passed to openShell.

But what is proto? Some debugging (and the fact that of the three terminalSize methods in all of twisted, only one is an actual implementation) reveals that proto is a RecvLine instance. Reading that method uncovers the culprit in our whodunnit:  the first thing the method does is call terminal.eraseDisplay.

Bingo! (And this is what I was referring to above when I said "poked my head" ...)

Since this was called after all of my attempts to display a banner using both connectionMade and initializeScreen, there's no way my efforts would have succeeded.

Here's What You Do
How do you get around this? Easy! Subclass :-)

The class  TerminalSessionTransport in t.c.manhole_ssh is the bad boy that calls terminalSize (which calls eraseDisplay). It's the last thing that TerminalSessionTransport does in its __init__, so if we subclass it, and render our banner at the end of our __init__, we should be golden. And we are :-)

You can see an example of this here.

Not sure if this sort of thing is better off in projects that make use of Twisted, or if it would be worth while to add this feature to Twisted itself. Time (and blog comments) will tell.

As is evident from the screenshot above (and the link), this feature is part of the DreamSSH project. There are a handful of other nifty features/shortcuts that I have implemented in DreamSSH (plus some cool ones that are coming) and I'm using them in projects that need a custom SSH server. I released the first version of DreamSSH last night, and there's a pretty clear README on the github project page.

One of the niftier things I did last night in preparation for the release was to dig into Twisted plugins and override some behaviour there. In order to make sure that the conveniences I had provided for devs with the Makefile were available for anyone who had DreamSSH installed, I added subcommands... but if the service was already running, these would fail. How to work around that (and other Twisted plugin tidbits) are probably best saved for another post, though :-)

Wednesday, April 25, 2012

New Life: The OpenStack DevOps Community

Last week's OpenStack Design Summit and Conference were pretty fantastic events. Lots of great technical discussions, some good initial planning on large tasks, incredible numbers of conversations -- all of high quality! Looking back, though, I'd have to say that the high point of the event for me was what turned into the DevOps community brainstorm session (etherpad link) at the Design Summit (all etherpads are here).

The session was approved with the title of "OpenStack and Operations: Getting Real" with a major goal of deciding whether we needed to create a new top-level project for DevOps. So it was really "New DevOps Team?" However, all that changed once we got started, and there was some amazing feedback and enthusiasm in that room. By the end of it, we were all pretty pumped up and ready to jump in with all our hands and feet!

The highest-level summary for me was this: The DevOps folks in the OpenStack community really need a point to rally around. Someplace they can not only talk to each other, share stories, get advice, etc., but also where they can have their voices heard by the predominantly developer-oriented OpenStack community. Jay Pipes suggested that a new section be added to the weekly IRC team leader meetings, and as Nova Ops subteam lead (teams list), I volunteered to collect top issues from the DevOps (sub-)community and give these some air-time in the meetings.

Some other highlights from the session:

  • needs volunteer admins -- a great opportunity for improving the dev <-> ops interface
  • CERN is deeply invested in DevOps, and wants to share data and possibly help with defining reference architectures
  • The same goes for the University of Melbourne!
  • There was a good corporate presences in the session too, rallying around a new and improved DevOps community: HP, Yahoo, AT&T, Canonical, Rackspace, DreamHost, and more (sorry if I forgot you!) 

There is a tremendous amount of information that we collected, but in written form (etherpads) as well as conversations during last week's event. As this is collected, we'll be reaching out to folks via the operators mail list, blog posts, tweets, and Google+ messages. Be sure to do the same! If you've got concerns, bring them up on the mail list, we'll get some bugs filed and blueprints put together, and come up with plans for addressing these.

Thanks again, everyone!

Wednesday, April 04, 2012

Recent Stackiness


Tomorrow is OpenStack Atlanta's second event (the first being a HackIn).  Ken Pepple is going to be talking about deploying OpenStack, something he should feel very comfortable doing, given his book as well as Internap's latest announcement :-)

There's more info about tomorrow's event at the Meetup page.

OpenStack Design Summit

In a couple weeks, a bunch of us from DreamHost are going to be heading to the OpenStack Summit and Conference in San Francisco. There's a lot of buzz about it both inside the company, in the offices of our fellow OpenStack collaborators, and in the wider open source community. With Citrix's recent announcement, Internap's deployment, Eucalyptus' approval by Amazon, there's plenty of Cloud Drama to go around. Fortunately, the focus of the Summit and Conference is on the important positives: how to improve an extraordinary piece of software and disseminating expertise. Can't wait!

GitHub Love

Last but not least, the Dev team at DreamHost has been using Github in conjunction with Launchpad in a manner similar to how the OpenStack project does it. The increased interest in open source software in our offices is starting to make its way out to our customers, and we've got a new web presence that is the first step in supporting this new direction. We're cooking up a bunch more stuff, so be sure to check in on our repos from time to time :-)

Saturday, March 24, 2012

A Conversation with Guido about Callbacks

In a previous post, I promised to share some of my PyCon conversations from this year -- this is the first in that series :-)

As I'm sure many folks noticed, during Guido van Rossum's keynote address at PyCon 2012, he mentioned that he likes the way that gevent presents asynchronous usage to developers taking advantage of that framework.

What's more, though, is that he said he's not a fan of anything that requires him to write a callback (at which point, I shed a tear). He continued with: "Whenever I see I callback, I know that I'm going to get it wrong. So I like other approaches."

As a great lover of the callback approach, I didn't quite know how to take this, even after pondering it for a while. But it really intrigued me that he didn't have the confidence in being able to get it right. This is Guido we're talking about, so there was definitely more to this than met the eye.

As such, when I saw Guido in the hall at the sprints, I took that opportunity to ask him about this. He was quite generous with his time and experiences, and was very patient as I scribbled some notes. His perspective is a valuable one, and gave me lots of food for thought throughout the sprints and well into this week. I've spent that intervening time reflecting on callbacks, why I like them, how I use them, as well as the in-line style of eventlet and gevent [1].

The Conversation

I only asked a few initial questions, and Guido was off to the races. I wanted to listen more than write, so what I'm sharing is a condensed (and hopefully correct!) version of what he said.

The essence is this: Guido developed an aesthetic for reading a series of if statements that represented async operations, as this helped him see -- at a glance -- what the overall logical flow was for that block of code. When he used the callback style, logic was distributed across a series of callback functions -- not something that one can see at a glance.

However, more than the ability to perceive the intent of what was written with a glance is something even more pragmatic: the ability to avoid bugs, and when they arise, debug them clearly. A common place for bugs is in the edge cases, and for Guido those are harder to detect in callbacks than a series of if statements. His logic is pretty sound, and probably generally true for most programmers out there.

He then proceded to give more details, using a memcache-like database as an example. With such a database, there are some basic operations possible:

  • check the cache for a value
  • get the value if present
  • add a value if not present
  • delete a value
At first approach, this is pretty straight-forward for both approaches, with in-line yielding code being more concise. However, what about the following conditions? What will the code look like in these circumstances?
  • an attempt to connect to the database failed, and we have to implement reconnecting logic
  • an attempt to get a lock, but a key is already locked
  • in the case of a failed lock, do re-trys/backoff, eventually raise an exception
  • storing to multiple database servers, but one or more might not contain updated data
  • this leaves the system in an inconsistent state and requires a all sorts of checking, etc.
I couldn't remember all of Guido's excellent points, so I made some up in that last set of bullets, but the intent should be clear: each of those cases requires code branching (if statements or callbacks). In the case of callbacks, you end up with quite a jungle [2]... a veritable net of interlacing callbacks, and the logic can be hard to follow.

One final point that Guido made was that batching/pooling is much simpler with the in-line style, a point I conceded readily.

A Tangent: Thinking Styles

As mentioned already, this caused me to evaluate closely my use of and preference for callbacks. Should I use them? Do I really like them that much? Okay, it looks like I really do -- but why?

Meditating on that question revealed some interesting insights, yet it might be difficult to convey -- please leave comments if I fail to describe this effectively!

There are many ways to describe how one thinks, stores information in memory, retrieves data and thoughts from memory, and applies these to the solutions of problems. I'm a visual thinker with a keen  spacial sense, so my metaphors tend follow those lines, and when reflecting on this in the context of using and creating callbacks, I saw why I liked them:

The code that I read is just a placeholder for me. It happens to be the same thing that the Python interpreter reads, but that's a happy accident [3]; it references the real code... the constructs that live in my brain. The chains of callbacks that conditionally execute portions of the total-possible-callbacks net are like the interconnected deer paths through a forest, like the reticulating sherpa trails tracing a high mountain side, like the twisty mazes of an underground adventure (though not all alike...). 

As I read the code, my eyes scan the green curves and lines on a black background and these trigger a highly associative memory, which then assembles a landscape before me, and it's there where I walk through the possibilities, explore new pathways, plan new architectures, and attempt to debug unexpected culs-de-sac. 

Even stranger is this: when I attempt to write "clean" in-line async code, I get stuck. My mental processes don't fire correctly. My creative juices don't flow. The "inner eye" that looks into problem spaces can't focus, or can't get binocular vision. 

The first thing I do in such a situation? Figure out how I can I turn silly in-line control structures into callback functions :-)  (see footnote [1]),

Now What?

Is Guido's astute assessment the death of callbacks? Well, of course not. Does it indicate the future of the predominant style for writing async Python code? Most likely, yes.

However, there are lots of frameworks that use callbacks and there are lots of people that still prefer that approach (including myself!). What's more, I'd bet that the callbacks vs. in-line async style comes down to a matter of 1) what one is used to, and possibly, 2) the manner in which one thinks about code and uses that code to solve problems in a concurrent, event-driven world.

But what, as Guido asked, am I going to do with this information?

Share it! And then chat with fellow members of the Twisted community. How can we better educate newcomers to Twisted? What best practices can we establish for creating APIs that use callbacks? What patterns result in the most readable code? What patterns are easiest to debug? What is the best way to debug code comprised of layers of callbacks?

What's more, we're pushing the frontiers of Twisted code right now, exploring reactors implemented on software transaction memory, digging through both early and recent research on concurrency and actor models, exploring coroutines, etc. (but don't use inlineCallbacks! Sorry, radix...). In other words, there's so much more to Twisted than what's been created; there's much more that lies ahead of us.

Regardless, Guido's perspective has highlighted the following needs within the Twisted community around the callback approach to writing asynchronous code: 
  • education
  • establishing clear best practices
  • recording and publicizing definitive design patterns
  • continued research
These provide exciting opportunities for big-picture thinkers for both those new to Twisted, as well as the more jaded old-timers. Twisted has always pushed the edge of the envelope (in more ways than one...), and I see no signs of that stopping anytime soon :-)


[1] In a rather comical twist of fate, I actually have a drafted blog post on how to write gevent code using its support for callbacks :-) The intent of that post will be to give folks who have been soaked in the callback style of Twisted a way of accepting gevent into their lives, in the event that they have such a need (we've started experimenting with gevent at DreamHost, so that need has arisen for me).

[2] There's actually a pretty well-done example of this in txzookeeper by Kapil Thangavelu. Kapil defined a series of callbacks within the scope of a method, organizing his code locally and cleanly. As much as I like this code, it is probably a better argument for Guido's point ;-)

[3] Oh, happy accident, let me count the hours, days, and weeks thy radiant presence has saved me ...

Saturday, March 17, 2012

Python for iOS

I do a lot of traveling, and I don't always like to lug my laptop around with me. Even when I do, I'd rather leave it in the bag unless I absolutely need to get it out (or if I'm setting up my mobile workspace). As such, I tend to use my iPhone for just about everything: reading, emails, calendar, etc.

So, imagine my delight, when I found out (just after PyCon this year) that I can now run Python 2.7.2 on my iPhone (and, when I get it, my iPad 3 ;-) ). This is just too cool for words... and given what pictures are worth, I'll use those instead :-)

I've put together a small Flickr set that highlights some of the functionality offered in this app, and each image in the set describes a nifty feature. For the image-challenged, here's a quick list:

  • an interactive Python prompt for entering code directly using the iPhone keyboard
  • a secondary, linear "keyboard" that one can use in conjunction with the main keyboard, extending one's ability to type faster
  • multiple options for working with/preserving one's code (email, saving to a file, viewing command history)
I can't even begin to count the number of times such an awesome Python scratchpad would have come in handy. And now we have it :-) At $2.99, this is a total steal.

Thanks Jonathan Hosmer!

(And thanks to David Mertz for pointing it out to folks on a Python mail list.)

Friday, March 16, 2012

PyCon 2012: To Be Continued

PyCon was just fabulous this year.

It's been a couple years since I was able to go, and I was quite surprised by how much I had been missing it. The Python community is not only one of the most technically astute and interesting ones to which I belong, but also the kindest. That last point is so incredibly important, and it ends up fostering a very strong familiar sense amongst its members.

There were so many good conversations with such great people: Anna, Alex, Guido, David (Mertz), Donovan, JP, Maciej, Allen, Glyph, Paul, Sean... the list goes on and on! Fortunately, I took notes and (and even have some book recommendations to share!) so there are many blog posts to come :-)

But this has brought something into focus quite strongly for me: the interaction at PyCon is one of the most fertile grounds for me all year -- and going without it since Chicago has been a genuine drought! There were some folks at DreamHost that couldn't make it, and we've already started looking around at various local, mini Python conferences that we can attend. This was initially so that those who couldn't make PyCon could receive similar benefits. But now there's something equally important that's contributing to the importance of this search: attending local conferences will mean not as much time has to pass between those fertile interactions and that recharging that we give each other at such events.

Until next time, I hope all Pythonistas everywhere are getting ready for a great weekend :-) Those who have been traveling, I hope you get lots of rest and share with everyone the treasures gathered at this year's PyCon :-)

Monday, March 12, 2012

OpenStack at PyCon 2012 Sprints!

This is just a short post to give a shout out to some folks who are sprinting for OpenStack this year at PyCon. It's a small group, since the Folsom Design Summit and Conference is coming up in a few weeks.

One big surprise came last night when I got an email about Cisco's recent work with Layer 3 (blueprint) support in Quantum, and there were two Cisco folks here this morning to chat about that. Mark McClain (DreamHost) is digging deep into their work right now.

Yahoo! is remote-sprinting today, and they hope to be in the house tomorrow, to continue working on current improvements in DevstackPy. Mike Pittaro (La Honda Research), Jonathan LaCour and Doug Hellmann (DreamHost) are working with Yahoo! on that.

Mike Perez (DreamHost) is hacking on some additional improvements in Horizon for different storage backend representations. We've also chatted a bit about the latest efforts in Horizon for Quantum support (Michael Fork's work). Perez is also helping out tracking some bugs down in DevstackPy.

Special thanks to Mike Pittaro for improving the sprinting pages on the OpenStack wiki with links to previous work and discussions!

If you're keen on OpenStack and would like to dive in with some fellow hackers into the deep ends of Nova, Quantum, or Horizon, be sure to come by or pop in at #openstack-pycon on Freenode :-)