Saturday, March 24, 2012

A Conversation with Guido about Callbacks

In a previous post, I promised to share some of my PyCon conversations from this year -- this is the first in that series :-)

As I'm sure many folks noticed, during Guido van Rossum's keynote address at PyCon 2012, he mentioned that he likes the way that gevent presents asynchronous usage to developers taking advantage of that framework.

What's more, though, is that he said he's not a fan of anything that requires him to write a callback (at which point, I shed a tear). He continued with: "Whenever I see I callback, I know that I'm going to get it wrong. So I like other approaches."

As a great lover of the callback approach, I didn't quite know how to take this, even after pondering it for a while. But it really intrigued me that he didn't have the confidence in being able to get it right. This is Guido we're talking about, so there was definitely more to this than met the eye.

As such, when I saw Guido in the hall at the sprints, I took that opportunity to ask him about this. He was quite generous with his time and experiences, and was very patient as I scribbled some notes. His perspective is a valuable one, and gave me lots of food for thought throughout the sprints and well into this week. I've spent that intervening time reflecting on callbacks, why I like them, how I use them, as well as the in-line style of eventlet and gevent [1].

The Conversation

I only asked a few initial questions, and Guido was off to the races. I wanted to listen more than write, so what I'm sharing is a condensed (and hopefully correct!) version of what he said.

The essence is this: Guido developed an aesthetic for reading a series of if statements that represented async operations, as this helped him see -- at a glance -- what the overall logical flow was for that block of code. When he used the callback style, logic was distributed across a series of callback functions -- not something that one can see at a glance.

However, more than the ability to perceive the intent of what was written with a glance is something even more pragmatic: the ability to avoid bugs, and when they arise, debug them clearly. A common place for bugs is in the edge cases, and for Guido those are harder to detect in callbacks than a series of if statements. His logic is pretty sound, and probably generally true for most programmers out there.

He then proceded to give more details, using a memcache-like database as an example. With such a database, there are some basic operations possible:

  • check the cache for a value
  • get the value if present
  • add a value if not present
  • delete a value
At first approach, this is pretty straight-forward for both approaches, with in-line yielding code being more concise. However, what about the following conditions? What will the code look like in these circumstances?
  • an attempt to connect to the database failed, and we have to implement reconnecting logic
  • an attempt to get a lock, but a key is already locked
  • in the case of a failed lock, do re-trys/backoff, eventually raise an exception
  • storing to multiple database servers, but one or more might not contain updated data
  • this leaves the system in an inconsistent state and requires a all sorts of checking, etc.
I couldn't remember all of Guido's excellent points, so I made some up in that last set of bullets, but the intent should be clear: each of those cases requires code branching (if statements or callbacks). In the case of callbacks, you end up with quite a jungle [2]... a veritable net of interlacing callbacks, and the logic can be hard to follow.

One final point that Guido made was that batching/pooling is much simpler with the in-line style, a point I conceded readily.

A Tangent: Thinking Styles

As mentioned already, this caused me to evaluate closely my use of and preference for callbacks. Should I use them? Do I really like them that much? Okay, it looks like I really do -- but why?

Meditating on that question revealed some interesting insights, yet it might be difficult to convey -- please leave comments if I fail to describe this effectively!

There are many ways to describe how one thinks, stores information in memory, retrieves data and thoughts from memory, and applies these to the solutions of problems. I'm a visual thinker with a keen  spacial sense, so my metaphors tend follow those lines, and when reflecting on this in the context of using and creating callbacks, I saw why I liked them:

The code that I read is just a placeholder for me. It happens to be the same thing that the Python interpreter reads, but that's a happy accident [3]; it references the real code... the constructs that live in my brain. The chains of callbacks that conditionally execute portions of the total-possible-callbacks net are like the interconnected deer paths through a forest, like the reticulating sherpa trails tracing a high mountain side, like the twisty mazes of an underground adventure (though not all alike...). 

As I read the code, my eyes scan the green curves and lines on a black background and these trigger a highly associative memory, which then assembles a landscape before me, and it's there where I walk through the possibilities, explore new pathways, plan new architectures, and attempt to debug unexpected culs-de-sac. 

Even stranger is this: when I attempt to write "clean" in-line async code, I get stuck. My mental processes don't fire correctly. My creative juices don't flow. The "inner eye" that looks into problem spaces can't focus, or can't get binocular vision. 

The first thing I do in such a situation? Figure out how I can I turn silly in-line control structures into callback functions :-)  (see footnote [1]),

Now What?

Is Guido's astute assessment the death of callbacks? Well, of course not. Does it indicate the future of the predominant style for writing async Python code? Most likely, yes.

However, there are lots of frameworks that use callbacks and there are lots of people that still prefer that approach (including myself!). What's more, I'd bet that the callbacks vs. in-line async style comes down to a matter of 1) what one is used to, and possibly, 2) the manner in which one thinks about code and uses that code to solve problems in a concurrent, event-driven world.

But what, as Guido asked, am I going to do with this information?

Share it! And then chat with fellow members of the Twisted community. How can we better educate newcomers to Twisted? What best practices can we establish for creating APIs that use callbacks? What patterns result in the most readable code? What patterns are easiest to debug? What is the best way to debug code comprised of layers of callbacks?

What's more, we're pushing the frontiers of Twisted code right now, exploring reactors implemented on software transaction memory, digging through both early and recent research on concurrency and actor models, exploring coroutines, etc. (but don't use inlineCallbacks! Sorry, radix...). In other words, there's so much more to Twisted than what's been created; there's much more that lies ahead of us.

Regardless, Guido's perspective has highlighted the following needs within the Twisted community around the callback approach to writing asynchronous code: 
  • education
  • establishing clear best practices
  • recording and publicizing definitive design patterns
  • continued research
These provide exciting opportunities for big-picture thinkers for both those new to Twisted, as well as the more jaded old-timers. Twisted has always pushed the edge of the envelope (in more ways than one...), and I see no signs of that stopping anytime soon :-)


[1] In a rather comical twist of fate, I actually have a drafted blog post on how to write gevent code using its support for callbacks :-) The intent of that post will be to give folks who have been soaked in the callback style of Twisted a way of accepting gevent into their lives, in the event that they have such a need (we've started experimenting with gevent at DreamHost, so that need has arisen for me).

[2] There's actually a pretty well-done example of this in txzookeeper by Kapil Thangavelu. Kapil defined a series of callbacks within the scope of a method, organizing his code locally and cleanly. As much as I like this code, it is probably a better argument for Guido's point ;-)

[3] Oh, happy accident, let me count the hours, days, and weeks thy radiant presence has saved me ...


irmen said...

An interesting read, thanks for sharing

Duncan McGreggor said...

Sure thing, irmen -- glad you enjoyed it!

Unknown said...

The thinking style aspect of this article really interests me, even though I find it somewhat bewildering. I've read your description, but I can't fathom how anyone could think about code they've not previously read like that before.

According to your description, you build an abstract understanding of a network of callbacks, and then follow trails down that network. If you lack the callbacks, your ability to build this network falters. But this requires you to establish the entire chaining mechanics of these callbacks in your head; without this knowledge, you can't follow the flow of execution.

If you consider purely synchronous code a moment, you will realize that this flow is provided for you (and the interpreter) through a few powerful visual cues: firstly, that code executes from top to bottom, and secondly, that branches in a path correspond to indentation levels in the source code. I would call the way I think and reason about code very visual and spatial as well.

This works extremely well for paths which are tree-like. I'd argue that any callback code could be made cleaner by distilling tree-like branching into implicitly async code (a la gevent) and then manually calling out to edge handlers when those cases come up. In this way, the blueprint of how errors are handled are contained in one place, rather than embedded in the execution flow of a complex network of callbacks.

As a final comment, the linked code makes me fondly remember GOTO, as these it was uniquely suited for handling complex error handling without making you wish pain upon your fellow man. Besides that, from the standpoint of figuring out what it's actually doing, it is full of "throwaway" lines which don't actually have any bearing on what is being done to the data passing through the program flow; instead doing chores like naming functions (partially python's fault, anonymous functions being limited to expressions) and setting up callback machinery. This is something I've noticed in every twisted codebase I've encountered (less so with inlineCallbacks, which has its own associated problems).

Rostyslav Dzinko said...

I like Twisted and I hate callbacks :)
Yeah, possibly quite a paradox, but not.

As for me "branching" is always harder to debug and read that straight in-line code, either "branching" means callbacks or conditional statements; but the reality is that we can't get rid of them for obvious reasons ;).

From the other side, callback is a reasonable and logical way to do asynchronous programming because of the concept of event that always exists in that context (namely, if event happened, you must handle it, or we can say it(event) "calls" something to get the result handled). But you should know that when you're creating some complex chain of software logic in Twisted you always get deep indents with closures or spaghetti code with tons of functions that handle callbacks and errbacks (and it's really-really hard to debug that, and even if it's a way that you think about code (as you wrote above), it can be very hard for other people in your team to deal with that).

Thank god there's inlineCallbacks in Twisted. As for me I use explicit callbacks when there's only one level of callbacks/errbacks in chain. In "deeper" cases I use generators. So I receive all power of Twisted without that "callback hell".

For the conclusion I can say that Twisted really gives a choice of style, so callbacks is not quite a problem for the framework (a very basic but good example where you can find the comparison for those styles was given by Jesse Noller in his old "intro to twisted" article:

Hugh Cole-Baker said...

"but don't use inlineCallbacks!" - I found inlineCallbacks to be a useful way of translating callback-based code into an easier to read coroutine-based style, I'd be interested to hear why you'd recommend not to use it.

rohni said...

Thank you for this. I have been discovering callbacks as a programming style in the world of javascript and to a certain extent nodejs, and it has been a bit of a revelation to find that I actually like callbacks as a programming style, but could not put my finger on why. You have put that intuition and feeling into words.

Much to my embarrassment, although I have done a lot of python programming, I have not explored the twisted framework, I guess it is time that changed. :)

Duncan McGreggor said...

Hey Jason,

Great comment -- thanks!

I'll respond to just a couple points: I'm not sure if the best way to phrase what I do is to say that the entire net of callbacks and errbacks is *established* in my head, rather that what I write is simply a cue for the living, breathing code that resides in a mental space for me. Each callback is like a little mini-program, its own little node that -- in my mind -- I can turn over, examine from different sides, imagine various side effects and errors, etc. It is in this way that my experience of it is spacial.

My big problem with using the methodologies of synchronous/blocking code when writing potentially highly-concurrent code is that the latter is a completely different animal, with different mechanics, patterns, problems, solutions, etc. Using old, familiar methods with a well-established history in a new space like this leads to a great deal of incorrect assumptions, incorrect code, and concepts that are difficult to convey. The use of callbacks helps to address this in the sense that it forces you to think about your problem in a completely different way, one that is quite explicitly designed for concurrency.

Duncan McGreggor said...

grumpeeoldtroll, glad you liked it and I'm excited for your new adventures in the land of Twisted ;-)

Rostislav and Hugh, regarding inlineCallbacks, my single biggest issue with them is conceptual: people use them and think that magically, all their blocking code has now become async, somehow making things like file I/O and non-Twisted networking non-blocking.

The truth of the matter is that you've just dressed up deferreds in a different outfit. You *really* have to know what you're doing to not misuse inlineCallbacks. I've seen lots of folks disappointed with the "performance of Twisted" when using inlineCallbacks, but what it really boiled down to was not knowing how to use deferreds in the first place.

There are additional issues with inlineCallbacks, including a certain amount of overhead (they are syntactic sugar, after all), extreme difficulty in debugging, limitations in the degree to which you can refactor code (due to yield's own limitations; note that this changes in the next release of Python 3, so maybe once Twisted supports 3, this objection might go away), and others I can't recall off the top of my head.

JP (core Twisted maintainer and Twisted instructor) has, on multiple occasions, warned against the general/indiscriminate use of inlineCallbacks. That being said, Glyph definitely supports their use, especially for the sorts of scenarios that Guido outlined in his chat with me. Glyph makes judicious use of them in his own projects and for work... but he *really* knows what he's doing, and is not likely to fall prey to the numerous pitfalls of inlineCallbacks.

I'll ask him to write a more thorough (than these comments) blog post on the misuse, abuse, and proper approach for writing Twisted code using inlineCallbacks... and you should too! The more people that ping him about it, the more likely he'll respond to popular demand :-)

Rostyslav Dzinko said...

>>> regarding inlineCallbacks, my single biggest issue with them is conceptual: people use them and think that magically, all their blocking code has now become async, somehow making things like file I/O and non-Twisted networking non-blocking.

So True, but still inlineCallbacks is a powerfull tool and we have to work on evolving it as it turns code to be much more readable. And IMHO being just an outfit for callbacks is rather plus that minus, because this aproach makes everything compatible and following one strategy of architecture.

I also support your words on Python 3, IMHO language changes are good and it would be wonderful to run Twisted written in 3k.

Thanks for the article as it's a great work and the problem is out there.

Matěj Cepl said...

I have used to maintain couple of thousand lines of Javascript in a Firefox extension. Of course, given the browser nature, it had to be as async as possible, and it was full of callbacks everywhere (given the functional foundation of Javascript it wasn't that bad). My conclusion is that callbacks' nature is very similar to goto. Of course, you can do with them the same what you can do with more structured programming, but in the moment it gets more complicated you have to keep all parts together yourself. No, I don't have a good alternative (I haven't studied known alternatives well enough), but in the moment I'll see it, I run.

Duncan McGreggor said...

Thanks for you kind remarks :-)

Also, if you're interested in seeing inlineCallbacks evolve, you're going to really like the work that fzZzy (Dononvan Preston), dreid, and myself have been collaborating on since PyCon this year: adding support for the actor model in Twisted. fzZzy's done an implementation on top of greenlet, dreid's got one that uses cooperators, and I'm hoping to do one based on pypy's continuelets (which may simply boil down to making minor tweaks to fzZzy's code).

The upshot of all this is that there will be an abstraction in Twisted for send/receive similar to other programming languages that implement or are based on the actor model (e.g., Erlang's spawn/receive functions). It's very likely that Twisted code written in this style will be even cleaner than code that uses inlineCallbacks...

You can track this work here, if you're interested: ticket #5565.

Duncan McGreggor said...

Note that this post got picked up on Hacker News, and there are some pretty intense and excellent comments on the subject there:

Unknown said...

Duncan, if you have the time, I'd *really* like to hear your feedback on PEP 403. It's largely designed to make heavy use of callbacks in Python programs less painful, so I'm very interested in feedback from the Twisted crowd. (I'll note one change I have planned: the PEP currently doesn't let you combine decorators with the new in statement, but I've since decided that's too limiting. While an update is a long way down the todo list, the next version will allow decorators between the in statement and the trailing function or class definition)

Email address is in the PEP header.

Eric Snow said...