Electric Duncan: 2008.06

Friday, June 27, 2008

So You Want Your Code to Be Asynchronous? A Twisted Interview

Prologue

This blog post was taken from a chat on a Divmod IRC channel couple weeks ago. Let's start with my opening comments to JP about what I hoped we could accomplish in the interview.

[1:47pm] oubiwann:exarkun: developers/users have started to understand Twisted, see the benefits of an async paradigm, and want to start writing their code making the best possible use of twisted's event driven nature
[1:48pm] oubiwann:they know how to write code using deferreds, and they're ready to get writing...
[1:48pm] oubiwann:except they're not
[1:48pm] oubiwann:because they don't know python internals
[1:49pm] oubiwann:they don't know what python can actually be used with deferreds because they don't know what requirements there are for python code that it be non-blocking in the reactor
[1:50pm] oubiwann:so you're going to help us understand the pitfalls
[1:50pm] oubiwann:how to make best guesses
[1:50pm] oubiwann:and where to look to get definitive answers

Change Your Mind

Before we go any further, I want to share a few comments and answer two questions: "Who is this for?" and "What do I need to know for this to mean something to me?" This post is for anyone who wants to write async code with Twisted and the answer to the second question is open-ended.

Let me start with what is often interpreted as effrontery: read the source code. Despite how that may have sounded, it's not another RTFM quip. The Twisted source code was specifically designed to be read (well, the code from the last two years, anyway). It was designed to be read, re-read, absorbed, pondered, and turned into living memes in your brain.

Understanding tricky topics in conceptually dense fields such as mathematics, physics, and advanced programming requires immersion. When we commit to really learning something difficult in programming, when we take the big step and dive in, we are surrounded by code. At a conceptual level, I mean that literally: it is a spacial experience. This is not something that is typically taught... the lucky few are able to do this their on the own; the rest have to slowly build their intuition through experience in order to get comfortable and be productive in code space.

Our school systems tend to train us along very linear lines: there's a right answer, and a wrong answer. Don't rock the boat. Don't make the teacher uncomfortable. Follow the rules, do your homework, and don't ask too many questions. We carry these habits with us into our professional lives, and it can be quite the task to overcome such a mindset.

Experience is multidimensional. Learning is experience, not rules. When you really jump into this stuff, it will surround you. You will have an experience of the code. For me, that is a mental experience akin to looking at something from the perspective of three dimensions versus two. When I've not dedicated myself to understanding a problem, the domain, or the tools of the domain, everything looks very flat to me. It's hard to muddle through. I feel like I have no depth perception and I get easily frustrated.

When I do take the time, when I make the investment of attention and interest, the problem spaces really do become spaces, ones where my mind has a much greater freedom of movement. It's not smart people who do this kind of thing, it's committed people. Your mind is your world and it's up to you to make it what you want. No one on a mail list or IRC channel can do that for you. They can help you with the rules, provide you with valuable moral support, and guide you along the way. However, a direct experience of the code as a living world of mind comes from taking many brave leaps into the unknown.

Interview in a Blender

Jean-Paul Calderone graciously set aside some time to talk with me about creating asynchronous code in Python, particularly, using the Twisted framework. As has been said many times before, simply using Twisted or deferreds doesn't make your code asynchronous. As with any tricky problem, you have to put some time and thought into what you want to accomplish and how you want to accomplish it.

I'm going to post bits of our chat in different sections, but hopefully in a way that makes sense. There's some good information here and some nice reminders. More than anything, though, this should serve as an encouragement to dig deeper.

Why Would I Ever Need Async Code?

There are a couple short answers to that:

Your application is doing many long-running computations (or runs of a varying/unpredictable length).
Your application runs in an unpredictable environment (in particular, I'm thinking of network communications).
Your application needs to handle lots of events

[1:55pm] oubiwann:exarkun: so, what's the first question a developer should ask themselves as they begin writing their Twisted application/library, txFoo?
[1:55pm] dash:"would everyone be better off if I just stopped now?"
[1:55pm] exarkun:oubiwann: I'm not sure I completely understand the target audience yet
[1:56pm] exarkun:my question is kind of like dash's question
[1:56pm] exarkun:why is this person doing this?
[1:57pm] oubiwann:exarkun: the audience is the group of software developers that are new to twisted, have a basic grasp of deferreds, and want their code to be properly async (using Twisted, of course)
[1:57pm] oubiwann:they don't have anything more than a passing familiarity of the reactor
[1:57pm] oubiwann:they don't know python internals

Protocols, Servers, and Clients, Oh My!

If your application can use what's already in Twisted, you're on easy street :-) If not, you may have to write your own protocols.

Let's get back to the chat:

[1:57pm] exarkun:So `foo´ is... a django-based web application?
[1:58pm] exarkun:... a unit conversion library?
[1:58pm] oubiwann:sure, that works
[1:58pm] oubiwann:unit conversion lib
[1:58pm] oubiwann:(which could be used in Django)
[1:58pm] exarkun:at a first guess, I'd say that there's probably no work to do
[1:58pm] exarkun:how could you have a unit conversion library that's not async?
[1:58pm] exarkun:that'd take some work
[1:59pm] oubiwann:let's say that the unit calculations take a really long time to run
[1:59pm] exarkun:Hm. :)
[1:59pm] idnar:you'd probably have to spawn a new process then :P
[2:00pm] exarkun:basically. probably the only other reasonable thing is for twisted-using code to use the unit conversion api with threads.
[2:00pm] exarkun:so then the question to ask "is my code threadsafe?"
[2:00pm] oubiwann:what about a messaging server
[2:00pm] oubiwann:that sends jobs out to different hosts for calcs
[2:01pm] dash:that's not going to be a tiny example
[2:01pm] exarkun:for that, the job is probably to take all the parsing and app logic and make sure it's separate from the i/o
[2:01pm] exarkun:so "am I using the socket/httplib/urllib/ftplib/XXXlib module?"
[2:03pm] exarkun:is another question for the developer to ask himself
[2:06pm] exarkun:they probably need to find the api in twisted that does what they were using a blocking api for, and switch to it
[2:07pm] exarkun:that might mean implementing a protocol, or it might mean using getPage or something
[2:07pm] exarkun:and pushing the async all the way from the bottom up to the top (maybe not in that direction)
[2:08pm] oubiwann:by "bottom" are you referring to protocol/wire-level stuff?
[2:08pm] oubiwann:exarkun: and by "top" their module's API?
[2:09pm] exarkun:yes
[2:10pm] exarkun:oubiwann: the point being, can't have a sync api implemented in terms of an async one (or at least the means by which to do so are probably beyond the scope of this post)

Processes

We didn't really talk about this one. Idnar mentioned spawning processes briefly, but the discussion never really returned there. I imagine that this is fairly well understood and may not merit as much pondering as such things as threads.

Which brings us to...

Threads

Thread safety is the number one concern when trying to provide an asynchronous API for synchronous code. Here are some starters for background information:

Discussing threads consumed the rest of the interview:

[2:12pm] oubiwann:exarkun: so, back to your comment about "is it threadsafe" (if they are doing long-running python calculations)
[2:13pm] oubiwann:what are the problems we face when we don't ask ourselves this question?
[2:13pm] oubiwann:what happens when we try to run non-threadsafe code in the Twisted reactor?
[2:14pm] exarkun:The problem happens when we try to run non-threadsafe code in a thread to keep it from blocking the reactor thread.
[2:16pm] oubiwann:so non-thread safe code run in deferredToThread could...
[2:16pm] oubiwann:have data inconsistencies which cause non-deterministic bugs?
[2:16pm] dash:have the usual effects of running non-threadsafe code
[2:16pm] exarkun:have any problem that using non-thread safe code in a multithreaded way using any other threading api could have
[2:16pm] dash:like that, yeah
[2:17pm] exarkun:inconsistencies, non-determinism, failure only under load (ie, only after you deploy it), etc
[2:18pm] dash:i smell a research paper
[2:18pm] oubiwann:so, next question: how does one determine that python code is thread safe or not?
[2:19pm] glyph:a research *paper*?
[2:19pm] exarkun:heh
[2:19pm] glyph:research *industry* more like
[2:19pm] oubiwann:exarkun: or, if not determine, at least ask the right sorts of questions to get the developer thinking in the right direction
[2:20pm] dash:glyph: Heh heh.
[2:20pm] exarkun:oubiwann: well, is there shared mutable state? if you're calling `f´ in a thread, does it operate on objects not passed to it as arguments?
[2:20pm] exarkun:oubiwann: if not, then it's probably safe - although don't call it twice at the same time with the same arguments
[2:20pm] exarkun:oubiwann: if so, who knows
[2:20pm] dash:with the same mutable arguments, anyway
[2:23pm] oubiwann:exarkun: so, because python and/or the os doesn't do anything to make file operations atomic, I'm assuming that reading and writing file data is not threadsafe?
[2:24pm] exarkun:don't use the same python file object in multiple threads, yes.
[2:24pm] exarkun:but certain filesystem operations are atomic, and you can manipulate the same file from multiple threads (or processes) if you know what you're doing
[2:25pm] oubiwann:what about C extensions in Python? any general rules there?
[2:25pm] oubiwann:other than "if they're threadsafe, you can use them"
[2:25pm] exarkun:that's about all you can say with certainty
[2:26pm] exarkun:for dbapi2 modules, look at the `threadlevel´ attribute. that's about the most general rule you can express.
[2:26pm] exarkun:there's some stuff other than objects that gets shared between threads too that might be worth mentioning
[2:26pm] exarkun:at least to get people to think about non-object state
[2:27pm] oubiwann:such as?
[2:27pm] exarkun:like, process working directory, or uid/gid
[2:30pm] • oubiwann looks at deferToThread...
[2:31pm] • oubiwann looks at reactor.callInThread
[2:33pm] • oubiwann looks at ReactorBase.threadpool
[2:38pm] oubiwann:hrm
[2:38pm] oubiwann:internesting
[2:39pm] oubiwann:never took the time to trace that all the way back to (and then read) the Python threading module
[2:40pm] oubiwann:exarkun: are there any python modules well known for their lack of threadsafety?
[2:42pm] exarkun:oubiwann: I dunno about "well known"
[2:42pm] exarkun:oubiwann: urllib isn't threadsafe
[2:42pm] exarkun:neither is urllib2
[2:43pm] exarkun:apparently random.gauss is not thread-safe?
[2:43pm] exarkun:you generally start with the assumption that any particular api is not thread-safe
[2:44pm] exarkun:and then maybe you can demonstrate to your own satisfaction that it's thread-safe-enough for your purposes
[2:44pm] exarkun:or you can demonstrate that it isn't
[2:45pm] exarkun:grepping the stdlib for 'thread' and 'safe' is interesting
[2:45pm] oubiwann:I wonder if the stuff available in math is threadsafe....
[2:45pm] oubiwann:exarkun: heh, I was just getting ready to dl the source so I could do that :-)
[2:46pm] exarkun:the math module probably is threadsafe
[2:46pm] exarkun:maybe that's another generalization
[2:46pm] exarkun:stdlib C modules are probably threadsafe
[2:49pm] oubiwann:hrm, looks like part of random isn't threadsafe
[2:51pm] oubiwann:random.random() is safe, though
[2:53pm] oubiwann:exarkun: I really appreciate you taking the time to discuss this
[2:53pm] exarkun:np
[2:53pm] oubiwann:and thanks to dash, glyph, and idnar for contributing to the discussion :-)

Summary

Concurrency is hard. If you want to use threads and you want to do it right and you want to avoid pitfalls and have bug-free code, you're going to be doing some head-banging. If you want to use an asynchronous framework like Twisted, you're going to have to bend your mind in a different way.

No matter what school of thought you follow for any given project, the best results will come with full commitment and immersion. Don't fear the learnin' -- embrace the pain ;-)

Update: Special thanks to Piet Delport for sorting out my endless typos!

Wednesday, June 25, 2008

Safari 3.1.1 Installer Hosed on OS X 10.5.3

I recently tried updating my Safari to the latest version, only to discover from here and here that Apple seems to have intentionally made this a 10.5.2-only update. I looked in the "Distribution" script and confirmed that this was, in fact, the case. The obvious symptom of this was that the installer told me I couldn't install Safari on any of my drives. Nice.

On those forum posts, I also discovered this great tool: Pacifist. It's been on my backburner list for a while to find a tool that could open up and extract Mac OS X packages, so for that alone I was delighted. When combined with PackageMaker, I was able to create my own installer. Even better.

If this is useful for anyone else, I've put it up here: Safari311UpdLeo_Divmod.pkg. Do note, however, that this installer has no brains: it just puts the files where they should be. It also doesn't check for your system version, so it could potentially really screw things up. Neither I, the Divmod community, nor Divmod, Inc. are responsible in any way if this installer takes your machine to the knacker's yard. However, I am using it on 10.5.3 with no issues (so far).

Saturday, June 21, 2008

txLoadBalancer

Well today was a flurry of activity... pulled an all-nighter whipping a python load balancer into shape after some late-afternoon discussions on #divmod.

At Divmod, we're going to be labbing out some distributed services experiments with twistd servers, and one set of those experiments involves "developer friendly" load balancing. JP suggested that I take a look at how PyDirector works and see if we could use that. Which was actually interesting in a full-circle kind of way: I worked on PyDirector when I was at PBS, ages ago, where I wrote a weighted lb algorithm for it.

Jumping into the code again after a 5-year hiatus was like seeing an old friend :-)

All tonight, I worked on the following branches:

https://code.launchpad.net/~oubiwann/pydirector/1.0.0 - A copy of the 2004 release of PyDirector
https://code.launchpad.net/~oubiwann/pydirector/more-twisted - Full Twisted support, development
https://code.launchpad.net/~oubiwann/pydirector/1.1.1 - Full Twisted support, release
https://code.launchpad.net/~oubiwann/txloadbalancer/0.9.1 - PyDirector compatible; Identical to PyDirector 1.1.1 (with the exception of the obvious name change)
https://code.launchpad.net/~oubiwann/txloadbalancer/1.0.1 - Twisted-only (no threading nor asyncore modules); name space was changed to txlb
https://code.launchpad.net/~oubiwann/txloadbalancer/main - current development

txLoadBalancer 0.9.1 and 1.0.1 are up on PyPI in the usual place.

I did lots of manual functional testing for each branch tonight, but I didn't do any TDD. While I'm still playing with it, I'll probably start adding tests as bugs crop up (BDT), and as it gets more serious I'll go fully into TDD and fill in what's missing at that point.

Tonight's mad rush was actually a great deal of fun. It's been a while since I've had the opportunity to plow through a bunch of code like that, and I enjoyed myself to near exhaustion :-) I don't think I'll be able to get to sleep tonight (er, this morning), due to the endless thinking about all the ways in which I want to use this code, mutate it, and... well, I better leave some surprises for later!

Update: I've edited the links for the latest micro-releases that fixed some issues with setup.py.

Update 2: Thanks to the heads-up in the comments from Kapil, I've patched txLoadBalancer trunk with the changes from Apple (David Reid and Wilfredo Sanchez).

Friday, June 20, 2008

Async Batching with Twisted: A Walkthrough

While drafting a Divmod announcement last week, I had a quick chat with a dot-bomb-era colleague of mine. Turns out, his team wants to do some cool asynchronous batching jobs, so he's taking a look at Twisted. Because he's a good guy and I like Twisted, I drew up some examples for him that should get him jump-started. Each example covered something in more depth that it's predecessor, so is probably generally useful. Thus this blog post :-)

I didn't get a chance to show him a DeferredSemaphore example nor one for the Cooperator, so I will take this opportunity to do so. For each of the examples below, you can save the code as a text file and call it with "python filname.py", and the output will be displayed.

These examples don't attempt to give any sort of introduction to the complexities of asynchronous programming nor the problem domain of highly concurrent applications. Deferreds are covered in more depth here and here. However, hopefully this mini-howto will inspire curiosity about those :-)

Example 1: Just a DefferedList

This is one of the simplest examples you'll ever see for a deferred list in action. Get two deferreds (the getPage function returns a deferred) and use them to created a deferred list. Add callbacks to the list, garnish with a lemon.

Example 2: Simple Result Manipulation

We make things a little more interesting in this example by doing some processing on the results. For this to make sense, just remember that a callback gets passed the result when the deferred action completes. If we look up the API documentation for DeferredList, we see that it returns a list of (success, result) tuples, where success is a Boolean and result is the result of a deferred that was put in the list (remember, we've got two layers of deferreds here!).

Example 3: Page Callbacks Too

Here, we mix things up a little bit. Instead of doing processing on all the results at once (in the deferred list callback), we're processing them when the page callbacks fire. Our processing here is just a simple example of getting the length of the getPage deferred result: the HTML content of the page at the given URL.

Example 4: Results with More Structure

A follow-up to the last example, here we put the data in which we are interested into a dictionary. We don't end up pulling any of the data out of the dictionary; we just stringify it and print it to stdout.

Example 5: Passing Values to Callbacks

After all this playing, we start asking ourselves more serious questions, like: "I want to decide which values show up in my callbacks" or "Some information that is available here, isn't available there. How do I get it there?" This is how :-) Just pass the parameters you want to your callback. They'll be tacked on after the result (as you can see from the function signatures).

In this example, we needed to create our own deferred-returning function, one that wraps the getPage function so that we can also pass the URL on to the callback.

Example 6: Adding Some Error Checking

As we get closer to building real applications, we start getting concerned about things like catching/anticipating errors. We haven't added any errbacks to the deferred list, but we have added one to our page callback. We've added more URLs and put them in a list to ease the pains of duplicate code. As you can see, two of the URLs should return errors: one a 404, and the other should be a domain not resolving (we'll see this as a timeout).

Example 7: Batching with DeferredSemaphore

These last two examples are for more advanced use cases. As soon as the reactor starts, deferreds that are ready, start "firing" -- their "jobs" start running. What if we've got 500 deferreds in a list? Well, they all start processing. As you can imagine, this is an easy way to run an accidental DoS against a friendly service. Not cool.

For situations like this, what we want is a way to run only so many deferreds at a time. This is a great use for the deferred semaphore. When I repeated runs of the example above, the content lengths of the four pages returned after about 2.5 seconds. With the example rewritten to use just the deferred list (no deferred semaphore), the content lengths were returned after about 1.2 seconds. The extra time is due to the fact that I (for the sake of the example) forced only one deferred to run at a time, obviously not what you're going to want to do for a highly concurrent task ;-)

Note that without changing the code and only setting maxRun to 4, the timings for getting the the content lengths is about the same, averaging for me 1.3 seconds (there's a little more overhead involved when using the deferred semaphore).

One last subtle note (in anticipation of the next example): the for loop creates all the deferreds at once; the deferred semaphore simply limits how many get run at a time.

Example 8: Throttling with Cooperator

This is the last example for this post, and it's is probably the most arcane :-) This example is taken from JP's blog post from a couple years ago. Our observation in the previous example about the way that the deferreds were created in the for loop and how they were run is now our counter example. What if we want to limit when the deferreds are created? What if we're using deferred semaphore to create 1000 deferreds (but only running them 50 at a time), but running out of file descriptors? Cooperator to the rescue.

This one is going to require a little more explanation :-) Let's see if we can move through the justifications for the strangeness clearly:

We need the deferreds to be yielded so that the callback is not created until it's actually needed (as opposed to the situation in the deferred semaphore example where all the deferreds were created at once).
We need to call doWork before the for loop so that the generator is created outside the loop. thus making our way through the URLs (calling it inside the loop would give us all four URLs every iteration).
We removed the result-processing callback on the deferred list because coop.coiterate swallows our results; if we need to process, we have to do it with pageCallback.
We still use a deferred list as the means to determine when all the batches have finished.

This example could have been written much more concisely: the doWork function could have been left in test as a generator expression and test's for loop could have been a list comprehension. However, the point is to show very clearly what is going on.

I hope these examples were informative and provide some practical insight on working with deferreds in your Twisted projects :-)

Monday, June 16, 2008

The Future of Personal Data

In a recent post about ULS systems, I said this:

The balance of power, from individuals all the way to the top of whatever organizations exist in the future will rest in information. Not like it is today, however. The "information economy" of today (+/- 10 years) will look like kids' games and playgrounds. The information economies this will evolve into will be so completely integrated into human existence that they will resemble the basic necessities like water and food.

I'm not going to focus on the ULS systems topic in this post, but there is a very deep connection between privacy, personal data and all things ULS. Any thoughts of a ULS system should be coupled with how this will impact the system's users and their data. Any thought of our personal data's future existence should include the anticipated future of computing: ULS systems.

Inside and Out

In a nutshell, here's how things look:

Yesterday: Paid Services - You want something, you buy it. Demographic research is expensive and mostly outsourced.
Today: Free Services - You want something, companies give it to you for free... in exchange for your demographic data.
Tomorrow: Information Economy - You want something, you leverage the value of your information in brokering the the service deals that mean the most to you.

What do we have right now? Companies are fighting for each other over who gets to have our data for free. Yay, free stuff! We used to have to pay for that sort of thing! But paying for people to hold your data was the old, old world. Having them do it for free is the old world. Here's the new world: They pay you.

Why would they do that? Why would things shift from the current status quo? The value of personal information.

There are many ways to assess the value of personal information, but let's look at a few from the perspective of large organizations (entailing everything from government to business). Simplistically, we can assign value to a single individual's data based on the value of a large collection of many individuals' data. The more participants, the greater the value of the whole, and therefore the greater the value for each individual's data. This perspective is limited because it treats data very staticly. The data may change, but in relation to the system it's "acquired" and inside as opposed to "for sale" and outside.

We Are the Markets

But the value of our data is not defined simply by the presense of bits or membership in a valued data conglomerate. Our data is not just our emails, our medical records, our purchasing trends, nor our opinions about local and national politics. Like an organism moving through an ecosystem, our data is dynamic and living; it is the very trace we leave in the world around us, be it digital or otherwise.

Any part of our lives that is ever recorded in "the system" provides data and comprises part of our movements through this system. Our traces through this digital ecosystem impact it, change it, shape its future direction. The collective behaviours (not just collective data) are immensely valuable to organizations. Their value is on-going and growing, with accrued, compounded interest.

Static data bits seem like property to us: you can buy them, you can sell them, you can store them somewhere. But moving, living data... that's a different story. That's not a buy-once commodity; ownership of that might be tantamount to slavery in a future, information-based economy. However, organizations might opt to lease it, or individuals might turn the past back on the future and offer license agreements to organizations.

More likely, though, individuals will form co-ops or communities (we have already seen this happend extensively in today's Internet) with shared mutual interest. Seeing how a group entity with shared values has a larger effect on the system than single individuals, data from such groups would likely be much more interesting and number-crunch-worthy. The greater power a group has to perterb systems' ecomonic or political trends, the more valuable that group's data will be to other groups.

In addition, I'm sure there'd be all sorts of tiered "offerings" from individuals and groups: the juicier/more detailed the data, the higher the premium offered. The changes this will introduce to markets (global and local), legal systems, and politcal organziations are probably barely imaginable right now. But what would it take to get us there? What would it take for my data and your data to be valuable enough to transform the world and make Wall Street look like an old-time, irrelevant boys club?

Privacy

One thing: a fanatical devotion to privacy, pure and simple. Security and a fanatical devotion to privacy. Two things! Okay, reliability, security and a fantaical devotion to privacy. Three things!

Monty Python references aside, an economy that values the data of individuals and groups can only arise if that data is secure. If we live in a topsy-turvy world where the Government, MPAA, RIAA, the Russian Mafia, and Big Hosting Company are pirating our data, then we're hosed. However, if our data is secure and contracts are effective, then we will have a world where data is the currency. There are an incredible number of hurdles to overcome in order for this to happen, however.

The System - we need a system where user data can be tracked, recorded, and analyzed, and there's enough of it to matter
Storage - we need our own, personal banks for our data (irrefutable ownership rights and complete power over that data)
Transactions - we need a mechanism for engaging in secure, data transations
Identity - when making a transation, we need to be able to prove unequivocally that we are who we say we are
Anonymity - we need to decouple activity in the system and identity, thus requiring organizations to come to us (or our groups) to get the definitive data they need
Recourse - we need a legal system and effective laws that protect the individuals and groups against the crimes of data-hungry organizations; fortunately, we will have had years of established precedent protecting the sellers from the buyers... oh my, how the tables turn!

And that's just off the top of my head. There's got to be tons of stuff which hasn't even occurred to me.

Closing Thoughts

Information will be as essential for us as water, yet there is a very interesting divergence from the example of a hydrological empire: each individual is the producer of some of that metaphorical water. By virtue of this difference, we hold the keys of the empire. We will be more a part of the economic and political powerbases than we have every been at any time in human history.

Of course, that means that we've got to get ready :-) This is already being done in many different ways. Everything from community housing cooperatives to small, co-op banks; from capabilities-based programming models to secure online transactions. Like the next 20 years of research needed for ULS systems to become a reality, we've got just as much work to do in order to guarantee our place in the economies of the future.

Thursday, June 12, 2008

Ultra Large-Scale Systems: An Example

The ULS Series

Required Reading: Ultra Large-Scale Systems
Ultra Large-Scale Systems: An Example

Background

My interest in this topic is as old as my love for science fiction. As a child who had not only just started teaching himself to program but had fallen deeply in love with I, Robot, I consumed everything I could by the Master of the Art himself, Isaac Asimov. Inevitably, an endless steam of science fiction began flowing into my brain: the harder the science, the more cherished it was.

Then came the discovery that computers could actually talk to each other. Holy network, Batman, that changed everything! Oh, how I lamented my Kaypro II's inability to dial out. Science fiction novels began touching on this aspect of technology more and more frequently, while the Internet began taking shape in the "real" world around us. Now, look at it. Regardless of the mess and chaos, it's really quite amazing: beowulf clusters, distributed computing, cloud services, and of course the Internet in general. These advances are actually quite mind-blowing when we take the time to examine them from a historical perspectve.

A lot has changed since those early days of the network. The past 10 years or so has seen the beginnings of a trend with regard to large systems. Certainly my views on the future of networks (and services that utilize them) have been pretty consistent:

In Praise of Evolvable Systems (2003)
Dinosaurs and Mammals (2004)
Evolving Networks (2004)
Everything as a Distributed Resource (2004)
The King is dead! Long live the Kinglets! (2005)

As I indicated in the more recent ULS blog post, I have been exposed to some excellent resources for ultra large-scale systems. For some of those I recently provided links, and others I will be referencing in future posts.

Due to their nature, ULS systems pose interesting open source collaboration as well as business opportunities. They entail a massive collection of excellent problems to solve that cannot possibly be completely addressed in the next 6-12 months (where so many projects and businesses tend to put their focus, for obvious practical reasons). As such, there are a great number of research and development areas -- plenty for everyone, in fact. In this series of blog posts, my goal is to expose a wider audience to the topics and encourage folks to start thinking about both interim solutions as well as potential long-term ones.

Characteristics of a ULS

Let's start of with some semblance of a definition :-) What constitutes a ULS system? Here are some characteristics given by Scale Changes Everything:

an unbelievable amount of code (on the order of trillions of lines of code)
immense storage needs, network connections, processing
lots of hardware, lots of people, lots of purposes
decentralized components
created by aggregation, not design
unreliable components, reliable whole
ongoing and real-time upgrades, changes, and deployments
lots of functionality, likely in a focused area of concern

Here's an illuminating quote from Richard Gabriel's Design Beyond Human Abilities presentation:

The components that make up a ULS system are diverse as well as many, ranging from servers and clusters down to small sensors, perhaps the size of motes of dust. Some of the components will self-organize like swarms, and others will be carefully designed. The components will not only be computationally active but also physically active, including sensors, actuators...

Sounds like pure science fiction, doesn't it? Think about it, though. Is it really? Divmod's friend Raffi Krikorian co-wrote this paper at MIT. Check out the cheap network node that's smaller than a fingertip. At that size, hundreds of them would be innocuous. In a few years, we could have thousands of them in a room without even knowing it. Within a single home we could have the equivalents of what today are campus or regional networks. We probably can't even wrap our heads around how big these systems will be. But there is plenty of precedence for such natural short-sightedness. From Raffi's (et al.) 2004 paper:

The ARPAnet was ambitiously designed to handle up to 64 sites with up to 4 computers per site, far exceeding any perceived future requirement. Today there are more than 200 million registered hosts on the Internet, with still more computers connected to those.

Here are some other choice quotes:

[Internet 0] is not a replacement for the current Internet (call that Internet 1); it is a set of principles for extending the Internet down to individual devices...

An [Internet 0] network cannot be distinguished from the computers that it connects; it really is the computer. Because it allows devices for communications, computation, storage, sensing, and display to exchange information in exactly the same representation, around the corner or around the world, the components of a system can be dynamically assembled based on the needs of a problem, rather than fixed by the boundaries of a box.

We're already building this stuff. It's not science fiction. We may not have swarming, self-replicating nano machines... yet. But we're already heading in a direction where that's not just a possibility; it's a likelihood.

So, we've got lots of code, machines, storage, sensing and people; much of it decentralized. What else do we need? Failure tolerance and maintenance on-the-fly. Check. Finally, a ULS system will have to actually be useful, or it will never get built. Who would want such a thing besides militaries, big governements, and Dr. Evils? Now we start getting to our example: Health Care. But let's not get ahead of ourselves. First, let's examine why the biggest system of networked devices that we know of isn't a ULS system.

Why the Internet is not a ULS System

Most obvious of the criteria listed above, the Internet is not focused on a single or related set of goals; it's used for everything. However, it does meet many of the criteria. From the Carnegie Mellon report:

The Web foreshadows the characteristics of ULS systems. Its scale is much larger than that of any of today’s systems of systems. Its development, oversight, and operational control are decentralized. Its stakeholders have diverse, conﬂicting, complex, and changing requirements. The services it provides undergo continuous evolution. The actions of the people making use of the Web inﬂuence what services are provided, and the services provided inﬂuence the actions of people. It has been designed to avoid the worst problems deriving from the heterogeneity of its elements and to be insensitive to connection failures.

But ... Security was not given much attention in its original design, and its use for purposes for which it was not initially intended ... has revealed exploitable vulnerabilities ... And although the Web is an important element of people’s work lives, it is not as critical as a ULS ... system would be.

Now I think we're in a good place to talk about the health care system of the future...

Health Services as a ULS

Let's start this section with a quote from the presentation that inspired it. Richard Gabriel says:

An example of a ULS system (that doesn’t yet exist) would be a healthcare system that integrates not only all medical records, procedures, and institutions, but also gathers information about individual people continuously, monitoring their health and making recommendations about how to improve it or keep it good. Medical researchers would be hooked into the system, so that whenever new knowledge appeared, it could be applied to all who might need it. Imagining this system, though, requires also imagining that it is protected from the adversaries of governmental and commercial spying / abuse.

Modern hospitals are packed with countless computing devices: everything from charting PDAs to physiological monitors for patients; from mainframes and patient record data warehouses, to terminals and desktops. Wireless medical sensors have already been developed by a research project at Harvard. What's more, despite the concerns over associated health risks, implant research at Johns Hopkins and the University of Maryland is on-going and may produce results that are one day standard practice in hospitals.

As versions of theses decives are developed that produce no ill effects for humans, they will make their way into out-patient clinics, assited living facilities, and ulitmately HMO's, private practices, and our homes. The devices will grow in numbers, shrink in size, and provide more functionality at greater efficiency than their predecessors.

The volumes of information that will be exhanged between devices, analyzed and correlated by other devices, and consumed by end-users, doctors, and researchers will be mind-boggling. It will bring new insights on everything from personal health to epidemiology.

With this, though, will come the obvious need for security and privacy, for defense against information attack and denial of service. These devices will all have to dedicate compuational and storage resources for use by the whole system. Part of the system will have to monitor itself, properly escalate problems, observe and anticipate trends. Protection and defense capabilities will have to exist the likes of which barely exist in our every day lives at the marcoscopic level.

All of this will take time. They will truly be modern wonders of the world. Given that such systems are anticipated to exist sometime in the next 20 years, and will have accreted the component systems over time, where might such a thing start?

Google and a Health Care ULS

If you read my last post (which I think was posted to blogger before the official announcement by Google), you already knew what I was going to say :-) Google Health. Though obviously nowhere near a ULS system in and of itself, why might we suggest Google is moving in this direction?

Here are some interesting bullet points from google.org:

InSTEDD: $5,000,000 multi-year grant to establish this nonprofit organization focused on improving early detection, preparedness, and response capabilities for global health threats and humanitarian crises
Global Health and Security Initiative: $2,500,000 multi-year grant to strengthen national and sub-regional disease surveillance systems in the Mekong Basin area (Thailand, Vietnam, Cambodia, Lao PDR, Myanmar, and China-Yunnan province)
Clark University for Clark Labs: $617,457 to Clark University, with equal funding from the Gordon and Betty Moore Foundation, to support the development of a system to improve monitoring, analysis and prediction of the impacts of climate variability and change on ecosystems, food, and health in Africa and the Amazon
HealthMap: $450,000 multi-year grant to conduct in-depth research into the use of online data sources for disease surveillance

Does that sound familiar to anyone besides me? All paranoia-induced sinister thoughts and Google Ads jokes aside, it makes sense that this is where we're going with health. In fact, it makes sense this is where we're going with all of our lives. If data privacy, personal ownership of that data, and security concerns can all be addressed, our lives' information will be better served by moving through systems specially designed to provide maximal use of that information with the least work. It won't just be nice to have, it will be essential.

The balance of power, from individuals all the way to the top of whatever organizations exist in the future will rest in information. Not like it is today, however. The "information economy" of the today (+/- 10 years) will look like kids' games and playgrounds. The information economies this will evolve into will be so completely integrated into human existence that they will resemble the basic necessities like water and food.

If you could find yourself a corner of that market, 20 years before everyone else got there, wouldn't that be a smart business move?

Summary

Our world is changing much more than we realize. We're too tied up in our jobs and gas prices to see the larger picture... to see that our future is already being made, that even in our unconscious actions we are propagating it no less than the cells in our bodies conspire to propagate what will become our children.

In the same way that hominid nomadic/migratory patterns begat the distribution of villages and tribal communities, which in turn gave birth to civilization, our silly little Internet will one day have descendants that dwarf it in size, utility, complexity, and computational power. The amazing thing is that we are the ones that actually get to build them!

There is a lot to research, and just as much to prototype. There is a project for everyone, and by starting now, we can make sure that feudal lords of tomorrow don't have absolute control over our food and water. If you have ideas for collaboration, start talking! Get involved! If you have money, fund some research, sponsor some conferences. In simply writing this blog post, I have uncovered gobs of new research I didn't know was out there. We should all be reading more, catching up, and coding. The projects near and dear to our hearts can get a whole new life within the context of ULS systems.

Holden Web/Divmod Seminar

For folks in the D.C. area (or those who can get there in July), we've got a special half-day seminar planned. Check out our news item about it.

Note that there are only 12 seats, some of which have already been reserved. Get, while the getting's good...

Tuesday, June 10, 2008

Bazaar with Subversion and Combinator

For the past couple days, I've been experimenting with using Bazaar and Combinator more or less simultaneously. As you may know by now, Combinator is a tool that wraps some of Subversion's ugliness (mostly merging), helps manage branches, and sets Python paths for development environments. We use it extensively (almost exclusively) at Divmod.

One of my recent side projects has evolved into useful code more quickly than I had anticipated, so I thought I'd put it up on Launchpad in the Twisted Community Code. This, of course, led to questions about one-time imports, mirroring, and dual bzr/svn management. I eventually opted for the last, using the bzr plugin bzr-svn. Not having a lot of experience with Bazaar, I was at a bit of a loss, at first: there don't seem to be any dummy docs to get us beginners up to speed.

Through some painful, time-consuming trial and error and a couple dead ends, I arrived at a process that works for me, and codified it in a script. The comments in that script seemed generally useful, and given the dearth of docs, I thought I'd turn the comments into a blog post.

The Plugin

Once I figured out the right way to use bzr-svn, it was actually much easier than I thought it would be. Here are the basics: you need to have bzr installed and then you need to install bzr-svn, which is actually a bzr plugin and not a separate tool. When you have bzr-svn installed, you will have additional bzr commands at your disposal which, as you might guess, let you interoperate with an svn repository.

Two Become One

So here's how you get started: create your Subversion branch (we use Combinator) and get your working dir ready to code. You can either add dirs and files now, or do that later; it doesn't matter.

Then, in this working directory, perform a bzr checkout:

bzr co . bzrtest
cd bzrtest

This will create a Bazaar branch from your Subversion (Combinator) branch. 'bzrtest' (or whatever you name it) is your new bzr+svn branch and it is here where you'll be doing all of your work, committing, pushing to Subversion, and (in my case) pushing to Launchpad.

If your Subversion repository has a long history, you probably don't want to perform a 'bzr update' -- that'll just end in tears (it could take days to finish, use up lots of memory, require multiple restarts, and consume disk space by the gigaliter).

Launchpad

For my project, I had already registered a branch on Launchpad via the web interface, so I was ready to push the new Bazaar branch just created with the checkout command above:

bzr push lp:~oubiwann/txevolver/dev --use-existing-dir

I then logged into the web interface again, and set this newly pushed branch as the main development effort for the project. All future pushes (during this development phase) will now be done with the following command:

bzr push lp:txevolver

Future commit-push cycles just look like this:

bzr commit --local -m "My message"
bzr push lp:txevolver

Keep in mind that you can do multiple commits with Bazaar before you push to a server.

The Divmod Repo

Once you've done a local commit (or many local commits), you're ready to start pushing changes to your Subversion repository. This is where you use one of the commands that is provided by the bzr-svn plugin:

bzr svn-push svn+ssh://myRepo

And in my case, that's the following:

bzr svn-push \
svn+ssh://divmod.org/svn/Divmod/branches/genetic-programming-2620/Evolver

If you have done more than one local commit since your last push, you'll see a series of commits made to your svn repo after you issue the 'svn-push' command.

All Together Now

The script I mentioned at the beginning of this post is here. With it, I run a single command which extracts my commit message from the ChangeLog diff, commits locally, pushes to the Divmod svn repo and then pushes to Launchpad. A single command does everything I need, now: maintaining changes in both a bzr repo that can be easily branched by others on Launchpad as well as in my Subversion branch at work.

Once this project is ready to merge to trunk (if, in fact, it's final home is to be the Divmod svn repo), I'll do an svn up in the Combinator-created branch, unbranch, and commit to trunk. Upon the suggestion of JP, I'll probably also clean up the bzr-svn-created svn props, but other than that, overhead seems to be zero.

Subversion Update: I've been playing with this more, and here's another tidbit I didn't find documented anywhere: If you do a fresh bzr branch that had been associated with a svn repo in another working directory, you will need to rebind it to the svn repo you were working with before. You do that with the following command:

bzr bind svn+ssh://svn.yourhost.com/repo/YourProject/trunk

Google Code Update: If you are sync'ing a bzr branch with googlecode's subversion, you will need to prefix your initial push with svn:

bzr push svn+https://yourproject.googlecode.com/svn/trunk

Likewise, if you need to rebind, you'll use the following:

bzr bind svn+https://yourproject.googlecode.com/svn/trunk

Wednesday, June 04, 2008

TX-Theory

Twisted Community Projects is brand-spankin' new and out on Launchpad!

When Glyph named TX-theory, he did not specify what the "TX" stood for, presumably because he did not feel he had the right to name a theory which he had not been able to fully describe. According to Glyph himself:

"TX" stands for "TwistedmatriX", "Transmit", or "Twisted multipleXed, according to taste.

However, as presented in the upcoming docudrama by producer Chris "radix" Armstrong and director Duncan "oubiwann" McGreggor, "TX" stands for "Twisted Extensions," though this is also contended, sometimes by Chris himself who has already pitched a counter-docudrama (to an un-named Hollywood backer) focusing entirely on the non-extensionness of TX-theory.

Cynics have noted that the "TX" could be rot-13 of Glyph Lefkowitz, in an alternate English alphabet with an additional letter inserted between those of "T" and "X." Even more insidious are the rumors that "TX" stands for "Prophecy Blade Epic Destiny Quest Adventure."

From the definitive history (as yet unwritten, but leaked by the time-traveling nanobots whom we all will one day serve) on the matter:

The name TX-theory is slightly ambiguous. It can be used to refer to both the particular seventeen-dimensional Twisted construct that Glyph originally proposed, or it can be used to refer to a kind of theory which looks -- in various limits -- like the growing number of asynchronous, event-driven networking frameworks implemented in conventional four-dimensional space-time.

Apologies to the author(s) of the M-Threory article on Wikipedia. For a slightly more real take, read the Labs' announcement.

Update: There are now 14 projects registered as belonging to TX on Launchpad :-)

Monday, June 02, 2008

Divmod/Holden Web Partnership

As mentioned before in a few tweets and blog posts, Divmod's been working with Holden Web a lot lately. After lots of brainstorming, sweet jam sessions, and planning, we're finally ready to talk about it: our latest news item says it all :-)

We've got information up on the site about the topics covered in our joint training courses and workshops. You can contact either Divmod or Holden Web with any questions.

Electric Duncan

Pages