Thursday, April 23, 2009

Generators and Coroutines


This came up in the blog comments yesterday, but it really deserves a post of its own. A few days back, I was googling for code and articles that have been written regarding the use of Python 2.5 generators to build coroutines. All I got were many dead ends. Nothing really seemed to have any substance nor did materials dive into the depths in the manner I wanted and was hoping to find. I was at a loss... until I came across a blog post by Jeremy Hylton.

This was heaven-sent. Until then, I'd never looked at David Beazley's instruction materials, but immediately was dumbstruck at the beautiful directness, clarity, and simplicity of his style. He was both lucid in the topics while conveying great enthusiasm for them. With Python 2.5 generator-based coroutines, this is something that few have been able to do a fraction as well as David has. I cannot recommend the content of these two presentations highly enough:
Even though I've never been to one of his classes, after reading his fascinating background, I'd love the chance to pick his brain... for about a year or two (class or no!).

I've done a little bit of prototyping using greenlets before, and will soon need to do much more than that with Python 2.5 generators. My constant companion in this work will be David's Curious Course. Also, don't give the other slides a pass, simply because you already understand generators. David's not just about conveying information: he's gifted at sharing and shifting perspective.



Wednesday, April 22, 2009

Functional Programming in Python


Over the past couple years or so I've toyed with functional programming, dabbling in Lisp, Scheme, Erlang, and most recently, Haskell. I've really enjoyed the little bit I've done and have broadened my experience and understanding in the process.

Curious as to what folks have done with Python and functional programming, I recently did a google search I should have run years ago and discovered some community classics. I'm posting them here, in the event that I might spare others such an error in oversight :-)
I've always enjoyed David's writing style, though I've never read his FP articles until now. They were quite enjoyable and have aged well, despite referencing older versions of Python. Andrew's HOWTO provides a wonderful, modern summary.

I make fairly regular use of itertools but have never used the operator module -- though I now look forward to some FP idiomatic Python playtime with it :-) I've never used functools, either.

Enjoy!



Tuesday, April 21, 2009

After the Cloud: To Atomic Computation and Beyond


After the Cloud:
  1. Prelude
  2. So Far
  3. The New Big
  4. To Atomic Computation and Beyond
  5. Open Heaps
  6. Heaps of Cash
  7. Epilogue

To restate the problem: we've got cloud for systems and we've got cloud for a large number of applications. We don't have cloud for processes (e.g., custom, light-weight applications/long-running daemons).

Personally, I don't want a whole virtual machine to myself, I just need a tiny process space for my daemon. When my daemon starts getting slammed, I want new instances of it started in a cloud (and then killed when they're not needed).

What's more, over time, I want to be writing my daemon better and better... using less of everything (memory, CPU, disk) in subsequent iterations. I want this process cloud to be able to handle potentially significant changes in my software.

Dream Cloud

So, after all that stumbling around, thinking about servers in the data center as the horsepower behind distributed services, and then user PCs/laptops as a more power-friendly alternative, the obvious hit me: phones. They are almost ubiquitous. People leave them on, plugged in, and only use them for a fraction of that time. What if we were able to construct a cloud from cell phones? Hell, let's throw in Laptops and netbooks, too. And Xboxes, Wii, and TiVos. Theoretically, anything that could support (or be hacked to support) a virtual process space could become part of this cloud.

This could be just the platform for running small processes in a distributed environment. And making it a reality could prove to be quite lucrative. A forthcoming blog post will explore more about the possibilities involved with phone clouds... but for now, let's push things even further.

When I mentioned this idea to Chris Armstrong at the Ubuntu Developer Conference last December, he immediately asked me if I'd read Charles Stross' book Halting State. I had started it, but hadn't gotten to the part about the phones. A portion of Stross' future vision in that book dealt with the ability of users to legally run programs of other's phones. I really enjoyed the tale, but afterwards I was ready to explore other possibilities.

Horse-buggy Virtualization


So I sat down and pondered other possibilities over the course of several weeks. I kept trying to think like business visionaries, given a new resource to exploit. But finally I stopped that and tried just imagining the possibilities based on examples computing and business history.

What's the natural thing for businesses to do when someone invents something or improves something? Put new improvements to old uses, potentially reinventing old markets in the process. That's just the sort of thing that could happen with the cloudification of mobile devices.

For examples, imagine this:
  • Phone cloud becomes a reality.
  • Someone in a garage in Silicon Valley buys a bunch of cheap phones, gumstix, or other small ARM components, rips off the cases, and sells them in rack-mountable enclosures.
  • Data centers start supplementing their old hardware offering with this new one that lets them use phone cloud tech (originally built for remote, hand-held devices) to sell tiny fractions of resources to users (on new, consolidated hardware... like having hundreds of phone uses in a single room with full bars, 24/7).
  • With the changing hardware and continuing improvements in virtualization software, more abstraction takes place.
  • Virtualization slowly goes from tool to prima materia, allowing designers not to focus on old-style, horse-drawn "machines" like your grandpa used to rack, but rather abstract process spaces that provide just what is needed, for example, to enable a daemon to run.
Once you've gotten that far, you're just inches from producing a meta operating system: process spaces (and other abstracted bits) can be built up to form a traditional user space. Or they can be used to build something entirely different and new. The computing universe suddenly gets a lot more flexible and dynamic.

Democritus Meets Modern Software

So, let's say that my dream comes true: I can now push all my tiny apps into a cloud service and turn off the big machines I've got colocated throughout the US. But once this is in place, how can we improve our applications to take even better advantage of such a system, one so capable of massively distributing our running code?

This leads us to an almost metaphysical software engineering question: how small can you divide an application until you reach the limits of functionality, where any further division would be senseless bytes and syntax errors? In terms of running processes, what is your code atom?

Prior to a few years ago, the most common answer would likely have been "my script" or "my application". Unless, of course, you asked a Scheme programmer. Programming languages like Scheme, Haskell, and Erlang are finding rapidly increasing acceptance as solutions for distributed programming problems because functional programming languages lend themselves easily to the problem of concurrency and parallelism.

If we had a massive computing cloud (atmosphere, more likely!) where we could run code in virtual process spaces, we could theoretically go even further than running a daemon: we could split our daemon up into async functions. These distributed functions could be available as continuously running microthreads/greenlets/whatever. They could accept an input and produce an output. Composing distributed functions could result in a program. Programs could change, failover, improve, etc., just by adding or removing distributed functions or by changing their order.

From Atoms to Dynamic Programs

Once we've broken down our programs into distributed functions and have broken our concept of an "Operating System" down into virtual process spaces, we can start building a whole new world of software:
  • Software becomes very dynamic, very distributed.
  • The particulars of hardware become irrelevant (it just needs to be present, somewhere).
  • We see an even more marked correlation between power consumption and code, where functions themselves could be measured in joules consumed per second.
  • Just for fun, let's throw in dynamic selection of fuctions or even genetic algorithms, and we have ourselves one of the core branches of the predicted Ultra-large Scale Systems :-)
I mention this not for cheap thrills, but rather because of the importance of having a vision. Even if we don't get to where we think we're going, by looking ahead and forward, we have the opportunity to influence our journey such that we increase the chances of getting to a place equal to or better than where we'd originally intended.

From a more practical perspective: today, I'm concerned about running daemons in the cloud. Tomorrow I could very well be concerned about finer granularity than that. Why not explore the potential results of such technology? Yes, it my prove infeasible now; but even still, it could render insights... and maybe more.

A Parting Message

Before I wind this blog post down, I'd like to paste a couple really excellent quotes. They are good not so much for their immediate content, but for the pregnant potentials they contain; for the directions they can point our musings... and engineerings. These are two similar thoughts about messaging from two radically different contexts. I leave you with these moments of Zen:

On the Erlang mail list, four years ago, Erlang creator Joe Armstrong posted this:
In Concurrency Oriented (CO) programming you concentrate on the concurrency and the messages between the processes. There is no sharing of data.

[A program] should be thought of thousands of little black boxes all doing things in parallel - these black boxes can send and receive messages. Black boxes can detect errors in other black boxes - that's all.
...
Erlang uses a simple functional language inside the [black boxes] - this is not particularly interesting - *any* language that does the job would do - the important bit is the concurrency.
On the Squeak mail list in 1998, Alan Kay had this to say:
...Smalltalk is not only NOT its syntax or the class library, it is not even about classes. I'm sorry that I long ago coined the term "objects" for this topic because it gets many people to focus on the lesser idea.

The big idea is "messaging" -- that is what the kernal of Smalltalk/Squeak is all about... The key in making great and growable systems is much more to design how its modules communicate rather than what their internal properties and behaviors should be. Think of the internet -- to live, it (a) has to allow many different kinds of ideas and realizations that are beyond any single standard and (b) to allow varying degrees of safe interoperability between these ideas.

If you focus on just messaging -- and realize that a good metasystem can late bind the various 2nd level architectures used in objects -- then much of the language-, UI-, and OS based discussions on this thread are really quite moot.

Resources

Next up: The Business of Computing Atmospheres



Monday, April 20, 2009

After the Cloud: The New Big


After the Cloud:
  1. Prelude
  2. So Far
  3. The New Big
  4. To Atomic Computation and Beyond
  5. Open Heaps
  6. Heaps of Cash
  7. Epilogue

Intermission

I've made a few hints so far about what cloud service I'd like to see come into being, and at the end of this post, we'll get closer to discussing that. Hang in there: the post after this one will describe that in more detail. Then, after that, there will be at least one post which will take a peek at some of the many business opportunities that could come from this.

A Passing Comment

At PyCon 2006 in Dallas, TX, an after-hours event was held in a local bookstore. At one point during that evening, Itamar, Moshe and I got into a discussion about miniaturization and Moshe went off on a hilarious rant that Itamar and I just sat back and enjoyed. His whole tirade was based on the beauty and perfection of gumstix. This was the first I'd heard of them; I had no idea a product like that was on the market, and it hit me like a ton of bricks.

For the next day or so all I could think about was buying a boxload of gumstix computers and doing something with them -- anything! And not just because they were the coolest toys ever, but because there was something about them that I could just feel was a part of the future of computing (see my 2004 post on Dinosaurs and Mammals). It seemed that these miniture devices could help prototype what was destined to be one of the most exciting fields in the coming years for both systems and application engineers.

Sadly, I never did get that box :-) But I neither did I stop thinking about them. Confronted with the problem of small distributed services sitting on big, barely-used iron, gumstix haunted my musings.

Tiny Apps in the Cloud?

When at Divmod, one of the strategies that Glyph and I were working on concerned Twisted adoption in web hosting and cloud environments. The differences between CGI and Twisted applications are magnified when one considers a cloud environment like Mosso and one that would suitably support Twisted design principals. I spent a lot of time pondering the ramifcations of that one, let me tell you. A potential merger permanently postponed those business possibilities, but a nice side benefit was the forking of Python Director into a pure-Twisted conversion, txLoadBalancer (with the beginnings of native, in-app load-balancing support).

Thoughts of adjusting tiny apps to be able to run on big cloud hardware still grated, though. It felt dangerously close to pounding round pegs into square holes. What I really wanted was something closer to the future hinted at by Ultra Large-Scale Systems research: massively distributed, fault-tolerant services running on everything :-) Until then, though, I would have been satisfied with tiny apps on tiny hardware, consuming only the resources they need in order to provide the service they were designed for.

This brought up ideas of distributed storage, memory, and processing as well as the need for redundacy and failover. But tiny. All I could see was tiny hardware, tiny apps, tiny protocols, tiny power consumption. For me, tiny was big. The easiest "tiny" problem to address with small devices was storage. And I already knew the guys that were working on the problem.

Distributed Storage Done Right

There's an odd, rather abstract parallel between EC2 and Tahoe (a secure, decentralized, fault-tolerant filesystem). EC2 arose in part from a corporation acting out of its best interests: turn a liability into an asset. For Tahoe, the "body" in question isn't a corporation, but rather a community. And the commodity is not bottom lines, but rather data owned and treasured by members of a data-consuming community.

Here's a quick description of Tahoe from a 2008 paper:
Tahoe is a storage grid designed to provide secure, long-term storage, such as for backup applications. It consists of userspace processes running on commodity PC hardware and communicating with [other Tahoe nodes] over TCP/IP.
Tahoe is written in Python using Twisted and a capabilities system inspired by those defined by E. But what does this mean to a user? It means that anyone can setup and run a storage grid on their personal computers. All data is encrypted and redundant, so you don't need to trust members of the community (your data grid), you just need to set aside some disk space on your machines for them.

In a message to the Tahoe mail list, I responded to an associate who was exploring Tahoe for in-memory use by Python mapreduce applications. I wanted in-memory distributed storage for a different use case (tiny apps on tiny devices!) but our interests were similar. It turned out one of the primary Tahoe developers was working on related code; something that could be used as the basis for future support for distributed, solid-state devices.

Here's some nice dessert: Twisted coder David Reid was reported to have gotten Tahoe runnig on his iPhone. Now we're talking ;-) (Update: David has informed me that Allmydata has a Tahoe client that runs on his iPhone).

Processing in the Right Direction

But what about the CPU? Running daemons? Can we do something similar with processing power? If a whole virtual machine is too much for users, can we get a virtual processing space? I want to be able to run my process (e.g., a Twisted daemon) on someone else's machine, but in such a way that they feel perfectly safe running it. I want Tahoe for processes :-)

As part of some recent experiments in setting up a virtual lab of running gumstix ARM images, I needed to be able to connect mutliple gumstix instances in a virtual network for testing purposes. In a search for such a solution, I discovered VDE. Then, unexpectedly, I ran across a couple fascinating wiki pages on the site of related super-project Virtual Square Networking. Their domain is currently not resolving for me, so I can't pull the exact text, but here's a blurb from a sister project on SourceForge:
View OS is a user configurable, modular process virtual machine, or system call hypervisor. For each process the user is able to define a "view of the world" in terms of file system, networking, devices, permissions, users, time and so on.
Man, that's so close, I can almost taste it!

Where is all this techno-rambling going? Well, I'm sure some of you have long since guessed by now :-) Regardless, I will save that for the next post.

Oh, and yes: tiny is the new big.

Next Up:
A Passing Message



Sunday, April 19, 2009

After the Cloud: So Far


After the Cloud:
  1. Prelude
  2. So Far
  3. The New Big
  4. To Atomic Computation and Beyond
  5. Open Heaps
  6. Heaps of Cash
  7. Epilogue

Systems Engineering in a Box

The recent redefinition of "the cloud" as a service and commodity is a brilliant bit of frugal resource management (making use of idle resources in an expensive data center) coupled with flawless marketing. Yes, from a business perspective, that's an amazing coup. But it's the 30,000 foot technical perspective that really impresses me:

In the same way that software frameworks, their libraries, and best practices have, through the trials of last 40 tears, productized application engineering, the cloud has started to experience something similar. What everyone is now calling the cloud is really the productization of systems engineering.

Systems engineering (and the management of related resources) has proven to be an expensive, time-consuming endeavor best left to the experts. Sadly, those that need it are often in the unenviable position of having to determine who the experts are without having the proper background to do so effectively. When the planning, building, and management of large systems works well, it's a labor of sweat and blood. When it doesn't, it's the same thing, with a nightmare tinge about the whole thing coupled with an odd time-dilation effect.

It seems that in applicable circumstances, some businesses are spared that nightmare by using a cloud service or product.

Bionic CGI

As someone with a long history and interest in application development, I was particularly keen on Google App Engine when it came out. This was a different take on the cloud, one that Mosso also seems to be embracing: upload an application that is capable of having it's data access and views distributed/load balanced across multiple systems (virtual or otherwise).

This is essentially CGI's grandchild. You have an application that needs to be started up by any number of machines in response to demand. A CGI app in Mosso will probably need very few (if any) adjustments required in order to run "in the cloud." Google is a special case, since developers are using custom, black-box infrastructure built by Google (for insights into this, check out these papers), but I'd be willing to bet someone lunch that there is room for a CGI analogy at some level of Google App Engine.I guess with Google, we kind of have both application and systems engineering in a box, in so far as the systems support your application.

At any rate, it's CGI better than it was before. Better, stronger, faster.

The Rub

However fascinating these cloud offerings may be, I find myself not getting what I need. As a developer of Twisted applications, I'm interested in small apps. Hell, I don't even like running databases and full-blown web servers. A while ago, I spent a couple years working on some Twisted-based application components that could be run as independent services (thus load-balanceable) and completely replace the standard web server + database + lots of code routine for application development.

So what about developers out there like me, who want to run tiny apps? We don't need "classic" web hosting, nor CGI in the cloud, nor cloud-virutualized versions of large machines.

As a segment of the population, business consideration for developers such as myself might seem like a waste of time. But before dismissing us, consider this:
  1. Exploring small niche's like this one often lead to interesting revelations.
  2. Market segments that have proven quite vibrant may be able to expand into even greater territories (e.g., the iPhone apps phenomena).
Next up: Tiny > *



Saturday, April 18, 2009

After the Cloud: Prelude


After the Cloud:
  1. Prelude
  2. So Far
  3. The New Big
  4. To Atomic Computation and Beyond
  5. Open Heaps
  6. Heaps of Cash
  7. Epilogue

These days, it seems that no matter where we go, we hear something about "the cloud." It's not really buzz anymore... it has become far more accepted and widely discussed to be that. For some organizations it's actually part of their current, every-day infrastructure. For others, it soon will be. As far as I'm concerned, now's the perfect time to start discussing what's next :-)

If you've spent any time reading some of the blog content I've managed to post over the past several years, you've probably noted that I like to explore the long view (if rather informally). Well, that's what I've got in store for you now: a series of blog posts that explore the long view of a post-cloud industry. Hopefully, with some new twists and turns along the way.

First off, I want to cover some basic ground, so the first couple posts might be a little less interesting that those that follow. Fortunately, I've been pondering these particular ideas since my month sabbatical last August -- this means I've already got most of the material written and ready to go!

These posts are going to take a peek at practical, hands-on ideas regarding the ways in which one might make use of current nascent tech to build prototypes for tomorrow's infrastructure, what that infrastructure might be, business ideas about what do do with that tech, and even future possibilities for information-based markets.

Hope you enjoy it as much as I've enjoyed thinking about it :-)


Friday, April 17, 2009

ULS-SIG: New Python Special Interest Group


After several discussions at the end of last summer and an incubation in Meta-SIG, Steve Holden, Jim Baker and I are pleased to announce the Python special interest group for ultra large-scale systems. For more about ULS system, you might want to read this post or this one. The SIGs page has been updated, so you can find us there with the rest of them (subscribe and archive links, too), and we've even got our own page :-)

The initial group of interested parties (and thus the first members of the list) represent an interesting cross-section of the Python community, including the following:
  • Jython hackers
  • Twisted hackers
  • Stackless hackers
  • XMPP experts
  • MMPORG developers
  • SOA and Business Process consultants
  • General technology and software companies
The technological umbrella of ULS systems covers a vast array of topics and interests, but basic principle unifying all of these is their potential contribution to making massive, highly distributed systems a functional reality.

One overview of ULS systems research states that we currently don't have an effective understanding of software (it's nature, development, and management) at the scale anticipated for ULS systems. These "fundamental gaps" will hinder the development of such systems until they can be crossed. Doing so will require breakthroughs in many fields with insights and experience gained over time.

Python programmers represent an extraordinary segment of the population: creative, curious, motivated, communicative, and deeply intelligent individuals. Our community is filled with minds that continuously produce solutions for a vast array of problems across a great many disciplines.
If there's any one group out there that could pull this off, I think it's ours :-)

A future post will provide a sketch of areas of interest in Python that are already sneaking up on the gaps outlined in ULS systems reports. There is a lot of software and supporting libraries that have an obvious connection to ULS systems, but even more fun are the ones that don't... and isn't always the darkest corners that yield the most unlooked for surprises?

While you're waiting for that, though, feel free to join us on the mail list!


Thursday, April 16, 2009

Newest Members of the PSF


Since it's now official, I can blog about it: I was delighted to discover that JP, radix, glyph, and I were voted into the Python Software Foundation this year at PyCon :-) Not only is that four for Twisted, but it's two for Divmod and two more for Canonical.

There was another Canonical employee -- Matthias Klose -- voted in as well, and with Barry Warsaw and Gustavo Niemeyer, that brings a tally for Canonical/Ubuntu to at least 5, and maybe more (let me know if I've missed you!).

It gets even better, though: check out the rest of the new members list:
  • Jim Baker
  • Ben Bangert
  • James Bennett
  • Graham Dumpleton
  • Martijn Faassen
  • Michael Fletcher
  • Michael Foord
  • Doug Hellmann
  • Adrian Holovaty
  • Jacob Kaplan-Moss
  • Jesse Noller
  • Benjamin Peterson
  • Ted Pollari
  • Mark Ramm
  • Malcolm Tredinnick
  • Kirby Urner
  • Robert Dino Viehland
  • Thomas Waldmann
  • Frank Wierzbicki
I am thrilled to be a PSF member and look forward to deepening my support of Python through this new level of involvement. Even more, though, I'm honored and delighted to be working with both the esteemed veteran members as well as these amazing new additions.