Thursday, July 31, 2008

New ULS Systems Blog

I'm currently drafting two new ultra large-scale systems blog posts, with one in particular being almost ready to go. While writing more on one of them today, a very cool thing happened: I received an email from the ULS systems team at the Software Engineering Institute of Carnegie Mellon University letting me know that they've got a new ULS blog site up. You can check it out here:

Be sure to read all the articles and go back often! As you can imagine, I'll be spending a lot of time there :-) I have a feeling this is the beginning of an emerging ULS community...

As for my forthcoming ULS systems blog posts, one concerns SOA and the other discusses currently extant code bases in the Python and Twisted Python communities that can be used for building ultra large-scale systems (or prototypes thereof) quickly and efficiently.

Tuesday, July 29, 2008

New Directions

Yesterday I submitted my resignation as COO to the Divmod officers, and today I forwarded it to the rest of the team. Divmod is headed in a new and wonderful direction, and I'm happy to have contributed to raising public awareness about our team, community, and the tech we use, thus increasing its value in the market. I am taking this opportunity to rest and then pursue interests of my own.

Even more than that contribution, I'm delighted to have worked with these guys for the past year. I've long been a supporter and fan of Divmod (since shortly after its inception, in fact). I was a community contributor before I was an employee and I will remain so for the foreseeable future. But I gotta say, that team in incomparable. The combination of technical excellence, creativity, pragmatic problem solving, quality engineering, humor and insight has made my time there a rich experience. They have made my time at Divmod one for the personal record books.

I'm ready for a break, though; the past year has been a long, hard pull...

I was originally courted by them for management, due to my community work. I deferred, and worked as a coder instead. This ended up being invaluable, as far as the insight it provided. After some early successes with a product release, I was offered the position of CEO, but deferred there too, with the mutual agreement that COO might be a better match for my skills. After a couple months as COO, I was put in charge of managing the direction of the company and raising funds, so I ended up being acting CEO anyway. I poured my heart and soul into Divmod, and it looks like that has payed off: the team is happy and they're headed for some good success. What's more, that leaves me in the enviable position of finally being able to surrender a massive workload :-)

For the next week or two, I'll be camping in the Rocky Mountains catching up on some rest and enjoying nature at her best :-) I've also got some fun sci-fi reading to catch up on (Stross and MacLeod).

When I get back, I'll be exploring 6-12 month consulting contracts... so if anyone hears anything interesting, do let me know!

Wednesday, July 23, 2008

In Memoria: The Great Work

The OSCON Tuesday Night Extravaganza was just fabulous: awards, laughter, brain-bending, and affirmation. The primary speakers were Mark Shuttleworth, r0ml, and Damian Conway; but I'm going to be focusing on r0ml's talk right now :-) Well, in part, anyway.

Let's back up to Monday night, first: Alex Martelli and I had a chance to wax philosophical about programming and software. It was wonderful. Both because it revealed Alex's code-spirit and because of the sympatico I felt as his passionate idealism resonated with mine. While Alex talked of the holy architecture of mosques and cathedrals, of the contributions that such artisans as stonecutters, masons, sculptors, and calligraphers made, he emphasized how each individual played an essential role in bringing these wondrous works into being, that each act was an offering to the ideals that formed the basis of the respective belief system.

What's more, though, Alex extended the analogy from religion to mysticism, saying that even more than builders of such great structures, coders are alchemists engaged in the magum opus. We are the transmutators. In our crucibles, the opposites of function and beauty unite; performance and elegance are commingled to produce the perfection of our art. Alex was careful to point out that he intended perfection in both an abstract and practical sense. On one hand, being able to create and actually deliver code that others found useful, regardless of the sex appeal (or lack thereof), can be viewed as a form of perfection. It is accomplishment; attainment of the goal. On the other hand, it's just something that someone wanted us to write; it's not a proof of Fermat's Last Theorem. It's useful; it serves a specific function.

Before I get to r0ml's talk, I want to mention UQDS as employed by the Twisted and Divmod communities. I think it's phenomenal and I enjoy working with that system. It's a well thought-out and proven process that tends to produce code of an extremely high quality. However, it's not my natural tendency. I like quick and dirty prototypes; a little messy code goes a long way. I like to throw something out there and then fix it up and apply polish incrementally, as dictated by need.

This is why I've been enjoying the Twisted Community Code project/group on Launchpad. Not only do you have the benefits of using a tool like bazaar that lets one branch other projects on a whim, but you've got a community space to put these explorations, where others can easily see what you're doing, check it out, and try something of their own. (There's a whole 'nother blog post I have coming about that.) However, this finally brings me to r0ml's talk: a new spin on the development process.

For those of you that have seen his phenomenal rhetoric talks, you'd be delighted to see what he did :-) He established a nice mapping from both Microsoft's development process as well as the one defined by Rational. He used the five canons of classical rhetoric: inventio, dispositio, elocutio, memoria, and pronuntiatio. However, the really brilliant thing was where he started the process: smack in the middle, right where I like to do it :-) And he justified this beautifully. His mapping was the following:
  • Memoria = Commit / Update
  • Pronuntiatio = Run / Use
  • Inventio = Bug Reporting / Patch Submission
  • Dispositio = Triage
  • Elocutio = Integration

The idea here being this: get what you've got done out there and in front of people's eyes. Everyone knows its crap; don't worry about it. Get it running and get others running it. Work on what matters most and integrate the changes. Repeat and continue.

I like to tease other Twisted devs that I tend not to do test-driven development, but bug-driven testing. What's interesting is that we both start with a requirements doc: for them, it's a development plan; for me, it's a bug/TODO list. The difference is that they then engage in Inventio whereas I start with Memoria. As r0ml said, with this model there is no development, there is only maintenance.

One of the other great things that r0ml mentioned about this process is that it not only gets you the developer started more quickly, it gets others started at the same time. Each programmer is engaged in a macroscopic genetic programming effort: everyone takes the source, mutates it, evolves it, reviews it, and the best implementations (or parts thereof) survive to become the basis for the next generation. Everyone gets to write at the same time; no one is blocked.

This development approach evokes images of philosophers from the Middle Ages sending letters to each other in cryptic alchemical symbols and diagrams, with all the implicit and explicit layers of meaning. I see this methodology as establishing the true foundation of the open source art: a gnostic, spirit-(of-open-souce)-ual transformation that brings us to improved states of mind and clarity.

The perfection of our art, whether sublime or mundane, can be merged in the mind of the developer as one... this union being our philosopher's stone. With each release of software engaged in this manner, we iterate the Great Work.

Wednesday, July 16, 2008

OSCON 2008

Hey all, thanks to a friend's amazingly generous offer, I'll be attending OSCON this year :-) I only have to pay for my airfare and food! I've contacted several people already who I know are going to be there (including Van Lindberg of Haynes and Boone and Bradley Kuhn of the SFC and the SFLC), and look forward to meeting up with others. Leave a comment or email me if you're going to be there!

Saturday, July 05, 2008

Native LoadBalancing for Twisted Apps

Yesterday, right before midnight, I tagged the 1.1.0 release of txLoadBalancer on Launchpad after completing the last of the planned features. There are some pretty radical changes that have been developed for this release... and the coolest part is this is just the beginning :-) (See the TODO if you don't believe me!)

You can checkout from lp:~oubiwann/txloadbalancer/1.1.0 or download from PyPI. If you're a PyPI expert, I've got some questions for you at the end of this post... Been having some sucky experiences with PyPI lately :-(

So here's what's going on with txLoadBalancer:

Improved API

The biggest thing you'll notice if you've switching from PythonDirector is the massive overhaul the API has undergone. Things are cleaner and generally more modern, with a concise and well-defined module layout.

New Load Balancing Algorithm

I've added support for a weighted host scheduler. Given a weight that represents the frequency a host should be used, a host will be randomly selected, based on it's weight. For example, with two hosts, one having a weight of 1 and the other having a weight of 3, host 2 will be chosen about 75% of the time and host 1 will get about 25% of the requests.

Right now, this algorithm has to make several calls to other parts of the code in order to get all the data it needs (it also builds some crazy iterators). As such, it's rather slow and performs poorly when compared to the very light-weight least-connections algorithm. That being said, the next release will include optimizations for the weighted scheduler that make use of a Twisted timer and caching.

Native Twisted Load-Balancing

Here's the sexiest part: you can now load-balance your Twisted application by using the txLB API; you don't even need to run the load-balancer as a separate app! This evolved as a feature after a conversation with an as-yet unnamed cloud hosting provider, a follow-up discussion with the Divmod team, and then some quiet pondering about ways in which Twisted applications could be supported in cloud/grid/massively-multi-core architectures.

The "self load-balancing" API in txLB is not a comlete solution for grid-hosting, but it is a first step in one direction (we've been discussing lots of others, too, including the use of our deployment tool).

Before I show you how to use the self load-balancing API, let's take a quick look at a normal Twisted application service:

You start that with the command twistd -noy myweb.tac. For use with the next example, you can also start two more, one on port 7002 and the other on port 7003.

Now here's what you do to make a self load-balanced app:

As you would expect, you need to indicate the proxy host:port, the algorithm to use, and the hosts that are to be balanced. The host setup assumes that you have three services running on localhost ports 7001, 7002, and 7003. All that's needed now is to just run that code with the usual twistd -noy myapp.tac. Also, for demonstration purposes, this is a somewhat simplified example of what is possible.

This may seem like a lot of extra work when compared to the simple web host above, but think about it: we're load-balancing here :-) This saves you from having to manage yet another application. With a few extra lines of code, you can keep it all in one place and have it manage itself.

Note that this API is in development and continuing to improve. The example above is from code running in trunk. For the more verbose configuration that is in the 1.1.0 release, be sure to see ./bin/txlbWeb.tac from the source tarball. To play with the latest and greatest, you'll want to checkout the code here: lp:txloadbalancer.

Other Goodies

Here is some other good stuff in the release:
  • You can now ssh into a txLB instance and mainipulate the load-balancer in real time from an interactive Python interpreter.
  • You can change the proxy to listen on a different port while the application is running (no restart requred!).
  • Changes made to the configuration while running are no longer volatile; they are saved to disk (and your old config gets backed up).
  • Work from Apple, Inc. was included in this release, too (they use the old PythonDirector in their Calendaring server). This includes a bug fix and management socket feature.
  • There is a significant jump in performance between this release and the previous one. I believe this to be due to the separation of concerns in the API, but haven't yet confirmed that.

Coming Work

There are a lot of exciting features coming for txLB. Just to name a few:
  • improved weighted algorithm
  • resources-based algorithm (a scheduler that determins the weight of a proxied host by memory, CPU, etc., utilization)
  • smarter proxied host failover and recovery
  • a heartbeat manager
  • txLB-powered application cloning (when started, an app will determine if it needs to run the clone as the managing load-balancer or simply as a proxied host)
  • auto-discovery of balanced hosts
  • proxy fail-over (a balanced host taking over as manager in the event that the manager goes down)
  • ApacheMQ/Stomp integration
  • LDAP/RADIUS authentication

Additionally, I'll be putting together some basic performance metrics contrasting Apache and load-balanced Twisted apps. I will also be comparing previous versions of txLB/PythonDirector with the latest release(s).

Problems with PyPI

I will close this post on a sad note: PyPI used to be an amazing experience for me (a couple years ago, when it was still being called "cheeseshop"). Everything worked as it was supposed to. This hasn't been the case when I've used it recently (over the past few months).

For all that I say about PyPI, I allow for the fact that I may just be missing something, and it may be entirely my fault. That being said, I spent about 3 hours online last night combing though the SIG mail list, the bug list on sourceforge, and blog posts about setuptools and PyPI, and could find no answers to my questions. Well, with the possible exception of a bug report, but it doesn't look like it was confirmed by a PyPI team member, so I'm not sure if it's valid or not.

Here are my issues:
  • When I upload my project using python [sdist|bdist_egg] upload, no metadata defined in my setup() function is presented on my package's PyPI page. When I click the metadata link, it's only got three sparse lines.
  • When I manually upload from the package's PKG-INFO itself, all the metadata is presented on the page as it should be, with the exception of the long description. It is in plain text instead of ReST (I am checking that it is valid ReST using distutils settings of reporter.halt_level = 5, reporter.report_level = 1, settings.pep_references = False, and settings.trim_footnote_reference_space = None; these are the same settings that Zope Corp uses to verify the ReST that it uploads to PyPI).
  • When I manually edit the long description in the form, I get the same thing: plain text, no ReST.
  • When I upload a package that is displayed properly on PyPI (such as zc.twist; uploaded as one of my projects by chaning the name), I get the same problem (this is why I think it might be something that I'm doing wrong...): no metadata, and when I upload the PKG-INFO manually, no ReST.
Why, oh why, cruel fates, does this not work any more? I used to be able to upload to PyPI without any of these issues...

Thursday, July 03, 2008

Divmod Tech: Making the "Next Gen" Grade

Last night, after I already posted the latest Twisted in the News, I came across another post that would have made the list had I found it sooner. However, this is a good opportunity to give it a little extra attention.

The title of the post is "Next Gen Web Dev: Playing with Python Twisted/Nevow/Athena" and I gotta say, that made my day :-) Between that post and Colin Alston's post that I mentioned in the News, Nevow had a good week. And people are appreciating it for the right reasons. It may not be the easiest web framework to use and certainly not the best documented, but when you need the flexibility to interact with your (Twisted) web server in particular ways as well as benefit from the functionality that COMET provides, Nevow comes out shining.

It's also refreshing to see new developers entering the community who not only see the potential of these tools (designed with that potential in mind) but are capable of taking advantage of it immediately. If nothing else, the author of that post has motivated me to finally merge the Athena tutorial to trunk in order to bring the publicly available and published content in sync with the new code that's in the branch.

Update: Along similar lines, but with more details, Tristan has provided an excellent write-up for this motivation to use Twisted/Nevow/Axiom/Mantissa. Be sure to check it out!