Electric Duncan

Thursday, June 12, 2008

Holden Web/Divmod Seminar

For folks in the D.C. area (or those who can get there in July), we've got a special half-day seminar planned. Check out our news item about it.

Note that there are only 12 seats, some of which have already been reserved. Get, while the getting's good...

Tuesday, June 10, 2008

Bazaar with Subversion and Combinator

For the past couple days, I've been experimenting with using Bazaar and Combinator more or less simultaneously. As you may know by now, Combinator is a tool that wraps some of Subversion's ugliness (mostly merging), helps manage branches, and sets Python paths for development environments. We use it extensively (almost exclusively) at Divmod.

One of my recent side projects has evolved into useful code more quickly than I had anticipated, so I thought I'd put it up on Launchpad in the Twisted Community Code. This, of course, led to questions about one-time imports, mirroring, and dual bzr/svn management. I eventually opted for the last, using the bzr plugin bzr-svn. Not having a lot of experience with Bazaar, I was at a bit of a loss, at first: there don't seem to be any dummy docs to get us beginners up to speed.

Through some painful, time-consuming trial and error and a couple dead ends, I arrived at a process that works for me, and codified it in a script. The comments in that script seemed generally useful, and given the dearth of docs, I thought I'd turn the comments into a blog post.

The Plugin

Once I figured out the right way to use bzr-svn, it was actually much easier than I thought it would be. Here are the basics: you need to have bzr installed and then you need to install bzr-svn, which is actually a bzr plugin and not a separate tool. When you have bzr-svn installed, you will have additional bzr commands at your disposal which, as you might guess, let you interoperate with an svn repository.

Two Become One

So here's how you get started: create your Subversion branch (we use Combinator) and get your working dir ready to code. You can either add dirs and files now, or do that later; it doesn't matter.

Then, in this working directory, perform a bzr checkout:

bzr co . bzrtest
cd bzrtest

This will create a Bazaar branch from your Subversion (Combinator) branch. 'bzrtest' (or whatever you name it) is your new bzr+svn branch and it is here where you'll be doing all of your work, committing, pushing to Subversion, and (in my case) pushing to Launchpad.

If your Subversion repository has a long history, you probably don't want to perform a 'bzr update' -- that'll just end in tears (it could take days to finish, use up lots of memory, require multiple restarts, and consume disk space by the gigaliter).

Launchpad

For my project, I had already registered a branch on Launchpad via the web interface, so I was ready to push the new Bazaar branch just created with the checkout command above:

bzr push lp:~oubiwann/txevolver/dev --use-existing-dir

I then logged into the web interface again, and set this newly pushed branch as the main development effort for the project. All future pushes (during this development phase) will now be done with the following command:

bzr push lp:txevolver

Future commit-push cycles just look like this:

bzr commit --local -m "My message"
bzr push lp:txevolver

Keep in mind that you can do multiple commits with Bazaar before you push to a server.

The Divmod Repo

Once you've done a local commit (or many local commits), you're ready to start pushing changes to your Subversion repository. This is where you use one of the commands that is provided by the bzr-svn plugin:

bzr svn-push svn+ssh://myRepo

And in my case, that's the following:

bzr svn-push \
svn+ssh://divmod.org/svn/Divmod/branches/genetic-programming-2620/Evolver

If you have done more than one local commit since your last push, you'll see a series of commits made to your svn repo after you issue the 'svn-push' command.

All Together Now

The script I mentioned at the beginning of this post is here. With it, I run a single command which extracts my commit message from the ChangeLog diff, commits locally, pushes to the Divmod svn repo and then pushes to Launchpad. A single command does everything I need, now: maintaining changes in both a bzr repo that can be easily branched by others on Launchpad as well as in my Subversion branch at work.

Once this project is ready to merge to trunk (if, in fact, it's final home is to be the Divmod svn repo), I'll do an svn up in the Combinator-created branch, unbranch, and commit to trunk. Upon the suggestion of JP, I'll probably also clean up the bzr-svn-created svn props, but other than that, overhead seems to be zero.

Subversion Update: I've been playing with this more, and here's another tidbit I didn't find documented anywhere: If you do a fresh bzr branch that had been associated with a svn repo in another working directory, you will need to rebind it to the svn repo you were working with before. You do that with the following command:

bzr bind svn+ssh://svn.yourhost.com/repo/YourProject/trunk

Google Code Update: If you are sync'ing a bzr branch with googlecode's subversion, you will need to prefix your initial push with svn:

bzr push svn+https://yourproject.googlecode.com/svn/trunk

Likewise, if you need to rebind, you'll use the following:

bzr bind svn+https://yourproject.googlecode.com/svn/trunk

Wednesday, June 04, 2008

TX-Theory

Twisted Community Projects is brand-spankin' new and out on Launchpad!

When Glyph named TX-theory, he did not specify what the "TX" stood for, presumably because he did not feel he had the right to name a theory which he had not been able to fully describe. According to Glyph himself:

"TX" stands for "TwistedmatriX", "Transmit", or "Twisted multipleXed, according to taste.

However, as presented in the upcoming docudrama by producer Chris "radix" Armstrong and director Duncan "oubiwann" McGreggor, "TX" stands for "Twisted Extensions," though this is also contended, sometimes by Chris himself who has already pitched a counter-docudrama (to an un-named Hollywood backer) focusing entirely on the non-extensionness of TX-theory.

Cynics have noted that the "TX" could be rot-13 of Glyph Lefkowitz, in an alternate English alphabet with an additional letter inserted between those of "T" and "X." Even more insidious are the rumors that "TX" stands for "Prophecy Blade Epic Destiny Quest Adventure."

From the definitive history (as yet unwritten, but leaked by the time-traveling nanobots whom we all will one day serve) on the matter:

The name TX-theory is slightly ambiguous. It can be used to refer to both the particular seventeen-dimensional Twisted construct that Glyph originally proposed, or it can be used to refer to a kind of theory which looks -- in various limits -- like the growing number of asynchronous, event-driven networking frameworks implemented in conventional four-dimensional space-time.

Apologies to the author(s) of the M-Threory article on Wikipedia. For a slightly more real take, read the Labs' announcement.

Update: There are now 14 projects registered as belonging to TX on Launchpad :-)

Monday, June 02, 2008

Divmod/Holden Web Partnership

As mentioned before in a few tweets and blog posts, Divmod's been working with Holden Web a lot lately. After lots of brainstorming, sweet jam sessions, and planning, we're finally ready to talk about it: our latest news item says it all :-)

We've got information up on the site about the topics covered in our joint training courses and workshops. You can contact either Divmod or Holden Web with any questions.

Tuesday, May 27, 2008

Twisted and Divmod: A Cheater's Setup Guide

I've been helping a few folks out on IRC lately. They've wanted to know how to setup Twisted and Divmod without doing any installs, running directly from SVN. They've been in luck, because that's actually how we develop at Divmod :-)

Here are the Cliff Notes (this stuff is available on the wikis, but it's spread out):

Install the dependencies:

pycrypto 2.0
SQLite 3.2.1
PySQLite 2.0
PyTZ 2005m-1
PIL 1.1.6

Get the Divmod code first (we'll get Twisted next):

mkdir ~/lab
cd ~/lab
svn co http://divmod.org/svn/Divmod/trunk Divmod/trunk

Set the Combinator env vars (if you want to persist this, then you'll need to put it in your .profile or shell .rc file):

eval `python ~/lab/Divmod/trunk/Combinator/environment.py`

Have Combinator start "tracking" Divmod and Twisted, thus managing PYTHONPATH for them (note that chbranch will detect that Twisted has not been checked out and will do so automatically):

chbranch Divmod trunk
chbranch Twisted trunk svn://svn.twistedmatrix.com/svn/Twisted/trunk

Get the new project dirs into the env:

eval `python ~/lab/Divmod/trunk/Combinator/environment.py`

Executing the whbranch command should give you the following:

Divmod: trunk
Twisted: trunk

If you start up a Python interpreter, you'll be able to import from twisted, mantissa, axiom, etc.

Update: the instructions have been edited and shortened, thanks to insight from Glyph.

Mantissa: An Alternative to LAMP

I first started drafting this post a few months ago, out of excitement for the work that JP and Glyph have been doing in the Divmod open source stack codebase. I was planning on entering the acronym fray with a title like "*MAP: An Alterantive to LAMP" where *MAP (pronounced "starmap") would be "Any OS, Mantissa, Axiom, and Python." A good friend of mine whose opinion I value said that *MAP was a terrible name, and after chatting about it with Glyph, he commented "Why not keep it really simple? Just say 'Mantissa.'"

And so it is :-)

For those that don't know, Mantissa is the Twisted application server and Axiom is a Twisted-based object database. By virtue of what are called "deferreds," Twisted allows you to write highly concurrent applications. Mantissa -- the Divmod stack (Mantissa entails Python, Twisted, and Axiom because it requires them) -- provides developers a means of scaling their Twisted-based, asynchronous applications. This means that you can go from experiments or prototypes to multi-node production deployments with the same set of tools and code.

As such, this is a direct competitor for LAMP. Here are some questions about that: What is the value of a full stack? Why is an alternative to LAMP good or needed? What is a good alternative?

Stacked Development Value

What does a full stack give us, as developers? From a practical perspective, it:

eliminates the overheard involved in setting up a system in preparation for development
provides a development toolset
provides a context within which design patterns have been established and utilized

In other words, we can do things like pop in a CD, install an OS, have it meet all the software dependencies for our development tasks (since we're talking about LAMP, we mean development for the web), and either know how to build what we need or who to ask that can point us in the right direction. LAMP gives us this and, thanks to OS distributions like Ubuntu, gives it to us cheaply through simple button-pushing.

Do notice, however, that I said nothing about "going live" or "pushing to production"...

An Engineer's Perspective

In a recent conversation, Sean Reifschneider of tummy.com had this to say about LAMP:

"The problem with the LAMP stack is that it's not a solution for the worst case scenarios. It's great for development: you throw it all together and start writing code. It's fairly okay for low-volume production use. But you need to plan for DoS attacks, search engine bot crawls, and spammer email address harvesting. Default LAMP installs fall over under such conditions."

This is a point that bears repeated belaboring: the network is violent and unpredictable. Connectivity can go away at any moment due to causes at pretty much all layers of the OSI model. The best practices for deploying applications in a production environment that keep this in mind are vast and varried. This is the domain of systems experts.

Sean made further comments concerning Google, that App Engine is so great because you write your code and then just throw the whole thing in their grid, and bam! instant scalability, protected by the (hopefully) same mechanisms that protect all of Google's publicly-facing web assets.

LAMP distributions productized and made freely available the otherwise painstaking process of compiling and installing a Linux kernel, Apache, a database, and your preferred programming language. The painstaking process was one that developers engaged in for software development. But what about the ones that systems engineers engage in for production deployments?

Google has addressed this in a "small way": massive in infrastructure support, but small in features. Knowning Google's penchant for incremental and steady service improvements, they've got plans for additional features. But I think everyone can agree that they're not going to try to meet everyone's needs all the time. Regardless, they are moving in the right direction: innovating a new platform.

Something for Both

On just this topic (innovating or finding a new platform), Albert Wenger of Union Square Ventures said the following:

"What then is needed? A platform that is created from the ground up ... What would such a platform look like? It would be hosted and (nearly) infinitely scaleable. It would provide object storage that’s as simple as saying 'here’s an object, store it' ... user authentication, authorization and access control. Flexible processing of pretty URLs. Easy creation and maintenance of page templates. Ability to send emails and process bounces. Handling of RSS feeds (inbound and outbound). Support for mobile access and possibly even voice capabilities."

Anyone that knows the Divmod software will know why this tickled us so: we have an object database (Axiom) with built-in user authentication, we've got object publishing (even with pretty URLs) and templating with Nevow, we've got mail services, feed support, mobile access and SIP. However! This isn't an advertisement; it's an illustration. The platform is part of the network, and in a sense, it is the network. Considerations for rapid application development need to be regarded very highly; I think it's fairly uncontested common knowledge that LAMP has proved this. Just as highly, though, we need to consider the needs of systems and of the engineers that are integrating them.

Google is making parts of its infrastructure available to developers now. With the dual ease of development and deployment, they are innovating engineering for us. They are only one of many, however. We need to be asking ourselves what our applications are, what the network is, what services are, and what our dev teams and engineers need.

Epilogue

This brings me to what I want for my birhtday :-) Hey IBM! Sun! I want access to a Blue Gene (a la Project Kittyhawk) or a Sun Grid. I want to prove the efficacy of LAMP alternatives in the changing internet, of Python's continued pertinence, Twisted's developmental power and Mantissa's deployment capabilities.

Tuesday, May 20, 2008

darkstat, For the Win

This is a quick self-response to my tweet to the lazyweb (is it still a tweet when it's Pownce and not Twitter?) today. I couldn't remember the name of a really handy network monitoring tool I used to use. It was similar to ntop but used a fraction of the resources and had a very limited yet perfectly satisfactory feature set. I've been having some crazy network utilization weirdness at the office lately, and I've wanted to peek at some trends without setting up NetFlow for my router or messing with ntop.

The answer was darkstat. It was my own memory that eventually came to the rescue, not Keyword Roulette on Google. Version 3.x is out and available for Mac OS X 10.5 via the latest MacPorts version (1.6).

This is all I needed to get it running:

sudo port selfupdate
sudo port install darkstat
sudo /opt/local/sbin/darkstat --debug -i en0 -l 192.168.4.0/255.255.255.0

Then, I just had to hit localhost:667...

I don't know what's up with the Google Juice for this guy's page, but it took me forever to find! I was searching for all the keywords like "ntop" (which was mentioned on his site at one point, I think), "network", "dark", "lightweight", "monitoring", etc. You get the picture. Hopefully this blog post will help when others are looking for it, too.

Monday, May 19, 2008

Required Reading: Ultra Large-Scale Systems

At Divmod, we're always talking about the future of computing, software, and the network. This usually focuses on our work with Twisted or the Divmod platform. But we have also spent considerable time assessing research in the area of what is called "ultra large-scale systems." Our primary business interest with this revolves around development, deployment, and management. However, there is a great deal of work that needs to be done to make ULS systems a reality.

I have a series of blog posts planned to discuss ULS systems in the areas where I have a vested interest. Despite the fact that a popular study on the matter was funded by the U.S. Department of Defense (conducted by the Carnegie Mellon Software Engineering Institute), I will not be discussing this technology in the context of war efforts nor national defense. Instead, I will be engaging in this discussion within the context of the medical/health services field, per the example given by Richard Gabriel in his presentation (PDF) to the Chinese University of Hong Kong.

This is especially pertinent today, as Google prepares for its Google Health announcement. Even though there are by some estimates (optimistic ones, in my view) 25 years of research ahead of us, it is generally agreed that ULS systems will consist aggregate sub-systems, built incrementally over time. I don't believe anyone is under the illusion that Google does not want to produce the first ULS system in history, and they are making rapid progress towards this goal. Google Health brings this point home very clearly, in the context of the research that has been done in this area.

The world is rapidly changing. The most important issues in technology and business are not who is going to create the next catchy "Web 2.0" application, or what mega corp games are being played in the Great Silicon Valley Soap Opera. The important issues are how the systems of the next 100-200 years are being built now, who they are being built by, and who as access to that technology.

We are a powerful and creative community. We are concerned about distributing power to the people, education, privacy, and freedom -- open source is built upon these principles. If we want those who are building ULS systems to build them fairly and with our concerns in mind, then we must get involved now. We must start building the necessary tools.

As for the reading, a couple of these were linked to in this blog post. Here is a list that should get your brain revved up and ready to roll (these are all PDF files):

I will have more specifics about the sorts of things we're exploring in future blog posts, but until then -- don't get left behind!

Update: Allen Short just pointed out some additional material that predates the Carnegie Mellon research: the Agoric Papers, co-authored by Mark Miller who is a co-creator of the E programming language.

Update 2: Glyph Lefkowitz just sent me a link to the Big Ball of Mud that discusses similar concepts.

Technorati Tags: technology, business, community , open source, uls

Friday, May 16, 2008

The Twisted Think Tank

I'm pleased to announce that we've got a new Divmod site up! We're still making tweaks, but it's ready for public viewing, and open for business.

This change currently doesn't affect our subscriber services... but it will, very shortly :-) JP's working on that now.

Anyone who knows us, knows that we know Twisted. We really know it. And how could we not, with Twisted superheroes like Glyph and JP? We've been solving very interesting problems for the past couple years, and other companies have availed themselves of this expertise. We're no longer trying to hide from our destiny as "the Twisted company."

We've found that providing specialized consulting services has not detracted from our core competency as software developers, but has rather done quite the reverse: provided a great deal of insight and clarity. The two activities have established a complementary feedback mechanism for growth and invention.

Thursday, May 15, 2008

App Engine Haiku

I've been playing with Google App Engine a bit more tonight, and I've come to some additional conclusions:

For the best results, you gotta drink the whole pitcher of koolaide; and
It's really quite tasty koolaide.

One of the things I really wanted to test was imaging. I use Flickr far more than Picasa, but Flickr's authentication API is kind of a pain in the ass (especially when used with App Engine). Google auth is (for obvious reasons) easier to work with. This, of course, led me to explore Picasa. To my surprise, it was delightful to use. I chopped up gdata for only the parts needed to support Picasa and was uploading images within mere minutes. And not only was the effort minimal, but the performance was outstanding.

One of the things that makes application development a more efficient process is having a unified platform on which to work. This is why working with the Divmod platform is so nice when I do extensive Twisted application development: all the infrastructure is already built and ready for me to use. Today, I got to really taste that same experience with the Google platform. And I liked it.

I did notice one interesting psychological side effect of working with the file count limitation: I inadvertently treated it like a game. Not unlike the rules in poetry, it provided a structure and bounds within which I was forced to operate, adjust for, and test my creativity against. I rather enjoyed this microgame, and that was a surprise :-)