Monday, May 14, 2012

CERN, OpenStack Keep Resonance Cascades at Bay

Tim Bell preparing to get his
OpenStack on
As previously mentioned, there's a growing momentum around ops-oriented participation in the OpenStack community. DreamHost is deeply invested in DevOps, seeing how that's where we're going to be living in a few months! As Simon Anderson, CEO of DreamHost, recently said:
"When we're running a complex fabric of apps on over 5,000 servers across three data centers, we need a lean and nimble approach to software development and operational implementation. Without a DevOps approach, we wouldn't be able to push code into production as fast or as efficiently as we do, and our customers would not be happy! Today's developers demand up-to-the-hour security and performance updates to Internet infrastructure, so we aim to deliver just that with DevOps."
Though expressed in the context of our work, the import of DevOps that Simon's comment generally highlights is going to be increasingly important for nearly anyone running cloud services. 

In particular, I've been following the work of the intrepid folks at CERN. As such, this post is not about DreamHost; rather, it's a mad tale of OpenStack, DevOps, and averting alien invasion.

After countless long-distance phone conversations, a flight to Switzerland, and spending several days buying pints for a security guard in the know (referred to from now on as "Barney"), I've uncovered some profound truths -- Mulder-style -- and have confirmed that the impact of OpenStack at CERN is huge. 

Superficial examinations turn up the usual: CERN's planning slides, nice quotes, discussions of features and savings in time and money. For instance, in a recent email conversation with Tim "Gordon Freeman" Bell at CERN, I learned that 
"The CERN Agile Infrastructure project aims to develop CERN's computing resources and processes to support the expanding needs of LHC physicists and the CERN organisation."
I think these guys have been hanging out with Simon! But once you slip behind the scenes, peek at some of the whiteboards in unattended rooms, or rifle through notes lying about, you see that things are not what they appear. I've included a shot of Mr. OpenStack-at-CERN himself; this was my first clue.

Publicly, he's been working with other teams at CERN to:
  • modernise the data centre configuration tools and automating operations procedures
  • exploit wide scale use of virtualisation, improving flexibility and efficiency
  • enhance monitoring such that the usage of the infrastructure can be fully understood and tuned to maximise the resources available
But privately, it seems that he and his team have been doing much, much more. This was alluded to in a statement made by team member Jan van Eldik: "We expect the number of requests to insert non-standard specimens into the scanning beam of the Anti-Mass Spectrometer to significantly decrease, once automation is in place and everyone is using the standard infrastructure we are setting up."

That isn't to say there haven't been incidents...

Innocuously enough, the current toolchains are based around:
  • OpenStack as a single Infrastructure-as-a-Service providing physics experiment services, developer boxes, applications servers as well as the large batch farm
  • Puppet for configuration management
  • Scientific Linux CERN as the dominant operating system with sizeable chunk of Windows installs
But that second bullet caught my eye, and one of Barney's pub mates confirmed a rumor that we'd heard: the Puppet instances are actually trained headcrabs. The primary training tool? You guessed it, a crowbar. Barney said that the folks from Dell took inspiration from this and developed it further for their OpenStack deployment framework after an extended visit to CERN.

Although Barney hadn't seen any evidence of resonance cascades, there have been minor cross-dimensional disturbances as a result of some "cowboy" activity and folks not following DevOps best practices. This has been kept quiet for obvious reasons, but has led to a small pest problem in some of CERN's older tunnel complexes. As rouge elements are discovered, CERN has been educating transgressors aggressively. (Sometimes they go as far as sending employees to Xen training... or was it Xen training?)

One artist's conception of what success will
look like for OpenStack at CERN
Despite the minor hiccoughs along the way, CERN is aiming for success. (Given the lack of Combine and forced relocation programs, they're already doing better than Black Mesa's Anomalous Materials team.) Plans are in place for an initial pre-production service, OpenStack deployment this year. Following that, they will be moving towards 300,000 virtual machines on 15,000 hosts spread across two data centres by 2015.

The OpenStack community is supporting them in their efforts with fantastic new features, high-quality discussions on the mail lists, and real-time interaction on the IRC channels. In an act of reciprocity and community spirit, operators at CERN have volunteered to contribute back to the OpenStack community with regard to operations best practices, reference architecture documentation, and support on the operators' mail list.

To see how other institutions were taking this news, I spent several days waiting on hold. In particular, Aperture Science could not be reached for comment. However, Ops team member Belmiro Rodrigues Moreira did say that there's an audio file being circulated at CERN of Cave Johnson threatening to "burn down OpenStack" ... with lemons. Given Aperture Science's failure record with time machine development, it's generally assumed to be a prank audio reconstruction. CloudStack developers are considered to be the prime suspects, seeing how much time they have on their hands while waiting for ant to finish compiling the latest Java contributions.

When asked what advice he could give to shops deploying OpenStack, Tim said simply: "Remember, the cake is a lie. Don't get distracted and don't stop. Just keep hacking."

Alyx, explaining to her dad why she loves DreamHost
Couldn't have said it better myself.

In closing, and interestingly enough, one of DreamHost's employees has an uncle who works at the Black Mesa Research Facility. Though his teleportation research team was too busy for an extended interview, his daughter did mention that she is a DreamHost customer and can't wait to use OpenStack while interning at CERN next summer. After all, that's what she uses to auto-scale her WordPress blog (she's in our private beta program).

It's a small world.

And, thanks to Tim and the rest at CERN, a safer one, too.


Saturday, May 12, 2012

Twisted SSH: Rendering a Log-in Banner/MOTD in Conch

A few weeks ago, I pinged my peeps on #twisted asking why the banner for a custom SSH server wasn't rendering properly. After some digging around and some inconsistent results (well, consistently bad results for me), we weren't able to resolve anything, and I had to set the problem aside.

The Symptom
The first thing I had tried was subclassing Manhole from twisted.conch.manhole, overriding (and up-calling) connectionMade, writing the banner to the terminal upon successful connection. This didn't work, so I then tried overriding initializeScreen by subclassing twisted.conch.recvline.RecvLine. Also a no-go. And by "didn't work" here's what I mean:

In both Linux (Ubuntu 12.04 LTS, gnome-terminal) and Mac (OS X 10.6.8, Terminal.app), after a successful login to the Twisted SSH server, the following sequence would occur:
  1. an interactive Python prompt was rendered, e.g., ":>>"
  2. the banner was getting written to the terminal, and
  3. the terminal screen refreshed with the prompt at the top
This all happened so quickly, that I usually never even saw #1 and #2. Just the second ":>>" prompt from #3. Only by scrolling up the terminal buffer would I see that the banner had actually been rendered. Even though I was doing my terminal.write after connectionMade and initializeScreen, it didn't seem to matter.

Discovery!
Some time last week, I put together example Twisted plugins showing what the problem was, and the circumstances under which a banner simply didn't get rendered. The idea was that I would provide some bare-bones test cases that demonstrated where the problem was occurring, post them to IRC or the Twisted mail list, and we could finally get it resolved. 'Cause, ya know, I really want my banners ...

While tweaking the second Twisted plugin example, I finally poked my head into the right method and discovered the issue. Here's what's happening:

  • twisted.conch.recvline.RecvLine.connectionMade calls t.c.recvline.RecvLine.initializeScreen
  • t.c.recvline.RecvLine.initializeScreen does a terminal.reset, writes the prompt, and then switches to insert mode. But this is a red herring. Since something after initializeScreen is causing the problem, we really need to be asking "who's calling connectionMade?"
  • t.c.manhole_ssh.TerminalSession.openShell is what kicks it off when it calls the transportFactory (which is really TerminalSessionTransport)
  • openShell takes one parameter, proto -- this is very important :-)
  • openShell instantiates TerminalSessionTransport
  • TerminalSessionTransport does one more thing after calling the makeConnection method on an insults.ServerProtocol instance (the one I had tried overriding without success), and as such, this is the prime suspect for what was preventing the banner from being properly displayed: it calls  chainedProtocol.terminalProtocol.terminalSize
  • chainedProtocol is an insults.ServerProtocol instance, and its terminalProtocol attribute is set when ServerProtocol.connectionMade is called.
  • A quick check reveals that terminalProtocol is none other than the proto parameter passed to openShell.

But what is proto? Some debugging (and the fact that of the three terminalSize methods in all of twisted, only one is an actual implementation) reveals that proto is a RecvLine instance. Reading that method uncovers the culprit in our whodunnit:  the first thing the method does is call terminal.eraseDisplay.

Bingo! (And this is what I was referring to above when I said "poked my head" ...)

Since this was called after all of my attempts to display a banner using both connectionMade and initializeScreen, there's no way my efforts would have succeeded.

Here's What You Do
How do you get around this? Easy! Subclass :-)

The class  TerminalSessionTransport in t.c.manhole_ssh is the bad boy that calls terminalSize (which calls eraseDisplay). It's the last thing that TerminalSessionTransport does in its __init__, so if we subclass it, and render our banner at the end of our __init__, we should be golden. And we are :-)

You can see an example of this here.

Not sure if this sort of thing is better off in projects that make use of Twisted, or if it would be worth while to add this feature to Twisted itself. Time (and blog comments) will tell.

Epilogue
As is evident from the screenshot above (and the link), this feature is part of the DreamSSH project. There are a handful of other nifty features/shortcuts that I have implemented in DreamSSH (plus some cool ones that are coming) and I'm using them in projects that need a custom SSH server. I released the first version of DreamSSH last night, and there's a pretty clear README on the github project page.

One of the niftier things I did last night in preparation for the release was to dig into Twisted plugins and override some behaviour there. In order to make sure that the conveniences I had provided for devs with the Makefile were available for anyone who had DreamSSH installed, I added subcommands... but if the service was already running, these would fail. How to work around that (and other Twisted plugin tidbits) are probably best saved for another post, though :-)