Monday, May 14, 2012

CERN, OpenStack Keep Resonance Cascades at Bay

Tim Bell preparing to get his
OpenStack on
As previously mentioned, there's a growing momentum around ops-oriented participation in the OpenStack community. DreamHost is deeply invested in DevOps, seeing how that's where we're going to be living in a few months! As Simon Anderson, CEO of DreamHost, recently said:
"When we're running a complex fabric of apps on over 5,000 servers across three data centers, we need a lean and nimble approach to software development and operational implementation. Without a DevOps approach, we wouldn't be able to push code into production as fast or as efficiently as we do, and our customers would not be happy! Today's developers demand up-to-the-hour security and performance updates to Internet infrastructure, so we aim to deliver just that with DevOps."
Though expressed in the context of our work, the import of DevOps that Simon's comment generally highlights is going to be increasingly important for nearly anyone running cloud services. 

In particular, I've been following the work of the intrepid folks at CERN. As such, this post is not about DreamHost; rather, it's a mad tale of OpenStack, DevOps, and averting alien invasion.

After countless long-distance phone conversations, a flight to Switzerland, and spending several days buying pints for a security guard in the know (referred to from now on as "Barney"), I've uncovered some profound truths -- Mulder-style -- and have confirmed that the impact of OpenStack at CERN is huge. 

Superficial examinations turn up the usual: CERN's planning slides, nice quotes, discussions of features and savings in time and money. For instance, in a recent email conversation with Tim "Gordon Freeman" Bell at CERN, I learned that 
"The CERN Agile Infrastructure project aims to develop CERN's computing resources and processes to support the expanding needs of LHC physicists and the CERN organisation."
I think these guys have been hanging out with Simon! But once you slip behind the scenes, peek at some of the whiteboards in unattended rooms, or rifle through notes lying about, you see that things are not what they appear. I've included a shot of Mr. OpenStack-at-CERN himself; this was my first clue.

Publicly, he's been working with other teams at CERN to:
  • modernise the data centre configuration tools and automating operations procedures
  • exploit wide scale use of virtualisation, improving flexibility and efficiency
  • enhance monitoring such that the usage of the infrastructure can be fully understood and tuned to maximise the resources available
But privately, it seems that he and his team have been doing much, much more. This was alluded to in a statement made by team member Jan van Eldik: "We expect the number of requests to insert non-standard specimens into the scanning beam of the Anti-Mass Spectrometer to significantly decrease, once automation is in place and everyone is using the standard infrastructure we are setting up."

That isn't to say there haven't been incidents...

Innocuously enough, the current toolchains are based around:
  • OpenStack as a single Infrastructure-as-a-Service providing physics experiment services, developer boxes, applications servers as well as the large batch farm
  • Puppet for configuration management
  • Scientific Linux CERN as the dominant operating system with sizeable chunk of Windows installs
But that second bullet caught my eye, and one of Barney's pub mates confirmed a rumor that we'd heard: the Puppet instances are actually trained headcrabs. The primary training tool? You guessed it, a crowbar. Barney said that the folks from Dell took inspiration from this and developed it further for their OpenStack deployment framework after an extended visit to CERN.

Although Barney hadn't seen any evidence of resonance cascades, there have been minor cross-dimensional disturbances as a result of some "cowboy" activity and folks not following DevOps best practices. This has been kept quiet for obvious reasons, but has led to a small pest problem in some of CERN's older tunnel complexes. As rouge elements are discovered, CERN has been educating transgressors aggressively. (Sometimes they go as far as sending employees to Xen training... or was it Xen training?)

One artist's conception of what success will
look like for OpenStack at CERN
Despite the minor hiccoughs along the way, CERN is aiming for success. (Given the lack of Combine and forced relocation programs, they're already doing better than Black Mesa's Anomalous Materials team.) Plans are in place for an initial pre-production service, OpenStack deployment this year. Following that, they will be moving towards 300,000 virtual machines on 15,000 hosts spread across two data centres by 2015.

The OpenStack community is supporting them in their efforts with fantastic new features, high-quality discussions on the mail lists, and real-time interaction on the IRC channels. In an act of reciprocity and community spirit, operators at CERN have volunteered to contribute back to the OpenStack community with regard to operations best practices, reference architecture documentation, and support on the operators' mail list.

To see how other institutions were taking this news, I spent several days waiting on hold. In particular, Aperture Science could not be reached for comment. However, Ops team member Belmiro Rodrigues Moreira did say that there's an audio file being circulated at CERN of Cave Johnson threatening to "burn down OpenStack" ... with lemons. Given Aperture Science's failure record with time machine development, it's generally assumed to be a prank audio reconstruction. CloudStack developers are considered to be the prime suspects, seeing how much time they have on their hands while waiting for ant to finish compiling the latest Java contributions.

When asked what advice he could give to shops deploying OpenStack, Tim said simply: "Remember, the cake is a lie. Don't get distracted and don't stop. Just keep hacking."

Alyx, explaining to her dad why she loves DreamHost
Couldn't have said it better myself.

In closing, and interestingly enough, one of DreamHost's employees has an uncle who works at the Black Mesa Research Facility. Though his teleportation research team was too busy for an extended interview, his daughter did mention that she is a DreamHost customer and can't wait to use OpenStack while interning at CERN next summer. After all, that's what she uses to auto-scale her WordPress blog (she's in our private beta program).

It's a small world.

And, thanks to Tim and the rest at CERN, a safer one, too.


6 comments:

Duncan McGreggor said...

Fact and fiction are pretty tightly woven in that post, so I thought I'd help folks out here:

1) Simon really did say that ;-) and

2) Tim's summary to me of CERN's plans around DevOps/OpenStack was as follows:

The CERN Agile Infrastructure project aims to develop CERN's computing resources and processes to support the expanding needs of LHC physicists and the CERN organisation. This requires actions to

- modernise the data centre configuration tools and automating operations procedures
- exploit wide scale use of virtualisation, improving flexibility and efficiency
- enhance monitoring such that the usage of the infrastructure can be fully understood and tuned to maximise the resources available

From the current mode of operations based on local tools developed over the past 10 years towards data driven automated operations based on tools developed elsewhere and following modern installation and deployment
techniques.

The current toolchains are based around

- OpenStack as a single Infrastructure-as-a-Service providing physics experiment services, developer boxes, applications servers as well as the large batch farm
- Puppet for configuration management
- Scientific Linux CERN as the dominant operating system with sizeable chunk of Windows

The target for a pre-production service this year and moving towards 300,000 virtual machines on 15,000 hosts spread across two data centres by 2015.

GregP said...

Ahhhh yes. Heartily comforting to see that my old friend hasn't yet forgotten his gaming days. :)

Duncan McGreggor said...

@GregP Heheh :-) In deed I have not! Brent Scotten (from DreamHost) and I were talking about it last night and then again today. He's just finished playing BioShock, and really enjoyed it. We chatted briefly about the greats: System Shock 2, Deus Ex, HL & HL2 :-)

Unknown said...

Nice post - to think that OpenStack might lead to the invention of a Gravity Gun.

Duncan, did you learn about how CERN is using Swift? I'm very interested in knowing how scientific and research organizations are using Object Storage, and was curious, if CERN does use it extensively, are they using things like metatagging to assist with their data analysis?

Duncan McGreggor said...

Trey, I don't know... but you could email Gor^H^H^H Tim Bell and ask! I'll point him at this comment, and see if has some insights to share ...

Thanks!

Unknown said...

We're investigating both Compute and Storage cloud technologies. The compute side is covered in http://cern.ch/go/FwG7.

Data is MUCH more complicated (as if 300K VMs is 'easy'). CERN's role is to record the 25PB or so each year for 20 years along with supplying the 200 remote sites which want to analyse the LHC results. We are looking at Swift, S3 appliances and storage federations based on xrootd technologies.

Changing direction in compute is relatively easy (since you just start the new work elsewhere). Moving data around, especially with networking costs, is much more difficult (and so more investigations are required).

BTW, Duncan's portrayal of me contains some fact and some fiction....

Tim Bell
CERN