Today at the OpenStack summit here in Paris, we continued SUSE’s winning streak at the “Ruler of the Stack” competition 🙂
This time, the challenge from the OpenStack Foundation was to deploy an OpenStack cloud as quickly as possible which could withstand “attacks” such as one of the controller nodes being killed, and still keep the OpenStack services and VM instances functional.
It was considerably more complex than the challenge posed by Intel at the Atlanta summit in May which my teammate Dirk Müller (pictured right) won, since now we needed a minimum of two controller nodes clustered together, two compute nodes, and some shared or replicated storage. So this time an approach deploying to a single node via a “quickstart” script was not an option, and we knew it would take a lot longer than 3 minutes 14 seconds.
However as soon as I saw the rules I thought we had a good chance to win with SUSE Cloud, for four reasons:
- We already released the capability to automatically deploy highly available OpenStack clouds back in May, as an update to SUSE Cloud 3.
- With openSUSE’s KIWI Image System and the Open Build Service, our ability to build custom appliance images of product packages for rapid deployment is excellent, as already proved by Dirk’s victory in Atlanta. KIWI avoids costly reboots by using
- I have been working hard the last couple of months on reducing the amount of manual interaction during deployment to the absolute minimum, and this challenge was a perfect use case for harnessing these new capabilities.
- The solution had to work within a prescribed set of VLANs and IP subnets, and Crowbar (the deployment technology embedded in SUSE Cloud) is unique amongst current OpenStack deployment tools for its very flexible approach to network configuration.
I worked with Dirk late in our hotel the night before, preparing and remotely testing two bootable USB images which were custom-built especially for the challenge: one for the Crowbar admin server, and one for the controller and compute nodes.
The clock was started when we booted the first node for installation, and stopped when we declared the nodes ready for stress-testing by the judges. It took us 53 minutes to prepare 6 servers: two admin servers (one as standby), two controllers, and two compute nodes running the KVM hypervisor. However we lost ~15 minutes simply through not realising that plugging a USB device directly into the server is far more performant than presenting virtual media through the remote console! And there were several other optimizations we didn’t have time to make, so I think in future we could manage the whole thing under 30 minutes.
However the exercise was a lot of fun and also highlighted several areas where we can make the product better.
At least three other well-known combinations of OpenStack distribution and deployment tools were used by other competitors. No-one else managed to finish deploying a cloud, so we don’t know how they would have fared against the HA tests. Perhaps everyone was too distracted by all the awesome sessions going on at the same time 😉
I have to send a huge thank-you to Dirk whose expertise in several areas, especially building custom appliances for rapid deployment, was a massive help. Also to Bernhard, Tom, and several other members of our kick-ass SUSE Cloud engineering team 🙂 And of course to Intel for arranging a really fun challenge. I’m sure next time they’ll think of some great new ways to stretch our brains!