Structured Procrastination cloud Archives – Structured Procrastination

Currently showing posts tagged: cloud

Improving trust in the cloud with OpenStack and AMD SEV

By Adam, September 13, 2019 1:00 pm

This post contains an exciting announcement, but first I need to provide some context!

Ever heard that joke “the cloud is just someone else’s computer”?

Coffee mug saying "There is no cloud. It's just someone else's computer"

Of course it’s a gross over-simplification, but there’s more than a grain of truth in it. And that raises the question: if your applications are running in someone else’s data-centre, how can you trust that they’re not being snooped upon, or worse, invasively tampered with?

Until recently, the answer was “you can’t”. Well, that’s another over-simplification. You could design your workload to be tamperproof; for example even if individual mining nodes in Bitcoin or Ethereum are compromised, the blockchain as a whole will resist the attack just fine. But there’s still the snooping problem.

Hardware to the rescue?

However, there’s some good news on this front. Intel and AMD realised this was a problem, and have both introduced new hardware capabilities to help improve the level to which cloud users can trust the environment in which their workloads are executed, e.g.:

AMD SEV (Secure Encrypted Virtualization) which can encrypt the memory of a running VM with a key which is only accessible to the owner of that VM. This is done on-chip so that even if you have physical access to the machine, it makes it a lot harder to snoop in on the running VM¹.

It can also provide the guest owner with an attestation which cryptographically proves that the memory was encrypted correctly and can only be decrypted by the owner.
Intel MKTME (Multi-Key Total Memory Encryption) which is a similar approach.

But even with that hardware support, there is the question to what degree anyone can trust public clouds run on proprietary technology. There is a growing awareness that Free (Libre) / Open Source Software tends to be inherently more secure and trustworthy, since its transparency enables unlimited peer review, and its openness allows anyone to contribute improvements.

And these days, OpenStack is pretty much the undisputed king of the Open Source cloud infrastructure world.

An exciting announcement

So I’m delighted to be able to announce a significant step forward in trustworthy cloud computing: as of this week, OpenStack is now able to launch VMs with SEV enabled! (Given the appropriate AMD hardware, of course.)

The core functionality is all merged and will be in the imminent Train release. You can read the documentation, and you will also find it mentioned in the Nova Release Notes.

While this is “only” an MVP and far from the end of the journey (see below), it’s an important milestone in a strong partnership between my employer SUSE and AMD. We started work on adding SEV support into OpenStack around a year ago:

This resulted in one of the most in-depth technical specification documentations I’ve ever had to write, plus many months of intense collaboration on the code and several changes in design along the way.

I’d like to thank not only my colleagues at SUSE and AMD for all their work so far, but also many members of the upstream OpenStack community, especially the Nova team. In particular I enjoyed fantastic support from the PTL (Project Technical Lead) Eric Fried, and several developers at Red Hat, which I think speaks volumes to how well the “coopetition” model works in the Open Source world.

The rest of this post gives a quick tour of the implementation via screenshots and brief explanations, and then concludes with what’s planned next.

Continue reading 'Improving trust in the cloud with OpenStack and AMD SEV'»

front page, geek, work | AMD, architecture, cloud, coopetition, OpenStack, SEV

Report from the OpenStack PTG in Dublin

Comments (1)

By Adam, March 9, 2018 7:30 pm

Last week I attended OpenStack’s PTG (Project Teams Gathering) in Dublin. This event happens every 6 months in a different city, and is a fantastic opportunity for OpenStack developers and upstream contributors to get together and turbo-charge the next phase of collaboration.

I wrote a private report for my SUSE colleagues summarising my experience, but then Colleen posted her report publicly, which made me realise that it would be far more in keeping with OpenStack’s Four Opens to publish mine online. So here it is!

Continue reading 'Report from the OpenStack PTG in Dublin'»

front page, geek, travel, work | architecture, cloud, development, OpenStack

Abstraction As A Service

Comments (0)

By Adam, December 19, 2017 7:55 pm

The birth of abstraction layers

The last five decades of computing have seen a gradual progression of architectural abstraction layers. Around 50 years ago, IBM mainframes gained virtualization capabilities. Despite explosive progress in the sophistication of hardware following Moore’s Law, there wasn’t too much further innovation in abstraction layers in server computing until well after the dawn of the microcomputer era, in the early 2000s, when virtualization suddenly became all the rage again. (I heard a rumour that this was due to certain IBM patents expiring, but maybe that’s an urban myth.) Different types of hypervisors emerged, including early forms of containers.

Then we started to realise that a hypervisor wasn’t enough, and we needed a whole management layer to keep control of the new “VM sprawl” problem which had arisen. A whole bunch of solutions appeared, including the concept of “cloud”, but many were proprietary, and so after a few years OpenStack came along to the rescue!

The cloud era

But then we realised that managing OpenStack itself was a pain, and someone had the idea that rather than building a separate management layer for managing OpenStack, we could just use OpenStack to manage itself! And so OpenStack on OpenStack, or Triple-O as it’s now known, was born.

Within and alongside OpenStack, several other new exciting trends emerged: Software-Defined Networking (SDN), Software-Defined Storage (e.g. Ceph), etc. So the umbrella term Software-Defined Infrastructure was coined to refer to this group of abstraction layers.

Continue reading 'Abstraction As A Service'»

front page, geek | architecture, cloud, fake news, humour, OpenStack

Announcing OpenStack’s Self-healing SIG

Comments (0)

By Adam, November 24, 2017 4:15 pm

One of the biggest promises of the cloud vision was the idea that all infrastructure could be managed in a policy-driven fashion, reacting to failures and other events by automatically healing and optimising services.

In OpenStack, most of the components required to implement such an architecture already exist, and are nicely scoped, for the most part without too much overlap:

Monasca: monitoring
Aodh: alarming
Congress: policy-based governance
Mistral: workflow
Senlin: clustering service
Vitrage: root cause analysis
Watcher: optimization
Masakari: compute plane HA
Freezer-dr: compute plane HA
Heat: orchestration (normally used for cloud applications, but could also potentially auto-heal cloud infrastructure via TripleO)
Doctor: fault management and maintenance for OPNFV
Fault Genes Working Group: Fault classification & Recovery Strategy
Craton: Fleet management (currently stalled)

However, there is not yet a clear strategy within the community for how these should all tie together. (The OPNFV community is arguably further ahead in this respect, but hopefully some of their work could be applied outside NFV-specific environments.)

Designing a new SIG

To address this, I organised an unofficial kick-off meeting at the PTG in Denver, at which it became clear that there was sufficient interest in this idea from many of the above projects in order to create a new “Self-healing” SIG. However, there were still open questions:

What exactly should be the scope of the SIG? Should it be for developers and operators, or also end users?
What should the name be? Is “self-healing” good enough, or should it also include, say, non-failure scenarios like optimization?

Continue reading 'Announcing OpenStack’s Self-healing SIG'»

front page, geek | architecture, cloud, OpenStack

Cloud rearrangement for fun and profit

Comments (1)

By Adam, May 17, 2015 4:42 am

In a populated compute cloud, there are several scenarios in which it’s beneficial to be able to rearrange VM guest instances into a different placement across the hypervisor hosts via migration (live or otherwise). These use cases typically fall into three categories:

Rebalancing – spread the VMs evenly across as many physical VM host machines as possible (conceptually similar to vSphere DRS). Example use cases:
- Optimise workloads for performance, by reducing CPU / I/O hotspots.
- Maximise headroom on each physical machine.
- Reduce thermal hotspots in order to reduce power consumption; for example, back in 2003, HP showed that intelligent workload placement can reduce energy consumption by more than 14%. (Here’s an old slidedeck I made years ago when researching this use case.)
Consolidation – condense VMs onto fewer physical VM host machines (conceptually similar to vSphere DPM). Typically involves some degree of defragmentation. Example use cases:
- Increase CPU / RAM / I/O utilization. CERN blogged last summer about this Tetris-like challenge, and it can be taken even further over-committing CPU/RAM and/or memory page sharing.
- Free up physical servers to reduce power consumption.
Evacuation – free up physical servers:
- for repurposing or decommissioning
- for maintenance (BIOS upgrades, re-cabling etc.) or repair
- to protect SLAs, e.g. when monitors indicate potential imminent hardware failure, or when HVAC failures are likely to cause servers to shutdown due to over-heating. (This is different to the somewhat confusingly named nova evacuate functionality which moves VMs to a new host after the server has already failed.)

Whilst one-shot manual or semi-automatic rearrangement can bring immediate benefits, the biggest wins often come when continual rearrangement is automated. The approaches can also be combined, e.g. first evacuate and/or consolidate, then rebalance on the remaining physical servers.

Other custom rearrangements may be required according to other IT- or business-driven policies, e.g. only rearrange VM instances relating to a specific workload, in order to increase locality of reference, reduce latency, respect availability zones, or facilitate other out-of-band workflows or policies (such as data privacy or other legalities).

In the rest of this post I will expand this topic in the context of OpenStack, talk about the computer science behind it, propose a possible way forward, and offer a working prototype in Python.

If you’re in Vancouver for the OpenStack summit which starts this Monday and you find this post interesting, ping me for a face-to-face chat!

Continue reading 'Cloud rearrangement for fun and profit'»

front page, geek, hacks, work | cloud, DPM, DRS, Free Software, migration, OpenStack, placement, software, virtualization, VMware, vSphere, workloads