Friday, September 12, 2014

What is OpenStack?

OpenStack is a collection of open source projects designed for creating and managing cloud infrastructure.

Imagine owning a data center. You have a large open space with plenty of power and cooling capacity. You have negotiated a deal with your Internet Service Provider for a few high bandwidth connections to the Internet. You have gone through the trouble of filling that data center with rows upon rows of general purpose servers and networking equipment to interconnect them. Now what?

The data center could become your personal super computer - graphics rendering, calculating pi, mining bitcoin, etc... or, you could rent it out and bring in some revenue, but who would rent the whole data center with all those servers all at once? Instead of wooing a single tenant for your space, why not rent out the data center in sections to multiple tenants?

Renting out physical infrastructure is how some hosting companies - like Rackspace - got their start. Tenants could work closely with operators to configure their section of the data center according to the tenant's needs. In return, the tenant would pay for the infrastructure and support.

For our data center, renting physical infrastructure is a good starting business model. However, we can do better. First, consider all the manual labor involved with hand holding each tenant and helping to configure the infrastructure according to their needs. Can't it be automated? Second, think of all the wasted capacity - a tenant may only need their server for 30 hours a month to run a nightly report even though the tenant is paying for a full month of 24/7 access. Outside of those 30 hours, your server (your investment) is dormant. If we know our tenant is not using the full capacity of a server, why not double book the server and charge another tenant for access to the same resource?

Consider a new model with the following characteristics:
  1. On-demand Self-service - no more hand holding tenants through the configuration process. The whole thing is automated and managed via a web interface, available 24/7.
  2. Resource Pooling - multiple tenants can rent the same infrastructure with the help of virtualization. Virtual Machines (VMs) give the illusion of a single machine (server) for each tenant. In reality, multiple VMs share a single physical machine and a special operating system (called a hypervisor) quickly switches between VMs to maintain the illusion.
  3. Rapid Elasticity - allow tenants to add or remove servers on a whim, as needed. It's all automated and virtualized anyway, so why not?
  4. Broad Network Access - something we were already providing in our rental model, but important enough to carry over to the new model. 
  5. Measured Service - with the above characteristics, accurately billing tenants becomes a nightmare without automated metering of resource consumption. Instead of charging per server per month, we'll switch to more granular pricing: per hour for CPU consumption, per GB for storage, and per Gb for network bandwidth.
The new model is attractive for all kinds of businesses and use cases, big or small. Some tenants will swoop in, rent 1000 machines for an hour for heavy number crunching on-demand, and then leave. Others will setup low traffic web sites with low levels of sporadic resource consumption. A few will have big operations with spikes in usage where their applications spread from 10s to 100s of servers and then fall back again - elastically. 

According to NIST, this service is by definition a "cloud".

The business (or service) model is "Infrastructure as a Service" (IaaS). Infrastructure refers to the (virtual) servers and networking equipment which the data center operator is offering to tenants. It is a service since the tenants are not purchasing this equipment - they are renting.

The deployment model is a "public cloud" - anyone in the world can access our website and rent out virtual machines - it is publicly available. In another scenario, imagine you are the CTO of a Fortune 500 company and are responsible for the many data centers owned by that company. It may still be beneficial to use the Cloud Computing model to manage your data center resources among the many groups within the company. However, these resources are not available to the general public or even to other companies. In such a scenario, the deployment model is a "private cloud".

What about OpenStack?

As a highly intelligent data center owner and operator, you may be savvy enough to write your own software to expose a web interface to your tenants which allows them to create user accounts, upload virtual machine images, spawn virtual machines, setup network connectivity for each new machine, handle security between tenants, provide resource usage reporting, and so on. If not, or if you are too busy to undertake such a task, then OpenStack is the answer.

OpenStack is a collection of open source projects designed for creating and managing cloud infrastructure. It includes projects for handling the virtual machine life cycle (nova), storing and managing virtual machine images (glance), managing tenant, user, and admin accounts (keystone), exposing a web interface for end users (horizon), handling virtual networking (neutron), handling block storage devices (cinder), and several more.

As a data center operator, you may install OpenStack software on each server in your data center. Some servers should be designated for "compute" or "storage" - meaning that these servers will be available for tenants to start their virtual machines and create virtual storage devices. Other servers may be designated for networking, for running tenant network services like DHCP and DNS. Finally, a few servers should be earmarked for the Cloud Controller. The Cloud Controller receives requests for creating and destroying virtual machines and other virtual resources. It delegates the real work to one of the many compute, storage, or network nodes in the data center.

The OpenStack projects are all open source and rely heavily on other existing open source projects. The majority of the OpenStack code base is written in the Python programming language. Most OpenStack projects exposes a unique RESTful HTTP API which allows users to programmatically access the cloud and invoke its capabilities.

OpenStack can be used in both public cloud and private cloud deployment models. It enables the Infrastructure as as Service service model.

All in all, OpenStack allows an individual (or a group) to convert a data center of any size into a cloud, as defined above. With OpenStack in place, the infrastructure in the data center can then be offered to tenants either publicly or privately as a service. Tenants can use the OpenStack API or Web UI to create virtual servers, establish virtual networks, manage virtual storage volumes, and much, much more.