Cloud Computing and Open Source

1 comments

With the advent of Web 2.0 and Software as a Service, cloud computing has come into vogue. Cloud computing has become synonymous with providing services anywhere anytime with the basic requirement being access to the internet. As a new model, cloud computing promises to make any online service available without a large upfront investment in infrastructure. The economics of running a full infrastructure changes dramatically since you only pay for what you use. And you can provision capacity for only what you require at any given time.

Many companies who have supported their own needs for large-scale, distributed computing are beginning to export their technologies and services to support others. Companies with a lot of experience in large-scale computing using open source software, such as Amazon and Google, have developed offerings in the cloud computing space. Other players are beginning to make an impact as well, including Hadoop, IBM Blue Cloud based on Hadoop, 10gen, and Eucalyptus.

What is Cloud Computing?

Cloud computing blends Internet-based IT infrastructure together with the applications and services that can be delivered over its resources.  Cloud computing consists of computing, networking, and storage resources used to power services, as well as combinations or mashups of services, that previously had been expensive, impractical and even impossible to provide.

There are three types of cloud computing options available today:

  • Virtual infrastructure provisioning
  • Application development and delivery
  • Building your own cloud from scratch, using your own storage, processing, and networking resources

Infrastructure provisioning is the most flexible option because it provides pure computing resources such as CPU, bandwidth and storage. A good example of such a service is Amazon's Elastic Compute Cloud (EC2). The user has complete control of these resources and what they do with it.

Application development and delivery, the second option, is a little less flexible for the user but is much less complex to set up and start using right away. A good example of this kind of cloud computing service is Google's App Engine (GAE), which provides CPUs, limited bandwidth and limited storage along with a pre-defined web application framework. The user doesn't have any control over the security and physical infrastructure but can run their web applications while the service provider takes care of scaling, performance and management of the infrastructure.

The third option involves building and managing your own cloud using open source software and tools such as Hadoop. You have absolute and complete control over what you provision, but you must provide the knowledge and skills level required to optimize your resources yourself.

Open source is important in all aspects of cloud computing. It is used to build the core of the "cloud" and its services. Linux is the operating system of choice for both physical and virtual machines in the cloud. Furthermore, open APIs and open source toolkits are available to interface and interact with cloud computing at all levels. Python, PHP, Ruby, Java APIs provide your web applications with access to the management services needed to control your resources in the cloud.

Building Your Own Virtual Infrastructure: Amazon Elastic Compute Cloud

The Amazon Elastic Compute Cloud (EC2) provides infrastructure and compute resources for web applications. EC2 functionality is accessible through web service interfaces, allowing you to configure, monitor your computing resources and provision capacity almost instantaneously. EC2 is built to fully support open source software and web applications. EC2 is based on Amazon's Xen-enabled Linux kernel and any operating system that can run on top of Xen is supported.

EC2 users can load custom application images on as many virtual systems as needed. Security and network access are set up and configured by the user as needed for their application. EC2 provides APIs for programmatic control of configuration and provisioning of resources via REST and SOAP protocols. The advanced user can create their own Amazon Machine Images (AMI) which package custom environments appropriate for the application to be deployed. This is especially useful for developers during integration and testing. EC2 is complemented by other Amazon services such as their Simple Queue Service (SQS), Simple Storage Service (S3) and Simple DB. All these services are fully usable by open source software because they implement standard open interfaces.

There are four basic steps to create applications on EC2. First, you create an Amazon Machine Image (AMI) which packages the operating system, configuration settings, libraries, and applications into one image--everything you need to boot instances of your application. The AMI can be selected from a library of existing public AMIs or it can be created from scratch.  Second, you upload your image for storage in the Amazon S3 (Amazon Simple Storage Service) service. Third, you register your AMI with Amazon EC2. Finally, you are ready to use the Amazon EC2 web service APIs to start, stop, or monitor one or more instances of this AMI.

Focusing on Your Application: Google App Engine

Google's App Engine provides a powerful tool for open source developers to build web applications based on Python. App Engine restricts applications to a secure sandbox with limited access to the underlying operating system.

App Engine can automatically ramp up compute resources within predefined quotas to handle spikes in traffic. Each App Engine user account can run upto three applications with 500MB of persistent storage and enough CPU horsepower and network bandwidth to support about five million page views a month.

App Engine's Python runtime environment provides API access to an object database, Google Accounts infrastructure, outbound HTTP requests (URL fetch API) and email services. Developers can also take advantage of frameworks such as webapp and Django to quickly build web applications running on App Engine.

To create an application for App Engine you first need to download the App Engine software development kit (SDK). The SDK provides a web server environment that emulates all of the App Engine services locally on your computer and enforces the restrictions placed on your application by App Engine's secure sandbox. Next, you create the application code as well as configuration files and static files necessary for your application. Using the upload tool included in the SDK, you login with your Google account to upload your application. You can manage your application, browse the datastore, and view log files using the web-based Administration Console provided by App Engine.

Virtualizing Your Own Infrastructure: Open Source Cloud Computing Projects

There are several open source initiatives which are looking at various implementations of cloud computing. Hadoop, Eucalyptus, and 10gen look the most promising.

Hadoop: Hadoop is an open source Java software framework for running data intensive distributed applications on large clusters of commodity computers. Hadoop was inspired by Google's MapReduce and the Google File System. IBM's cloud computing project, Blue Cloud, uses Hadoop technologies. The framework's major components include a distributed file system (HDFS) and the map/reduce engine (Job Tracker, Task Tracker). Hadoop is optimized for highly parallel, data intensive batch operations.

Hadoop is a top-level Apache Software Foundation project supported by Yahoo. HP, Intel, and Yahoo recently announced a project that will create a global cloud computing research testbed leveraging Hadoop.

Eucalyptus: Eucalyptus (Elastic Utility Computing Architecture for Linking Your Program To Useful Systems) is an open source cloud computing infrastructure based on Xen, implemented using commonly available Linux tools and web services technologies. It was developed at University of California Santa Barbara to simulate a cloud computing platform for research and testing. Interfaces to popular commercial clouds such as Amazon EC2 are being developed to support research.

10gen: 10gen is an open source web application Platform-as-a-Service (PaaS) technology that helps developers focus on building application functionality instead of being sidetracked by scalability, management and infrastructure concerns. 10gen provides an application server (Appsrv) supporting JavaScript and Ruby, an object database (MongoDB), a virtual file system (GridFS), Javascript libraries (CoreJS) and an application and resource management system.

10gen is a new player that is remixing ideas from Google App Engine. It promises to broaden the appeal of cloud computing application development to a wider range of developer communities. At the same time it provides the tools to craft your own cloud service.

Conclusion

Cloud computing is a resource sharing model of development and deployment for web applications. A recent market study by Merrill Lynch predicts a big shift to cloud computing in the next five years, predicting that the global market for cloud computing will grow to $95 billion and represent 12% of worldwide software deployment. In this article we have provided an overview of some major cloud computing solutions available today and their relationships to open source. The next article in this series will take a detailed look at how to use open source to deploy web applications using each of the major cloud computing solutions.

4.6
Average: 4.6 (5 votes)
still some confusion in definition
Submitted by b_bprimal on Wed, 11/26/2008 - 19:37.

Bhaskar Prasad Rimal

This article is nice and fruitful for the beginners of cloud computing.there is a few discussion about Cloud computing.People are still in confusion about Grid computing and Cloud computing because we can not find any Books regarding it and proper documentations.Lets's start to talk about Open Cloudware Architecture and talk about the promising areas of cloud computing so that fresh researcher can do something.