FLASH INFORMATIQUE FI

High Performance Computing


Economic Aspects of Cloud Computing




Anastasia AILAMAKI

Debabrata DASH

Verena KANTERE


The new paradigm of cloud computing is taking over individual and business computation tasks. The cloud is dissociated from super-computers offering computational services in the early information age by the following. Cloud computing provides a solid framework for resource planning, the most crucial problem of the IT industry, as well as for cheap administration of thousands of computers. Moreover, cloud computing employs the recent advancement in multiprocessor technology and the current spread of high-speed Internet technology and offers efficient computation services to millions of users.
In DIAS, we envision a cloud-based database service that enables data providers and consumers to meet in a transparent way. The data provider stores parts of the database in the cloud so that the consumer can access the data. The first is charged for storage and the second for query services. Beyond hardware and software planning, the cloud database needs to plan for the building and maintenance of auxiliary database structures that expedite user query execution. This planning stage is facilitated by combining expert database design algorithms with economic modules. The cloud maps database operations to actual money cost and decides to build a database structure when the building cost is covered by the estimated benefit of this structure. Query arrival patterns are employed to predict the average query number that may use a newly built structure. Based on this prediction, the building and maintenance cost of the structure is amortized to the prospective queries.

JPEG - 20.9 kb
Cloud users benefit from data management services

What is Cloud Computing?

Hardware and software computing platforms have undergone many changes over time. In the 60s and 70s the world-scale computations were performed on big and bulky mainframe supercomputers. Later on, in the 80s, the personal computers and workstations took over and continued their dominance well into the new millennium. Nowadays, the new paradigm of cloud computing begins to take over individual and business computation tasks. Experts postulate that, in the near future, most of the world’s computing requirements will be satisfied by cloud computing platforms. A handful of cloud platforms throughout the world will undertake computation tasks posed by average computer users, who will access the cloud services through a very lightweight client, such as a smartphone or a netbook.
Opinions, however, differ on the exact definition of a cloud. Some define it as the services provided by a data center, and some define it as any subscription-based service on the Internet. The most common definition is that: a computing cloud is a large cluster of off-the-shelf machines, located at one or more data centers and interconnected with high speed network technology, allowing outside clients to use the computing and storage resources of these machines on demand. While this may seem like a step backwards to only a handful of super-computers offering computational services, there are many key differences that make the cloud computing paradigm far more appealing to the current technological setup compared to the super computers of early information age.

Why Cloud Computing?

First, cloud computing provides a solid framework for solving one of the most crucial problems of the IT industry: resource planning. Every startup company’s dream of getting publicity overnight and a flood of new clients making use of their service. Yet, this dream is a nightmare for the IT department of the company, as such a flood of new clients would require much more computing power than that needed on any other day (the so-called flash crowd). On one hand, if the department buys many machines to handle sudden increases in demand, then on an average day it will keep most of the machines idle. On the other hand, if the company’s infrastructure does not account for such a load, then on the day the flash crowd hits the service will crash leaving many new potential users disappointed and unhappy - good news for the competition. Cloud computing enables the company to automatically use as much infrastructure as it needs when it needs it, and thus handle flash crowd effects without excessive provisionary costs. On an average day, the company keeps the operating cost low by using only a few machines.
Second, cloud computing addresses the recent change in the IT cost model. Initially, the computers were expensive and accounted for most of the IT budget. Nowadays, the computing power is cheap - thanks to Moore’s law, but the administration of computers is very expensive. Since cloud computing gathers centrally a large number of low-cost computers, system management and maintenance procedures can be automated and therefore a few expert administrators can manage thousands or tens of thousands of such machines in a straightforward manner.
Third, the increased importance of in parallel programming and virtualization techniques allows cloud computing to fully benefit from multiprocessor technology. Despite Moore’s law, power and design limitations stop uniprocessor speed trends from growing commensurately to exponentially-increasing number of transistors in a given chip area. The response from hardware designers, multicore chips, integrates several processors (cores) on a given chip and the prediction is that the number of cores on a chip will be increasing exponentially in the future. Modern software needs to be embarrassingly parallel in order to exploit the available computer power. Cloud virtualization techniques allow processing power to be used efficiently and flexibly by simulating multiple virtual systems on a single chip. Recently developed programming paradigms, such as map-reduce, allow these virtual systems to be used efficiently.
Finally, the current spread of the high-speed Internet technology allows the users to access the computing service provided by a central location. In the fiber-to-home age, latency is constantly decreasing and the bandwidth of the Internet user’s connection is increasing, therefore removing the final obstacle to a network-based computational service.
While the last three technological shifts made cloud computing feasible, it is the simplification of resource planning and cheaper economics that drive the growth of cloud computing. Therefore, cloud computing is also attracting lots of interest from both economics and computer science researchers.
Let us consider an instance of how the cloud computing solved a large scale IT problem. When New York Times wanted to scan all their publications from 1851 to 1989, instead of acquiring new hardware or using some of their over-used computing resources, they transferred the raw image files in TIFF format to Amazon’s EC2 cloud computing infrastructure. The dataset was of 3 terabytes in size, developing the code took few days, processing the data took 24 hours using 100 cloud computers. The final cost to the IT department was only 240 dollars. If it had not been for the cloud computing, it would have taken months to just acquire the hardware and setting it up for the processing.

Data Management as a Cloud Service

Database management systems are one of the major drivers of the IT industry, accounting for 19 billion dollars in revenue in 2008. Ideally, database systems will be installed on cloud infrastructures to provide on-demand data management services to the users so that database sizing and resource planning can be offloaded to the cloud provider. In DIAS we envision a cloud-based database service, where data providers and consumers can meet in a transparent way. The data provider stores the entire database or parts of it in the cloud and is charged for storage services. Data consumers are also charged in exchange for data processing, i.e. for querying the cloud database. Data services can therefore be mutually beneficial to the provider and to the consumer, as the provider can gather larger amounts of data at a central location to reduce management costs while maximizing the consumers’ benefit from high data availability and low query response time.
While providing database services in the cloud alleviates resource planning on the user’s side, a cloud database needs more planning than just deciding on hardware infrastructure. The most common interface to a database system for asking queries is through SQL (Structured Query Language), which are translated into an optimal combination of filtering operations on stored data. Therefore, data acquisition and query performance is directly affected by the physical design of the databases (i.e., the way the data is organized on the storage media, and the auxiliary structures created for faster data access). For example, if the data is ordered by a particular attribute, queries involving that attribute are typically executed faster. Custom data orders, auxiliary indexing structures, and data replication for increased availability come at a high storage and maintenance cost, however, so cloud database administrators must carefully choose which of the vast space of possible auxiliary structures to build. Planning, building, and updating such structures requires significant effort.

Predicting Costs in Cloudy Databases

At DIAS, we aim to shorten and facilitate this planning stage by combining expert database design algorithms with widely-used economic modules. We first build a cost model that maps database operations to actual money cost. For example, if a data ordering operation takes a day to complete on a uniprocessor system, then using Amazon EC2’s pricing structure, we estimate that the operation would cost 2.40 USD. Since the cloud wants to recover the cost of building and maintaining such structures, it charges a small fraction of the cost to all the queries that use them. Consequently, the cloud has an incentive to build structures which are frequently used in the execution of user queries, and the data consumers have an incentive to pay a small amount of money to the cloud for each query.
There are still two questions left to be answered: 1) How does the cloud know which structure to build and remove? 2) How much should it charge to the user, so that it can recover the cost of building the structure?
To answer the first question, the cloud does not build any structure initially; nevertheless, it keeps a list of possible structures that can be built on the database. For each incoming query, the cloud determines the benefit of using each structure; if the presence of structure A, for example, reduces the cost of the query from 10 cents to only 3, the benefit of the structure would be 7 cents for that query. The cloud keeps the cumulative sum of the benefits for each structure over time and, finally, builds them when the past benefit is more than the cost of building the structure. Similarly, the structure is removed when it has not been used by queries so long that the cost of maintaining the structure is higher than the cost of rebuilding it.
The second question is more challenging. If the cloud charges a large fraction of the building cost of a structure to the queries, then the users may avoid choosing this structure to expedite query execution because of excessive cost, and the cloud not only does not profit, but risks losing the building cost of the structure. If the cloud charges too small a fraction of the building cost to each query, then the structure may be removed from the cloud before the entire building cost has been recovered. In DIAS, we use the arrival pattern of the queries to predict how many queries on average may use a newly built structure. Using this prediction the cloud can amortize the building and maintenance cost of a structure to prospective queries.
Our experiments certify that using the above solution to manage database structures provides a viable cloud database economy. Both the cloud’s and the user’s interests are balanced, and, at the same time, the cloud provides the expected benefits of cheaper, faster and economical data service.



Cherchez ...

- dans tous les Flash informatique
(entre 1986 et 2001: seulement sur les titres et auteurs)
- par mot-clé

Avertissement

Cette page est un article d'une publication de l'EPFL.
Le contenu et certains liens ne sont peut-être plus d'actualité.

Responsabilité

Les articles n'engagent que leurs auteurs, sauf ceux qui concernent de façon évidente des prestations officielles (sous la responsabilité du DIT ou d'autres entités). Toute reproduction, même partielle, n'est autorisée qu'avec l'accord de la rédaction et des auteurs.


Archives sur clé USB

Le Flash informatique ne paraîtra plus. Le dernier numéro est daté de décembre 2013.

Taguage des articles

Depuis 2010, pour aider le lecteur, les articles sont taggués:
  •   tout public
    que vous soyiez utilisateur occasionnel du PC familial, ou bien simplement propriétaire d'un iPhone, lisez l'article marqué tout public, vous y apprendrez plein de choses qui vous permettront de mieux appréhender ces technologies qui envahissent votre quotidien
  •   public averti
    l'article parle de concepts techniques, mais à la portée de toute personne intéressée par les dessous des nouvelles technologies
  •   expert
    le sujet abordé n'intéresse que peu de lecteurs, mais ceux-là seront ravis d'approfondir un thème, d'en savoir plus sur un nouveau langage.