1.Introduction to Big data and Cloud Computing
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over a network (typically the Internet). It’s a virtualization framework.
It is like resource on demand whether it be storage, computing etc. Cloud follows pay per usage model. You need to pay the amount of resource you use.
This computing service by cloud charges you based only on the amount of computing resources we use. So for example, if you want to give demo to a client on a cluster of more than 100 machines and you do not have so many machines currently available with you, then in such case cloud computing plays a very important role.
Cloud plays an important role within the Big Data world, by providing horizontally expandable and optimized infrastructure that supports practical implementation of Big Data.
2.Cloud Computing and Big Data
In cloud computing, all data is gathered in data centers and then distributed to the end-users. Further, automatic backups and recovery of data is also ensured for business continuity, all such resources are available in the cloud. We do not know exact physical location of these resources provided to us. You just need dummy terminals like desktops, laptops, phones etc. and a net connection.
There are multiple ways to access the cloud:
1) Applications or software as a service (SAAS) ex. Salesforce.com, dropbox, google drive etc.
2) Platform as a service (PAAS)
3) Infrastructure as a service (IAAS)
3. Features of Cloud Computing
Let us see few features of cloud computing:
Scalability is provided by using distributed computing
Customers are allowed to use and pay for only that much resource which it is using. In cloud computing, elasticity is defined as the degree to which a system is able to adapt to workload changes in an autonomic manner, so that at any time the available resources match the current demand as closely as possible.
c. Resource Pooling
Same resources are allowed to be used by multiple organizations. The computing resources are pooled for serving various consumers via multi-tenant model, with different resources dynamically assigned and reassigned according to consumer demand.
d. Self service
Customers are provided easy to use interface through which they can choose services they want. A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed without requiring human interaction.
e. Low Costs
It charges you based only on the amount of computing resources we use and you need not buy expensive infrastructure. Pricing on a utility computing basis is usage-based and fewer IT skills are required for implementation.
f. Fault Tolerance
Allows recovery in case of a part in cloud system fails to respond.
4. Cloud Deployment Models
There are mainly 2 types of cloud deployments models:
- Public cloud – A cloud is called a “public cloud” when the services are open over a network for public use.
- Private Cloud – Private cloud is operated solely for a single organization, whether managed internally or by a third-party, and hosted either internally or externally.
5. Cloud Delivery Models
Cloud services are categorized as below:
1) Infrastructure as a service (IAAS): It means complete infrastructure will be provided to you. Maintenance related tasks will be done by cloud provider and you can use it as per your requirement. It can be used as public and private both.
Examples of IaaS are virtual machines, load balancers, and network attached storage.
2) Platform as a service (PAAS): Here we have object storage, queuing, databases, runtime etc. All these we can get directly from the cloud provider. It’s our responsibility to configure and use that. Providers will give us the resources but connectivity to our database and other similar activities are our responsibility. Examples of PaaS are Windows Azure and Google App Engine (GAE).
3) Applications or software as a service (SAAS) ex. Salesforce.com, dropbox, google drive etc. Here we do not have any responsibility. We are using the application that is running on the cloud. All infrastructure setup is the responsibility of the service provider. For SaaS to work, the infrastructure (IaaS) and the platform (PaaS) must be in place.
6. Cloud for Big Data
Below are some examples of how cloud applications are used for Big Data:
IAAS in a public cloud: Using a cloud provider’s infrastructure for Big Data services, gives access to almost limitless storage and compute power. IaaS can be utilised by enterprise customers to create cost effective and easily scalable IT solutions where cloud providers bear the complexities and expenses of managing the underlying hardware. If the scale of a business customer’s operations fluctuate, or they are looking to expand, they can tap into the cloud resource as and when they need it rather than purchase, install and integrate hardware themselves.
PAAS in a private cloud: PaaS vendors are beginning to incorporate Big Data technologies such as Hadoop and MapReduce into their PaaS offerings, which eliminate the dealing with the complexities of managing individual software and hardware elements. For example, web developers can use individual PaaS environments at every stage of development, testing and ultimately hosting their websites. However, businesses that are developing their own internal software can also utilise Platform as a Service, particularly to create distinct ring-fenced development and testing environments.
SAAS in a hybrid cloud: Many organizations feel the need to analyse the customer’s voice, especially on social media. SaaS vendors provide the platform for the analysis as well as the social media data. Office software is the best example of businesses utilising SaaS. Tasks related to accounting, sales, invoicing and planning can all be performed through SAAS. Businesses may wish to use one piece of software that performs all of these tasks or several that each perform different tasks. The software can be subscribed through internet and then accessed online via any computer in the office using a username and password. If needed, they can switch to software that fulfills their requirements in better manner. Everyone who needs access to a particular piece of software can be set up as a user, whether it is one or two people or every employee in a corporation that employs hundreds.
7. Providers in the Big Data Cloud Market
Cloud computing companies come in all shapes and sizes. All large software vendors either have already started offerings in cloud space, or are in the process of launching one. In addition there are many startups that have interesting products in cloud space. Here we have a list of major vendors of cloud computing. Few of the cloud providers are google, citrix, netmagic, redhat, rackspace etc. Amazon (aws) is the leading cloud provider amongst all. Microsoft is also providing cloud services and it is called as azure.
Infrastructure as a Service cloud computing companies:
- Amazon’s offerings include S3 (Data storage/file system), SimpleDB (non-relational database) and EC2 (computing servers).
- Rackspace’s offerings include Cloud Drive (Data storage/file system), Cloud Sites (web site hosting on cloud) and Cloud Servers(computing servers).
- IBM’s offerings include Smart Business Storage Cloud and Computing on Demand (CoD).
- AT&T’s provides Synaptic Storage and Synaptic Compute as a service.
Platform as a Service cloud computing companies
- Googles AppEngine is a development platform that is built upon Python and Java.
- com’s provides a development platform that is based upon Apex.
- Microsoft Azure provides a development platform based upon .Net.
Software as a Service companies
- In SaaS, Google provides space that includes Google Docs, Gmail, Google Calendar and Picasa.
- IBM provides LotusLive iNotes, a web-based email service for messaging and calendaring capabilities to business users.
- Zoho provides online products similar to Microsoft office suite.
8. Issues in Using Cloud Services
Some important cloud services issues are as listed:
a. Data Security
Organizations must ensure that their agreement with the cloud service provider ensure data security. Handing over private data to others worries some people. Corporate executives might hesitate to take advantage of a cloud computing system because they can’t keep their company’s information under lock and key.
Parameters of cloud performance must be specified in the agreement and quantified wherever possible. Exceptions must be clearly noted. Service-Level Agreement (SLA) should clearly state all the terms and conditions between a service user and a service provider to ensure proper performance.
Cloud services must be compatible with the compliance needs of the business. Some companies are also concerned about regulatory issues. Market observers say that around 50 percent people worry that they will be tied to one provider of cloud storage.
d. Legal Issues
Organization must ensure that the location of the physical resources of the cloud does not bring any legal issue. The cloud presents a number of legal challenges towards privacy issues involved in data stored in multiple locations in the cloud, additionally increasing the risk of confidentiality and privacy breaches.
Organizations should be aware of all the costs involved with the use of cloud, and use the services in a controlled manner as cloud offers pay as per usage method of the cost incurred by the company.