QUESTION BANK
UNIT-I
PART-A
2. What is Virtualization?
9. What do you mean by operating system level and library support level of virtualization?
PART-B
PART-A
7. What are some examples of large cloud providers and their databases?
8. Define GPU?
PART-B
3. How does cloud architecture overcome the difficulties faced by traditional architecture? What are
the three differences that separate out cloud architecture from the tradition one?
5. Explain the cloud eco system and the architecture of P2P system?
UNIT-III
PART-A
2. Define BigTable?
9. Demonstrate how does the name node choose which data nodes to store replicas on?
PART-B
5. Explain the case study of Amazon Web Service reference and Go Grid?
UNIT-IV
PART-A
The Dataflow model provides a number of useful abstractions that insulate you from low-
level details of distributed processing, such as coordinating individual workers, sharding data
sets, and other such tasks. These low-level details are fully managed for you by Cloud
Dataflow's runner services.
When you think about data processing with Dataflow, you can think in terms of f our major
concepts:
1. Pipelines
2. PCollections
3. Transforms
4. I/O Sources and Sinks
Once you're familiar with these principles, you can learn about pipeline design principles to help
determine how best to use the Dataflow programming model to accomplish your data processing
tasks.
2. What is Java Cloud service?
One of the common problems faced by anyone building large scale distributed systems: how
to ensure that only one process (replace this with: worker, server or agent) across a fleet of
servers acts on a resource? Before jumping in to how to solve this, let us take a look at the
problem more in detail and see where it is applicable:
What are the examples of where we need a single entity to act on a resource?
3. Ensure that there is a single master that processes all writes (e.g., Google’s use of
Chubby for leader election in GFS, map reduce etc).
Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, low-cost, web-
based cloud storage service designed for online backup and archiving of data and
application programs. S3 was designed with a minimal feature set and created to make web-
scale computing easier for developers
Amazon Simple Queue Service (SQS) is a fully managed message queuing service that
enables you to decouple and scale microservices, distributed systems, and serverless
applications. SQS eliminates the complexity and overhead associated with managing and
operating message oriented middleware, and empowers developers to focus on differentiating
work
The buffer is used to create equilibrium linking various apparatus and to provide identical
rate among rapid services. In order to make system more efficient against the burst of traffic
or load, buffer is used. It synchronizes different component .
Firstly, download Serial to Ethernet Connector. Then install the software on the host
machine with the connected COM port device and on the guest OS.
Once installed, start the Serial to Ethernet program on the host PC, start the app and
choose “Server connection”. Configure the connection settings and click “Create
connection” to open the port with the device attached.
On a virtual machine, choose “Client connection”. Enter the server’s IP address as
well as the name of the open serial port and click “Connect”.
PART-B
1. Explain a user view of Google App Engine with suitable block schematic?
The Memcache and Task Queue services are integrated in the App Engine standard
environment. Memcache is an in-memory cache shared across the App Engine instances.
This provides extremely high speed access to information cached by the web server (e.g.
authentication or account information).
Task Queues provide a mechanism to offload longer running tasks to backend servers,
freeing the front end servers to service new user requests. Finally, App Engine features a
built-in load balancer (provided by the Google Load Balancer) which provides transparent
Layer 3 and Layer 7 load balancing to applications
WS EC2 Tutorial
It is very difficult to predict how much computing power one might require for an application which
you might have just launched. There can be two scenarios, you may over-estimate the requirement,
and buy stacks of servers which will not be of any use, or you may under-estimate the usage, which
will lead to the crashing of your application. In this AWS EC2 Tutorial we will understand all the
key concepts using examples and also perform hands-on to launch an Ubuntu instance.
What is AWS EC2 ?
mazon Elastic Compute Cloud, EC2 is a web service from Amazon that provides re-
sizable compute services in the cloud.
They are re-sizable because you can quickly scale up or scale down the number of server instances
you are using if your computing requirements change.
Suppose instead of taking AWS EC2, we consider taking a dedicated set of servers, so, what all we
might have to face:
Now for using these servers we have to hire an IT team which can handle them.
Also, having a fault in the system is unavoidable, therefore we have to bear the cost of
getting it fixed, and if you don’t want to compromise on your up-time you have to keep your
systems redundant to other servers, which might become more expensive.
Your own purchased assets will depreciate over the period of time, however, as a matter of
fact the cost of an instance have dropped more than 50% over a 3-year period, while
improving processor type and speed. So eventually, moving to Cloud is all more suggested.
For scaling up we have to add more servers, and if your application is new and you
experience a sudden traffic, scaling up that quickly might become a problem.
These are just a few problems and there are many others scenarios which make the case for EC2
stronger!
Let’s understand the types of EC2 Computing Instances:
Computing is a very broad term, the nature of your task decides what kind of computing you need.
General Instances
o For applications that require a balance of performance and cost.
E.g email responding systems, where you need a prompt response as well as
the it should be cost effective, since it doesn’t require much processing.
Compute Instances
Memory Instances
o For applications that are heavy in nature, therefore, require a lot of RAM.
E.g when your system needs a lot of applications running in the background
i.e multitasking.
Storage Instances
o For applications that are huge in size or have a data set that occupies a lot of space.
GPU Instances
Now, every instance type has a set of instances which are optimized for different workloads:
General Instances
o t2
o m4
o m3
Compute Instances
o c4
o c3
Memory Instances
o r3
o x1
Storage Instances
o i2
o d2
GPU Instances
o g2
Now let’s understand the kind of work that each instance is optimized for, in this AWS EC2 Tutorial:
4. Explain the architecture of Map Reduce in Hadoop?
MapReduce
MapReduce is a parallel programming model for writing distributed applications devised at
Google for efficient processing of large amounts of data (multi-terabyte data-sets), on large
Why MapReduce?
Traditional Enterprise Systems normally have a centralized server to store and process data. The
following illustration depicts a schematic view of a traditional enterprise system. Traditional
model is certainly not suitable to process huge volumes of scalable data and cannot be
accommodated by standard database servers. Moreover, the centralized system creates too
much of a bottleneck while processing multiple files simultaneously.
Google solved this bottleneck issue using an algorithm called MapReduce. MapReduce divides a
task into small parts and assigns them to many computers. Later, the results are collected at one
place and integrated to form the result dataset.
The Reduce task takes the output from the Map as an input and combines those data
tuples (key-value pairs) into a smaller set of tuples.
The reduce task is always performed after the map job.
Let us now take a close look at each of the phases and try to understand their significance.
Input Phase − Here we have a Record Reader that translates each record in
an input file and sends the parsed data to the mapper in the form of key-value pairs.
Map − Map is a user-defined function, which takes a series of key-value pairs and processes
each one of them to generate zero or more key-value pairs.
Intermediate Keys − The key-value pairs generated by the mapper are known as
intermediate keys.
Combiner − A combiner is a type of local Reducer that groups similar data from the map
phase into identifiable sets. It takes the intermediate keys from the mapper as input and
applies a user-defined code to aggregate the values in a small scope of one mapper. It is not a
part of the main MapReduce algorithm; it is optional.
Shuffle and Sort − The Reducer task starts with the Shuffle and Sort step. It downloads the
grouped key-value pairs onto the local machine, where the Reducer is running. The
individual key-value pairs are sorted by key into a larger data list. The data list groups the
equivalent keys together so that their values can be iterated easily in the Reducer task.
Reducer − The Reducer takes the grouped key-value paired data as input and runs a Reducer
function on each one of them. Here, the data can be aggregated, filtered, and combined in a
number of ways, and it requires a wide
ange of processing. Once the execution is over, it gives zero or more key- value pairs to
the final step.
Output Phase − In the output phase, we have an output formatter that translates the
final key-value pairs from the Reducer function and writes them onto a file using a record
writer.
Let us try to understand the two tasks Map & Reduce with the help of a small diagram
−
MapReduce Example
Let us take a real-world example to comprehend the power of MapReduce. Twitter receives
around 500 million tweets per day, which is nearly 3000 tweets per second. The following
illustration shows how Tweeter manages its tweets with the help of MapReduce.
As shown in the illustration, the MapReduce algorithm performs the following actions
−
Tokenize − Tokenizes the tweets into maps of tokens and writes them as key- value pairs.
Filter − Filters unwanted words from the maps of tokens and writes the
filtered maps as key-value pairs.
Count − Generates a token counter per word.
Aggregate Counters − Prepares an aggregate of similar counter values into
small manageable units.
UNIT-V
PART-A
3. Which security mechanism provides an effective control for data confidentiality and
integrity?
Data integrity in cloud storage. Abstract: Cloud computing is a promising computing model
that enables convenient and on-demand network access to a shared pool of configurable
computing resources. ... Therefore, an independent auditing service is required to make sure
that the data is correctly hosted in theCloud.
here are tools like Keepass and LastPass that offers to save the password on desktop or
browser. And then such tools can be used to automate browser login. Tools like these require
single master password and using that single password you can automte the login across
multiple websites. So testing such password fields using automation tools is going to be
different
7. What is VM-theft?
VM Theft: This is the ability to steal a virtual machine file electronically, which can then be
mounted and run elsewhere. It is an attack that is the equivalent of stealinga complete physical
server without having to enter a secure data center and remove a piece of computing equipmen
• the client presents the goals and the target of the analysis,
[1] Provide access to users from supplier, distributor, and partner networks.
[2]Provide access to new users outside the traditional organization perimeter after mergers
and acquisitions.
10. How access controls are used in identity management?
Provisioning
De-provisioning
Identity Lifecycle
The lifecycle of digital identities that have a lifecycle, just like the real-world entities
they represent.
PART-B
Cloud Provider
Cloud Auditor
Cloud Broker
Cloud Carrier
An intermediary that provides connectivity and transport of cloud services
While all cloud Actors involved in orchestrating a cloud Ecosystem are responsible
foraddressing operational, security and privacy concerns, cloud Consumers retain the data
ownership, and therefore remain fully responsible for:
• assessing the risk from any exposure or misuse of the data and the impact to their
business,
• Business Continuity
Disaster recovery plans o Restoration plan incorporating and quantifying the Recovery
Point Objective (RPO) and Recovery Time Objective (RTO) for services
2. Describe the challenges are applied in cloud computing security?
Modern Cloud Challenges
Cloud Security
Cloud security threats are the main concern today. Malicious or unintended actions can cause
damage at many levels in a company. This is a fact in cloud as well as non-cloud
environments, but as the sophistication of applications and services increases, the security
risks also grow. The attack surface of cloud services is higher than traditional service models,
as the related components have many endpoints and different protocols to manage in different
ways. A variety of approaches to identify and address both known and new threats is required.
Distributed-Denial-of-Service (DDoS) attacks are just one common threat to security in cloud
computing. Providers usually offer several architectural options to configure a safe
environment to prevent these kinds of threats, like traffic isolation (the capability to isolate
groups of virtual machines in groups of virtual separate networks) or access control lists to
create rules that define the permissions of a component with several levels of granularity.
Researchers are now also proposing automated systems to detect intrusions in cloud
deployments.
The second interesting challenge to traditional cloud paradigms is Data Management. Not so
long ago, data use to be thought of as a homogeneous entity that could be gathered and
archived in silos, usually relational databases (groups of tables that contain part of the data
and link each row to other rows in different tables).
But a whole new set of needs appeared with the SaaS business model. Now we have document
databases, big table databases, and graph and columnar databases for an almost innumerable
set of cases. This new reality where the data can drive the way a company measures its future
opportunities imposes a number of challenges—For example, data consistency must be
maintained, and data will typically need to be aggregated, extended, transformed and analyzed
in several contexts.
One aspect of data management to be addressed is the consistency of the data. This means that
all the users of a cloud application should see the same data at the same time. This is not a
trivial task, as the data resources (hardware-software) are pooled according to the physical
restrictions of networks, the transactional status of the operations—and, the client’s
connection could have an impact on consistency. A good analysis of this issue was presented
recently by Rick Hoskinson . Unifying the way that time is measured in all the millions of
simultaneous clients, regardless of their hardware or connection, is an incredible engineering
effort.
No system is free of failures. No matter how good the process or the measures taken to
address the risks, we live in an uncertain world. Resiliency is the ability to handle failures
gracefully and recover the whole system. This is a huge challenge for services and
applications where the components compete for resources, and depend on other internal or
external components/ services that fail, or may rely on defective software. Planning the way
that those failures will be detected, logged, fixed and recovered involves not only developers
but all the teams as part of a cloud strategy. To help, tools are available to simulate random
failures, from hardware issues to massive external attacks—including failed deployments or
unusual software behavior.
The three basic techniques that are used to increase the resiliency of a cloud system are:
Checking and monitoring: An independent process continuously reviews that the system
meets the minimum specifications of behavior. This technique, albeit simple, is key to detect
failures and reconfiguration of resources.
Checkpoint and restart: The status of the entire system or the key status variables are saved
based on certain circumstances (e.g. every x timestep or before a release, etc.) System failures
show a process of restoration to the latest correct checkpoint and the system recovers.
Replication: The critical components of a system are replicated using additional resources
(hardware and software), ensuring that they are available at any time. The additional challenge
with this technique is the status synchronization task between the replicas and the main system.
Note, too, that there are many different levels of fault tolerance in cloud computing. The
balance between resources, costs and acceptable resilience level should achieve the best
results possible. A scenario with multiple machines used as replicas in the same cluster
assumes that the tolerable resilience level is in the cluster. A more reliable approach is cluster
replication in the same data center which separates the replicas in independent clusters. With
this configuration, the data center is the single point of failure. A more complicated but
reliable scenario is the replication of systems in different data centers. This way, even with
large outages, the resilience of a system could be guaranteed.
Dynamic resource allocation is very much popular research area in cloud environment due to its
live application in data center.Becasue of dynamic and heterogeneous nature of cloud, allocation of
virtual machine is affected by various parameters like QOS, time consumption, cost, carbon effect
etc. Grouping of virtual machine which is communicate to eachother to execute one large request is
comes into affinity group. Here we will study details of allocation method, affinity of virtual
machine and how it will give good performance over non-affinity group and give some idea about
new technique to improve performance which we will implement in future
Resource Allocation Strategy (RAS) is all about integrating cloud provider activities for utilizing
and allocating scarce resources within the limit of cloud environment so as to meet the needs of the
cloud application. It requires the type and amount
of resources needed by each application in order to complete a user job. The order and time of
allocation of resources are also an input for an optimal RAS. An optimal RAS should avoid the
following criteria as follows:
a) Resource contention situation arises when two applications try to access the same resource
at the same time.
b) Scarcity of resources arises when there are limited resources.
c) Resource fragmentation situation arises when the resources are isolated.
d) Over-provisioning of resources arises when the application gets surplus resources than the
demanded one.
e) Under-provisioning of resources occurs when the application is assigned with fewer
numbers of resources than the demand.
From the perspective of a cloud provider, predicting the dynamic nature of users, user demands, and
application demands are impractical. For the cloud users, the job should be completed on time with
minimal cost. Hence due to limited resources,resource heterogeneity, locality restrictions,
environmental necessities and dynamic nature of resource demand, we need an
efficient resource allocation system that suits cloud environments. Cloud resources consist of
physical and virtual resources. The physical resources are shared across multiple compute requests
through virtualization and provisioning. The request forvirtualized resources is described through a
set of parameters detailing the processing, memory and disk needs. Provisioning satisfies the request
by mapping virtualized resources to physical ones. The hardware and software resources are
allocated to the cloud applications on-demand basis. For scalable computing, Virtual Machines
are rented. The complexity of finding an optimum resource allocation is exponential in huge
systems like big clusters, data centres or Grids. Since resource demand and supply can be dynamic
and uncertain.
4.Explain the two fundamental functions, identity management and access control,
which are required for secure cloud computing.
Cloud computing offers a rich set of services by pay per use basis. The features and
technology offered by various providers created a great competitive market for the business.
The various security issues are attracting attention, one of which is identity and privacy of
the cloud user. Users are varied about the privacy of information which they have given to
the provider at the time of registration. We present an analysis of various identity
management systems and proposing a simple trust based scheme for a cloud computing
application and service.
In a cloud computing system federated Identity management concept is essential along with
the strong and trusted Identity management system itself. Identity management systems
discussed in the previous section is not sufficient for the cloud environment. A trust based
solution can be proposed as follows.
In this direction, this paper presents our early steps towards innovative autonomic
resource provisioning and management techniques for supporting SaaS applications hosted on
Clouds. Steps towards this goal include (i) development of an autonomic management system and
algorithms for dynamic provisioning of resources based on users’ QoS requirements to maximize
efficiency while minimizing the cost of services for users and (ii) creation of secure mechanisms to
ensure that the resource provisioning system is able to allocate resources only for requests from
legitimate users. We present a conceptual model able to achieve the aforementioned goals and
present initial results that evidence the advantages of autonomic management of Cloud
infrastructures.
are referred to as Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a
Service (SaaS), respectively. To support end-user applications, service providers such as Amazon [3],
HP [4], and IBM [5] have deployed Cloud data centers worldwide. These applications range from
generic text processing software to online healthcare. Once applications are hosted on Cloud
platforms, users are able to access them from anywhere at any time, with any networked device,
from desktops to smartphones. The Cloud system taps into the processing power of virtualized
computers on the back end, thus significantly speeding up the application for the users, who pay for
the actually used services. However, management of large-scale and elastic Cloud infrastructure
offering reliable, secure, and cost-efficient services is a challenging task. It requires co-optimization
at multiple layers (infrastructure, platform, and application) exhibiting autonomic properties. Some
key open challenges are: • Quality of Service
Quality of Service (QoS). Cloud service providers (CSPs) need to ensure that sufficient amount of
resources are provisioned to ensure that QoS requirements of Cloud service consumers (CSCs) such
as deadline, response time, and budget constraints are met. These QoS requirements form the basis
for SLAs (Service Level Agreements) and any violation will lead to penalty. Therefore, CSPs need to
ensure that these violations are avoided or minimized by dynamically provisioning the right amount
of resources in a timely manner.
Energy efficiency. It includes having efficient usage of energy in the infrastructure, avoiding
utilization of more resources than actually required by the application, and minimizing the carbon
footprint of the Cloud application.
Security. Achieving security features such as confidentiality (protecting data from unauthorized
access), availability (avoid malicious users making the application unavailable to legitimate users),
and reliability against Denial of Service (DoS) attacks. The DoS is critical because, in a dynamic
resource provisioning scenario, increase in the number of users causes automatic increase in the
resources allocated to the application. If a coordinated attack is launched against the SaaS provider,
the sudden increase in traffic might be wrongly assumed to be legitimate requests and resources
would be scaled up to handle them. This would result in an increase in the cost of running the
application (because provider will be charged by these extra resources) as well as a waste of energy