1 Overview

From Microsoft Azure: cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (‘the cloud’) to offer faster innovation, flexible resources, and economies of scale.

1.1 Cloud computing services

Notes taken from the LinkedIn Learning Course, Learning Cloud Computing: Core Concepts, by David Linthicum, and various other online resources.

Infrastructure as a Service (IaaS)

  • IaaS provider manages hardware, software, servers, and storage
  • Customer can add and manage any software (operating systems, applications, etc.)
  • Examples: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform

Software as a Service (SaaS)

  • SaaS provider manages infrastructure, operating systems, middleware, and data
  • Applications typically run through web browsers, users may have subscriptions to access
  • Examples: Salesforce (customer relationship management solution), Google Apps (provides storage, wood processing, spreadsheets, etc.), Microsoft Office 365

Platform as a Service (PaaS)

  • PaaS provides the framework to build, test, deploy, manage, and update software produces
  • Typically combines with an IaaS by including operating system, development tools, database management systems, etc.
  • Examples: AWS Elastic Beanstalk (PaaS cloud that run within an IaaS - AWS), Apache Stratos, Google App Engine, Microsoft Azure

2 BigQuery

Google’s serverless data warehouse

Conclusion: useful to store data or pull in data from 1 or more data sources and run SQL queries. More useful when working with big data (greater than a gigabyte), otherwise should just run locally. Seems challenging to use languages other than SQL, so not useful for deploying rigorous machine learning models (written in say Python or R).

3 Hadoop

3.1 MapReduce

  • Typically written in Java, C, or Python (some basic Java required).

Alternatives:

  • Pig: “Pig’s programming language referred to as Pig Latin is a coding approach that provides high degree of abstraction for MapReduce programming but is a procedural code not declarative”
    • Typically fewer lines and easier to write than MapReduce, less efficient
  • Hive: Uses a “SQL like query language known as HQL (Hive Query Language). Hive queries are converted to MapReduce programs in the background by the hive compiler for the jobs to be executed parallel across the Hadoop cluster”