Overview
From Microsoft Azure: cloud computing is the delivery of computing services—including servers, storage, databases, networking, software, analytics, and intelligence—over the Internet (‘the cloud’) to offer faster innovation, flexible resources, and economies of scale.
- Types of cloud deployment: private (own hardware, can provide best security), public (using hardware/software you don’t own, can be less secure), hybrid
Cloud computing services
Notes taken from the LinkedIn Learning Course, Learning Cloud Computing: Core Concepts, by David Linthicum, and various other online resources.
Infrastructure as a Service (IaaS)
- IaaS provider manages hardware, software, servers, and storage
- Customer can add and manage any software (operating systems, applications, etc.)
- Examples: Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform
Software as a Service (SaaS)
- SaaS provider manages infrastructure, operating systems, middleware, and data
- Applications typically run through web browsers, users may have subscriptions to access
- Examples: Salesforce (customer relationship management solution), Google Apps (provides storage, wood processing, spreadsheets, etc.), Microsoft Office 365
Platform as a Service (PaaS)
- PaaS provides the framework to build, test, deploy, manage, and update software produces
- Typically combines with an IaaS by including operating system, development tools, database management systems, etc.
- Examples: AWS Elastic Beanstalk (PaaS cloud that run within an IaaS - AWS), Apache Stratos, Google App Engine, Microsoft Azure
BigQuery
Google’s serverless data warehouse
- Run SQL queries on gigabytes to petabytes of data
- Mostly SQL, but supports user defined functions (function written using SQL expressions or in other languages like Javascript)
- Transfer data from external sources like Youtube, Teradata, Amazon S3
- Can integrate with tools like Tableau
- Joins seem strange - converts to nested and repeated rows
Conclusion: useful to store data or pull in data from 1 or more data sources and run SQL queries. More useful when working with big data (greater than a gigabyte), otherwise should just run locally. Seems challenging to use languages other than SQL, so not useful for deploying rigorous machine learning models (written in say Python or R).
Hadoop
MapReduce
- Typically written in Java, C, or Python (some basic Java required).
Alternatives:
- Pig: “Pig’s programming language referred to as Pig Latin is a coding approach that provides high degree of abstraction for MapReduce programming but is a procedural code not declarative”
- Typically fewer lines and easier to write than MapReduce, less efficient
- Hive: Uses a “SQL like query language known as HQL (Hive Query Language). Hive queries are converted to MapReduce programs in the background by the hive compiler for the jobs to be executed parallel across the Hadoop cluster”