About Me
Hi! I’m Elizabeth Hance and I love data. After getting my Master’s degree in computational mathematics, I started this page as a way to document what I’ve learned about data science. This site is a work in progress and the topics explored here do not display the full depth and breadth of my knowledge. The explanations of the concepts are meant to be brief with simple examples. Feel free to reach out to me at my contact info below!
Topics Covered
Content for these topics was gathered from a variety of sources including: DataCamp’s Data Scientist with R Career Track, Springer’s book: An Introduction to Statistical Learning with Applications in R, Wikipedia, various Medium articles, other online blogs, and my graduate coursework.
- Modeling:
- Regression
- Classification
- Clustering
- Gradient Boosting
- Validation/Deployment:
- Model Validation
- Model Deployment
- Additional Topics:
- Databases
- Cloud Computing
- Shell/Git
Also see Additional_Topics for resources not covered on this page including:
- Causal inferences notes
- Notes from a deep learning coursear course
- Exploring interactions in a GAM
- Brief notes from a pandas tutorial
R Packages
Most of the examples are written in R, so here are some packages that are frequently used:
All cheat sheets
- tidyverse
- dplyr - data manipulation
- ggplot2 - data visualization
- tidyr - to tidy/clean data
- readr - read rectangular data
- stringr - working with strings
- haven - for SPSS, Stata, and SAS data
- lubridate - working with dates
- readxl
- purrr
- tibble
- forcats
- data.table - fread: to read rectangular data
- broom - tidy data from R functions
- DBI - database connections
- sqldf - use SQL to manipulate a dataframe
- RMarkdown