Data Analytics Tools Used

YouTube Analytics on AWS

Built storage & pipeline infrastructure with AWS CLI then loaded structured and semi-structured YouTube data into S3 Data Lake. Transformed .csv and .json data into Apache Parquet with python scripts on AWS Lambda and Glue ETL. Built data catalogs with Glue Crawler to form the schema of the Data Lake and built a Glue ETL to produce an Analytic Table. Visualized results in a dashboard using AWS QuickSight.

View Github

Google Analytics Dashboard

Extract-transform-loaded (ETL): GA data (1.4 million records) ➜ BigQuery ➜ Google Cloud Storage ➜ Tableau Server Connection.

BigQuery was used to calculate desirable metrics (Page views, time on page, bounces, sessions) for use as KPIs in Tableau. Created filters and parameters for segmentation and comparison analysis.

View Github | View App

IMDB Ratings Visualization

This application takes a chosen film as an input and creates a plot and table describing the rank of the film for the specific genres it is categorized in.

2 datasets, 2 APIs, and a web crawler comprise the backend data handling for this project and the data visualization framework `Dash` was used for the application frontend.

View Github | View App

Spotify Recently Played Songs

Extracted personal data from Spotify API using OAuth 2.0 framework and the authorization code flow structure for long-running apps. Transformed this raw data into a data frame using `pandas` & loaded resulting data frame into SQL Server database using `pyodbc` and `SQLAlchemy` to append new data. Scheduled the above batch ETL process using Apache Airflow.

View Github

Word Prediction Model

Developed an application in `shiny` which accepts a string of words and returns a table of recommendations sorted by probability. The `tidytext` package in R was used for processing, cleaning and analyzing the data, while `ggplot2` was used for visualizations.

View Report | View Github | View App

Canada Population Dashboard

An interactive map showing division population changes based on user chosen dates. The packages `dplyr` and `data.table` in R were used to process and clean the raw population and shape data and the package `shiny` was used to build the application UI. Data was sourced from Statistics Canada.

View Report | View Github | View App

Covid-19 Data Dashboard

Statistics and research data was sourced from Our World in Data to visualize how covid-19 has spread over time and between countries. The data was explored and cleaned with Microsoft SQL and visualized in Tableau.

View Github | View App

Human Activity Prediction Using Machine Learning

This project employs the `caret` package in R to build machine learning models that can predict which exercise an individual is performing from wearable sensor data. The data for this project was sourced from UCI Machine Learning Repository

View Report | View Github

Weather Data Cleaning and Visualization in R

Storm data spanning from 1950-2011 was acquired from the National Weather Service. This data was processed and analyzed to determine which weather related events were the most costly to human life as well as property and crops.

View Report | View Github


Personal Projects

Nvidia Graphics Card Scraper

A python Web Scraper used to search a number of different retailers to check for stock. The package `twilio` is used to send a text when stock is received. This app was used to acquire a personal graphics card during the "Great Graphics Card Shortage of 2020".

View Github

TIFF Data Scraping with Organization in Excel

Sometimes it is difficult to tell which movies are worth seeing at a film festival since they have never been seen before. In 2019, TIFF decided to allow attendees to "heart" the films they wished to see. I decided to use this as a metric for "popularity" so I could better decide which films were worth attending.

View Github

Auto Backup & File Management

An interactive python tool for keeping track of videos, pictures, documents etc. that have been backed up. The `tqdm` package in python is used to monitor file transfer progress. Records of previous backups are stored in a .json file.

View Github