Bret Young

I’m an aspiring Data Scientist with over 7 years of experience in variety of industries. I am a highly motivated individual devoted to solving problems and continued learning. I utilize a variety of analytical techniques to develop creative solutions to challenging problems and make intuitive decisions and effectively communicating to all audience types. I am experienced in Python, R, Data Cleaning, Data Visualization, and Machine Learning.

I started my journey by obtaining my undergraduate degree in biochemistry from Northern Illinois University. After graduating, I began working in health, safety, and environmental for a worldwide engineering, procurement, and construction company where I collected data in relation to incidents, employee exposures and waste. The data was analyzed to identify trends which led to campaigns to enhance employee awareness and updated policies to protect workers as well as reduce project costs. I also aided in generating and presenting safety training and presented data trends in weekly project meetings.

I then moved into the hospitality industry working for another worldwide company where I took a role as a scheduler ensuring employees were scheduled in accordance with labor laws and agreements. This role allowed me to analyze shift data and identify areas of improvement reducing expenditures. Seeing all the data that was available in this role, I decided to continue my education and started my master’s degree from Bellevue University in data science.

When I am not working on projects or studying, I spend time with my family or tying flies for my next fly-fishing adventure.

My Skills

Languages: Python, R, Visual Basic

I am most proficient with Python and most of my projects start here. I am also familiar with R and Visual Basic.

Data Cleaning: Pandas, NumPy, Dplyr, Tidyr

Most of my time during a data science project is spent cleaning and exploring data. I constantly use these to uncover patterns and test hypothesis through summary statistics.

Data Visualization: Tableau, Power BI, Matplotlib, Seaborn, Plotly, GGplot2

Being able to visualize data is my favorite part of a project. I am experienced with Matplot lib and Ggplot2. I have also used Seaborn and Plotly in my projects depending on visualization needs.

Machine Learning: sklearn, Tensorflow

I am most comfortable working with Scikit-learn for building machine learning models. I have used this for many projects and am familiar with many algorithms. When working with neural networks, TensorFlow is my preference.

Data Bases: SQL, MySQL, PostgreSQL

For databases, I am trained in SQL and PostgreSQL. I have also worked in MySQL in the past.

Version Control: Git, GitHub

Organization and version control are important in any project. I always build a project using Git or GitHub for this reason.

IDES: Jupyter, Atom, Pycharm, RStudio

Utilizing an integrated development environment (IDE) is an important aspect to improve code maintaniability. I prefer to use JupyterLab or ATOM for my Pyton IDE and Rstudio for R. I have also used Pycharm for past projects.

Data Projects

Sales Prediction

Time Series Analysis

Environment: Python & Tableau

Sales forecasting is an important aspect of determining if the company is meeting their goals, setting budgets and determine if sales promotions are needed. The goal of this project was to forecast the next 7 days after the series data ended. A naive baseline and an ARIMA model were compared with the ARIMA model performing better.

Documentation

Loan Value Prediction

Model Evaluation

Environment: R & Python

Various models were used to predict loan amount for online loan applications. The random forest model with 7 trees performed the best with the lowest mean squared error and highest r-squared values.

Documentation

Condition Classification from Drug Review

Text Analysis & Neural Networks

Environment: Python

Drug reviews from Drugs.com were converted into a vector representation and feed into a bidirectional long-shhort term memory recurrent neural network. The model accuracy was 60% compared to 26% for the baseline randomly predicting the highest occurring condition.

Documentation

Airline Safety

Data Visualization

Environment: Tableau

Airline and automobile fatality datasets were used to show that airline travel is still safer than automobiles despite an increase in unfortunate events. This project is still in progress.

Documentation

Animal Classification

Data Exploration & Modeling

Environment: Python

Traits of animals were used to create a decission tree used to classify the animal into mammal, bird, reptile, fish, invertibrate, bug or amphibian.

Documentation

Movie Recommender System

Data Modeling

Environment: Python

In the era of streaming services, having the ability to recommend movies thhe client may be highly interested in is vital. This project is designed to present the top 10 movies based on distance to the selected movie using a k-Nearest Neighbors model on multiple user ratings.

Documentation

Sentiment Analysis

Data Exploration & Analysis

Environment: Python

Product reviews of women's clothing were analyzed to determine if the review was positive or negative based on a list of predefined words. The final decision was made by summing the positive and negative values, which may have also ended with a neutral review classification.

Documentation

API Usage

Data Collection & Cleaning

Environment: Python

Application programming interfaces, API's, for OpenWeatherMap and Twitter were used to obtain data and used to build databases for future analysis.

Documentation

Salmon Catch Location Predictor

Data Modeling

Environment: Python

Historical North Atlantic salmon catch data from the International Counsil for the Exploration of the Sea was analyzed to determine the optimal location to commercially fish for Atlantic salmon.

Documentation

Data Wrangling

Data Cleaning

Environment: Python

Data comes in many different formats and very rarely is ready to be used in a model without processing. This project performed cleaning and transformation to a dataset to prepare it for machine learning and additional analysis.

Documentation

Get In Touch

If you would like to get in touch, click below and I will try to respond as soon as possible.

CONTACT ME

I'm Bret,

a data scientist.

My Skills

Languages: Python, R, Visual Basic

Data Cleaning: Pandas, NumPy, Dplyr, Tidyr

Data Visualization: Tableau, Power BI, Matplotlib, Seaborn, Plotly, GGplot2

Machine Learning: sklearn, Tensorflow

Data Bases: SQL, MySQL, PostgreSQL

Version Control: Git, GitHub

IDES: Jupyter, Atom, Pycharm, RStudio

Data Projects

Sales Prediction

Time Series Analysis

Loan Value Prediction

Model Evaluation

Condition Classification from Drug Review

Text Analysis & Neural Networks

Airline Safety

Data Visualization

Animal Classification

Data Exploration & Modeling

Movie Recommender System

Data Modeling

Sentiment Analysis

Data Exploration & Analysis

API Usage

Data Collection & Cleaning

Salmon Catch Location Predictor

Data Modeling

Data Wrangling

Data Cleaning

Get In Touch