Skip to content

Data Science

In addition to my journalism studies, I also completed a minor in data science. This portfolio provides some examples of my work.

Minoring in data science at a liberal arts college like Washington and Lee University has not only given me basic skills in data science, but it has also allowed me to learn data science applications across several different fields. My data science journey has taken me through sports, programming, business and cognitive behavioral science. My experience through the minor has allowed me to gain skills in programming, statistical analysis, web design and data gathering. It has also given me a peek into what data scientists do at the professional level and how the skills I’ve learned can apply to the workforce. I’ve learned that individuals like myself, who won’t pursue a full-time data science career, can use data science skills and techniques to increase performance and efficiency in whatever field you’re working in.

SOAN 220: A World of Baseball and Statistics – Prof. Eastwood and Prof. Kosky – Spring Term 2022

FinalProject

For our last project in this class, I used multilevel modeling to try and find appropriate predictors for the ability to throw a first-pitch strike. For this project, I scraped Fangraphs and Statcast data, filtered it, and created both new and dummy variables to track whether a strike or ball was thrown on the first pitch. First, I created a visualization showing the random effects of throwing a first pitch strike.

Then, I controlled for the effect of throwing a breaking ball. I created a graph with all of the individual effects of throwing a breaking ball, and then a graph that has each pitcher’s probability of throwing a first-pitch strike when controlling for a breaking ball being thrown. I added another variable to control for, FIP, which is a good indicator of pitcher dominance.

My goal was to have all of my plotted intercepts move closer to the middle (0), which would’ve signified that both FIP and breaking ball were good predictors of whether a first-pitch strike was coming. This multilevel modeling project would be good to use in any scenario where you are trying to predict an outcome using multiple variables. The objective of this project was to expand our data analysis skills into modeling and predicting future outcomes, as well as learning new R techniques. This project was strong, but throwing a first-pitch strike is hard to predict based off of other metrics, which I found out. So, to effectively do it, you would need to control for more than just two variables.

We spent a lot of time in this class learning the fundamentals of programming in R along with scraping, filtering, cleaning, visualizing, and modeling data. We also learned specifically how to scrape Statcast data, which is official pitch and batted ball data collected by the MLB. Statcast data is widely used at the professional level because of how specific it is and how those metrics can be better predictors and analyzers of play. A hit can travel as far as 500 feet or as little as three feet, but batted ball data gives you a better idea of what occurred.

We also spent a lot of time in this course discussing the ethics of new-school thinking in the game of baseball. It goes beyond just what are the rules but to what do we want to see going forward. Do we want to witness a sport where everything can be predicted? Is uncertainty the reason behind our love for the game? These are questions I often think about in my baseball broadcasting.

DCI 110: Web Programming for Non-Programmers – Prof. Barry – Fall 2022

Link to website: https://hunterj24.wludci.info/

For my final project in DCI 110, I created a one-page website that I created content and designed from scratch. I used the HTML skills I picked up on to create the text and format it by headers and body paragraphs. I used to add color and better align some of my text. I used my bootstrap skills and its grid to organize my content on the screen. I created a responsive menu, a header, a picture carousel, and a video section with embedded YouTube videos. I then used a web server to get my website up and running, which it still is. This project taught me handy skills that could be used to make another website, whether it be for a job, a personal project, or a personal portfolio that I can send to employers. It also gave me the skills to be able to troubleshoot websites that have styling errors in them.

Throughout the course, I learned about most of the components that go into developing a website. In our increasingly digital world, websites carry greater importance, and being able to understand the design choices and how they are made are valuable skills. In this course, we spent a week each on backend, fullstack, and mobile app development. We spent most of our time on frontend web development and our final project, which incorporated a lot of frontend web development. In this course, Professor Barry taught a good bit about the specifics and history of the web to give us a strong background on how it works.

BUS 315 – Database Management – Prof. Larson – Fall 2023

315-Deliverable-1

Any legitimate business has a lot of data to track. Every sale, inventory order, shift worked, pay date, and so much more has to be tracked. It has to be tracked in order to have records, but it also needs to be tracked so it can be analyzed. Data is a big way companies can make decisions for success. In Business Management, we learned hard coding skills in SQL, but we also learned how to organize data and how to map relationships that connect data in databases together.

            For our final project, we were tasked to create a database with one-to-many and many-to-many relationships. For each of our entities, we had to create a list of attributes and make sure to have primary and foreign keys that match our relationships. I worked in a group of three for this project. I created hundreds of rows of data for each of the entities we created. The database we created was for a hypothetical food truck company that operated a few different food trucks in the region. We created entities that tracked shifts, individual orders, location of food trucks, customer information, and a few other categories. After we created the database, put in the data and forward engineered it in SQL, we used our SQL coding knowledge to create queries that solved real-life business problems.

Examples of the questions included asking for a specific employee’s contact information, tracking sales for a specific food truck in a week, or which customers were repeat customers. The goal of this project was to apply our knowledge from throughout the semester into a real-world scenario. We got a sense of what it is like to create a database based on a specific company. We were tasked with coming up with real business problems and then using our database to query answers to them. Creating a database like this would be useful for any entrepreneurial or business venture. Like I mentioned earlier, data can be used as an asset in decision making, and using SQL queries will give good results if you build good databases.

This project was strong in that the code worked and our database relationships made sense, but it was limited in what we could query. We had to generate fake data which was time consuming, so there often wasn’t a logical pattern in some of the data. We didn’t have the time or the resources to make a realistic database, so we didn’t have all of the possible relationships or data included in our project.

The Minor:

The data science minor at W&L has six core competencies:

  • Collect and analyze data in a reproducible and ethically responsible manner
  • Obtain data through searching, scraping, mining or experimental methods
  • Parse, transform and generate wide-ranging data sets for analysis
  • Statistically analyze data to summarize, draw inferences and make predictions
  • Identify patterns and relationships in datasets using visualization and algorithms
  • Communicate data methods and conclusions to diverse audiences

The classes I’ve taken for my data science minor include:

SOAN 220: A World of Baseball and Statistics, DCI 110: Web Programming for Non-Programmers, BUS315: Database Management, INTR202 : Applied Statistics, BUS306A: Applied AI and Machine Learning, CBSC185: Intro to Data Science: Trends over Time, and DS401: Directed Individual Study.

These classes have taught me how to think ethically about data, which is not just thinking about what the rules are, but about what is right and what should we strive for when it comes to data science in our world. It has taught me to be able to visualize and explain my data and results for diverse audiences so that many can understand what I am doing. I have also learned hard skills in R, SQL, and Python.

Other Projects:

Assignment2 CBSC185FinalPoster-3 Final-Project-Report-BUS306