Class of 2023

Photo of Yiyang (Jade)  Cao

Yiyang (Jade) Cao

Graduate StudentEmail Yiyang (Jade) Cao

During my undergraduate study at the University of California, San Diego, I built up a solid fundamental background in probability and statistics. I also have taken data science related courses including Data Analysis and Inference, Machine Learning, and programming-related courses like Python. Moreover, I completed several research projects on data science related topics. One of the research projects was a group project. I collaborated with 3 other group members to develop PCA and K-means algorithms from scratch to clarify 6 human activity behaviors through human engagement with smartphones. We conducted parallel comparison analysis with supervised learning models like random forest and logistic regression. These projects were the first step I utilized theory to practice, including applying the statistical techniques on real datasets using programming language and trying to draw a conclusion based on the analysis. I saw the potential that I could solve a real-world problem by myself independently one day by analyzing data. Therefore, I decided to pursue a master’s degree to deep dive into the data science field, especially in related algorithms. To explore the real-world environment, I became a data scientist assistant at Deloitte. During this experience, I established a predictive modeling pipeline based on 2019 domestic aviation data with 294798 observations from Dallas Airport. I performed preprocessing and visualizations to showcase aviation industry trends for airlines and airports by applying logistic regression, decision tree, random forest, and support vector machine models with cross-validation, parametrization, and evaluation. And the comparison showed the decision tree model to be the best-fit model which interpreted 14.7% data correctly. The result indicated whether a flight would delay or not, but not precise enough to predict in a 15-minute range. When I noticed the inadequate outcome after fitting the models, I self-studied how to parametrize models and implement them in the project. This experience made me realize the complete difference between school projects and projects in the real industry. For example, the data-cleaning process could consume a large amount of time in real-world problems. To what extent should I accept that the data is ready to be used to build the model? I realized besides technical skills, it was essential to understand the business value behind data, and how the problem can be turned into data-related problems. While the outcomes did not turn out perfectly, they showed my motivation for independent thinking and learning. My immediate goal after graduation is to work as a data scientist, helping companies better draw out business insights using data-driven techniques. I look forward to being a lifelong learner and a transferable professional that will explore more cutting-edge data science techniques and gain a comprehensive study in all fields of data science via contributing to different industries.