Zeyu/Catherine's Website

Social Media Platform for USC Recreational Sports Club.

Key Technologies: MongoDB, pymongo, Flask web, Firebase, RESTful API, SocketIO, HTML/CSS

2022

Video Introduction

Curated and stored USC Recreational Sports Club Instagram posts in MongoDB with pymongo.

Built Flask web app emulating Firebase, enabling RESTful API actions via command URLs.

Enhanced user engagement with real-time post and comment interactions using SocketIO.

UCrafted responsive UI with HTML/CSS for key pages (Home, Account, Post Detail).

Empowered users for dynamic social interaction: create, view, update, delete posts, and manage comments.

Analysis and Prediction of Energy Output from Ambient Variables in a Power Plant.

Key Technologies: Python, Scikit-learn, linear regression, polynomial regression models, k-nearest neighbor (KNN) regression

2022

Utilized Python and Scikit-learn to process and analyze a six-year dataset from a Combined Cycle Power Plant, emphasizing hourly ambient variables.

Applied linear regression techniques, both simple and multiple, to identify statistically significant predictors of energy output.

Explored nonlinear dynamics using polynomial regression models, and enhanced model robustness by integrating quadratic nonlinearities and interaction terms.

Implemented and tuned k-nearest neighbor (KNN) regression using raw and normalized features, determining optimal 'k' values and assessing model performance.

Conducted a comparative evaluation of KNN Regression and linear regression models, drawing insights on prediction accuracy and model efficacy.

Time Series Classification for Human Activity Recognition

Key Technologies: time series data, logistic regression, L1-penalized logistic regression, multi-class classification

2022

Leveraged time series data from a Wireless Sensor Network to classify human activities, using datasets from the AReM repository, encompassing seven diverse activity types.

Conducted feature extraction from the time series data, researching and implementing time-domain features like minimum, maximum, mean, median, standard deviation, and quartiles.

Utilized logistic regression for binary classification, applying techniques like recursive feature elimination and assessing model performance through scatter plots, confusion matrices, ROC curves, and AUC metrics.

Enhanced model robustness by implementing L1-penalized logistic regression, optimizing for both time series split parameter (l) and regularization weight (λ), and comparing against traditional variable selection using p-values.

Executed multi-class classification using L1-penalized multinomial regression and Naïve Bayes classifiers (with Gaussian and Multinomial priors), evaluating and contrasting their performance for comprehensive activity classification.

Tree-Based Methods for Time Series Classification and Fault Detection.

Key Technologies: data imputation techniques, random forests, confusion matrix, ROC, AUC, L1-penalized logistic regression, XGBoost, SMOTE

2022

Address the APS Failure dataset that contains missing values. Investigate data imputation techniques and select at least one method to apply on the dataset.

Directly classify the dataset using random forests without compensating for class imbalance. Calculate and report the confusion matrix, ROC, AUC, and misclassification rates for training and test sets.

Research how class imbalance is tackled in random forests. After compensating the data, redo the classification step with random forests and compare the results with the uncompensated approach.

Implement L1-penalized logistic regression at each node and utilize XGBoost for model tree fitting. Use cross-validation to determine the regularization term and train the model for the APS dataset.

Pre-process the data using the SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance. Train the XGBoost model again and compare the results with the non-compensated scenario.

Multi-Class and Multi-Label Classification Using Support Vector Machines and K-Means Clustering.

Key Technologies: Anuran Calls (MFCCs), Gaussian kernels, hamming score/loss, L1-penalized SVMs, cross-validation, k-means clustering, Hamming distance

2022

Download the Anuran Calls (MFCCs) Data Set and allocate 70% for training.

Train separate SVMs for each label (Families, Genus, Species) with Gaussian kernels. Evaluate using exact match and hamming score/loss.

Implement L1-penalized SVMs with standardized attributes and determine SVM penalty using cross-validation.

Apply SMOTE or a similar method to address class imbalance and evaluate the classifiers.

Implement k-means clustering on the full dataset, determine majority labels for each cluster, and compute the Hamming distance, score, and loss.

Zeyu Li