Flight Delay Prediction
I designed a two-stage predictive machine learning engine to forecast the on-time performance of flights at 15 different USA airports using 2016-2017 data. This project involved data cleaning, pre-processing, and merging flight and weather data. To address flight delays, the engine first classifies whether a flight will be delayed and then predicts the arrival delay in minutes for delayed flights. Due to class imbalance favoring on-time flights, SMOTE sampling was used before classification. The Random Forest classifier achieved the best performance with an F1 score of 0.78 and a Recall of 0.74 for delayed flights. For regression, the Random Forest regressor yielded an MAE of 7.178 minutes, an RMSE of 11.283 minutes, and an R-squared score of 0.977.