The following are mainly descriptions of data science projects that I've done for work or contracting that I cannot publish the details for. For a look at more projects with more details that I have done, go to the main page.
Economic Impact Model for Forestry Industry (2024)
• Developed a Python-based model to measure the economic impact of increased demand in forestry and wood processing industries.
• Utilized Jupyter Notebooks for interactive data exploration and model development.
• Created dynamic visualizations using matplotlib and seaborn to illustrate marginal increases in demand across economic sectors.
• Uncovered errors in previous industry importance estimates, leading to revised economic projections.
• Technologies used: Python, Jupyter Notebooks, pandas, numpy, matplotlib, seaborn.
Insurance Fraud Detection Approaches (2018)
• Built machine learning models in R to identify potential insurance fraud cases.
• Conducted qualitative interviews across departments to gain insights into fraud patterns.
• Integrated qualitative insights with quantitative data.
• Developed classifier models to predict fraudulent activity on the website.
• Created an interactive dashboards using R Shiny.
• Result: Improved fraud detection efficiency, leading to significant cost savings.
• Technologies used: SQL, R, R Shiny, Plotly, SAS, Hadoop.
Customer Segmentation for Targeted Marketing (2018)
• Performed advanced customer segmentation analysis using K-means clustering and hierarchical clustering algorithms.
• Analyzed customer data, including demographics, region, and purchase history.
• Identified distinct customer segments with unique characteristics and purchasing patterns.
• Result: Interactive dashboard for more data-centric strategic sales and marketing campaigns.
• Technologies used: SQL, R Shiny, leaflet, ggplot2.
Political Party Ideological Analysis - Academic Research Project (2013-2015)
• Applied machine learning methods to analyze ideological rifts within the US Republican Party
• Collected and preprocessed large volumes of ratings data from many independent interest groups.
• Used a network segmentation algorithm to classify politicians based on their ideological stances.
• Used regression models to test hypotheses about groups.
• Results: Published findings in a top-tier political science journal. Published op-ed in Globe and Mail newspaper.
• Technologies used: R, Python, scikit-learn, matplotlib
Regional Employment Pattern Analysis (2008)
• Developed regression model to analyze employment patterns across Ontario regions and industries.
• Collected and cleaned data from multiple government sources.
• Performed time series analysis to identify trends and seasonal patterns in employment data.
• Created geographical visualizations using R and QGIS to illustrate spatial employment patterns.
• Results: Presented findings to senior leadership, informing regional economic development strategies.
• Technologies used: R, QGIS.