free amp template

Topic Modeling with Google Cloud Run

This project uses BERTopic, Docker, Flask, and Google Cloud services like Cloud Run to read text data from Gmail or Google Drive using the Google Workspace APIs.

Mobirise
Introduction - Text analysis

Organizations often have access to vast amounts of text data. However, there is difficulty converting the raw data into coherent insights. In this example, I use text analysis to explore a year's worth of privacy-related news summaries. Based on this, I create dynamic visualization and outline some main topics over the year. This project uses BERTopic, Docker, Flask, and Google Cloud services like Cloud Run to read Gmail or Google Drive text data.

The interface for the web application asks the user to tag the emails they are interested in. The next step is authentication to verify that they will allow my application to process their documents. 

Flexible content

The CloudRun application asks for access to GMail or Google Drive. It then uploads the contents for processing using BERTopic.

Mobirise
Custom parsers

In my example, I parse data from the IAPP (International Association of Privacy Professionals). To extract the data correctly, I wrote a custom parser that can accurately separate the multiple news stories published in each daily newsletter.

Mobirise
Text blocks

My custom IAPP text filter ignores ads and examines only the separate paragraphs. The results show topic clusters showing how documents are related and how they appear over time. See the video below for more details!

See the final data visualizations!

Contact: Jon MacKay