This project uses BERTopic, Docker, Flask, and Google Cloud services like Cloud Run to read text data from Gmail or Google Drive using the Google Workspace APIs.
Organizations often have access to vast amounts of text data. However, there is difficulty converting the raw data into coherent insights. In this example, I use text analysis to explore a year's worth of privacy-related news summaries. Based on this, I create dynamic visualization and outline some main topics over the year. This project uses BERTopic, Docker, Flask, and Google Cloud services like Cloud Run to read Gmail or Google Drive text data.
The interface for the web application asks the user to tag the emails they are interested in. The next step is authentication to verify that they will allow my application to process their documents.
The CloudRun application asks for access to GMail or Google Drive. It then uploads the contents for processing using BERTopic.
In my example, I parse data from the IAPP (International Association of Privacy Professionals). To extract the data correctly, I wrote a custom parser that can accurately separate the multiple news stories published in each daily newsletter.
My custom IAPP text filter ignores ads and examines only the separate paragraphs. The results show topic clusters showing how documents are related and how they appear over time. See the video below for more details!
Introduction to the application
This video gives an overview of the Google Cloud application. The application uses a Bert Topic Model to cluster documents stored in Google Drive or GMail.
The videos that follow show the application in action and the results from the Bert Topic Model clustering.
The application user interface
Output from the application