TL;DR

This page details some exploratory data analysis I did using Twitter data. I detail the approach and code I used to interpret large numbers of tweets to answer a social science problem. In this example, I focus on Canadian data but I also analyzed Twitter data from the USA, the UK and New Zealand.

In the next couple of sections, I briefly outline the background to the problem our paper addresses. You can read the academic overview here.

Skip to the methodology section to get a sense of my code.

I conduct Natural Language Processing to find major topics in discussions. I interpreted each of the detected topics and then verified my findings by prompting Chat-GPT by OpenAI.

With confirmation of the substantive topics being discussed, I then inferred some higher level themes these topics belong to. I used sentence embeddings to infer which thematic area each tweet was closest to. The results allowed me to visualize how conversations changed about Huawei over time.

Problem

This section details how I used Twitter data to understand better how the Five Eyes security alliance worked to deny Huawei access to their respective markets. The Five Eyes is short-hand for the United States, United Kingdom, Australia, Canada, and New Zealand – five nations with strong historical and cultural bonds that led to a national security information-sharing agreement.

In 2018, Australia’s security services became concerned that the Chinese government could use Huawei’s 5G telecommunications equipment to intercept national security information sent over their network equipment. This led to an Australian ban on Huawei equipment throughout the country. Australian leaders informed their US counterparts of these concerns.

US President Donald Trump followed with an American ban on Huawei and ZTE equipment in 2019. The American leadership encouraged the UK, Canada, and New Zealand to follow suit, along with their Western European allies. Despite the requests of the US, the remaining members of the Five Eyes took varying amounts of time before enacting their bans on Huawei’s 5G equipment.

Theory

We argue that smaller countries have strong incentives to ignore national security requests of their larger allies because of economic benefits open trade brings. Such incentives encourage smaller allies to delay and hedge by finding alternatives to appease the larger countries they are involved with (i.e., the USA and China.)

This research is relevant to Weaponized Interdependence (WI) theory. My co-author, Bill Bendix, and I argue that WI theory is important, but it tends to over-simplify in its explanations of how a major power like the USA can harness the network of relationships between countries to extract compliance from its adversaries. To do this, we use a case-study approach to examine how complicated it is for the US to gain compliance from even its closest allies. We use the USA asking its closest allies to ban Huawei's 5G equipment in their networks in a series of case studies.

The US government has made several public announcements to its alliance members saying that they should ban Huawei from their national networks. We hypothesize that these public announcements were necessary so governments and businesses in allied nations could prepare for a ban. For example, businesses might want to prepare their supply chains for a trade war with China; governments may want to consider the many domestic impacts such a ban could bring.

Methodology

Typically, academic papers would outline just the relevant work that was done for the paper. Good papers are like icebergs in the sense that you only get to see about 10% of the real work that was done. The vast majority of work will never see the light of day!

In this section, I will include much detail I wouldn’t usually put in a paper. In fact, our first paper on Huawei will not even focus directly on Twitter data. (But a second paper will!) Instead, I used Twitter conversations to help me understand:

(a) What are the major topics under discussion in each of the Five Eyes countries regarding Huawei?
(b) How do the events between the US and the allied country eventually affect the decision to ban Huawei?
(c) How do domestic events affect an ally's decision to ban Huawei?

Exploratory Analysis of Tweets

The social media site Twitter is a natural place for people to discuss news. Twitter is heavily used by politicians, journalists, and the broader public as a place to discuss current events. We expected that Twitter discussions about Huawei would cover the significant events concerning the lead-up to any nation’s ban.

My first step was identifying Twitter accounts in one of the Five Eyes countries. I used the location field that users can choose to fill out to determine their country. Users that did not have the location field filled out were excluded from the analysis. I wanted to be sure that a person was from a given country before I included the Tweet in that country’s collection.

BERTopic results

I created a Bidirectional Encoder Representations from Transformers (BERT) topic model for each country. I used the BERTopic Python package developed by Maarten Grootendorst (https://maartengr.github.io/BERTopic/). This package uses c-TF-IDF to create topic clusters from the data. To simplify the analysis, I used the Tweets from the six months leading up to the country’s ban on Huawei, or the most recent six months if they did not ban the company.

You can see the Jupyter notebook of this early analysis below. (For some reason I got an error if I tried to save as html, so I show PDF below.) For simplicity, I will focus on Canadian events.

The analysis tools provided with BERTopic were handy to understand the basic contents of the discussions. However, I wanted to better understand the substantive topics being discussed in each cluster.

Moving beyond BERTopic - Verifying results with OpenAI

BERTopic showed almost 20 topic clusters in the Canadian Tweets. I saved these clusters and summarized the ten most retweeted tweets in each topic cluster. I extracted the URLs that the Tweet linked to.

In almost all cases, a link to a newspaper article was being commented upon in the tweet. I resolved these URLs and read the source article that was being discussed. This process gave me a good idea of the substantive topic being discussed.

I also verified my assessments using the OpenAI API and got summaries of each topic.

Here are examples of the resulting Excel file I generated along with the OpenAI analysis of the topics discussed.

Inferring broader themes from the topic clusters

When I completed the initial analysis of tweets for the USA, UK, Canada, and New Zealand, several themes emerged. I found that I could describe the clusters of topics as part of a few broad themes. These were discussions related to Business concerns, Political topics, Security topics, Technology topics, discussions about other Allies, and more general discussions about Huawei PR (often press releases by the company.) I developed a set of keywords to define each of these themes.

Figure 1- Simplified representation of words as embeddings. From https://docs.cohere.com/docs/text-embeddings

Aside: Word embeddings

One of the most fascinating aspects of language models is word embeddings. Language models don’t deal with text but instead represent words as a vector of numbers. You can think of these numbers as representing a coordinate in some n-dimensional space. Language models will assign words a vector of numbers based on the context in which a word appears relative to other words. The numeric values aren’t interpretable by humans directly. The model will determine these values using its hidden layer weights.

But to give you the gist of the idea, consider Figure 1 below. Where would we put a cow in the two-dimensional embedding space – would we choose placement (a), (b), or (c)? Well, a cow is similar to a calf, so it makes sense to be close to that word, but it’s also an older version of the same animal, so it is analogically similar to what a dog is relative to a puppy. In this case, point (c) makes sense.

Again, the numeric values of the language models assigned to words (or sentences) are not interpretable by humans. But these values are still beneficial! We can use embeddings to see how similar two words are (i.e., to see if they occurred in similar contexts in the training data). We can also apply operations to word embeddings to find what words are similar. I use this technique to determine which tweets are similar to my inferred themes.

Using embeddings to detect themes

I extracted embeddings based on keywords related to each theme and compared these theme embeddings to the embeddings of all Tweets. This comparison allowed me to categorize whether any given tweet belonged to a general theme.

I verified my choice of themes by seeing how well these themes aligned with the topic clusters. Table 1 shows Canadian tweets and how they align with the themes. (Note that in this early version of the table.) Table 1 serves as a sanity check on the themes I have outlined. I found the correspondence between the topic subjects and themes reassuring.

The topics show a broad alignment with the themes I described. For example, topics 0, 1, and 2 all deal with Jean Charest – a Conservative leadership hopeful who previously consulted for Huawei. These topics discuss his politics and his business ties to the company. Ultimately, Charest's relationship to Huawei was an important reason why he lost the leadership of the Canadian Conservative Party. Similarly, topic 7 details Canadian telecommunications companies asking the government for bailouts if a ban is enacted. (The topic labeled -1 can be ignored – this was unclustered tweets.)

Changing themes over time

Below (Figure 2) is a stacked area chart visualization of these themes over time in Canada. The vertical axis is the number of tweets, and the horizontal axis is the date. Overlayed at each point in time are major events that occurred in the world or to an allied nation that may be relevant.

Figure 2 shows some interesting patterns. Until the arrest of Huawei CFO Meng Wanzhou in January 2019, most discussion about Huawei on Twitter was related to Huawei PR or general business discussion. Just before the arrest and afterward, there was extensive discussion about other topics concerning Huawei: politics, security, technology, and allies. This event marks a broad shift in the types of topics discussed. In essence, the arrest of Meng Wanzhou and the subsequent arrest of two innocent Canadians by the Chinese changed the way people thought about Huawei. It may have been the beginning of the end for the company in terms of they way the public thought about them.

Findings and Future Work

This approach has been very helpful in understanding the main debates that each of the allied countries -- the UK, Canada and New Zealand have had before they instituted a ban on Huawei 5G telecommunications equipment.

Each of these countries have engaged in some degree of hedging. For example, the UK initially argued that it would allow Huawei's equipment outside the 'core' of its network, only later to change its position. Canada's Prime Minister, Justin Trudeau, avoided making a firm decision until it was clear that delaying was becoming a domestic political liability. New Zealand rejected a 5G proposal from Huawei but still has not formally banned Huawei from its telecommunications infrastructure.

The full details of these cases inform a detailed theory about a short-coming in Weaponized Interdependence that is the subject of a forthcoming paper. The thematic analysis will, I hope, become the basis for a different paper.

Technically Speaking:
How the Five Eyes Alliance Tried to Ban Huawei

TL;DR

Problem

Theory

Methodology

Exploratory Analysis of Tweets

BERTopic results

Jupyter notebook 1 - Search for topics

Moving beyond BERTopic - Verifying results with OpenAI

Inferring broader themes from the topic clusters

Aside: Word embeddings

Using embeddings to detect themes

Changing themes over time

Jupyter notebook 2 - Searching for topics

Findings and Future Work

Technically Speaking: How the Five Eyes Alliance Tried to Ban Huawei

TL;DR

Problem

Theory

Methodology

Exploratory Analysis of Tweets

BERTopic results

Jupyter notebook 1 - Search for topics

Moving beyond BERTopic - Verifying results with OpenAI

Inferring broader themes from the topic clusters

Aside: Word embeddings

Using embeddings to detect themes

Changing themes over time

Jupyter notebook 2 - Searching for topics

Findings and Future Work

Technically Speaking:
How the Five Eyes Alliance Tried to Ban Huawei