Which country is leading in wireless technology?

Mobile communications have allowed people to work and buy online from anywhere. Wireless technology is critical for national security. But which country is home to the leading companies that develop this critical technology? Using this R notebook, I’ll use US patent data to show an example of exploratory data visualization. For simplicity, we won’t get into summary statistics but instead stick to the high-level outlines of the story.

Data source

The U.S. Patent Office has released all its patent data to the public (Source: Patentsview API.) This notebook will focus on some primary exploratory data visualizations here to examine how patents in the wireless telecommunications industry evolved.

This notebook downloads the data from the U.S. patent database. A complete listing of the patent classifications is available. For this brief overview, I’m looking at “H04W” patent category, which is “Wireless communications networks.” The code below will query the U.S. Patent Office database for the classifications under this category and download the results.

library(patentsview) # data source
library(tidyverse) # data manipulation
library(highcharter) # interactive visualizations
setwd("D:/jgmackay.com/code/articles/patents/code")
library(patentsview)

# Normally we would build a query object. However, the API wants the text representation. I've already generated that elsewhere so I just show the text query.

query_custom = '{"_and":[{"_gte":{"patent_date":"2000-1-1"}},{"_or":[{"_eq":{"cpc_subgroup_id":"H04W4/00"}},{"_eq":{"cpc_subgroup_id":"H04W8/00"}},{"_eq":{"cpc_subgroup_id":"H04W12/00"}},{"_eq":{"cpc_subgroup_id":"H04W16/00"}},{"_eq":{"cpc_subgroup_id":"H04W24/00"}},{"_eq":{"cpc_subgroup_id":"H04W28/00"}},{"_eq":{"cpc_subgroup_id":"H04W36/00"}},{"_eq":{"cpc_subgroup_id":"H04W40/00"}},{"_eq":{"cpc_subgroup_id":"H04W48/00"}},{"_eq":{"cpc_subgroup_id":"H04W60/00"}},{"_eq":{"cpc_subgroup_id":"H04W64/00"}},{"_eq":{"cpc_subgroup_id":"H04W68/00"}},{"_eq":{"cpc_subgroup_id":"H04W72/00"}},{"_eq":{"cpc_subgroup_id":"H04W74/00"}},{"_eq":{"cpc_subgroup_id":"H04W76/00"}},{"_eq":{"cpc_subgroup_id":"H04W80/00"}},{"_eq":{"cpc_subgroup_id":"H04W84/00"}},{"_eq":{"cpc_subgroup_id":"H04W88/00"}},{"_eq":{"cpc_subgroup_id":"H04W92/00"}},{"_eq":{"cpc_subgroup_id":"H04W99/00"}}]}]}'

fields <- c("patent_number", "assignee_organization",
            "patent_num_cited_by_us_patents", "app_date", "patent_date",
            "assignee_total_num_patents", 
            #"forprior_country", 
            "assignee_id", "assignee_longitude", "assignee_latitude"
            #,"citedby_patent_number","cited_patent_number","patent_title"
            )

# Comment out if you don't need to re-fetch the data!
#pv_out <- search_pv(query = query_custom, fields = fields, all_pages = TRUE) 

#save(pv_out, file = "data/pv_out.rda")

# Load the pre-fetched data
load("data/pv_out.rda")

# we have to unnest the data frames that are stored in the assignee list column:
dl <- unnest_pv_data(data = pv_out$data, pk = "patent_number")
save(dl, file = "data/dl.rda", compress = "xz")
#load("data/dl.rda")

Visualizations for exploratory data analysis

For the visualizations, companies who own patents are colored according to their country of origin. The only exception is the U.S. Big Tech companies (Google, Apple, Amazon, Meta, Microsoft) because they tend to disrupt established markets. The colors will help highlight the dynamics of different corporations that are national champions.

set_colors <- function(df) {
  df$color <- "#18BC9C" # default
  # Canada / RIM / BB
  df$color[get_org_loc( "research in motion", df)] = "#eee600"
  df$color[get_org_loc( "blackberry", df)] = "#eee600"
  df$color[get_org_loc( "nortel", df)] = "#eee600"

  # China
  df$color[get_org_loc( "huawei", df)] = "#00ffff"
  df$color[get_org_loc( "zte", df)] = "#00ffff"
  
  # Europe
  df$color[get_org_loc( "ericsson", df)] = "#0000cd"
  df$color[get_org_loc( "nokia", df)] = "#0000cd"
  df$color[get_org_loc( "alcatel", df)] = "#0000cd"
  df$color[get_org_loc( "philips", df)] = "#0000cd"
  df$color[get_org_loc( "siemens", df)] = "#0000cd"
  df$color[get_org_loc( "vodaphone", df)] = "#0000cd"

  # USA
  df$color[get_org_loc( "qualcomm", df)] = "#69359c"
  df$color[get_org_loc( "AT&T", df)] = "#69359c"
  df$color[get_org_loc( "international business machines", df)] = "#69359c"
  df$color[get_org_loc( "westell", df)] = "#69359c"
  df$color[get_org_loc( "sprint", df)] = "#69359c"
  df$color[get_org_loc( "cisco", df)] = "#69359c"
  df$color[get_org_loc( "broadcom", df)] = "#69359c"
  df$color[get_org_loc( "intel", df)] = "#69359c"
  df$color[get_org_loc( "hewlett", df)] = "#69359c"
  df$color[get_org_loc( "motorola", df)] = "#69359c"
  df$color[get_org_loc( "interdigital", df)] = "#69359c"
  # American Big Tech
  df$color[get_org_loc( "google", df)] = "#fc0fc0"
  df$color[get_org_loc( "facebook", df)] = "#fc0fc0"
  df$color[get_org_loc( "meta", df)] = "#fc0fc0"
  df$color[get_org_loc( "amazon", df)] = "#fc0fc0"
  df$color[get_org_loc( "apple", df)] = "#fc0fc0"
  df$color[get_org_loc( "microsoft", df)] = "#fc0fc0"

  # South Korea
  df$color[get_org_loc( "samsung", df)] = "#ffa500"
  df$color[get_org_loc( "ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE", df)] = "#ffa500"
  df$color[get_org_loc( "LG electronics", df)] = "#ffa500"

  # Japan
  df$color[get_org_loc( "sony", df)] = "#ff0000"
  df$color[get_org_loc( "fujitsu", df)] = "#ff0000"
  df$color[get_org_loc( "konica", df)] = "#ff0000"
  df$color[get_org_loc( "NEC corp", df)] = "#ff0000"
  df$color[get_org_loc( "sharp", df)] = "#ff0000"
  df$color[get_org_loc( "sanyo", df)] = "#ff0000"
  df$color[get_org_loc( "Sumitomo", df)] = "#ff0000"
  df$color[get_org_loc( "HITACHI", df)] = "#ff0000"
  df$color[get_org_loc( "KYOCERA", df)] = "#ff0000"
  df$color[get_org_loc( "CANON", df)] = "#ff0000"
  df$color[get_org_loc( "AGOOP", df)] = "#ff0000"
  df$color[get_org_loc( "KYOCERA", df)] = "#ff0000"
  df$color[get_org_loc( "mitsubishi", df)] = "#ff0000"
  df$color[get_org_loc( "panasonic", df)] = "#ff0000"
  return(df)
}

get_org_loc <- function(org_name,df){
  loc <- grepl( org_name, df$assignee_organization, ignore.case = TRUE)
  return(loc)
}

Identifying top assignees

Now that we have the data let’s identify the top assignees. Assignees are the companies that ultimately own the rights to the patent. The code below filters the patent data to isolate the top patent-holding assignee companies.

The summary table, below, contains four columns: - The name of the company (assignee), - the number of patents reported for the categories we extracted from the patent office, and - the total number of patents held by that assignee. - Also calculated is the fraction of patents the company created that belong to the categories we are examining. In other words, this represents how specialized the company is in this narrow field of research.

For example, Qualcomm has the highest number of patents in the “Wireless communications networks” category currently being examined. Despite this, these patents only represent 5.6% of their total patenting activity. AT&T Mobility has over 12% of its patents in this area. Interestingly, Google holds 1% of its patents, and IBM has only 0.2%. The low percentage indicates that these corporations are R&D focused with patents in many areas. Of course, the percentage of patents is an imperfect measurement since it penalizes large R&D companies with many patents, while small firms with a single patent will have 100% percent.

library(tidyverse)
# We create a data frame with the top 75 assignees:
top_asgns <-
  dl$assignees %>%
  filter(!is.na(assignee_organization)) %>% # we filter out those patents that are assigned to an inventor without an organization (we want only organizations)
  mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>% #we create a numeric column (ttl_pats) with total number of patents of assignee
  group_by(assignee_organization, ttl_pats) %>% # we group assignees by total number of patents (ttl_pats)
  summarise(db_pats = n()) %>% # count the number of patents in the subset of patents we selected for
  mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>% #the fraction of patents the assignee has in this category (specialization)
  ungroup() %>%
  select(c(1, 3, 2, 4)) %>%
  arrange(desc(db_pats)) %>%
  slice(1:75) # change this number to adjust the top N companies

library(formattable)
formattable( top_asgns %>% slice(1:20) )
assignee_organization db_pats ttl_pats frac_db_pats
QUALCOMM Incorporated 1984 35540 0.056
Telefonaktiebolaget LM Ericsson (publ) 1295 23101 0.056
Samsung Display Co., Ltd.  1227 155778 0.008
LG ELECTRONICS INC. 942 44975 0.021
HUAWEI TECHNOLOGIES CO. LTD. 936 22092 0.042
Intel Corporation 564 49717 0.011
AT&T Intellectual Property I, L.P. 514 15777 0.033
Apple Inc.  482 33004 0.015
Cisco Technology, Inc.  475 18824 0.025
SONY CORPORATION 467 60545 0.008
Nokia Corporation 450 8318 0.054
NEC CORPORATION 417 34293 0.012
INTERNATIONAL BUSINESS MACHINES CORPORATION 366 160159 0.002
AT & T Mobility II LLC 350 2768 0.126
Motorola, Inc.  288 20387 0.014
Broadcom Corporation 283 11171 0.025
GOOGLE LLC 268 27846 0.010
NTT DOCOMO, INC. 264 4284 0.062
BlackBerry Limited 259 5786 0.045
Verizon Patent and Licensing Inc.  237 6017 0.039
# This code creates a data frame with patent counts by application year for each assignee.

data <-
  top_asgns %>% slice(1:15) %>% 
  select(-contains("pats")) %>%
  slice(1:75) %>% #we filter top N
  inner_join(dl$assignees) %>%
  inner_join(dl$applications) %>%
  mutate(app_yr = as.numeric(substr(app_date, 1, 4))) %>% #we create a new column taking only the year form the date
  group_by(assignee_organization, app_yr) %>%
  count()

Data reliability

So far, the analysis shows some inconsistencies. For example, Research In Motion (RIM) and BlackBerry are names for the same company but are listed as assignees. This same thing happens with other related companies. Therefore, these visualizations are making some companies appear to be less productive than they are. Despite these limitations, we’ll continue in order to get a sense of the data.

Which companies are most productive?

The plot below shows the number of patents each company (assignee) produced each year. The number of patents produced each year is on the vertical axis, and the date is on the horizontal axis. We can see that Qualcomm (green line) is one of the most productive companies across time. The most striking change over time is due to Huawei’s patenting rate (red line). After 2012, Huawei’s patenting rate rose steeply until 2019. (In 2019, the USA formerly banned Huawei’s telecommunications equipment.)

library(highcharter)

# Optional: change the date range
df <- data #%>% dplyr::filter(app_yr >= 2010) %>% filter( app_yr <= 2015)

# Optional: Create the cumulative sum of patents by organization. This highlights
# total organizational output rather than annual productivity
df <- df %>% group_by(assignee_organization) %>% mutate(cs=cumsum(n))


df %>% filter(app_yr>=2000) %>% 
  hchart(
         type = "line", 
         hcaes(x = app_yr, 
               y = n, # n or use 'cs' to see cumulative
               group = assignee_organization)) %>%
  hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>%
  hc_xAxis(title = list(text = "Published applications")) %>%
  hc_yAxis(title = list(text = "Patents")) %>%
  hc_title(text = "Top assignees patenting") %>%
  hc_subtitle(text = "Annual patent applications through time")

Which companies have the most cited patents?

We can assume that each patent represents a unit of knowledge a company gains. It is not clear how generally valuable any patent will be. Patent citations are measures of how often other patents cite a given patent. To get the top-cited assignees, we use a ranking function to rank patents by their citation counts and take the average for each year.

percent_rank2 <- function(x) {
  (rank(x, ties.method = "average", na.last = "keep") - 1) / (sum(!is.na(x)) - 1)
}

# Create a data frame with normalized citation rates and stats
asng_p_dat <-
  dl$patents %>% 
  mutate(patent_yr = substr(patent_date, 1, 4)) %>%
  group_by(patent_yr) %>%
  mutate(perc_cite = percent_rank2(patent_num_cited_by_us_patents)) %>%
  inner_join(dl$assignees) %>%
  group_by(assignee_organization) %>%
  summarise(mean_perc = mean(perc_cite), sd_perc=sd(perc_cite)) %>%
  inner_join(top_asgns) %>%
  arrange(desc(ttl_pats)) %>%
  filter(!is.na(assignee_organization)) %>%
  slice(1:50) %>%
  as.data.frame()

asng_p_dat <- set_colors(asng_p_dat)

The bubble chart scatterplot shows how highly cited a company’s patents are on average. The bubble size is relative to the number of patents, and the position in the vertical axis is relative to the percentage of citations (highly cited organizations are positioned higher in the chart). The horizontal axis shows the number of patents the company owns.

Perhaps the most exciting finding from this plot is that the American Big Tech companies (Amazon, Microsoft, Google, Apple) have the highest citation rates on average despite being companies with relatively low numbers of wireless patents.

# Adapted from http://jkunst.com/highcharter/showcase.html
# I'm not totally sold on this representation, but we'll go with it for now.
hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = mean_perc, size = frac_db_pats,
##hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = frac_db_pats, size = mean_perc,
                                    group = assignee_organization, color = color)) %>%
  #hc_size(height=1800,width=2200) %>% 
  hc_xAxis(title = list(text = "Patents (log scale)"), type = "logarithmic",
           allowDecimals = FALSE, endOnTick = TRUE, min=40) %>%
  hc_yAxis(title = list(text = "Mean percentile of citation")) %>%
  ##hc_yAxis(title = list(text = "Fraction of patents")) %>%
  hc_subtitle(text = "Most cited assignees", align = "center") %>%
  ##hc_subtitle(text = "Most cited assignees on average", align = "center") %>%
  hc_add_theme(hc_theme_538()) %>%
  hc_legend(enabled = FALSE)

Location of inventions

The final visualization shows the locations of the assignees. The bubble size reflects the number of patents owned by each assignee.

This map visualization illustrates which countries are the most engaged in wireless technologies. Traditional US phone carriers have the most significant number of patents, but Big Tech companies also produce patents. Although Canada has RIM/BlackBerry and Nortel, these companies eventually left wireless research.

In Europe, Ericsson and Nokia dominate. South Korea (LG, Samsung) and Japan have companies with patents in East Asia. Finally, China’s Huawei dominates the country with its sheer amount of research related to wireless.

library(leaflet)
library(htmltools)
library(dplyr)
library(tidyr)

datad <-
  pv_out$data$patents %>%
    unnest(assignees) %>%
    select(assignee_id, assignee_organization, patent_number,
           assignee_longitude, assignee_latitude) %>%
    group_by_at(vars(-matches("pat"))) %>%
    mutate(num_pats = n()) %>%
    ungroup() %>%
    select(-patent_number) %>%
    distinct() %>%
    mutate(popup = paste0("<font color='Black'>",
                          htmlEscape(assignee_organization), "<br>Patents:",
                          num_pats, "</font>")) %>%
    mutate_at(vars(matches("_l")), as.numeric) %>%
    filter(!is.na(assignee_id))

datad <- set_colors(datad)

pd <- leaflet(datad) %>%
  addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
  addCircleMarkers(lng = ~assignee_longitude, lat = ~assignee_latitude,
                   popup = ~popup, ~sqrt(num_pats), color = ~color) # num_pats diameter
pd

Conclusions

This has been a visual exploration of wireless patenting activity from 2000 onwards. From this exploration we’ve discovered that:

Regional powers.

The USA still dominates regarding companies and the number of patents registered. As one might expect, much of this patenting activity is due to wireless phone companies such as Verizon, Sprint, and AT&T Wireless. US hardware and software manufacturers like Motorola, Qualcomm (a wireless modem manufacturer), Intel, and Internet routing giant Cisco maintain large patent portfolios in wireless communications.

Canada’s telecommunications companies BlackBerry (RIM) and Nortel Networks were also once significant players with large patent portfolios. However, Nortel is defunct today, and BlackBerry is no longer competing in wireless.

Europe also maintains large patent portfolios owned by Nokia, Ericsson, and, to a lesser extent, Phillips, Alcatel Lucent, and Siemens.

In East Asia, Japanese companies are also active in patenting (Kyocera, Sharp, Fujitsu, Sony, and Canon all maintain significant patent portfolios.) South Korean manufacturers LG and Samsung also have strong wireless portfolios. However, the rise of the Chinese company Huawei in wireless technology is the most striking example of growth.

American Big Tech firms also appear interestingly in the data visualizations. Even though these companies are not focused on wireless communications, they have impacted the space. Amazon’s wireless patents are some of the most cited among the 50 largest assignees that we looked at in this analysis. This pattern indicates that the Big Tech firms are building innovative intellectual property that other industry companies cite because of its relevance.