Mobile communications have allowed people to work and buy online from anywhere. Wireless technology is critical for national security. But which country is home to the leading companies that develop this critical technology? Using this R notebook, I’ll use US patent data to show an example of exploratory data visualization. For simplicity, we won’t get into summary statistics but instead stick to the high-level outlines of the story.
The U.S. Patent Office has released all its patent data to the public (Source: Patentsview API.) This notebook will focus on some primary exploratory data visualizations here to examine how patents in the wireless telecommunications industry evolved.
This notebook downloads the data from the U.S. patent database. A complete listing of the patent classifications is available. For this brief overview, I’m looking at “H04W” patent category, which is “Wireless communications networks.” The code below will query the U.S. Patent Office database for the classifications under this category and download the results.
library(patentsview) # data source
library(tidyverse) # data manipulation
library(highcharter) # interactive visualizations
setwd("D:/jgmackay.com/code/articles/patents/code")
library(patentsview)
# Normally we would build a query object. However, the API wants the text representation. I've already generated that elsewhere so I just show the text query.
query_custom = '{"_and":[{"_gte":{"patent_date":"2000-1-1"}},{"_or":[{"_eq":{"cpc_subgroup_id":"H04W4/00"}},{"_eq":{"cpc_subgroup_id":"H04W8/00"}},{"_eq":{"cpc_subgroup_id":"H04W12/00"}},{"_eq":{"cpc_subgroup_id":"H04W16/00"}},{"_eq":{"cpc_subgroup_id":"H04W24/00"}},{"_eq":{"cpc_subgroup_id":"H04W28/00"}},{"_eq":{"cpc_subgroup_id":"H04W36/00"}},{"_eq":{"cpc_subgroup_id":"H04W40/00"}},{"_eq":{"cpc_subgroup_id":"H04W48/00"}},{"_eq":{"cpc_subgroup_id":"H04W60/00"}},{"_eq":{"cpc_subgroup_id":"H04W64/00"}},{"_eq":{"cpc_subgroup_id":"H04W68/00"}},{"_eq":{"cpc_subgroup_id":"H04W72/00"}},{"_eq":{"cpc_subgroup_id":"H04W74/00"}},{"_eq":{"cpc_subgroup_id":"H04W76/00"}},{"_eq":{"cpc_subgroup_id":"H04W80/00"}},{"_eq":{"cpc_subgroup_id":"H04W84/00"}},{"_eq":{"cpc_subgroup_id":"H04W88/00"}},{"_eq":{"cpc_subgroup_id":"H04W92/00"}},{"_eq":{"cpc_subgroup_id":"H04W99/00"}}]}]}'
fields <- c("patent_number", "assignee_organization",
"patent_num_cited_by_us_patents", "app_date", "patent_date",
"assignee_total_num_patents",
#"forprior_country",
"assignee_id", "assignee_longitude", "assignee_latitude"
#,"citedby_patent_number","cited_patent_number","patent_title"
)
# Comment out if you don't need to re-fetch the data!
#pv_out <- search_pv(query = query_custom, fields = fields, all_pages = TRUE)
#save(pv_out, file = "data/pv_out.rda")
# Load the pre-fetched data
load("data/pv_out.rda")
# we have to unnest the data frames that are stored in the assignee list column:
dl <- unnest_pv_data(data = pv_out$data, pk = "patent_number")
save(dl, file = "data/dl.rda", compress = "xz")
#load("data/dl.rda")
For the visualizations, companies who own patents are colored according to their country of origin. The only exception is the U.S. Big Tech companies (Google, Apple, Amazon, Meta, Microsoft) because they tend to disrupt established markets. The colors will help highlight the dynamics of different corporations that are national champions.
set_colors <- function(df) {
df$color <- "#18BC9C" # default
# Canada / RIM / BB
df$color[get_org_loc( "research in motion", df)] = "#eee600"
df$color[get_org_loc( "blackberry", df)] = "#eee600"
df$color[get_org_loc( "nortel", df)] = "#eee600"
# China
df$color[get_org_loc( "huawei", df)] = "#00ffff"
df$color[get_org_loc( "zte", df)] = "#00ffff"
# Europe
df$color[get_org_loc( "ericsson", df)] = "#0000cd"
df$color[get_org_loc( "nokia", df)] = "#0000cd"
df$color[get_org_loc( "alcatel", df)] = "#0000cd"
df$color[get_org_loc( "philips", df)] = "#0000cd"
df$color[get_org_loc( "siemens", df)] = "#0000cd"
df$color[get_org_loc( "vodaphone", df)] = "#0000cd"
# USA
df$color[get_org_loc( "qualcomm", df)] = "#69359c"
df$color[get_org_loc( "AT&T", df)] = "#69359c"
df$color[get_org_loc( "international business machines", df)] = "#69359c"
df$color[get_org_loc( "westell", df)] = "#69359c"
df$color[get_org_loc( "sprint", df)] = "#69359c"
df$color[get_org_loc( "cisco", df)] = "#69359c"
df$color[get_org_loc( "broadcom", df)] = "#69359c"
df$color[get_org_loc( "intel", df)] = "#69359c"
df$color[get_org_loc( "hewlett", df)] = "#69359c"
df$color[get_org_loc( "motorola", df)] = "#69359c"
df$color[get_org_loc( "interdigital", df)] = "#69359c"
# American Big Tech
df$color[get_org_loc( "google", df)] = "#fc0fc0"
df$color[get_org_loc( "facebook", df)] = "#fc0fc0"
df$color[get_org_loc( "meta", df)] = "#fc0fc0"
df$color[get_org_loc( "amazon", df)] = "#fc0fc0"
df$color[get_org_loc( "apple", df)] = "#fc0fc0"
df$color[get_org_loc( "microsoft", df)] = "#fc0fc0"
# South Korea
df$color[get_org_loc( "samsung", df)] = "#ffa500"
df$color[get_org_loc( "ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE", df)] = "#ffa500"
df$color[get_org_loc( "LG electronics", df)] = "#ffa500"
# Japan
df$color[get_org_loc( "sony", df)] = "#ff0000"
df$color[get_org_loc( "fujitsu", df)] = "#ff0000"
df$color[get_org_loc( "konica", df)] = "#ff0000"
df$color[get_org_loc( "NEC corp", df)] = "#ff0000"
df$color[get_org_loc( "sharp", df)] = "#ff0000"
df$color[get_org_loc( "sanyo", df)] = "#ff0000"
df$color[get_org_loc( "Sumitomo", df)] = "#ff0000"
df$color[get_org_loc( "HITACHI", df)] = "#ff0000"
df$color[get_org_loc( "KYOCERA", df)] = "#ff0000"
df$color[get_org_loc( "CANON", df)] = "#ff0000"
df$color[get_org_loc( "AGOOP", df)] = "#ff0000"
df$color[get_org_loc( "KYOCERA", df)] = "#ff0000"
df$color[get_org_loc( "mitsubishi", df)] = "#ff0000"
df$color[get_org_loc( "panasonic", df)] = "#ff0000"
return(df)
}
get_org_loc <- function(org_name,df){
loc <- grepl( org_name, df$assignee_organization, ignore.case = TRUE)
return(loc)
}
Now that we have the data let’s identify the top assignees. Assignees are the companies that ultimately own the rights to the patent. The code below filters the patent data to isolate the top patent-holding assignee companies.
The summary table, below, contains four columns: - The name of the company (assignee), - the number of patents reported for the categories we extracted from the patent office, and - the total number of patents held by that assignee. - Also calculated is the fraction of patents the company created that belong to the categories we are examining. In other words, this represents how specialized the company is in this narrow field of research.
For example, Qualcomm has the highest number of patents in the “Wireless communications networks” category currently being examined. Despite this, these patents only represent 5.6% of their total patenting activity. AT&T Mobility has over 12% of its patents in this area. Interestingly, Google holds 1% of its patents, and IBM has only 0.2%. The low percentage indicates that these corporations are R&D focused with patents in many areas. Of course, the percentage of patents is an imperfect measurement since it penalizes large R&D companies with many patents, while small firms with a single patent will have 100% percent.
library(tidyverse)
# We create a data frame with the top 75 assignees:
top_asgns <-
dl$assignees %>%
filter(!is.na(assignee_organization)) %>% # we filter out those patents that are assigned to an inventor without an organization (we want only organizations)
mutate(ttl_pats = as.numeric(assignee_total_num_patents)) %>% #we create a numeric column (ttl_pats) with total number of patents of assignee
group_by(assignee_organization, ttl_pats) %>% # we group assignees by total number of patents (ttl_pats)
summarise(db_pats = n()) %>% # count the number of patents in the subset of patents we selected for
mutate(frac_db_pats = round(db_pats / ttl_pats, 3)) %>% #the fraction of patents the assignee has in this category (specialization)
ungroup() %>%
select(c(1, 3, 2, 4)) %>%
arrange(desc(db_pats)) %>%
slice(1:75) # change this number to adjust the top N companies
library(formattable)
formattable( top_asgns %>% slice(1:20) )
assignee_organization | db_pats | ttl_pats | frac_db_pats |
---|---|---|---|
QUALCOMM Incorporated | 1984 | 35540 | 0.056 |
Telefonaktiebolaget LM Ericsson (publ) | 1295 | 23101 | 0.056 |
Samsung Display Co., Ltd. | 1227 | 155778 | 0.008 |
LG ELECTRONICS INC. | 942 | 44975 | 0.021 |
HUAWEI TECHNOLOGIES CO. LTD. | 936 | 22092 | 0.042 |
Intel Corporation | 564 | 49717 | 0.011 |
AT&T Intellectual Property I, L.P. | 514 | 15777 | 0.033 |
Apple Inc. | 482 | 33004 | 0.015 |
Cisco Technology, Inc. | 475 | 18824 | 0.025 |
SONY CORPORATION | 467 | 60545 | 0.008 |
Nokia Corporation | 450 | 8318 | 0.054 |
NEC CORPORATION | 417 | 34293 | 0.012 |
INTERNATIONAL BUSINESS MACHINES CORPORATION | 366 | 160159 | 0.002 |
AT & T Mobility II LLC | 350 | 2768 | 0.126 |
Motorola, Inc. | 288 | 20387 | 0.014 |
Broadcom Corporation | 283 | 11171 | 0.025 |
GOOGLE LLC | 268 | 27846 | 0.010 |
NTT DOCOMO, INC. | 264 | 4284 | 0.062 |
BlackBerry Limited | 259 | 5786 | 0.045 |
Verizon Patent and Licensing Inc. | 237 | 6017 | 0.039 |
# This code creates a data frame with patent counts by application year for each assignee.
data <-
top_asgns %>% slice(1:15) %>%
select(-contains("pats")) %>%
slice(1:75) %>% #we filter top N
inner_join(dl$assignees) %>%
inner_join(dl$applications) %>%
mutate(app_yr = as.numeric(substr(app_date, 1, 4))) %>% #we create a new column taking only the year form the date
group_by(assignee_organization, app_yr) %>%
count()
So far, the analysis shows some inconsistencies. For example, Research In Motion (RIM) and BlackBerry are names for the same company but are listed as assignees. This same thing happens with other related companies. Therefore, these visualizations are making some companies appear to be less productive than they are. Despite these limitations, we’ll continue in order to get a sense of the data.
The plot below shows the number of patents each company (assignee) produced each year. The number of patents produced each year is on the vertical axis, and the date is on the horizontal axis. We can see that Qualcomm (green line) is one of the most productive companies across time. The most striking change over time is due to Huawei’s patenting rate (red line). After 2012, Huawei’s patenting rate rose steeply until 2019. (In 2019, the USA formerly banned Huawei’s telecommunications equipment.)
library(highcharter)
# Optional: change the date range
df <- data #%>% dplyr::filter(app_yr >= 2010) %>% filter( app_yr <= 2015)
# Optional: Create the cumulative sum of patents by organization. This highlights
# total organizational output rather than annual productivity
df <- df %>% group_by(assignee_organization) %>% mutate(cs=cumsum(n))
df %>% filter(app_yr>=2000) %>%
hchart(
type = "line",
hcaes(x = app_yr,
y = n, # n or use 'cs' to see cumulative
group = assignee_organization)) %>%
hc_plotOptions(series = list(marker = list(enabled = FALSE))) %>%
hc_xAxis(title = list(text = "Published applications")) %>%
hc_yAxis(title = list(text = "Patents")) %>%
hc_title(text = "Top assignees patenting") %>%
hc_subtitle(text = "Annual patent applications through time")
We can assume that each patent represents a unit of knowledge a company gains. It is not clear how generally valuable any patent will be. Patent citations are measures of how often other patents cite a given patent. To get the top-cited assignees, we use a ranking function to rank patents by their citation counts and take the average for each year.
percent_rank2 <- function(x) {
(rank(x, ties.method = "average", na.last = "keep") - 1) / (sum(!is.na(x)) - 1)
}
# Create a data frame with normalized citation rates and stats
asng_p_dat <-
dl$patents %>%
mutate(patent_yr = substr(patent_date, 1, 4)) %>%
group_by(patent_yr) %>%
mutate(perc_cite = percent_rank2(patent_num_cited_by_us_patents)) %>%
inner_join(dl$assignees) %>%
group_by(assignee_organization) %>%
summarise(mean_perc = mean(perc_cite), sd_perc=sd(perc_cite)) %>%
inner_join(top_asgns) %>%
arrange(desc(ttl_pats)) %>%
filter(!is.na(assignee_organization)) %>%
slice(1:50) %>%
as.data.frame()
asng_p_dat <- set_colors(asng_p_dat)
The bubble chart scatterplot shows how highly cited a company’s patents are on average. The bubble size is relative to the number of patents, and the position in the vertical axis is relative to the percentage of citations (highly cited organizations are positioned higher in the chart). The horizontal axis shows the number of patents the company owns.
Perhaps the most exciting finding from this plot is that the American Big Tech companies (Amazon, Microsoft, Google, Apple) have the highest citation rates on average despite being companies with relatively low numbers of wireless patents.
# Adapted from http://jkunst.com/highcharter/showcase.html
# I'm not totally sold on this representation, but we'll go with it for now.
hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = mean_perc, size = frac_db_pats,
##hchart(asng_p_dat, "scatter", hcaes(x = db_pats, y = frac_db_pats, size = mean_perc,
group = assignee_organization, color = color)) %>%
#hc_size(height=1800,width=2200) %>%
hc_xAxis(title = list(text = "Patents (log scale)"), type = "logarithmic",
allowDecimals = FALSE, endOnTick = TRUE, min=40) %>%
hc_yAxis(title = list(text = "Mean percentile of citation")) %>%
##hc_yAxis(title = list(text = "Fraction of patents")) %>%
hc_subtitle(text = "Most cited assignees", align = "center") %>%
##hc_subtitle(text = "Most cited assignees on average", align = "center") %>%
hc_add_theme(hc_theme_538()) %>%
hc_legend(enabled = FALSE)
The final visualization shows the locations of the assignees. The bubble size reflects the number of patents owned by each assignee.
This map visualization illustrates which countries are the most engaged in wireless technologies. Traditional US phone carriers have the most significant number of patents, but Big Tech companies also produce patents. Although Canada has RIM/BlackBerry and Nortel, these companies eventually left wireless research.
In Europe, Ericsson and Nokia dominate. South Korea (LG, Samsung) and Japan have companies with patents in East Asia. Finally, China’s Huawei dominates the country with its sheer amount of research related to wireless.
library(leaflet)
library(htmltools)
library(dplyr)
library(tidyr)
datad <-
pv_out$data$patents %>%
unnest(assignees) %>%
select(assignee_id, assignee_organization, patent_number,
assignee_longitude, assignee_latitude) %>%
group_by_at(vars(-matches("pat"))) %>%
mutate(num_pats = n()) %>%
ungroup() %>%
select(-patent_number) %>%
distinct() %>%
mutate(popup = paste0("<font color='Black'>",
htmlEscape(assignee_organization), "<br>Patents:",
num_pats, "</font>")) %>%
mutate_at(vars(matches("_l")), as.numeric) %>%
filter(!is.na(assignee_id))
datad <- set_colors(datad)
pd <- leaflet(datad) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addCircleMarkers(lng = ~assignee_longitude, lat = ~assignee_latitude,
popup = ~popup, ~sqrt(num_pats), color = ~color) # num_pats diameter
pd
This has been a visual exploration of wireless patenting activity from 2000 onwards. From this exploration we’ve discovered that:
The USA still dominates regarding companies and the number of patents registered. As one might expect, much of this patenting activity is due to wireless phone companies such as Verizon, Sprint, and AT&T Wireless. US hardware and software manufacturers like Motorola, Qualcomm (a wireless modem manufacturer), Intel, and Internet routing giant Cisco maintain large patent portfolios in wireless communications.
Canada’s telecommunications companies BlackBerry (RIM) and Nortel Networks were also once significant players with large patent portfolios. However, Nortel is defunct today, and BlackBerry is no longer competing in wireless.
Europe also maintains large patent portfolios owned by Nokia, Ericsson, and, to a lesser extent, Phillips, Alcatel Lucent, and Siemens.
In East Asia, Japanese companies are also active in patenting (Kyocera, Sharp, Fujitsu, Sony, and Canon all maintain significant patent portfolios.) South Korean manufacturers LG and Samsung also have strong wireless portfolios. However, the rise of the Chinese company Huawei in wireless technology is the most striking example of growth.
American Big Tech firms also appear interestingly in the data visualizations. Even though these companies are not focused on wireless communications, they have impacted the space. Amazon’s wireless patents are some of the most cited among the 50 largest assignees that we looked at in this analysis. This pattern indicates that the Big Tech firms are building innovative intellectual property that other industry companies cite because of its relevance.