Retrieve patents linked to a set of grants

Prerequisites

In order to run this tutorial, please ensure that:

You have a valid Dimensions on Google BigQuery account and have configured a Google Cloud project.
You have a valid Dimensions API account.
You have some basic familiarity with Python and Jupyter notebooks.

This example uses the Dimensions database to determine the number of patents generated by a set of grants, providing another perspective on the impact of funding from different organizations. There are two types of linkages considered here:

Patents that directly acknowledge grants. Since the enactment of the Bayh-Dole Act in 1980, all recipients of U.S. federal funding are legally obligated to disclose government support that led to any inventions they produce. This is useful when funders are most interested in the patents directly supported by the grants.
Patents that cite publications supported by grants. This can be considered by funders who are interested in understanding patents that build less directly on research that they supported.

Note

Non-patent literatures in the patent references section are not only provided by the inventors, but also added by patent examiners during the patent examination process.

This tutorial is based on a Jupyter notebook that is available directly via GitHub.

# general import
import pandas as pd
import numpy as np
import sys, time, json
import plotly.express as px

# authentication happens via your browser
from google.colab import auth
auth.authenticate_user()
print('Authenticated')

Authenticated

# GBQ import
from google.cloud import bigquery

BQ_PROJECT_ID = "ds-gov-funder-gbq"  # remember to change this to your project ID
client = bigquery.Client(project=BQ_PROJECT_ID)

An example

For this example, we will be using the Dimensions database to extract patents that are linked to the grants funded by NSF Directorate for Engineering and started in 2015.

The patents, publications, and grants datasets will be used in the example to retrieve patents through two types of linkages.

Retrieve patents that directly acknowledged grants

First, let's get all patents that directly acknowledged the NSF Directorate for Engineering grants that started in 2015

Tips

Unnest the funding_details in dimensions-ai.data_analytics.patents table to get the funder's GRID ID and Dimensions Grant ID
Join with dimensions-ai.data_analytics.grants table and limit to the grants that started in 2015
The GRID ID of NSF Directorate for Engineering can be found in Global Research Identifier Database

# build the search string
search_string = """
SELECT
  DISTINCT pat.id AS patent_id,
  f.grant_id,
  pat.family_id,
  CAST(pat.priority_year AS string) priority_year,
  'supported' AS link_type
FROM
  `dimensions-ai.data_analytics.patents` pat
CROSS JOIN
  UNNEST(funding_details) f     -- unnest the field to get the funder and grant number acknowledged by the patent
JOIN
  `dimensions-ai.data_analytics.grants` g
ON
  g.id = f.grant_id
WHERE
  f.grid_id = 'grid.457810.f'    -- specify funder's GRID ID
  AND g.start_year = 2015        -- specify the grant start year
"""

# retrieve from BigQuery and make it a pandas dataframe
nsf_grant_patents = client.query(search_string).to_dataframe()

# get a quick preview of the patents directly linked to the grants
nsf_grant_patents.head()

Row	patent_id	grant_id	family_id	priority_year	link_type
0	US-20200303900-A1	grant.4179692	60479104	2016	supported
1	US-20190224370-A1	grant.4178213	61562333	2016	supported
2	US-20160236141-A1	grant.3852639	56620694	2014	supported
3	US-20190193116-A1	grant.3982138	60663733	2016	supported
4	US-20200061618-A1	grant.4318677	69584146	2018	supported

# get a quick count of how many patents were retrieved
print(nsf_grant_patents['patent_id'].nunique())

Retrieve patents that cited the publications funded by the same set of grants

Then, we can get all patents that cited publications which were funded by the same set of grants (i.e. grants funded by NSF Directorate for Engineering started in 2015)

Tips

unnest the resulting_publication_ids in dimensions-ai.data_analytics.grants table to get the publication ids funded by a set of grants, setting the funder's GRID ID and Start year the same as the above query
unnest the publication_ids in dimensions-ai.data_analytics.patents table which contains publication ids cited by patents
the publication_ids unnested from the above are used as an intermediate link between patents and grants

# build the search string
search_string = """
WITH
  grant_pubs AS (
  SELECT
    DISTINCT pub_id,
    g.id AS grant_id
  FROM
    `dimensions-ai.data_analytics.grants` g
  CROSS JOIN
    UNNEST(resulting_publication_ids) pub_id   -- unnest the publication ids resulting from grants
  WHERE
    g.funder_org = 'grid.457810.f'     -- specify funder grid id
    AND g.start_year = 2015)           -- specify grant start year
SELECT
  DISTINCT pat.id AS patent_id,
  gp.grant_id,
  pat.family_id,
  CAST(pat.priority_year AS string) priority_year,
  'pub_reference' AS link_type
FROM
  `dimensions-ai.data_analytics.patents` pat
CROSS JOIN
  UNNEST(publication_ids) pub_ref     -- unnest the publication ids cited by patents
JOIN
  grant_pubs gp
ON
  gp.pub_id = pub_ref                 -- join on publicaiton id
"""

# retrieve from BigQuery and make it a pandas dataframe
nsf_pub_ref_patents = client.query(search_string).to_dataframe()

# get a quick preview of the patents that cited publications which were funded by NSF grants
nsf_pub_ref_patents.head()

Row	patent_id	grant_id	family_id	priority_year	link_type
0	WO-2019202933-A1	grant.4179511	68239624	2018	pub_reference
1	US-10528687-B2	grant.3861071	60158395	2016	pub_reference
2	DE-102017002874-A1	grant.3981846	61731654	2017	pub_reference
3	US-10725209-B2	grant.4312170	62908810	2017	pub_reference
4	US-10196708-B2	grant.4312419	62782292	2017	pub_reference

# get a quick count of how many patents were retrieved
print(nsf_pub_ref_patents['patent_id'].nunique())

Merge results

Now we will merge two data frames to have a complete set of patents that are directly and indirectly linked to the set of grants

nsf_patents = pd.concat([nsf_grant_patents, nsf_pub_ref_patents]).reset_index()

# get a quick count of how many patents in total
print(nsf_patents['patent_id'].nunique())

Quick overview of patents

Lastly, we can examine the trends in the patents by priority year

Tips

family_id was used to deduplicate the patent documents, one patent family is a collection of patent documents that are considered to cover a single invention (see definition in DOCDB Simple patent family)
priority_year was used to aggregate the patents, since it indicates the time when the invention was established. All patent documents in one patent family share the same priority date

nsf_patents.groupby(['link_type','priority_year'], as_index = False).agg({'family_id':'nunique'})\
           .rename(columns={'family_id':'n_pat_families'})\
           .pivot (values = 'n_pat_families', index = 'priority_year', columns = 'link_type')

link_type	pub_reference	supported
priority_year
2004	3.0	NaN
2005	1.0	NaN
2009	1.0	NaN
2010	NaN	1.0
2011	1.0	NaN
2012	1.0	1.0
2013	4.0	5.0
2014	8.0	15.0
2015	10.0	42.0
2016	41.0	90.0
2017	57.0	123.0
2018	57.0	63.0
2019	15.0	42.0
2020	1.0	1.0

In addition, we can also create a quick visualization of the trends

plot_data = nsf_patents.groupby(['link_type','priority_year'], as_index = False)\
           .agg({'family_id':'nunique'})\
           .rename(columns={'family_id':'n_patent_families'})

# create line plot by using plotly express
fig = px.line(plot_data, x="priority_year", y="n_patent_families", color = "link_type", title='Trends in patents supported by NSF Directorate for Engineering grants starting 2015')
fig.show()