Skip to content

Retrieve patents linked to a set of grants

Prerequisites

In order to run this tutorial, please ensure that:

This example uses the Dimensions database to determine the number of patents generated by a set of grants, providing another perspective on the impact of funding from different organizations. There are two types of linkages considered here:

  • Patents that directly acknowledge grants. Since the enactment of the Bayh-Dole Act in 1980, all recipients of U.S. federal funding are legally obligated to disclose government support that led to any inventions they produce. This is useful when funders are most interested in the patents directly supported by the grants.
  • Patents that cite publications supported by grants. This can be considered by funders who are interested in understanding patents that build less directly on research that they supported.

Note

Non-patent literatures in the patent references section are not only provided by the inventors, but also added by patent examiners during the patent examination process.

This tutorial is based on a Jupyter notebook that is available directly via GitHub.

# general import
import pandas as pd
import numpy as np
import sys, time, json
import plotly.express as px

# authentication happens via your browser
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated
# GBQ import
from google.cloud import bigquery

BQ_PROJECT_ID = "ds-gov-funder-gbq"  # remember to change this to your project ID
client = bigquery.Client(project=BQ_PROJECT_ID)

An example

For this example, we will be using the Dimensions database to extract patents that are linked to the grants funded by NSF Directorate for Engineering and started in 2015.

The patents, publications, and grants datasets will be used in the example to retrieve patents through two types of linkages.

Retrieve patents that directly acknowledged grants

First, let's get all patents that directly acknowledged the NSF Directorate for Engineering grants that started in 2015

Tips

  • Unnest the funding_details in dimensions-ai.data_analytics.patents table to get the funder's GRID ID and Dimensions Grant ID
  • Join with dimensions-ai.data_analytics.grants table and limit to the grants that started in 2015
  • The GRID ID of NSF Directorate for Engineering can be found in Global Research Identifier Database
# build the search string
search_string = """
SELECT
  DISTINCT pat.id AS patent_id,
  f.grant_id,
  pat.family_id,
  CAST(pat.priority_year AS string) priority_year,
  'supported' AS link_type
FROM
  `dimensions-ai.data_analytics.patents` pat
CROSS JOIN
  UNNEST(funding_details) f     -- unnest the field to get the funder and grant number acknowledged by the patent
JOIN
  `dimensions-ai.data_analytics.grants` g
ON
  g.id = f.grant_id
WHERE
  f.grid_id = 'grid.457810.f'    -- specify funder's GRID ID
  AND g.start_year = 2015        -- specify the grant start year
"""

# retrieve from BigQuery and make it a pandas dataframe
nsf_grant_patents = client.query(search_string).to_dataframe()
# get a quick preview of the patents directly linked to the grants
nsf_grant_patents.head()
Row patent_id grant_id family_id priority_year link_type
0 US-20200303900-A1 grant.4179692 60479104 2016 supported
1 US-20190224370-A1 grant.4178213 61562333 2016 supported
2 US-20160236141-A1 grant.3852639 56620694 2014 supported
3 US-20190193116-A1 grant.3982138 60663733 2016 supported
4 US-20200061618-A1 grant.4318677 69584146 2018 supported
# get a quick count of how many patents were retrieved
print(nsf_grant_patents['patent_id'].nunique())
581

Retrieve patents that cited the publications funded by the same set of grants

Then, we can get all patents that cited publications which were funded by the same set of grants (i.e. grants funded by NSF Directorate for Engineering started in 2015)

Tips

  • unnest the resulting_publication_ids in dimensions-ai.data_analytics.grants table to get the publication ids funded by a set of grants, setting the funder's GRID ID and Start year the same as the above query
  • unnest the publication_ids in dimensions-ai.data_analytics.patents table which contains publication ids cited by patents
  • the publication_ids unnested from the above are used as an intermediate link between patents and grants
# build the search string
search_string = """
WITH
  grant_pubs AS (
  SELECT
    DISTINCT pub_id,
    g.id AS grant_id
  FROM
    `dimensions-ai.data_analytics.grants` g
  CROSS JOIN
    UNNEST(resulting_publication_ids) pub_id   -- unnest the publication ids resulting from grants
  WHERE
    g.funder_org = 'grid.457810.f'     -- specify funder grid id
    AND g.start_year = 2015)           -- specify grant start year
SELECT
  DISTINCT pat.id AS patent_id,
  gp.grant_id,
  pat.family_id,
  CAST(pat.priority_year AS string) priority_year,
  'pub_reference' AS link_type
FROM
  `dimensions-ai.data_analytics.patents` pat
CROSS JOIN
  UNNEST(publication_ids) pub_ref     -- unnest the publication ids cited by patents
JOIN
  grant_pubs gp
ON
  gp.pub_id = pub_ref                 -- join on publicaiton id
"""

# retrieve from BigQuery and make it a pandas dataframe
nsf_pub_ref_patents = client.query(search_string).to_dataframe()

# get a quick preview of the patents that cited publications which were funded by NSF grants
nsf_pub_ref_patents.head()
Row patent_id grant_id family_id priority_year link_type
0 WO-2019202933-A1 grant.4179511 68239624 2018 pub_reference
1 US-10528687-B2 grant.3861071 60158395 2016 pub_reference
2 DE-102017002874-A1 grant.3981846 61731654 2017 pub_reference
3 US-10725209-B2 grant.4312170 62908810 2017 pub_reference
4 US-10196708-B2 grant.4312419 62782292 2017 pub_reference
# get a quick count of how many patents were retrieved
print(nsf_pub_ref_patents['patent_id'].nunique())
224

Merge results

Now we will merge two data frames to have a complete set of patents that are directly and indirectly linked to the set of grants

nsf_patents = pd.concat([nsf_grant_patents, nsf_pub_ref_patents]).reset_index()

# get a quick count of how many patents in total
print(nsf_patents['patent_id'].nunique())
799

Quick overview of patents

Lastly, we can examine the trends in the patents by priority year

Tips

  • family_id was used to deduplicate the patent documents, one patent family is a collection of patent documents that are considered to cover a single invention (see definition in DOCDB Simple patent family)
  • priority_year was used to aggregate the patents, since it indicates the time when the invention was established. All patent documents in one patent family share the same priority date
nsf_patents.groupby(['link_type','priority_year'], as_index = False).agg({'family_id':'nunique'})\
           .rename(columns={'family_id':'n_pat_families'})\
           .pivot (values = 'n_pat_families', index = 'priority_year', columns = 'link_type')
link_type pub_reference supported
priority_year
2004 3.0 NaN
2005 1.0 NaN
2009 1.0 NaN
2010 NaN 1.0
2011 1.0 NaN
2012 1.0 1.0
2013 4.0 5.0
2014 8.0 15.0
2015 10.0 42.0
2016 41.0 90.0
2017 57.0 123.0
2018 57.0 63.0
2019 15.0 42.0
2020 1.0 1.0

In addition, we can also create a quick visualization of the trends

plot_data = nsf_patents.groupby(['link_type','priority_year'], as_index = False)\
           .agg({'family_id':'nunique'})\
           .rename(columns={'family_id':'n_patent_families'})

# create line plot by using plotly express
fig = px.line(plot_data, x="priority_year", y="n_patent_families", color = "link_type", title='Trends in patents supported by NSF Directorate for Engineering grants starting 2015')
fig.show()