Retrieve patents linked to a set of grants
Prerequisites
In order to run this tutorial, please ensure that:
- You have a valid Dimensions on Google BigQuery account and have configured a Google Cloud project.
- You have a valid Dimensions API account.
- You have some basic familiarity with Python and Jupyter notebooks.
This example uses the Dimensions database to determine the number of patents generated by a set of grants, providing another perspective on the impact of funding from different organizations. There are two types of linkages considered here:
- Patents that directly acknowledge grants. Since the enactment of the Bayh-Dole Act in 1980, all recipients of U.S. federal funding are legally obligated to disclose government support that led to any inventions they produce. This is useful when funders are most interested in the patents directly supported by the grants.
- Patents that cite publications supported by grants. This can be considered by funders who are interested in understanding patents that build less directly on research that they supported.
Note
Non-patent literatures in the patent references section are not only provided by the inventors, but also added by patent examiners during the patent examination process.
This tutorial is based on a Jupyter notebook that is available directly via GitHub.
# general import
import pandas as pd
import numpy as np
import sys, time, json
import plotly.express as px
# authentication happens via your browser
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated
# GBQ import
from google.cloud import bigquery
BQ_PROJECT_ID = "ds-gov-funder-gbq" # remember to change this to your project ID
client = bigquery.Client(project=BQ_PROJECT_ID)
An example
For this example, we will be using the Dimensions database to extract patents that are linked to the grants funded by NSF Directorate for Engineering and started in 2015.
The patents, publications, and grants datasets will be used in the example to retrieve patents through two types of linkages.
Retrieve patents that directly acknowledged grants
First, let's get all patents that directly acknowledged the NSF Directorate for Engineering grants that started in 2015
Tips
- Unnest the
funding_details
indimensions-ai.data_analytics.patents
table to get the funder's GRID ID and Dimensions Grant ID - Join with
dimensions-ai.data_analytics.grants
table and limit to the grants that started in 2015 - The GRID ID of NSF Directorate for Engineering can be found in Global Research Identifier Database
# build the search string
search_string = """
SELECT
DISTINCT pat.id AS patent_id,
f.grant_id,
pat.family_id,
CAST(pat.priority_year AS string) priority_year,
'supported' AS link_type
FROM
`dimensions-ai.data_analytics.patents` pat
CROSS JOIN
UNNEST(funding_details) f -- unnest the field to get the funder and grant number acknowledged by the patent
JOIN
`dimensions-ai.data_analytics.grants` g
ON
g.id = f.grant_id
WHERE
f.grid_id = 'grid.457810.f' -- specify funder's GRID ID
AND g.start_year = 2015 -- specify the grant start year
"""
# retrieve from BigQuery and make it a pandas dataframe
nsf_grant_patents = client.query(search_string).to_dataframe()
# get a quick preview of the patents directly linked to the grants
nsf_grant_patents.head()
Row | patent_id | grant_id | family_id | priority_year | link_type |
---|---|---|---|---|---|
0 | US-20200303900-A1 | grant.4179692 | 60479104 | 2016 | supported |
1 | US-20190224370-A1 | grant.4178213 | 61562333 | 2016 | supported |
2 | US-20160236141-A1 | grant.3852639 | 56620694 | 2014 | supported |
3 | US-20190193116-A1 | grant.3982138 | 60663733 | 2016 | supported |
4 | US-20200061618-A1 | grant.4318677 | 69584146 | 2018 | supported |
# get a quick count of how many patents were retrieved
print(nsf_grant_patents['patent_id'].nunique())
581
Retrieve patents that cited the publications funded by the same set of grants
Then, we can get all patents that cited publications which were funded by the same set of grants (i.e. grants funded by NSF Directorate for Engineering started in 2015)
Tips
- unnest the
resulting_publication_ids
indimensions-ai.data_analytics.grants
table to get the publication ids funded by a set of grants, setting the funder's GRID ID and Start year the same as the above query - unnest the
publication_ids
indimensions-ai.data_analytics.patents
table which contains publication ids cited by patents - the publication_ids unnested from the above are used as an intermediate link between patents and grants
# build the search string
search_string = """
WITH
grant_pubs AS (
SELECT
DISTINCT pub_id,
g.id AS grant_id
FROM
`dimensions-ai.data_analytics.grants` g
CROSS JOIN
UNNEST(resulting_publication_ids) pub_id -- unnest the publication ids resulting from grants
WHERE
g.funder_org = 'grid.457810.f' -- specify funder grid id
AND g.start_year = 2015) -- specify grant start year
SELECT
DISTINCT pat.id AS patent_id,
gp.grant_id,
pat.family_id,
CAST(pat.priority_year AS string) priority_year,
'pub_reference' AS link_type
FROM
`dimensions-ai.data_analytics.patents` pat
CROSS JOIN
UNNEST(publication_ids) pub_ref -- unnest the publication ids cited by patents
JOIN
grant_pubs gp
ON
gp.pub_id = pub_ref -- join on publicaiton id
"""
# retrieve from BigQuery and make it a pandas dataframe
nsf_pub_ref_patents = client.query(search_string).to_dataframe()
# get a quick preview of the patents that cited publications which were funded by NSF grants
nsf_pub_ref_patents.head()
Row | patent_id | grant_id | family_id | priority_year | link_type |
---|---|---|---|---|---|
0 | WO-2019202933-A1 | grant.4179511 | 68239624 | 2018 | pub_reference |
1 | US-10528687-B2 | grant.3861071 | 60158395 | 2016 | pub_reference |
2 | DE-102017002874-A1 | grant.3981846 | 61731654 | 2017 | pub_reference |
3 | US-10725209-B2 | grant.4312170 | 62908810 | 2017 | pub_reference |
4 | US-10196708-B2 | grant.4312419 | 62782292 | 2017 | pub_reference |
# get a quick count of how many patents were retrieved
print(nsf_pub_ref_patents['patent_id'].nunique())
224
Merge results
Now we will merge two data frames to have a complete set of patents that are directly and indirectly linked to the set of grants
nsf_patents = pd.concat([nsf_grant_patents, nsf_pub_ref_patents]).reset_index()
# get a quick count of how many patents in total
print(nsf_patents['patent_id'].nunique())
799
Quick overview of patents
Lastly, we can examine the trends in the patents by priority year
Tips
family_id
was used to deduplicate the patent documents, one patent family is a collection of patent documents that are considered to cover a single invention (see definition in DOCDB Simple patent family)priority_year
was used to aggregate the patents, since it indicates the time when the invention was established. All patent documents in one patent family share the same priority date
nsf_patents.groupby(['link_type','priority_year'], as_index = False).agg({'family_id':'nunique'})\
.rename(columns={'family_id':'n_pat_families'})\
.pivot (values = 'n_pat_families', index = 'priority_year', columns = 'link_type')
link_type | pub_reference | supported |
---|---|---|
priority_year | ||
2004 | 3.0 | NaN |
2005 | 1.0 | NaN |
2009 | 1.0 | NaN |
2010 | NaN | 1.0 |
2011 | 1.0 | NaN |
2012 | 1.0 | 1.0 |
2013 | 4.0 | 5.0 |
2014 | 8.0 | 15.0 |
2015 | 10.0 | 42.0 |
2016 | 41.0 | 90.0 |
2017 | 57.0 | 123.0 |
2018 | 57.0 | 63.0 |
2019 | 15.0 | 42.0 |
2020 | 1.0 | 1.0 |
In addition, we can also create a quick visualization of the trends
plot_data = nsf_patents.groupby(['link_type','priority_year'], as_index = False)\
.agg({'family_id':'nunique'})\
.rename(columns={'family_id':'n_patent_families'})
# create line plot by using plotly express
fig = px.line(plot_data, x="priority_year", y="n_patent_families", color = "link_type", title='Trends in patents supported by NSF Directorate for Engineering grants starting 2015')
fig.show()