Skip to content

Identify a funder's grants and publications

Use case

How many grants have resulted in publications? This notebook serves two main purposes:

i. Enables identification of a funder's grants and resulting publications.

ii. Prepares a summary table of grants and publications (per year and in total), and a bar graph showing number of grants by year with and without publications.

Prerequisites

In order to run this tutorial, please ensure that:

This tutorial is based on a Jupyter notebook that is available directly via GitHub.

Method

This notebook retrieves grants and publication data from Dimensions the world's largest linked research information dataset. For more about Dimensions visit - https://www.dimensions.ai/

To complete this analysis the following steps are taken:

  1. Connect to the Dimensions database.
  2. Find funder's unique GRID ID. The Global Research Identifier Database (GRID) is a database of research-related organisations each with a unique and persistent identifier. For more about GRID visit - https://grid.ac/.
  3. Generate list of grants and resulting publications.
  4. (OPTIONAL) Export list of grants and resulting publications to excel.
  5. Summary table showing number of grants and publications in total and by year.
  6. Graph showing number of grants that have yielded one or more publications by year.

Using this notebook

Before running some cells you will have to fill in information about the funder and year range. When required instructions are set out in info boxes.

This notebook assumes installation of python packages pandas and plotly.

1. Connect to Dimensions

You will need to be authenticated to run these queries—see the "Verifying your connection tutorial for options.

Run this cell if you're connecting via a Google Colab notebook:

from google.colab import auth
auth.authenticate_user()
print('Authenticated')
Authenticated
import pandas as pd
from google.cloud import bigquery

GBQ_PROJECT_ID = 'ds-gov-funder-gbq' # <----- Input your project ID here
client = bigquery.Client(project=GBQ_PROJECT_ID)

2. Find funder's unique GRID ID

To identify funder's grants we need the funder's unique GRID ID. The following cell searches on the funders names and returns the associated GRID ID(s).

Instructions

  1. Before running the code in the following cell input funder's name
  2. Funder's name might return multiple GRID IDs, note all relevant GRID IDs as they are needed for the next step.

If no GRID IDs are returned, try searching different name variants for the funder. Avoid acronyms.

funder_name = 'Wellcome Trust' # Input funder name here

funder_name = funder_name.lower()
query = f"""
    select
        id as grid_id
        , name
    from `dimensions-ai.data_analytics.grid`
    where lower(name) like '%{funder_name}%'
"""

funder = pd.read_gbq(query, project_id=GBQ_PROJECT_ID)
funder.head(10)
Row grid_id name
0 grid.52788.30 Wellcome Trust
1 grid.478079.5 NIHR Wellcome Trust Southampton Clinical Resea...
2 grid.484745.e Wellcome Trust/DBT India Alliance
3 grid.482844.6 Wellcome Trust Centre for the History of Medicine
4 grid.419393.5 Malawi-Liverpool-Wellcome Trust Clinical Resea...

3. Generate list of grants and publications identifiers

The following cell generates a table of funders grants and any resulting publications for grants started in a defined time period. Only the first five rows of the table will be shown in this notebook, however the whole table can be exported to excel in the following step.

Table columns:

  • dimensions_grant_id - Unique identifier for grant in Dimensions
  • grant_start_year - Year the grant started
  • grant_number - Grant number used by funder
  • dimensions_publication_id - Unique identifier for publications in Dimensions
  • doi - Digital Object Identifier

Instructions

Before running the code in the following cell, where indicated:

  1. input funders GRID ID between ('') e.g. ('grid.52788.30'). If funder has two or more GRID IDs then list within brackets separated by commas e.g. ('grid.453157.1', 'grid.453157.1').
  2. Update year range, note that we can only look at full years and cannot be filtered by month or day.
query = f"""
with pubs as
  (select id, doi from `dimensions-ai.data_analytics.publications`)
, grants as
  (
  select
    id as dimensions_grant_id
    , start_year as grant_start_year
    , grant_number
    , pub_id as dimensions_publication_id
  from `dimensions-ai.data_analytics.grants` g left join unnest(g.resulting_publication_ids) as pub_id
   where funder_org in ('grid.52788.30')  -- ADD GRID ID HERE
      and start_year between 2011 and 2015 -- UPDATE YEAR RANGE
    )
Select
  grants.*
  , pubs.doi
from grants
  left join pubs
    on pubs.id = grants.dimensions_publication_id
"""
grants = pd.read_gbq(query)
grants.head()
Row dimensions_grant_id grant_start_year grant_number dimensions_publication_id doi
0 grant.3627143 2012 082265/Z/07/A pub.1105902779 10.1016/j.jaci.2018.07.011
1 grant.3635772 2012 098274/Z/12/Z pub.1124346329 10.1016/j.bpj.2020.01.019
2 grant.3636003 2012 097899/Z/11/Z pub.1062809971 10.1016/j.jaci.2018.07.011
3 grant.3642390 2012 098649/Z/12/Z None None
4 grant.3638982 2012 099618/Z/12/Z pub.1023307051 10.1007/s00234-016-1648-3

4. (OPTIONAL) Export list of grants and resulting publications to excel

The following cell creates new excel file of grants and resulting publications as output from previous cell. The new file will appear in the same location as this notebook is saved.

Instructions

Before running code the code in the following cell replace FILE_NAME with new file name.

# Creates new excel file.
writer = pd.ExcelWriter('FILE_NAME.xlsx',  engine='openpyxl')

grants.to_excel(writer, sheet_name='grants_pubs')

writer.save()

5. Summary table of number of grants and publications in total and by year

The following cell creates a table with columns, for all years and per year:

  • grant_start_year - first year of the grant
  • grants - number of grants awarded by funder
  • publications - number of publications resulting from awards by funder
  • grants_without_publications - number of grants that have not resulted in a publication
  • perc_without_publications - percentage of grants that have not resulted in a publication
  • avg_publications_per_grants - average (mean) number of publications per grant

Instructions

Before running code the code in the following cell, where indicated:

  1. input funders GRID ID between ('') e.g. ('grid.52788.30'). If funder has two or more GRID IDs then list within brackets separated by commas e.g. ('grid.453157.1', 'grid.453157.1').
  2. Update year range
query = f"""
with grant_pub_map as
  (
  select
    id
    , start_year
   , pub_id  -- is this the best way of finding publications? or search for grant or
  from `dimensions-ai.data_analytics.grants` g left join unnest(g.resulting_publication_ids) as pub_id
    where funder_org = 'grid.52788.30'  -- ADD GRID IDs HERE
      and start_year between 2011 and 2015 -- UPDATE YEAR RANGE
  )
select
  'All years' as grant_start_year
  , count(distinct id) as grants
  , count(pub_id) as publications
  , countif(pub_id is null) as grants_without_publications
  , round(((countif(pub_id is null)/count(distinct id))*100), 2) as perc_without_publications
  , round((count(pub_id)/(count(distinct id) )),2) as avg_publications_per_grants
from grant_pub_map
 union all
select
  cast(start_year as string) as grant_start_year
  , count(distinct id) as grants
  , count(pub_id) as publications
  , countif(pub_id is null) as grants_without_publications
  , round(((countif(pub_id is null)/count(distinct id))*100), 2) as perc_without_publications
  , round((count(pub_id)/(count(distinct id) )),2) as avg_publications_per_grants
from grant_pub_map
  group by start_year
order by 1 desc ;
"""

grants1 = pd.read_gbq(query)
grants1.head(5)
Row grant_start_year grants publications grants_without_publications perc_without_publications avg_publications_per_grants
0 All years 6423 45840 3731 58.09 7.14
1 2015 1417 6901 833 58.79 4.87
2 2014 1327 8746 791 59.61 6.59
3 2013 1200 8217 690 57.50 6.85
4 2012 1159 8515 689 59.45 7.35

6. Bar graph showing number of grants that have yielded one or more publications by year

Using the table above the following cell generates a bar graph showing the number of grants with and without publications per year.

# Remove first row that counts all grants over all years
df = grants1.iloc[1:]

# Reverse rows so earlist year shown first on bar graph
df = df[::-1].reset_index()

# Calculate number of grants with publications
df.insert(5, 'grant_pubs', df["grants"] - df["grants_without_publications"])


import plotly.graph_objs as go


x = df.grant_start_year

trace1 = {
    'x' : x,
    'y' : df.grant_pubs,
    'name' : 'Grants with publications',
    'type' : 'bar'
};
trace2 = {
    'x' : x,
    'y' : df.grants_without_publications,
    'name' : 'Grants without publications',
    'type' : 'bar'
};
data = [trace1, trace2];
layout = {
    'xaxis' : {'title' : 'Year'},
    'yaxis' : {'title' : 'Number of grants'},
    'barmode' : 'relative',
    'title': 'Grants with and without publications'
};
fig = go.Figure(data = data, layout = layout)
fig.show()

grant_pubs_barchart