Skip to content

31. Country-level publication activity over time

Level: Advanced

This query requires a good understanding of SQL and the Dimensions data model

Description

This query returns the number of publications in a selected field over time, broken down by the countries with which the authors are affiliated. Both the annual and cumulative totals are available for each country for each year. The example below examines publications in archaeology starting in 1970, which means the row for "Australia" in 1981 will include two numbers:

  1. pubs, which is the number of publications published in that year that include at least one author affiliated with an institution in Australia. (If a publication has authors from multiple countries, that publication is counted once for each country.)
  2. running_total, which indicates the total publications attributed to Australia from 1970 through the current year, in this case 1981.

This example returns yearly numbers only for the top eight countries by publication count—the top_countries subquery holds the logic for this step. The rest of the country totals are combined as "Others" using the everybody_else subquery.

Query

WITH results AS (
  SELECT DISTINCT
    id, year, country AS country
  FROM `dimensions-ai.data_analytics.publications` p
  CROSS JOIN UNNEST(p.research_org_country_names) country
  WHERE
    ---- HERE is where to define your publications
    ---- of interest
    p.year >= 1970
    AND '2101' IN UNNEST(p.category_for.second_level.codes) -- Archaeology
),
counts AS (
  SELECT year, country, COUNT(DISTINCT id) AS pubs
  FROM results
  GROUP BY 1,2
),
top_countries AS (
  SELECT country, SUM(pubs) AS total_pubs
  FROM counts
  GROUP BY 1
  ORDER BY 2 DESC
  LIMIT 8
),
everybody_else AS (
  SELECT 'Others' AS country, year, SUM(pubs) AS pubs
  FROM counts
  WHERE counts.country NOT IN (SELECT country FROM top_countries)
  GROUP BY 2
),
everybody_else_all_years AS (
  -- We add a CROSS JOIN with the years from the results to make
  -- sure there's an entry for "Others" even in years where that
  -- number is zero.
  SELECT base.country, year.year AS year, COALESCE(everybody_else.pubs,0) AS pubs
  FROM (SELECT 'Others' AS country) base
  CROSS JOIN (SELECT DISTINCT year FROM results) year
  LEFT JOIN everybody_else
    ON year.year=everybody_else.year
),
counts_consolidated AS (
  -- This query makes sure we have an entry for each country
  -- in each year
  SELECT top_countries.country, year.year AS year, COALESCE(counts.pubs,0) AS pubs
  FROM top_countries
  CROSS JOIN (SELECT DISTINCT year FROM results) year
  LEFT JOIN counts
    ON top_countries.country=counts.country
    AND year.year=counts.year
  --Then we add everybody else:
  UNION ALL
  SELECT * FROM everybody_else_all_years
)
SELECT year, country, pubs,
  SUM (pubs) OVER (PARTITION BY country ORDER BY year) AS running_total
FROM counts_consolidated
ORDER BY year, country

Results

Row year country pubs running_total
0 Australia 1970 15 15
1 Canada 1970 14 14
2 France 1970 6 6
more entries here...
101 Australia 1981 65 360
102 Canada 1981 26 247