31. Country-level publication activity over time
Level: Advanced
This query requires a good understanding of SQL and the Dimensions data model
Description
This query returns the number of publications in a selected field over time, broken down by the countries with which the authors are affiliated. Both the annual and cumulative totals are available for each country for each year. The example below examines publications in archaeology starting in 1970, which means the row for "Australia" in 1981 will include two numbers:
pubs
, which is the number of publications published in that year that include at least one author affiliated with an institution in Australia. (If a publication has authors from multiple countries, that publication is counted once for each country.)running_total
, which indicates the total publications attributed to Australia from 1970 through the current year, in this case 1981.
This example returns yearly numbers only for the top eight countries by publication count—the top_countries
subquery holds the logic for this step. The rest of the country totals are combined as "Others" using the everybody_else
subquery.
Query
WITH results AS (
SELECT DISTINCT
id, year, country AS country
FROM `dimensions-ai.data_analytics.publications` p
CROSS JOIN UNNEST(p.research_org_country_names) country
WHERE
---- HERE is where to define your publications
---- of interest
p.year >= 1970
AND '2101' IN UNNEST(p.category_for.second_level.codes) -- Archaeology
),
counts AS (
SELECT year, country, COUNT(DISTINCT id) AS pubs
FROM results
GROUP BY 1,2
),
top_countries AS (
SELECT country, SUM(pubs) AS total_pubs
FROM counts
GROUP BY 1
ORDER BY 2 DESC
LIMIT 8
),
everybody_else AS (
SELECT 'Others' AS country, year, SUM(pubs) AS pubs
FROM counts
WHERE counts.country NOT IN (SELECT country FROM top_countries)
GROUP BY 2
),
everybody_else_all_years AS (
-- We add a CROSS JOIN with the years from the results to make
-- sure there's an entry for "Others" even in years where that
-- number is zero.
SELECT base.country, year.year AS year, COALESCE(everybody_else.pubs,0) AS pubs
FROM (SELECT 'Others' AS country) base
CROSS JOIN (SELECT DISTINCT year FROM results) year
LEFT JOIN everybody_else
ON year.year=everybody_else.year
),
counts_consolidated AS (
-- This query makes sure we have an entry for each country
-- in each year
SELECT top_countries.country, year.year AS year, COALESCE(counts.pubs,0) AS pubs
FROM top_countries
CROSS JOIN (SELECT DISTINCT year FROM results) year
LEFT JOIN counts
ON top_countries.country=counts.country
AND year.year=counts.year
--Then we add everybody else:
UNION ALL
SELECT * FROM everybody_else_all_years
)
SELECT year, country, pubs,
SUM (pubs) OVER (PARTITION BY country ORDER BY year) AS running_total
FROM counts_consolidated
ORDER BY year, country
Results
Row | year | country | pubs | running_total |
---|---|---|---|---|
0 | Australia | 1970 | 15 | 15 |
1 | Canada | 1970 | 14 | 14 |
2 | France | 1970 | 6 | 6 |
more entries here... | ||||
101 | Australia | 1981 | 65 | 360 |
102 | Canada | 1981 | 26 | 247 |