33. Calculating disruption indices (CD index)
Level: Advanced
This query requires a good understanding of SQL and the Dimensions data model
Description
Evaluating the disruptive nature of academic ideas is a new area of research evaluation that moves beyond standard citation-based metrics by taking into account the broader citation context of publications or patents. The "CD index" and a number of related indicators have been proposed in order to characterise mathematically the disruptiveness of scientific publications or patents.
This research area has generated a lot of attention in recent years, yet there is no general consensus on the significance and reliability of disruption indices. More experimentation and evaluation would be desirable, however is hampered by the fact that these indicators are expensive and time-consuming to calculate, especially if done at scale on large citation networks.
The query below takes advantage of Dimensions on BigQuery scalable architecture and reduces the computational time taken to produce such indices by an order of magnitude. This approach makes it possible to calculate e.g. CD5 index for all journal articles with references in Dimensions in less than 5 hours.
For more details, see also:
- Dimensions: Calculating Disruption Indices at Scale, Arxiv, 2023.
- The paper accompanying source code in Github
Note
Please make sure you are familiar with BigQuery billing methods and quota before running large queries like the ones below.
Query
CREATE OR REPLACE TABLE `{your-gbq-project}.{you-gbq-dataset}.publications_cd_index_all`
CLUSTER BY id
AS
(
WITH publications AS
(
SELECT id, year, citations, reference_ids
FROM `dimensions-ai.data_analytics.publications`
WHERE year IS NOT NULL
)
SELECT focal_id AS id,
(SUM(score)/COUNT(DISTINCT citation_id))+2 AS cd_5,
COUNTIF(score = -1)*((SUM(score)/COUNT(DISTINCT citation_id))+2) AS mcd_5
FROM
(
(
SELECT DISTINCT publications.id AS focal_id,
citation.id AS citation_id,
-1 AS score
FROM publications
LEFT JOIN UNNEST(publications.citations) AS citation
WHERE citation.year - publications.year BETWEEN 1 AND 5
)
UNION ALL
(
SELECT DISTINCT publications.id AS focal_id,
reference_citation.id as citation_id,
-2 as score
FROM publications
LEFT JOIN UNNEST(publications.reference_ids) AS reference_id
INNER JOIN publications AS references
ON references.id = reference_id
LEFT JOIN UNNEST(references.citations) AS reference_citation
WHERE reference_citation.year - publications.year BETWEEN 1 AND 5
)
)
GROUP BY 1
)
NOTE: it is easy to create variations of the query above using a more restrictive set of input documents.
For example, if we want to generate the CD index only for journal publications:
CREATE OR REPLACE TABLE `{your-gbq-project}.{you-gbq-dataset}.publications_cd_index_journals`
CLUSTER BY id
AS
(
WITH publications AS
(
SELECT id, year, citations, reference_ids
FROM `dimensions-ai.data_analytics.publications`
WHERE year IS NOT NULL AND type ="article" AND ARRAY_LENGTH(reference_ids)>=10 AND journal.id IS NOT NULL
)
SELECT focal_id AS id,
(SUM(score)/COUNT(DISTINCT citation_id))+2 AS cd_5,
COUNTIF(score = -1)*((SUM(score)/COUNT(DISTINCT citation_id))+2) AS mcd_5
FROM
(
(
SELECT DISTINCT publications.id AS focal_id,
citation.id AS citation_id,
-1 AS score
FROM publications
LEFT JOIN UNNEST(publications.citations) AS citation
WHERE citation.year - publications.year BETWEEN 1 AND 5
)
UNION ALL
(
SELECT DISTINCT publications.id AS focal_id,
reference_citation.id as citation_id,
-2 as score
FROM publications
LEFT JOIN UNNEST(publications.reference_ids) AS reference_id
INNER JOIN publications AS references
ON references.id = reference_id
LEFT JOIN UNNEST(references.citations) AS reference_citation
WHERE reference_citation.year - publications.year BETWEEN 1 AND 5
)
)
GROUP BY 1
)
Results
The query results have the following format:
[{
"id": "pub.1065228952",
"cd_5": "1.0",
"mcd_5": "18.0"
}, {
"id": "pub.1065228398",
"cd_5": "1.0",
"mcd_5": "27.0"
}, {
"id": "pub.1065228573",
"cd_5": "1.0",
"mcd_5": "29.0"
}, {
"id": "pub.1065125068",
"cd_5": "1.0",
"mcd_5": "26.0"
}, {
"id": "pub.1113010424",
"cd_5": "0.125",
"mcd_5": "1.125"
}, {
"id": "pub.1083797867",
"cd_5": "1.0",
"mcd_5": "194.0"
}, {
"id": "pub.1078318645",
"cd_5": "1.0",
"mcd_5": "108.0"
}, {
"id": "pub.1078221791",
"cd_5": "1.0",
"mcd_5": "28.0"
}, {
"id": "pub.1078052782",
"cd_5": "1.0",
"mcd_5": "19.0"
}, {
"id": "pub.1071156881",
"cd_5": "1.0",
"mcd_5": "21.0"
}, {
"id": "pub.1071156878",
"cd_5": "1.0",
"mcd_5": "56.0"
}, {
"id": "pub.1071156873",
"cd_5": "1.0",
"mcd_5": "56.0"
}, {
"id": "pub.1071156882",
"cd_5": "1.0",
"mcd_5": "27.0"
}, {
"id": "pub.1070917761",
"cd_5": "1.0",
"mcd_5": "17.0"
}, {
"id": "pub.1071163762",
"cd_5": "1.0",
"mcd_5": "144.0"
}, {
"id": "pub.1071053457",
"cd_5": "1.0",
"mcd_5": "44.0"
},
.......