Skip to content

33. Calculating disruption indices (CD index)

Level: Advanced

This query requires a good understanding of SQL and the Dimensions data model

Description

Evaluating the disruptive nature of academic ideas is a new area of research evaluation that moves beyond standard citation-based metrics by taking into account the broader citation context of publications or patents. The "CD index" and a number of related indicators have been proposed in order to characterise mathematically the disruptiveness of scientific publications or patents.

This research area has generated a lot of attention in recent years, yet there is no general consensus on the significance and reliability of disruption indices. More experimentation and evaluation would be desirable, however is hampered by the fact that these indicators are expensive and time-consuming to calculate, especially if done at scale on large citation networks.

The query below takes advantage of Dimensions on BigQuery scalable architecture and reduces the computational time taken to produce such indices by an order of magnitude. This approach makes it possible to calculate e.g. CD5 index for all journal articles with references in Dimensions in less than 5 hours.

For more details, see also:

Note

Please make sure you are familiar with BigQuery billing methods and quota before running large queries like the ones below.

Query

CREATE OR REPLACE TABLE `{your-gbq-project}.{you-gbq-dataset}.publications_cd_index_all`
CLUSTER BY id
AS
(
  WITH publications AS
  (
    SELECT id, year, citations, reference_ids
    FROM `dimensions-ai.data_analytics.publications`
    WHERE year IS NOT NULL 
  )

  SELECT focal_id AS id,
  (SUM(score)/COUNT(DISTINCT citation_id))+2 AS cd_5,
  COUNTIF(score = -1)*((SUM(score)/COUNT(DISTINCT citation_id))+2) AS mcd_5
  FROM
  (
    (
      SELECT DISTINCT publications.id AS focal_id,
      citation.id AS citation_id,
      -1 AS score
      FROM publications
      LEFT JOIN UNNEST(publications.citations) AS citation
      WHERE citation.year - publications.year BETWEEN 1 AND 5
    )
    UNION ALL
    (
      SELECT DISTINCT publications.id AS focal_id,
      reference_citation.id as citation_id,
      -2 as score
      FROM publications
      LEFT JOIN UNNEST(publications.reference_ids) AS reference_id
      INNER JOIN publications AS references
      ON references.id = reference_id
      LEFT JOIN UNNEST(references.citations) AS reference_citation
      WHERE reference_citation.year - publications.year BETWEEN 1 AND 5
    )
  )
  GROUP BY 1
)

NOTE: it is easy to create variations of the query above using a more restrictive set of input documents.

For example, if we want to generate the CD index only for journal publications:

CREATE OR REPLACE TABLE `{your-gbq-project}.{you-gbq-dataset}.publications_cd_index_journals` 
CLUSTER BY id 
AS
(
  WITH publications AS
  (
    SELECT id, year, citations, reference_ids
    FROM `dimensions-ai.data_analytics.publications`
    WHERE year IS NOT NULL AND type ="article" AND ARRAY_LENGTH(reference_ids)>=10 AND journal.id IS NOT NULL
  )

  SELECT focal_id AS id,
  (SUM(score)/COUNT(DISTINCT citation_id))+2 AS cd_5,
  COUNTIF(score = -1)*((SUM(score)/COUNT(DISTINCT citation_id))+2) AS mcd_5
  FROM
  (
    (
      SELECT DISTINCT publications.id AS focal_id, 
      citation.id AS citation_id,
      -1 AS score
      FROM publications
      LEFT JOIN UNNEST(publications.citations) AS citation
      WHERE citation.year - publications.year BETWEEN 1 AND 5
    )
    UNION ALL
    (
      SELECT DISTINCT publications.id AS focal_id, 
      reference_citation.id as citation_id,
      -2 as score
      FROM publications
      LEFT JOIN UNNEST(publications.reference_ids) AS reference_id
      INNER JOIN publications AS references 
      ON references.id = reference_id
      LEFT JOIN UNNEST(references.citations) AS reference_citation
      WHERE reference_citation.year - publications.year BETWEEN 1 AND 5
    )
  )
  GROUP BY 1
)

Results

The query results have the following format:

[{
  "id": "pub.1065228952",
  "cd_5": "1.0",
  "mcd_5": "18.0"
}, {
  "id": "pub.1065228398",
  "cd_5": "1.0",
  "mcd_5": "27.0"
}, {
  "id": "pub.1065228573",
  "cd_5": "1.0",
  "mcd_5": "29.0"
}, {
  "id": "pub.1065125068",
  "cd_5": "1.0",
  "mcd_5": "26.0"
}, {
  "id": "pub.1113010424",
  "cd_5": "0.125",
  "mcd_5": "1.125"
}, {
  "id": "pub.1083797867",
  "cd_5": "1.0",
  "mcd_5": "194.0"
}, {
  "id": "pub.1078318645",
  "cd_5": "1.0",
  "mcd_5": "108.0"
}, {
  "id": "pub.1078221791",
  "cd_5": "1.0",
  "mcd_5": "28.0"
}, {
  "id": "pub.1078052782",
  "cd_5": "1.0",
  "mcd_5": "19.0"
}, {
  "id": "pub.1071156881",
  "cd_5": "1.0",
  "mcd_5": "21.0"
}, {
  "id": "pub.1071156878",
  "cd_5": "1.0",
  "mcd_5": "56.0"
}, {
  "id": "pub.1071156873",
  "cd_5": "1.0",
  "mcd_5": "56.0"
}, {
  "id": "pub.1071156882",
  "cd_5": "1.0",
  "mcd_5": "27.0"
}, {
  "id": "pub.1070917761",
  "cd_5": "1.0",
  "mcd_5": "17.0"
}, {
  "id": "pub.1071163762",
  "cd_5": "1.0",
  "mcd_5": "144.0"
}, {
  "id": "pub.1071053457",
  "cd_5": "1.0",
  "mcd_5": "44.0"
},
.......