32. Creating a concepts network

Level: Advanced

This query requires a good understanding of SQL and the Dimensions data model

Description

This query generates two-concept pairs and counts how many publications are shared between these concepts (note: concepts in Dimensions are publication-level keywords normalised and weighted based on a relevancy score).

The query includes a subquery (user-provided-subquery) that can be customised so to focus on arbitrary subsets of the Dimensions database. Parameters values for max number of nodes and min weight of edges to be included in the result (@max_nodes, @min_edge_weight) can be fine-tuned as needed.

The gist of the query lies in the double CROSS JOIN UNNEST. This mechanism allows to traverse a potentially very large number of relationships in seconds and to expose all relevant combinations of co-authoring organisations within the same data structure.

For more details, see also:

Generating large-scale network analyses of scientific landscapes in seconds using Dimensions on Google BigQuery, International Conference on Science, Technology and Innovation Indicators (STI 2022) Granada September 2022
Python tool to generate network visualizations: website and source code in Github

Query

As an example, we generate a concepts cooccurence network for all publications from the last 30 days that have an Altmetric Attention Score greater than 10:

WITH subset AS (

          -- USER-PROVIDED-SUBQUERY
          -- add any query in this section, as long as it returns publication IDs
          -- EG here we get all papers from the last 30 days with Altmetric attention > 10

          SELECT
              id
          FROM
              `dimensions-ai.data_analytics.publications`
          WHERE
              EXTRACT(
                  DATE
                  FROM
                      date_inserted
              ) >= DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)
              AND altmetrics.score > 10

        ),
        papercount AS (
            SELECT concept.concept, COUNT(p.id) AS papers,
            FROM `dimensions-ai.data_analytics.publications` p
            INNER JOIN subset ON p.id=subset.id
            CROSS JOIN UNNEST(p.concepts) concept
            WHERE
                year >= 1965
                AND concept.relevance >= 0.5 -- @min_link_relevance
            GROUP BY 1
        ),
        filtered AS (
            SELECT *
            FROM papercount
            WHERE papers >= 5 --@min_concept_frequency
            ORDER BY papers DESC
            LIMIT 500 -- @max_nodes
        ),
        results AS (
        SELECT concept1.concept AS concept_a, concept2.concept AS concept_b,
            COUNT(p.id) AS overlap,
        FROM `dimensions-ai.data_analytics.publications` p
        INNER JOIN subset ON p.id=subset.id
        CROSS JOIN UNNEST(p.concepts) concept1
        CROSS JOIN UNNEST(p.concepts) concept2
        INNER JOIN filtered pcount1 ON concept1.concept=pcount1.concept
        INNER JOIN filtered pcount2 ON concept2.concept=pcount2.concept
        WHERE year >= 1965
            AND concept1.relevance >= 0.5 --@min_link_relevance
            AND concept2.relevance >= 0.5 --@min_link_relevance
            AND concept1.concept <> concept2.concept
        GROUP BY 1,2
        )
        SELECT *
        FROM results
        WHERE overlap >= 3 --@min_edge_weight

Results

A sample of the results

concept_a	concept_b	overlap
concept_a	concept_b	overlap
odds ratio	subgroup analysis	5
randomized clinical trials	patients	18
population-based study	Main Outcomes	3
systematic review	Web of Science	24
intervention	care	18
depression	participants	20
exercise	muscle	5
heart failure	observational study	3
non-small cell lung cancer	therapy	3
type 2 diabetes	cohort study	7
quality of life	primary outcome	8
Cox proportional hazards models	women	5
plants	cells	3
nurses	outcomes	6
COVID-19	intervention	3
COVID-19 pandemic	well-being	3
clinical trials	randomized clinical trials	46
subgroup analysis	meta-analysis	17
odds ratio	logistic regression models	3
more entries here...

VOSViewer

A VOSViewer visualization using the results of this query can be explored online (note: the visualization uses data from the COVID-19 dataset)