Skip to content

32. Creating a concepts network

Level: Advanced

This query requires a good understanding of SQL and the Dimensions data model

Description

This query generates two-concept pairs and counts how many publications are shared between these concepts (note: concepts in Dimensions are publication-level keywords normalised and weighted based on a relevancy score).

The query includes a subquery (user-provided-subquery) that can be customised so to focus on arbitrary subsets of the Dimensions database. Parameters values for max number of nodes and min weight of edges to be included in the result (@max_nodes, @min_edge_weight) can be fine-tuned as needed.

The gist of the query lies in the double CROSS JOIN UNNEST. This mechanism allows to traverse a potentially very large number of relationships in seconds and to expose all relevant combinations of co-authoring organisations within the same data structure.

For more details, see also:

Query

As an example, we generate a concepts cooccurence network for all publications from the last 30 days that have an Altmetric Attention Score greater than 10:

WITH subset AS (

          -- USER-PROVIDED-SUBQUERY
          -- add any query in this section, as long as it returns publication IDs
          -- EG here we get all papers from the last 30 days with Altmetric attention > 10

          SELECT
              id
          FROM
              `dimensions-ai.data_analytics.publications`
          WHERE
              EXTRACT(
                  DATE
                  FROM
                      date_inserted
              ) >= DATE_ADD(CURRENT_DATE(), INTERVAL -30 DAY)
              AND altmetrics.score > 10

        ),
        papercount AS (
            SELECT concept.concept, COUNT(p.id) AS papers,
            FROM `dimensions-ai.data_analytics.publications` p
            INNER JOIN subset ON p.id=subset.id
            CROSS JOIN UNNEST(p.concepts) concept
            WHERE
                year >= 1965
                AND concept.relevance >= 0.5 -- @min_link_relevance
            GROUP BY 1
        ),
        filtered AS (
            SELECT *
            FROM papercount
            WHERE papers >= 5 --@min_concept_frequency
            ORDER BY papers DESC
            LIMIT 500 -- @max_nodes
        ),
        results AS (
        SELECT concept1.concept AS concept_a, concept2.concept AS concept_b,
            COUNT(p.id) AS overlap,
        FROM `dimensions-ai.data_analytics.publications` p
        INNER JOIN subset ON p.id=subset.id
        CROSS JOIN UNNEST(p.concepts) concept1
        CROSS JOIN UNNEST(p.concepts) concept2
        INNER JOIN filtered pcount1 ON concept1.concept=pcount1.concept
        INNER JOIN filtered pcount2 ON concept2.concept=pcount2.concept
        WHERE year >= 1965
            AND concept1.relevance >= 0.5 --@min_link_relevance
            AND concept2.relevance >= 0.5 --@min_link_relevance
            AND concept1.concept <> concept2.concept
        GROUP BY 1,2
        )
        SELECT *
        FROM results
        WHERE overlap >= 3 --@min_edge_weight

Results

A sample of the results

concept_a concept_b overlap
concept_a concept_b overlap
odds ratio subgroup analysis 5
randomized clinical trials patients 18
population-based study Main Outcomes 3
systematic review Web of Science 24
intervention care 18
depression participants 20
exercise muscle 5
heart failure observational study 3
non-small cell lung cancer therapy 3
type 2 diabetes cohort study 7
quality of life primary outcome 8
Cox proportional hazards models women 5
plants cells 3
nurses outcomes 6
COVID-19 intervention 3
COVID-19 pandemic well-being 3
clinical trials randomized clinical trials 46
subgroup analysis meta-analysis 17
odds ratio logistic regression models 3
more entries here...

VOSViewer

A VOSViewer visualization using the results of this query can be explored online (note: the visualization uses data from the COVID-19 dataset)