Source and Concept Vocabularies – Output Generation — scv

Using the tabular output generated by scv_process, this function will build a graph to visualize the results. Each function configuration will output a bespoke ggplot. Theming can be adjusted by the user after the graph has been output using + theme(). Most graphs can also be made interactive using make_interactive_squba()

Usage

scv_output(
  process_output,
  code_type,
  filter_concept = NULL,
  filter_mapped = NULL,
  num_codes = 10,
  num_mappings = 25,
  vocab_tbl = NULL,
  large_n = FALSE,
  large_n_sites = NULL
)

Arguments

process_output

tabular input || required

The tabular output produced by scv_process

code_type

string || required

A string identifying the type of concept that was been provided in the concept_set. This should match whatever was provided when executing the scv_process function.

Acceptable values are cdm (the standard, mapped code that is included in the CDM) or source (the "raw" concept from the source system)

filter_concept

(integer or string) or vector || defaults to NULL

An integer/string (depending on the data model) or vector of concepts (as provided in the concept_set in scv_process) that should appear in the output. This parameter is required for the following check types:

Single Site, Anomaly Detection, Cross-Sectional
Multi-Site, Anomaly Detection, Cross-Sectional
Multi-Site, Exploratory, Longitudinal
Single Site, Anomaly Detection, Longitudinal
Multi-Site, Anomaly Detection, Longitudinal

filter_mapped

integer or string || defaults to NULL

An integer/string (depending on the data model) representing a mapping associated with filter_concept, as it appears in the output. This parameter is required for the following check types:

Multi-Site, Anomaly Detection, Longitudinal

num_codes

integer || defaults to 10

An integer indicating the top N of concepts (of code_type) that should be displayed in the plot. This parameter is required for the following check types:

Single Site, Exploratory, Cross-Sectional
Multi-Site, Exploratory, Cross-Sectional
Single Site, Anomaly Detection, Cross-Sectional

num_mappings

integer || defaults to 25

An integer indicating the top N of mappings associated with each concept of code_type that should be displayed on the plot. This parameter is required for the following check types:

Single Site, Exploratory, Cross-Sectional
Single Site, Anomaly Detection, Cross-Sectional
Multi-Site, Anomaly Detection, Cross-Sectional
Single Site, Exploratory, Longitudinal
Multi-Site, Exploratory, Longitudinal
Multi-Site, Anomaly Detection, Longitudinal

vocab_tbl

tabular input || optional

A vocabulary table containing concept names for the provided codes (ex: the OMOP concept table)

large_n

boolean || defaults to FALSE

For Multi-Site analyses, a boolean indicating whether the large N visualization, intended for a high volume of sites, should be used. This visualization will produce high level summaries across all sites, with an option to add specific site comparators via the large_n_sites parameter.

large_n_sites

vector || defaults to NULL

When large_n = TRUE, a vector of site names that can add site-level information to the plot for comparison across the high level summary information.

Value

This function will produce a graph to visualize the results from scv_process based on the parameters provided. The default output is typically a static ggplot or gt object, but interactive elements can be activated by passing the plot through make_interactive_squba. For a more detailed description of output specific to each check type, see the PEDSpace metadata repository

Examples


#' Source setup file
source(system.file('setup.R', package = 'sourceconceptvocabularies'))

#' Create in-memory RSQLite database using data in extdata directory
conn <- mk_testdb_omop()

#' Establish connection to database and generate internal configurations
initialize_dq_session(session_name = 'scv_process_test',
                      working_directory = my_directory,
                      db_conn = conn,
                      is_json = FALSE,
                      file_subdirectory = my_file_folder,
                      cdm_schema = NA)
#> Connected to: :memory:@NA

#' Build mock study cohort
cohort <- cdm_tbl('person') %>% dplyr::distinct(person_id) %>%
  dplyr::mutate(start_date = as.Date(-5000),
                #RSQLite does not store date objects,
                #hence the numerics
                end_date = as.Date(15000),
                site = ifelse(person_id %in% c(1:6), 'synth1', 'synth2'))

#' Prepare input tables
scv_domain_tbl <- dplyr::tibble(domain = 'condition_occurrence',
                                concept_field = 'condition_concept_id',
                                source_concept_field =
                                  'condition_source_concept_id',
                                date_field = 'condition_start_date',
                                vocabulary_field = NA)

scv_concept_set <- read_codeset('dx_hypertension')

#' Execute `scv_process` function
#' This example will use the single site, exploratory, cross sectional
#' configuration
scv_process_example <- scv_process(cohort = cohort,
                                   multi_or_single_site = 'single',
                                   anomaly_or_exploratory = 'exploratory',
                                   time = FALSE,
                                   omop_or_pcornet = 'omop',
                                   code_type = 'cdm',
                                   code_domain = 'condition_occurrence',
                                   domain_tbl = scv_domain_tbl,
                                   concept_set = scv_concept_set) %>%
  suppressMessages()
#> ┌ Output Function Details ──────────────────────────────────────┐
#> │ You can optionally use this dataframe in the accompanying     │
#> │ `scv_output` function. Here are the parameters you will need: │
#> │                                                               │
#> │ Always Required: process_output, code_type                    │
#> │ Required for Check: num_codes, num_mappings                   │
#> │ Optional: vocab_tbl                                           │
#> │                                                               │
#> │ See ?scv_output for more details.                             │
#> └───────────────────────────────────────────────────────────────┘

scv_process_example
#> # A tibble: 1 × 10
#>   site     domain            concept_id source_concept_id    ct denom_concept_ct
#>   <chr>    <chr>                  <int>             <int> <int>            <int>
#> 1 combined condition_occurr…     320128            320128     5                5
#> # ℹ 4 more variables: denom_source_ct <int>, concept_prop <dbl>,
#> #   source_prop <dbl>, output_function <chr>

#' Execute `scv_output` function
scv_output_example <- scv_output(process_output = scv_process_example,
                                 code_type = 'cdm',
                                 vocab_tbl = NULL) %>%
  suppressMessages()

scv_output_example[[1]]


#' Easily convert the graph into an interactive ggiraph or plotly object with
#' `make_interactive_squba()`

make_interactive_squba(scv_output_example[[1]])