Concept Set Distribution – Output Generation — csd

Using the tabular output generated by csd_process, this function will build a graph to visualize the results. Each function configuration will output a bespoke ggplot. Theming can be adjusted by the user after the graph has been output using + theme(). Most graphs can also be made interactive using make_interactive_squba()

Usage

csd_output(
  process_output,
  concept_set = NULL,
  vocab_tbl = NULL,
  num_variables = 10,
  num_mappings = 10,
  filter_variable = NULL,
  filter_concept = NULL,
  text_wrapping_char = 80,
  output_value = "prop_concept",
  large_n = FALSE,
  large_n_sites = NULL
)

Arguments

process_output

tabular input || required

The tabular output produced by csd_process

concept_set

tabular input || optional

The concept set originally used in the csd_process function. Recommended if no vocab_tbl is provided but the concept names are available in the concept set.

vocab_tbl

tabular input || optional

A vocabulary table containing concept names for the provided codes (ex: the OMOP concept table)

num_variables

integer || defaults to 30

An integer indicating the top N of variables to include for the exploratory analyses. The function will choose the most commonly occurring N variables to include in the plot.

num_mappings

integer || defaults to 30

An integer indicating the top N of concepts to include for the exploratory analyses. The function will choose the most commonly occurring N concepts per variable to include in the plot.

filter_variable

string or vector || defaults to NULL

The specific variable(s) to display in the output. This parameter is required for the following check types:

Single Site, Anomaly Detection, Cross-Sectional
Multi Site, Anomaly Detection, Cross-Sectional
Single Site, Exploratory, Longitudinal
Single Site, Anomaly Detection, Longitudinal
Multi Site, Exploratory, Longitudinal

filter_concept

numeric/string or vector || defaults to NULL

The specific code(s) to display in the output. This parameter is required for the following check types:

Single Site, Anomaly Detection, Longitudinal
Multi Site, Exploratory, Longitudinal
Multi Site, Anomaly Detection, Longitudinal

text_wrapping_char

integer || defaults to 80

An integer indicating the length limit for text on an axis before wrapping is enforced. This is only used for the Multi Site, Anomaly Detection, Cross-Sectional check type

output_value

string || defaults to prop_concept

The name of the numerical column in process_output that should be used in the output. This parameter is required for the following check types:

Multi-Site, Anomaly Detection, Cross-Sectional
Single Site, Exploratory, Longitudinal
Multi-Site, Exploratory, Longitudinal

large_n

boolean || defaults to FALSE

For Multi-Site analyses, a boolean indicating whether the large N visualization, intended for a high volume of sites, should be used. This visualization will produce high level summaries across all sites, with an option to add specific site comparators via the large_n_sites parameter.

large_n_sites

vector || defaults to NULL

When large_n = TRUE, a vector of site names that can add site-level information to the plot for comparison across the high level summary information.

Value

This function will produce a graph to visualize the results from csd_process based on the parameters provided. The default output is typically a static ggplot or gt object, but interactive elements can be activated by passing the plot through make_interactive_squba. For a more detailed description of output specific to each check type, see the PEDSpace metadata repository

Examples


#' Source setup file
source(system.file('setup.R', package = 'conceptsetdistribution'))

#' Create in-memory RSQLite database using data in extdata directory
conn <- mk_testdb_omop()

#' Establish connection to database and generate internal configurations
initialize_dq_session(session_name = 'csd_process_test',
                      working_directory = my_directory,
                      db_conn = conn,
                      is_json = FALSE,
                      file_subdirectory = my_file_folder,
                      cdm_schema = NA)
#> Connected to: :memory:@NA

#' Build mock study cohort
cohort <- cdm_tbl('person') %>% dplyr::distinct(person_id) %>%
  dplyr::mutate(start_date = as.Date(-5000),
                #RSQLite does not store date objects,
                #hence the numerics
                end_date = as.Date(15000),
                site = ifelse(person_id %in% c(1:6), 'synth1', 'synth2'))

#' Prepare input tables
csd_domain_tbl <- dplyr::tibble(domain = 'condition_occurrence',
                                concept_field = 'condition_concept_id',
                                date_field = 'condition_start_date',
                                vocabulary_field = NA)

csd_concept_tbl <- read_codeset('dx_hypertension') %>%
  dplyr::mutate(domain = 'condition_occurrence',
                variable = 'hypertension')

#' Execute `csd_process` function
#' This example will use the single site, exploratory, cross sectional
#' configuration
csd_process_example <- csd_process(cohort = cohort,
                                   multi_or_single_site = 'single',
                                   anomaly_or_exploratory = 'exploratory',
                                   time = FALSE,
                                   omop_or_pcornet = 'omop',
                                   domain_tbl = csd_domain_tbl,
                                   concept_set = csd_concept_tbl) %>%
  suppressMessages()
#> ┌ Output Function Details ──────────────────────────────────────┐
#> │ You can optionally use this dataframe in the accompanying     │
#> │ `csd_output` function. Here are the parameters you will need: │
#> │                                                               │
#> │ Always Required: process_output                               │
#> │ Required for Check: num_variables, num_mappings               │
#> │ Optional: concept_set, vocab_tbl                              │
#> │                                                               │
#> │ See ?csd_output for more details.                             │
#> └───────────────────────────────────────────────────────────────┘

csd_process_example
#> # A tibble: 1 × 7
#>   variable     ct_denom concept_id ct_concept prop_concept site  output_function
#>   <chr>           <int> <chr>           <int>        <dbl> <chr> <chr>          
#> 1 hypertension        5 320128              5            1 comb… csd_ss_exp_cs  

#' Execute `csd_output` function
csd_output_example <- csd_output(process_output = csd_process_example,
                                 concept_set = csd_concept_tbl,
                                 vocab_tbl = NULL) %>%
  suppressMessages()

csd_output_example[[1]]


#' Easily convert the graph into an interactive ggiraph or plotly object with
#' `make_interactive_squba()`

make_interactive_squba(csd_output_example[[1]])