Using the tabular output generated by dr_process, this function will build a graph to
visualize the results. Each function configuration will output a bespoke ggplot. Theming can
be adjusted by the user after the graph has been output using + theme(). Most graphs can
also be made interactive using make_interactive_squba()
Usage
dr_output(
process_output,
output_col,
filter_definition = NULL,
text_wrapping_char = 60L,
large_n = FALSE,
large_n_sites = NULL
)Arguments
- process_output
tabular input || required
The tabular output produced by
dr_process- output_col
string || defaults to
NULLThe name of the numerical variable from
process_outputthat should be used to generate the plot. This input is required for all checks EXCEPTMulti Site, Anomaly Detection, Longitudinal- filter_definition
string or vector || defaults to
NULLA string or vector with the names of duplicate definitions that should be the focus of the analysis. Only a single value is accepted for the following checks:
Multi Site, Anomaly Detection, LongitudinalSingle Site, Anomaly Detection, Longitudinal
Multiple values are accepted for:
Multi Site, Exploratory, Cross-SectionalMulti Site, Exploratory, Longitudinal
- text_wrapping_char
integer || defaults to
60An integer indicating the length limit for text wrapping on long labels in the plots. Generally applies to the position of the duplicate_definition on the plot, as these labels can be lengthy
- large_n
boolean || defaults to
FALSEFor Multi-Site analyses, a boolean indicating whether the large N visualization, intended for a high volume of sites, should be used. This visualization will produce high level summaries across all sites, with an option to add specific site comparators via the
large_n_sitesparameter.- large_n_sites
vector || defaults to
NULLWhen
large_n = TRUE, a vector of site names that can add site-level information to the plot for comparison across the high level summary information.
Value
This function will produce a graph to visualize the results
from dr_process based on the parameters provided. The default
output is typically a static ggplot or gt object, but interactive
elements can be activated by passing the plot through make_interactive_squba.
For a more detailed description of output specific to each check type,
see the PEDSpace metadata repository
Examples
#' Source setup file
source(system.file('setup.R', package = 'duplicaterecords'))
#' Create in-memory RSQLite database using data in extdata directory
conn <- mk_testdb_omop()
#' Establish connection to database and generate internal configurations
initialize_dq_session(session_name = 'dr_process_test',
working_directory = my_directory,
db_conn = conn,
is_json = FALSE,
file_subdirectory = my_file_folder,
cdm_schema = NA)
#> Connected to: :memory:@NA
#' Build mock study cohort
cohort <- cdm_tbl('person') %>% dplyr::distinct(person_id) %>%
dplyr::mutate(start_date = as.Date(-15000), # RSQLite does not store date objects,
# hence the numerics
end_date = as.Date(20000),
site = ifelse(person_id %in% c(1:6), 'synth1', 'synth2'))
#' Execute `dr_process` function
#' This example will use the single site, exploratory, cross sectional
#' configuration
dr_process_example <- dr_process(cohort = cohort,
multi_or_single_site = 'single',
anomaly_or_exploratory = 'exploratory',
time = FALSE,
omop_or_pcornet = 'omop',
dr_input_file = dr_input_file_omop) %>%
suppressMessages()
#> Warning: Missing values are always removed in SQL aggregation functions.
#> Use `na.rm = TRUE` to silence this warning
#> This warning is displayed once every 8 hours.
#> ┌ Output Function Details ─────────────────────────────────────┐
#> │ You can optionally use this dataframe in the accompanying │
#> │ `dr_output` function. Here are the parameters you will need: │
#> │ │
#> │ │
#> │ See ?dr_output for more details. │
#> └──────────────────────────────────────────────────────────────┘
dr_process_example
#> # A tibble: 2 × 14
#> site duplicate_definition duplicate_columns total_rows total_pt
#> <chr> <chr> <chr> <int> <int>
#> 1 combined duplicate visits per day include visit_concept_i… 1627 12
#> 2 combined conditions without PK exclude condition_occur… 58 12
#> # ℹ 9 more variables: duplicate_rows <int>, duplicate_pt <int>,
#> # median_all_with0s <dbl>, median_all_without0s <dbl>,
#> # median_site_with0s <dbl>, median_site_without0s <dbl>,
#> # duplicate_row_prop <dbl>, duplicate_pt_prop <dbl>, output_function <chr>
#' Execute `dr_output` function
dr_output_example <- dr_output(process_output = dr_process_example,
output_col = 'duplicate_row_prop')
dr_output_example
#' Easily convert the graph into an interactive ggiraph or plotly object with
#' `make_interactive_squba()`
make_interactive_squba(dr_output_example)