The Basics
squba_basics.RdIf you are interested in using squba but are not sure which modules to start with, look no further! This kit is considered a basic "starter pack" for squba analyses. All you need to do is configure the input files to tailor the program to your study aims, and the function will take care of the rest. This analysis includes 3 modules and will execute 5 total checks:
Cohort Attrition || Exploratory, Cross-Sectional
Patient Facts || Exploratory, Cross-Sectional
Expected Variables Present || Exploratory, Cross-Sectional
Expected Variables Present || Anomaly Detection, Cross-Sectional
Expected Variables Present || Anomaly Detection, Longitudinal
Usage
squba_basics(
cohort,
omop_or_pcornet,
multi_or_single_site,
ca_input,
pf_input,
pf_visits,
evp_input,
evp_variable_filter,
time_period = "year",
time_span = c("2015-01-01", "2020-01-01")
)Arguments
- cohort
tabular input || required
The cohort to be used for data quality testing. This table should contain, at minimum:
site| character | the name(s) of institutions included in your cohortperson_id/patid| integer / character | the patient identifierstart_date| date | the start of the cohort periodend_date| date | the end of the cohort period
Note that the start and end dates included in this table will be used to limit the search window for the analyses in this module.
- omop_or_pcornet
string || required
A string, either
omoporpcornet, indicating the CDM format of the data- multi_or_single_site
string || defaults to
singleA string, either
singleormulti, indicating whether a single-site or multi-site analysis should be executed. This selection will be applied across all analyses executed in this kit.- ca_input
tabular input || required
A table or CSV file with attrition information for each site included in the cohort. This table should minimally contain:
site| character | the name of the institutionstep_number| integer | a numeric identifier for the attrition stepattrition_step| character | a description of the attrition stepnum_pts| integer | the patient count for the attrition step
- pf_input
tabular input || required
A table that defines the fact domains to be investigated in the analysis. This input should contain:
domain| character | a string label for the domain being examined (i.e. prescription drugs)domain_tbl| character | the CDM table where information for this domain can be found (i.e. drug_exposure)filter_logic| character | logic to be applied to the domain_tbl in order to achieve the definition of interest; should be written as if you were applying it in a dplyr::filter command in R
- pf_visits
tabular input || required
A table that defines visit types of interest called in
visit_types.This input should contain:visit_concept_idorenc_type| integer or character | thevisit_concept_idorenc_typethat represents the visit type of interest (i.e. 9201 or IP)visit_type| character | the string label to describe the visit type
This information will be extracted either from the
visit_occurrence(OMOP) orencounter(PCORnet) CDM tables. If you wish to extract visit information from other tables, please run the Patient Facts module on its own.- evp_input
tabular input || required
A table with information about each of the variables that should be examined in the analysis. This table should contain the following columns:
variable| character | a string label for the variable captured by the associated codesetdomain_tbl| character | the CDM table where the variable is foundconcept_field| character | the string name of the field in the domain table where the concepts are locateddate_field| character | the name of the field in the domain table with the date that should be used for temporal filteringvocabulary_field| character | for PCORnet applications, the name of the field in the domain table with a vocabulary identifier to differentiate concepts from one another (ex: dx_type); can be set to NA for OMOP applicationscodeset_name| character | the name of the codeset that defines the variable of interestfilter_logic| character | logic to be applied to the domain_tbl in order to achieve the definition of interest; should be written as if you were applying it in a dplyr::filter command in R
To see an example of the structure of this file, please see
?expectedvariablespresent::evp_variable_file_omopor?expectedvariablespresent::evp_variable_file_pcornet- evp_variable_filter
string or vector || required
For the longitudinal analysis, we HIGHLY RECOMMEND choosing up to 3 of your variables for which the analysis should be executed. This is recommended in order to keep runtime low. The function will not error if you choose to select more than 3 variables, but please know this will take longer to process.
- time_period
string || defaults to
yearA string indicating the distance between dates within the specified time_span. Defaults to
year, but other time periods such asmonthorweekare also acceptable- time_span
vector - length 2 || defaults to
c('2015-01-01', '2020-01-01')A vector indicating the lower and upper bounds of the time series for longitudinal analyses
Value
This function will return a named list, with parent lists for each module and child lists for each check. It will contain 5 dataframes, one for each of the analyses executed in the kit. Each of these tables can be passed into the respective module's *_output function to produce visualizations.
Details
Note that single or multi-site selection is a global selection that will be controlled in the function parameters and be applied across all executions.
Examples
# Cohort Input
my_cohort <- dplyr::tibble('site' = c('Site A', 'Site A', 'Site B'),
'person_id' = c(1, 2, 3),
'start_date' = c('2012-01-01', '2015-07-04',
'2014-12-31'),
'end_date' = c('2018-01-01', '2020-07-04',
'2025-12-31'))
# Attrition Input
cohortattrition::sample_attrition
#> # A tibble: 10 × 4
#> site step_number attrition_step num_pts
#> <chr> <dbl> <chr> <dbl>
#> 1 site 1 0 All Patients 10900
#> 2 site 1 1 Patients with at least 2 visits since 2015 7800
#> 3 site 1 2 Patients with a T2DM diagnosis 2200
#> 4 site 1 3 Patients with an Hba1c lab 2000
#> 5 site 1 4 Patients with an Hba1c result > 6.5% 1900
#> 6 site 2 0 All Patients 17000
#> 7 site 2 1 Patients with at least 2 visits since 2015 10500
#> 8 site 2 2 Patients with a T2DM diagnosis 5500
#> 9 site 2 3 Patients with an Hba1c lab 5000
#> 10 site 2 4 Patients with an Hba1c result > 6.5% 4800
# Expected Variables Present Input
expectedvariablespresent::evp_variable_file_omop
#> # A tibble: 1 × 6
#> variable domain_tbl concept_field date_field codeset_name filter_logic
#> <chr> <chr> <chr> <chr> <chr> <lgl>
#> 1 Sample OMOP Var… condition… condition_co… condition… dx_hyperten… NA
expectedvariablespresent::evp_variable_file_pcornet
#> # A tibble: 1 × 7
#> variable domain_tbl concept_field date_field vocabulary_field codeset_name
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 Sample PCOR… diagnosis dx admit_date dx_type sample_code…
#> # ℹ 1 more variable: filter_logic <lgl>
# Patient Facts Input
patientfacts::pf_domain_file
#> # A tibble: 2 × 3
#> domain domain_tbl filter_logic
#> <chr> <chr> <chr>
#> 1 diagnoses condition_occurrence NA
#> 2 prescription drugs drug_exposure drug_type_concept_id == 38000177
patientfacts::pf_visit_file_omop
#> # A tibble: 8 × 2
#> visit_concept_id visit_type
#> <dbl> <chr>
#> 1 9201 inpatient
#> 2 9202 outpatient
#> 3 9203 emergency
#> 4 581399 outpatient
#> 5 9201 all
#> 6 9202 all
#> 7 9203 all
#> 8 581399 all
patientfacts::pf_visit_file_pcornet
#> # A tibble: 8 × 2
#> enc_type visit_type
#> <chr> <chr>
#> 1 IP inpatient
#> 2 AV outpatient
#> 3 ED emergency
#> 4 TH outpatient
#> 5 IP all
#> 6 AV all
#> 7 ED all
#> 8 TH all
if (FALSE) { # \dontrun{
# Execute Function
squba_basics(cohort = my_cohort,
omop_or_pcornet = 'omop' | 'pcornet',
multi_or_single_site = 'multi' | 'single',
ca_input = my_ca_file,
evp_input = my_evp_file,
pf_input = my_pf_domains,
pf_visits = my_pf_visits)
} # }