Concept Set Distribution - OMOP Version
Usage
csd_process_omop(
cohort,
domain_tbl = conceptsetdistribution::csd_domain_file,
concept_set = conceptsetdistribution::csd_concept_set,
multi_or_single_site = "single",
anomaly_or_exploratory = "exploratory",
num_concept_combined = FALSE,
num_concept_1 = 30,
num_concept_2 = 30,
p_value = 0.9,
age_groups = FALSE,
time = TRUE,
time_span = c("2012-01-01", "2020-01-01"),
time_period = "year"
)Arguments
- cohort
cohort for SQUBA testing; required fields:
siteperson_idstart_dateend_date
- domain_tbl
input table defining the domains listed in the annotated concept set four columns: -
domainthe name of the CDM table associated with the concept; should match what is listed in the annotated concept set -concept_fieldthe name of the field in the domain table where the concepts are located -date_fieldthe name of the field in the domain table with the date that should be used for time-based filtering -vocabulary_fieldPCORnet only; set to NA- concept_set
an annotated concept set CSV file with the following columns:
concept_idrequired for OMOP; the concept_id of interestconcept_coderequired for PCORnet; the code of interestconcept_nameoptional; the descriptive name of the conceptvocabulary_idrequired for PCORnet; the vocabulary of the code - should match what is listed in the domain table's vocabulary_fieldvariablerequired; a string label grouping one concept code into a larger variable definitiondomainrequired; the name of the CDM table where the concept is stored - multiple domains can be included in the file, but only one domain should be listed per row
- multi_or_single_site
Option to run the function on a single vs multiple sites
single: run the function for a single sitemulti: run the function for multiple sites
- anomaly_or_exploratory
Option to conduct an exploratory or anomaly detection analysis. Exploratory analyses give a high level summary of the data to examine the fact representation within the cohort. Anomaly detection analyses are specialized to identify outliers within the cohort.
- num_concept_combined
when
multi_or_single_site=singleandanomaly_or_exploratory=anomaly, this argument is a boolean that will ensure thatconcept1andconcept2meet some minimal threshold for including in the jaccard index ifTRUE, then both conditions fornum_concept_1andnum_concept_2should be met; ifFALSEthen just one condition needs to be met.- num_concept_1
when
multi_or_single_site=singleandanomaly_or_exploratory=anomaly, this argument is an integer and requires a minimum number of times that the first concept appears in the dataset- num_concept_2
when
multi_or_single_site=singleandanomaly_or_exploratory=anomaly, this argument is an integer and requires a minimum number of times that the second concept appears in the dataset- p_value
the p value to be used as a threshold in the multi-site anomaly detection analysis
- age_groups
If you would like to stratify the results by age group, create a table or CSV file with the following columns and include it as the
age_groupsfunction parameter:min_age: the minimum age for the group (i.e. 10)max_age: the maximum age for the group (i.e. 20)group: a string label for the group (i.e. 10-20, Young Adult, etc.)
If you would not like to stratify by age group, leave the argument as NULL
- time
logical to determine whether to output the check across time
- time_span
when time = TRUE, a vector of two dates for the observation period of the study
- time_period
when time = TRUE, this argument defines the distance between dates within the specified time period. defaults to
year, but other time periods such asmonthorweekare also acceptable