Expected Variables Present – PCORnet
Usage
evp_process_pcornet(
cohort,
evp_variable_file = expectedvariablespresent::evp_variable_file_pcornet,
multi_or_single_site = "single",
anomaly_or_exploratory = "exploratory",
output_level = "row",
age_groups = NULL,
p_value = 0.9,
time = FALSE,
time_span = c("2012-01-01", "2020-01-01"),
time_period = "year"
)Arguments
- cohort
A dataframe with the cohort of patients for your study. Should include the columns:
person_idstart_dateend_datesite
- evp_variable_file
CSV file with information about each of the variables that should be examined in the function. contains the following columns:
variablea label for the variable captured by the associated codesetdefault_tblCDM table where data related to the codeset is foundconcept_fieldconcept_id field with codes from the associated codesetdate_fielda date field in thedefault_tblthat should be used for over time analysesvocabulary_fieldPCORNET ONLY; field in thedefault_tblthat defines the vocabulary type of the concept (i.e. dx_type) if this field is used, the codeset should have avocabulary_idcolumn that defines the appropriate vocabularies for each conceptcodeset_namethe name of the codeset file; DO NOT include the file extensionfilter_logica string indicating filter logic that should be applied to achieve the desired variable; optional
- multi_or_single_site
Option to run the function on a single vs multiple sites
single: run the function for a single sitemulti: run the function for multiple sites
- anomaly_or_exploratory
Option to conduct an exploratory or anomaly detection analysis. Exploratory analyses give a high level summary of the data to examine the fact representation within the cohort. Anomaly detection analyses are specialized to identify outliers within the cohort.
- output_level
the level of output to use for an AUC computation, exclusive to
ms_anom_at; eitherpatientorrow– defaults torow- age_groups
If you would like to stratify the results by age group, create a table or CSV file with the following columns and include it as the
age_groupsfunction parameter:min_age: the minimum age for the group (i.e. 10)max_age: the maximum age for the group (i.e. 20)group: a string label for the group (i.e. 10-20, Young Adult, etc.)
If you would not like to stratify by age group, leave the argument as NULL
- p_value
the p value to be used as a threshold in the multi-site anomaly detection analysis
- time
a logical that tells the function whether you would like to look at the output over time
- time_span
when time = TRUE, this argument defines the start and end dates for the time period of interest. should be formatted as c(start date, end date) in yyyy-mm-dd date format
- time_period
when time = TRUE, this argument defines the distance between dates within the specified time period. defaults to
year, but other time periods such asmonthorweekare also acceptable