Multi-Site Analysis for Independent Data Sources • cohortattrition

The multi-site analyses included in this suite are intended to be executed against data that are all stored in the same place. However, there may be some instances where the data associated with each site is stored in independent locations. This vignette outlines how the multi-site analysis can be executed in these instances.

Multi-Site Exploratory Analysis

First, execute the Single Site, Exploratory analyses, configured appropriately for your study, against each data source.

library(cohortattrition)

my_table <- ca_process(attrition_tbl = my_attrition_counts,
                       multi_or_single_site = 'single',
                       anomaly_or_exploratory = 'exploratory',
                       ...)

Then, combine these results into a single table with the different sites delineated in the site column. You will also need to edit the output_function column to reflect that this table now should be considered a Multi Site, Exploratory output.

my_final_results <- my_table1 %>% dplyr::union(my_table2) ... %>%
  dplyr::union(my_table_n) %>%
  dplyr::mutate(output_function = 'ca_ms_exp_cs')

Multi-Site Anomaly Detection Analysis

For anomaly detection analysis, start by executing the same steps as the exploratory analysis. Then, the compute_dist_anomalies and detect_outliers functions, both available through the squba.gen package, should be executed against your results. You will also need to edit the output_function column to reflect that this table now should be considered a Multi Site, Anomaly Detection output. Copy the code below, inputting the table you generated.

var_col should be whichever numerical column produced by the exploratory output that should be used as the target for anomaly detection. The p_value can be selected by the user.

# First execute the compute_dist_anomalies function
df_start <- compute_dist_anomalies(df_tbl = my_table,
                                   grp_vars = c('step_number', 'attrition_step'),
                                   var_col = var_col,
                                   denom_cols = c('step_number', 'attrition_step'))

# Then, use that output as input for the detect_outliers function
df_final <- detect_outliers(df_tbl = df_start,
                            p_input = p_value,
                            column_analysis = var_col,
                            column_variable = c('step_number', 'attrition_step')) %>%
  dplyr::mutate(output_function = 'ca_ms_anom_cs')