Connecting to Your Data • squba

In order to use each of the packages in the SQUBA ecosystem, you must first be able to connect to your CDM formatted data in the database where it is stored. We have included a helper function that will allow you to more easily set up this connection and provide the package with the information it needs to execute the analyses.

initialize_dq_session

initialize_dq_session is the helper function included in the squba.gen package and accessible through all other SQUBA ecosystem packages. This function will intake user-specific database connection information and use these to set up internal configurations that will allow the data quality analyses to run.

We use a framework called argos to facilitate accessing the data, so it is required to establish a connection via this function. This ensures that all internal functions and configurations are appropriately established without a need from the user to directly interact with this framework. More information on argos can be found here.

A standard set up of initialize_dq_session may look something like this:

library(conceptsetdistribution)

initialize_dq_session(session_name = 'vignette_demo',
                      db_conn = dbi_connection / 'path/to/json/config', #choose one
                      is_json = FALSE / TRUE, #choose one
                      working_directory = getwd(),
                      file_subdirectory = 'my_files',
                      cdm_schema = 'my_cdm_schema')

Parameter Walkthrough

session_name

This is an arbitrary string value used to give a name to your session. It is used for internal organization, so that multiple sessions can more easily be differentiated.

db_conn & is_json

These parameters are where the actual database connection information is read into the function. There are two methods that can be used to input the connection details.

Option 1: DBI (or similar)

One option is to create a connection object inside your R session using DBI or a similar database connection package. Instructions for how to use DBI::dbConnect to establish a connection can be found in the DBI package documentation.

If this option is used, is_json should be set to FALSE.

Option 2: External JSON Configuration

Another option is to store your configuration details in a local JSON file and provide the helper function with a path to that file. We have provided a simple example below. As with Option 1, the type of information that would be included in the file differs for different database backends.

{
    "src_name" : "Postgres",
    "src_args" : {
            "host"     : "my.database.server",
            "port"     : 5432,
            "dbname"   : "project_db",
            "username" : "my_username",
            "password" : "my_password",
            "options"  : "-c search_path=my_cdm_schema"
     },
     "post_connect_sql" : [
         "set role project_staff;"
     ]
     
}

If this option is used is_json should be set to TRUE.

working_directory & file_subdirectory

These parameters provide information about your local directory so the analyses know where local files such as concept sets or input configuration files can be accessed. When using a project-oriented workflow in R, getwd() will default to the directory of your project. Otherwise, you will need to point to the directory of interest.

file_subdirectory should be a directory nested inside of the provided working_directory where local files are stored. This ensures that all concept sets are accessible to the code without the user having to explicitly define file paths multiple times throughout the analysis process.

cdm_schema

This should be the name of the schema (or database, depending on your local setup) where the data in the appropriate CDM format (either OMOP or PCORnet) is stored. Any supplemental tables that you may want to access for the analyses should also be stored here

Optional Parameters

This function has some additional parameters that can optionally be specified should you so choose.


initialize_dq_session(session_name = 'vignette_demo',
                      db_conn = dbi_connection / 'path/to/json/config', #choose one
                      is_json = FALSE / TRUE, #choose one
                      working_directory = getwd(),
                      file_subdirectory = 'my_files',
                      cdm_schema = 'my_cdm_schema',
                      results_schema = 'my_results_schema', ***
                      results_tag = '_project_id', ***
                      vocabulary_schema = 'vocabulary', ***
                      cache_enabled = FALSE, ***
                      retain_intermediates = FALSE, ***
                      db_trace = TRUE, ***
                      default_file_output = FALSE, ***
                      results_subdirectory = NULL ***
                      )

results_schema

If you would like to output any of your analysis results back to the database, this string will define the destination schema. The output_tbl function included as part of the argos framework can be used to achieve this.

results_tag

This is an optional string that would be appended on to any tables output back to the database in the defined results_schema. This is intended to assist with uniquely identifying tables for each project/analysis and avoid accidental overwrites.

vocabulary_schema

If you have the OMOP vocabulary tables available on your database and would like to access them as part of the analysis, this is where you would define the schema in which those tables are stored.

This parameter will primarily be used for OMOP database backends, but can be used for PCORnet backends if the same or similar tables are available.

cache_enabled

This parameter will allow you to cache concept sets that are loaded via argos::load_codeset, which is the default method used by squba. This could be beneficial if you are repeatedly using the same codeset across multiple analyses,

retain_intermediates

This parameter controls whether intermediate/temporary tables should be retained on the database. It should be set to TRUE if your system does not permit temporary tables.

If using this feature, make sure to set results_schema to give the program a destination for the retained tables.

db_trace

This parameter, if set to TRUE, will print the SQL query log in the R console for any relevant computations. This is essentially a “verbose” option, controlling how much information you want to see about certain queries being executed.

results_subdirectory

If you choose to use argos::output_tbl to output results, this is the destination that will be used if you choose to output your result table as a file.

default_file_output

If you choose to use argos::output_tbl to output results to your database, this parameter will globally control whether results are also output in CSV file format to the user-specified results directory. This feature can also be invoked at the function call level if that is preferred.