| Title: | Accessing the Peekbank Database and working with Peekbank data |
|---|---|
| Description: | Collection of tools for working with peekbank, an open repository for developmental eye-tracking data. |
| Authors: | Mika Braginsky [aut, cre], Kyle MacDonald [aut], Michael Frank [aut], Linger Xu [aut], Adrian Steffan [aut] |
| Maintainer: | Mika Braginsky <[email protected]> |
| License: | GPL-3 |
| Version: | 0.3.6.2 |
| Built: | 2026-06-03 18:53:20 UTC |
| Source: | https://github.com/langcog/peekbankr |
Adds a relative cdi score indicating the percentage of total achievable points the subject got on each given measure
append_relative_cdi_scores(subjects_table)append_relative_cdi_scores(subjects_table)
subjects_table |
a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "measure", "rawscore" |
the input table with an added "cdi_relative" column that contains the percentage of total points gained in the given administrations
## Not run: cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% append_relative_cdi_scores() ## End(Not run)## Not run: cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% append_relative_cdi_scores() ## End(Not run)
Checks cdi data for inconsistencies, warns about them, and fixes them
cleanup_cdi_data(cdi_data)cleanup_cdi_data(cdi_data)
cdi_data |
a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore" |
a cleaned up version of the cdi data
## Not run: clean_cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% peekbankr::cleanup_cdi_data() ## End(Not run)## Not run: clean_cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% peekbankr::cleanup_cdi_data() ## End(Not run)
Connect to Peekbank
connect_to_peekbank( db_version = "current", db_args = NULL, compress = TRUE, host = NULL, port = NULL, ssl = "auto" )connect_to_peekbank( db_version = "current", db_args = NULL, compress = TRUE, host = NULL, port = NULL, ssl = "auto" )
db_version |
String of the name of database version to use |
db_args |
List with host, user, and password defined |
compress |
Flag to use compression protocol (defaults to TRUE) |
host |
Hostname of the Peekbank server to connect to (defaults hosted PB) |
port |
Port of the Peekbank DB to connect to (defaults to 3307) |
ssl |
How to handle TLS. One of: * '"auto"' (default): verify against the shipped CA for the hosted Peekbank instance; skip TLS for connections to '127.0.0.1', 'localhost', or '::1'; otherwise leave it to the connector defaults. * '"disabled"': do not require TLS (useful for plaintext local servers when running on RMariaDB versions that would otherwise refuse the connection). * Path to a PEM file: use that as the CA to verify the server cert (useful for self-hosted deployments with their own self-signed cert). |
con A DBIConnection object for the peekbank database
## Not run: con <- connect_to_peekbank(db_version = "current", db_args = NULL) DBI::dbDisconnect(con) ## End(Not run)## Not run: con <- connect_to_peekbank(db_version = "current", db_args = NULL) DBI::dbDisconnect(con) ## End(Not run)
Download a list of files from OSF and recreate folder structure locally
download_osf_files( file_paths, osf_node_id = "pr6wu", local_base_dir = "data", debug = F, skip_existing = TRUE, max_retries = 3, retry_delay = 5 )download_osf_files( file_paths, osf_node_id = "pr6wu", local_base_dir = "data", debug = F, skip_existing = TRUE, max_retries = 3, retry_delay = 5 )
file_paths |
A character vector of file paths on OSF to download |
osf_node_id |
The OSF node ID where the files are stored (default: "pr6wu") |
local_base_dir |
Base directory to save files locally (default: here::here("data")) |
debug |
Logical, whether to print debugging information (default: TRUE) |
skip_existing |
Logical, skip downloading a file if a file with that name already exists in that path locally |
max_retries |
Maximum number of retry attempts for server errors (default: 3) |
retry_delay |
Delay in seconds between retry attempts (default: 5) |
returns paths to downloaded files
## Not run: # Download multiple files from OSF download_osf_files( file_paths = c( "lab1/raw_data/file1.csv", "lab2/processed_data/file2.csv" ), osf_node_id = "pr6wu" ) ## End(Not run)## Not run: # Download multiple files from OSF download_osf_files( file_paths = c( "lab1/raw_data/file1.csv", "lab2/processed_data/file2.csv" ), osf_node_id = "pr6wu" ) ## End(Not run)
This function downloads stimulus images for selected Peekbank datasets from OSF. It retrieves stimulus metadata from a Peekbank database connection, constructs the full paths to the stimulus images on OSF, and downloads them to a local directory.
download_stimuli( con, local_base_dir = "stimulus_data", datasets = c(), skip_existing = T, debug = F, max_retries = 3, retry_delay = 5 )download_stimuli( con, local_base_dir = "stimulus_data", datasets = c(), skip_existing = T, debug = F, max_retries = 3, retry_delay = 5 )
con |
A database connection object created by connect_to_peekbank() |
local_base_dir |
Local directory path where stimulus images will be saved (default: "stimulus_data") |
datasets |
Character vector of dataset names to download stimuli for. If empty (default), downloads stimuli for all datasets. |
skip_existing |
skip downloading a file if a file with that name already exists in that path locally |
debug |
show debug prints |
max_retries |
Maximum number of retry attempts for server errors (default: 3) |
retry_delay |
Delay in seconds between retry attempts (default: 5) |
Returns the stimulus df with an additional column for the paths of the downloaded stimuli
## Not run: con <- connect_to_peekbank("2025.1") # Download stimuli for all datasets download_stimuli(con, local_base_dir = "stimulus_data") # Download stimuli for specific datasets download_stimuli(con, local_base_dir = "stimulus_data", datasets = c("reflook_v4", "reflook_socword")) ## End(Not run)## Not run: con <- connect_to_peekbank("2025.1") # Download stimuli for all datasets download_stimuli(con, local_base_dir = "stimulus_data") # Download stimuli for specific datasets download_stimuli(con, local_base_dir = "stimulus_data", datasets = c("reflook_v4", "reflook_socword")) ## End(Not run)
Add AOIs to an xy dataframe
ds.add_aois(xy_joined)ds.add_aois(xy_joined)
xy_joined |
dataframe containing processed xy timepoints with aoi region sets information |
dataframe with two added columns 'side' and 'aoi'. 'side' only contains "left" or "right" value 'aoi' indicates whether this xy timepoint is looking to "target" or "distractor"
Fetching the list of field names and requirements in each table according to the schema json file
ds.get_json_fields(table_type)ds.get_json_fields(table_type)
table_type |
the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables() |
the list of field names
## Not run: fields_json <-ds.get_json_fields(table_type = "aoi_timepoints") ## End(Not run)## Not run: fields_json <-ds.get_json_fields(table_type = "aoi_timepoints") ## End(Not run)
parse json file from peekbank github into a dataframe
ds.get_peekjson()ds.get_peekjson()
the organized dataframe from schema json file
## Not run: peekjson <-ds.get_peekjson() ## End(Not run)## Not run: peekjson <-ds.get_peekjson() ## End(Not run)
Download peekbank processed dataset from OSF
ds.get_processed_data(lab_dataset_id, path = ".", osf_address = "pr6wu")ds.get_processed_data(lab_dataset_id, path = ".", osf_address = "pr6wu")
lab_dataset_id |
Specific ID occurring in the file hierarchy of the relevant OSF repo. |
path |
Where you want it on your own machine. Will error if directory doesn't exist. |
osf_address |
pr6wu for peekbank. |
Download specific peekbank dataset from OSF
ds.get_raw_data(lab_dataset_id, path = ".", osf_address = "pr6wu")ds.get_raw_data(lab_dataset_id, path = ".", osf_address = "pr6wu")
lab_dataset_id |
Specific ID occurring in the file hierarchy of the relevant OSF repo. |
path |
Where you want it on your own machine. Will error if directory doesn't exist. |
osf_address |
pr6wu for peekbank. |
Check if a certain table is required according to schema
ds.is_table_required(table_type, coding_methods)ds.is_table_required(table_type, coding_methods)
table_type |
the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables() |
coding_methods |
methods used in the experiment for coding gaze data, to get the list of current coding methods, please use function ds.list_coding_methods() |
A boolean value
## Not run: is_required <-ds.is_table_required(table_type = "xy_timepoints", coding_method = "manual gaze coding") ## End(Not run)## Not run: is_required <-ds.is_table_required(table_type = "xy_timepoints", coding_method = "manual gaze coding") ## End(Not run)
Get the coding method list from json schema file
ds.list_coding_methods()ds.list_coding_methods()
a list of strings indicating allowed coding methods
## Not run: coding_methods <-ds.list_coding_methods() ## End(Not run)## Not run: coding_methods <-ds.list_coding_methods() ## End(Not run)
List the tables required based on coding method
ds.list_ds_tables(coding_methods = c("eyetracking"))ds.list_ds_tables(coding_methods = c("eyetracking"))
coding_methods |
a list of strings indicating the methods used in the experiment for coding gaze data, to get the list of current coding methods, please use functionds.list_coding_methods() |
a list of table types that are required based on input coding method
## Not run: table_list <-ds.list_ds_tables(coding_methods = "manual gaze coding") ## End(Not run)## Not run: table_list <-ds.list_ds_tables(coding_methods = "manual gaze coding") ## End(Not run)
List current allowed language choices for db import
ds.list_language_choices()ds.list_language_choices()
a list of strings containing all the allowed language codes based on json schema file
## Not run: language_list <-ds.list_language_choices() ## End(Not run)## Not run: language_list <-ds.list_language_choices() ## End(Not run)
Function for mapping raw data columns to processed table columns
ds.map_columns(raw_data, raw_format, table_type)ds.map_columns(raw_data, raw_format, table_type)
raw_data |
raw data frame |
raw_format |
source of the eye-tracking data, e.g. "tobii" |
table_type |
type of processed table, e.g. "xy_data" | "aoi_table" |
processed data frame with specified column names
## Not run: df_xy_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii", table_type = "xy_data") df_aoi_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii", table_type = "aoi_data") ## End(Not run)## Not run: df_xy_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii", table_type = "xy_data") df_aoi_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii", table_type = "aoi_data") ## End(Not run)
sets the starting point of a given trial to be zero
ds.normalize_times(df_table)ds.normalize_times(df_table)
df_table |
to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id |
df_out with resampled time, xy or aoi value rows
Put processed data for specific peekbank dataset on OSF
ds.put_processed_data(token, dataset_name, path = ".", osf_address = "pr6wu")ds.put_processed_data(token, dataset_name, path = ".", osf_address = "pr6wu")
token |
personal access tokens for uploading to OSF |
dataset_name |
Specific dataset name occurring in the file hierarchy of the relevant OSF repo. |
path |
Where the data live on your own machine. |
osf_address |
pr6wu for peekbank. |
Resampling is done by the following steps:
ds.resample_times(df_table, table_type)ds.resample_times(df_table, table_type)
df_table |
to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id |
table_type |
table name, can only be "aoi_timepoints" or "xy_timepoints" |
1. iterate through every trial for every administration
2. create desired timepoint sequence with equal spacing according to pre-specified SAMPLE_RATE parameter
3. use approxfun to interpolate given data points to align with desired timepoint sequence "constant" interpolation method is used for AOI timepoints; "linear" interpolation method is used for xy timepoints; for more details on approxfun, please see: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/approxfun.html
4. after resampling, bind resampled dataframes back together and re-assign aoi_timepoint_id
df_out with resampled time, xy or aoi value rows
## Not run: dir_datasets <- "testdataset" # local datasets dir lab_dataset_id <- "pomper_saffran_2016" dir_csv <- file.path(dir_datasets, lab_dataset_id, "processed_data") table_type <- "aoi_timepoints" file_csv <- file.path(dir_csv, paste0(table_type, '.csv')) df_table <- utils::read.csv(file_csv) df_resampled <-ds.resample_times(df_table, table_type = "aoi_timepoints") ## End(Not run)## Not run: dir_datasets <- "testdataset" # local datasets dir lab_dataset_id <- "pomper_saffran_2016" dir_csv <- file.path(dir_datasets, lab_dataset_id, "processed_data") table_type <- "aoi_timepoints" file_csv <- file.path(dir_csv, paste0(table_type, '.csv')) df_table <- utils::read.csv(file_csv) df_resampled <-ds.resample_times(df_table, table_type = "aoi_timepoints") ## End(Not run)
sets the starting point of a given trial to be zero
ds.rezero_times(df_table)ds.rezero_times(df_table)
df_table |
to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id |
df_out with resampled time, xy or aoi value rows
check all csv files against database schema for database import
ds.validate_for_db_import( dir_csv, cdi_expected, file_ext = ".csv", is_null_field_required = TRUE, suppress_warnings = c() )ds.validate_for_db_import( dir_csv, cdi_expected, file_ext = ".csv", is_null_field_required = TRUE, suppress_warnings = c() )
dir_csv |
the folder directory containing all the csv files, the path should end in "processed_data" |
cdi_expected |
specifies whether cdi_data is to be expected to be present in the imported data |
file_ext |
the default is ".csv" |
is_null_field_required |
by default is set to TRUE which means that all the columns in the json file are required; when set to FALSE, fields that are allowed null values are not required |
suppress_warnings |
character vector of warning IDs to silence.
Currently supported: |
A list with two elements:
Character vector of validation errors (blocking), or NULL if none.
Character vector of validation warnings (suppressible), or NULL if none.
Warnings matching suppress_warnings are excluded.
## Not run: result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE) result$errors # blocking issues result$warnings # warnings that can be opted out of on a case by case basis # suppress known warnings for a specific dataset result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE, suppress_warnings = c("cdi_collision")) ## End(Not run)## Not run: result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE) result$errors # blocking issues result$warnings # warnings that can be opted out of on a case by case basis # suppress known warnings for a specific dataset result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE, suppress_warnings = c("cdi_collision")) ## End(Not run)
Check if a dataframe/table is compliant to peekbank json before database import
ds.validate_table( df_table, table_type, cdi_expected, dir_csv, is_null_field_required = TRUE )ds.validate_table( df_table, table_type, cdi_expected, dir_csv, is_null_field_required = TRUE )
df_table |
the dataframe to be saved |
table_type |
the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables() |
cdi_expected |
specifies whether cdi_data is to be expected to be present in the imported data; only relevant for subjects table |
dir_csv |
the folder directory containing all the csv files, used for stimulus image path validation |
is_null_field_required |
by default is set to TRUE which means that all the columns in the json file are required; when user specifically sets this to FALSE, then the fields that are allowed null values are not required. |
A list with two elements:
Character vector of validation errors (blocking), or NULL if none.
Named list of validation warnings (suppressable). Names are warning IDs
(e.g. "cdi_collision").
## Not run: result <- ds.validate_table(df_table = df_table, table_type = "xy_data", cdi_expected = F, dir_csv = "../processed_data") result$errors # blocking issues result$warnings # warnings that can be opted out of on a case by case basis ## End(Not run)## Not run: result <- ds.validate_table(df_table = df_table, table_type = "xy_data", cdi_expected = F, dir_csv = "../processed_data") result$errors # blocking issues result$warnings # warnings that can be opted out of on a case by case basis ## End(Not run)
Check if within aoi_timepoints table, there is no duplication in all the administration_ids associated with each individual trial_id
ds.validate_trial_uniqueness_constraint(df_aoi_timepoints)ds.validate_trial_uniqueness_constraint(df_aoi_timepoints)
df_aoi_timepoints |
the aoi_timepoints dataframe |
an empty string when all the administration_ids are unique within each trial_id; Otherwise, the error message will be returned.
## Not run: is_valid <-ds.validate_trial_uniqueness_constraint(df_aoi_timepoints = aoi_timepoints) ## End(Not run)## Not run: is_valid <-ds.validate_trial_uniqueness_constraint(df_aoi_timepoints = aoi_timepoints) ## End(Not run)
Check if a file exists with exact case sensitivity
file.exists.case.sensitive(...)file.exists.case.sensitive(...)
... |
character vectors, containing file paths |
logical value: TRUE if the file exists with the exact same case, FALSE otherwise
## Not run: exists <- file.exists.case.sensitive("path/to/image.jpg") ## End(Not run)## Not run: exists <- file.exists.case.sensitive("path/to/image.jpg") ## End(Not run)
Get administrations
get_administrations( age = NULL, dataset_id = NULL, dataset_name = NULL, connection = NULL )get_administrations( age = NULL, dataset_id = NULL, dataset_name = NULL, connection = NULL )
age |
A numeric vector of a single age or a min age and max age (inclusive), in months |
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
connection |
A connection to the peekbank database |
A 'tbl' of Administrations data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_administrations() get_administrations(age = c()) get_administrations(dataset_name = "pomper_saffran_2016") ## End(Not run)## Not run: get_administrations() get_administrations(age = c()) get_administrations(dataset_name = "pomper_saffran_2016") ## End(Not run)
Get AOI region sets
get_aoi_region_sets(connection = NULL)get_aoi_region_sets(connection = NULL)
connection |
A connection to the peekbank database |
A 'tbl' of AOI Region Sets data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_aoi_region_sets() ## End(Not run)## Not run: get_aoi_region_sets() ## End(Not run)
Get AOI timepoints
get_aoi_timepoints( dataset_id = NULL, dataset_name = NULL, age = NULL, rle = TRUE, connection = NULL )get_aoi_timepoints( dataset_id = NULL, dataset_name = NULL, age = NULL, rle = TRUE, connection = NULL )
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
age |
A numeric vector of a single age or a min age and max age (inclusive), in months |
rle |
Logical indicating whether to use RLE data representation or not |
connection |
A connection to the peekbank database |
A 'tbl' of AOI Timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_aoi_timepoints(dataset_name = "pomper_saffran_2016") ## End(Not run)## Not run: get_aoi_timepoints(dataset_name = "pomper_saffran_2016") ## End(Not run)
Get datasets
get_datasets(connection = NULL)get_datasets(connection = NULL)
connection |
A connection to the peekbank database |
A 'tbl' of Datasets data. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_datasets() ## End(Not run)## Not run: get_datasets() ## End(Not run)
Get information on database connection options
get_db_info()get_db_info()
List of database info: host name, current version, supported versions, historical versions, username, password
## Not run: get_db_info() ## End(Not run)## Not run: get_db_info() ## End(Not run)
Downloads README files for Peekbank datasets from OSF. Note that READMEs always reflect the latest version of the dataset on OSF.
get_readmes(datasets = c(), local_base_dir = "dataset_readmes")get_readmes(datasets = c(), local_base_dir = "dataset_readmes")
datasets |
Character vector of dataset names. If empty (default), downloads READMEs for all datasets. |
local_base_dir |
Directory to save README files to (default: "dataset_readmes") |
No return value, called for side effects. README files are saved to the specified directory and the path is printed via message.
## Not run: get_readmes() get_readmes(datasets = c("pomper_saffran_2016")) ## End(Not run)## Not run: get_readmes() get_readmes(datasets = c("pomper_saffran_2016")) ## End(Not run)
Run a SQL Query script on the Peekbank database
get_sql_query(sql_query_string, connection = NULL)get_sql_query(sql_query_string, connection = NULL)
sql_query_string |
A valid sql query string character |
connection |
A connection to the Peekbank database |
The database after calling the supplied SQL query
## Not run: get_sql_query("SELECT * FROM datasets") ## End(Not run)## Not run: get_sql_query("SELECT * FROM datasets") ## End(Not run)
Get stimuli
get_stimuli(dataset_id = NULL, dataset_name = NULL, connection = NULL)get_stimuli(dataset_id = NULL, dataset_name = NULL, connection = NULL)
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
connection |
A connection to the peekbank database |
A 'tbl' of Stimuli data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_stimuli() get_stimuli(dataset_name = "pomper_saffran_2016") ## End(Not run)## Not run: get_stimuli() get_stimuli(dataset_name = "pomper_saffran_2016") ## End(Not run)
Get subjects
get_subjects(connection = NULL)get_subjects(connection = NULL)
connection |
A connection to the peekbank database |
A 'tbl' of Subjects data. Note that Subjects is a table used to link longitudinal Administrations, which is the primary table you probably want. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_subjects() ## End(Not run)## Not run: get_subjects() ## End(Not run)
Get trial types
get_trial_types(dataset_id = NULL, dataset_name = NULL, connection = NULL)get_trial_types(dataset_id = NULL, dataset_name = NULL, connection = NULL)
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
connection |
A connection to the peekbank database |
A 'tbl' of Trial Types data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_trial_types() get_trial_types(dataset_name = "pomper_saffran_2016") ## End(Not run)## Not run: get_trial_types() get_trial_types(dataset_name = "pomper_saffran_2016") ## End(Not run)
Get trials
get_trials(dataset_id = NULL, dataset_name = NULL, connection = NULL)get_trials(dataset_id = NULL, dataset_name = NULL, connection = NULL)
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
connection |
A connection to the peekbank database |
A 'tbl' of Trials data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_trials() get_trials(dataset_name = "pomper_saffran_2016") ## End(Not run)## Not run: get_trials() get_trials(dataset_name = "pomper_saffran_2016") ## End(Not run)
Get XY timepoints
get_xy_timepoints( dataset_id = NULL, dataset_name = NULL, age = NULL, connection = NULL )get_xy_timepoints( dataset_id = NULL, dataset_name = NULL, age = NULL, connection = NULL )
dataset_id |
An integer vector of one or more dataset ids |
dataset_name |
A character vector of one or more dataset names |
age |
A numeric vector of a single age or a min age and max age (inclusive), in months |
connection |
A connection to the peekbank database |
A 'tbl' of XY timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.
## Not run: get_xy_timepoints(dataset_name = "reflook_v4") ## End(Not run)## Not run: get_xy_timepoints(dataset_name = "reflook_v4") ## End(Not run)
List of peekbank tables
list_peekbank_tables(connection)list_peekbank_tables(connection)
connection |
A connection to the peekbank database |
A vector of the names of tables in peekbank
## Not run: con <- connect_to_peekbank() list_peekbank_tables(con) ## End(Not run)## Not run: con <- connect_to_peekbank() list_peekbank_tables(con) ## End(Not run)
Populate the provided cdi data with percentile values for that specific age, instrument_type, measure and language. Loosely based on the work from this repo https://github.com/kachergis/cdi-percentiles/tree/main by George Kachergis and Jess Mankewitz with advice from Virginia Marchman.
populate_cdi_percentiles(subjects_table)populate_cdi_percentiles(subjects_table)
subjects_table |
a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore" |
the input table with added columns containing the reference age used, the reference year used, and both gender specific and general percentile values for the cdi score
## Not run: full_cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% peekbankr::cleanup_cdi_data() %>% peekbankr::populate_cdi_percentiles() ## End(Not run)## Not run: full_cdi_data <- all_subjects %>% unnest(subject_aux_data) %>% filter(!is.na(cdi_responses)) %>% unnest(cdi_responses) %>% peekbankr::cleanup_cdi_data() %>% peekbankr::populate_cdi_percentiles() ## End(Not run)
Unpack the json sting in the *_aux_data column and turns it into a nested R list
unpack_aux_data(df)unpack_aux_data(df)
df |
a dataframe in the peekbank format that has an aux data column |
the input dataframe, with the *_aux_data column unpacked
## Not run: subjects_table <- unpack_aux_data(df = subjects_table) ## End(Not run)## Not run: subjects_table <- unpack_aux_data(df = subjects_table) ## End(Not run)