Package 'peekbankr'

Title: Accessing the Peekbank Database and working with Peekbank data
Description: Collection of tools for working with peekbank, an open repository for developmental eye-tracking data.
Authors: Mika Braginsky [aut, cre], Kyle MacDonald [aut], Michael Frank [aut], Linger Xu [aut], Adrian Steffan [aut]
Maintainer: Mika Braginsky <[email protected]>
License: GPL-3
Version: 0.3.6.2
Built: 2026-06-03 18:53:20 UTC
Source: https://github.com/langcog/peekbankr

Help Index


Adds a relative cdi score indicating the percentage of total achievable points the subject got on each given measure

Description

Adds a relative cdi score indicating the percentage of total achievable points the subject got on each given measure

Usage

append_relative_cdi_scores(subjects_table)

Arguments

subjects_table

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "measure", "rawscore"

Value

the input table with an added "cdi_relative" column that contains the percentage of total points gained in the given administrations

Examples

## Not run: 
cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  append_relative_cdi_scores()

## End(Not run)

Checks cdi data for inconsistencies, warns about them, and fixes them

Description

Checks cdi data for inconsistencies, warns about them, and fixes them

Usage

cleanup_cdi_data(cdi_data)

Arguments

cdi_data

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore"

Value

a cleaned up version of the cdi data

Examples

## Not run: 
clean_cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  peekbankr::cleanup_cdi_data()

## End(Not run)

Connect to Peekbank

Description

Connect to Peekbank

Usage

connect_to_peekbank(
  db_version = "current",
  db_args = NULL,
  compress = TRUE,
  host = NULL,
  port = NULL,
  ssl = "auto"
)

Arguments

db_version

String of the name of database version to use

db_args

List with host, user, and password defined

compress

Flag to use compression protocol (defaults to TRUE)

host

Hostname of the Peekbank server to connect to (defaults hosted PB)

port

Port of the Peekbank DB to connect to (defaults to 3307)

ssl

How to handle TLS. One of: * '"auto"' (default): verify against the shipped CA for the hosted Peekbank instance; skip TLS for connections to '127.0.0.1', 'localhost', or '::1'; otherwise leave it to the connector defaults. * '"disabled"': do not require TLS (useful for plaintext local servers when running on RMariaDB versions that would otherwise refuse the connection). * Path to a PEM file: use that as the CA to verify the server cert (useful for self-hosted deployments with their own self-signed cert).

Value

con A DBIConnection object for the peekbank database

Examples

## Not run: 
con <- connect_to_peekbank(db_version = "current", db_args = NULL)
DBI::dbDisconnect(con)

## End(Not run)

Download a list of files from OSF and recreate folder structure locally

Description

Download a list of files from OSF and recreate folder structure locally

Usage

download_osf_files(
  file_paths,
  osf_node_id = "pr6wu",
  local_base_dir = "data",
  debug = F,
  skip_existing = TRUE,
  max_retries = 3,
  retry_delay = 5
)

Arguments

file_paths

A character vector of file paths on OSF to download

osf_node_id

The OSF node ID where the files are stored (default: "pr6wu")

local_base_dir

Base directory to save files locally (default: here::here("data"))

debug

Logical, whether to print debugging information (default: TRUE)

skip_existing

Logical, skip downloading a file if a file with that name already exists in that path locally

max_retries

Maximum number of retry attempts for server errors (default: 3)

retry_delay

Delay in seconds between retry attempts (default: 5)

Value

returns paths to downloaded files

Examples

## Not run: 
# Download multiple files from OSF
download_osf_files(
  file_paths = c(
    "lab1/raw_data/file1.csv",
    "lab2/processed_data/file2.csv"
  ),
  osf_node_id = "pr6wu"
)

## End(Not run)

Download stimulus images from OSF for Peekbank repository

Description

This function downloads stimulus images for selected Peekbank datasets from OSF. It retrieves stimulus metadata from a Peekbank database connection, constructs the full paths to the stimulus images on OSF, and downloads them to a local directory.

Usage

download_stimuli(
  con,
  local_base_dir = "stimulus_data",
  datasets = c(),
  skip_existing = T,
  debug = F,
  max_retries = 3,
  retry_delay = 5
)

Arguments

con

A database connection object created by connect_to_peekbank()

local_base_dir

Local directory path where stimulus images will be saved (default: "stimulus_data")

datasets

Character vector of dataset names to download stimuli for. If empty (default), downloads stimuli for all datasets.

skip_existing

skip downloading a file if a file with that name already exists in that path locally

debug

show debug prints

max_retries

Maximum number of retry attempts for server errors (default: 3)

retry_delay

Delay in seconds between retry attempts (default: 5)

Value

Returns the stimulus df with an additional column for the paths of the downloaded stimuli

Examples

## Not run: 
con <- connect_to_peekbank("2025.1")

# Download stimuli for all datasets
download_stimuli(con, local_base_dir = "stimulus_data")

# Download stimuli for specific datasets
download_stimuli(con, local_base_dir = "stimulus_data", datasets = c("reflook_v4", "reflook_socword"))

## End(Not run)

Add AOIs to an xy dataframe

Description

Add AOIs to an xy dataframe

Usage

ds.add_aois(xy_joined)

Arguments

xy_joined

dataframe containing processed xy timepoints with aoi region sets information

Value

dataframe with two added columns 'side' and 'aoi'. 'side' only contains "left" or "right" value 'aoi' indicates whether this xy timepoint is looking to "target" or "distractor"


Fetching the list of field names and requirements in each table according to the schema json file

Description

Fetching the list of field names and requirements in each table according to the schema json file

Usage

ds.get_json_fields(table_type)

Arguments

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

Value

the list of field names

Examples

## Not run: 
fields_json <-ds.get_json_fields(table_type = "aoi_timepoints")

## End(Not run)

parse json file from peekbank github into a dataframe

Description

parse json file from peekbank github into a dataframe

Usage

ds.get_peekjson()

Value

the organized dataframe from schema json file

Examples

## Not run: 
peekjson <-ds.get_peekjson()

## End(Not run)

Download peekbank processed dataset from OSF

Description

Download peekbank processed dataset from OSF

Usage

ds.get_processed_data(lab_dataset_id, path = ".", osf_address = "pr6wu")

Arguments

lab_dataset_id

Specific ID occurring in the file hierarchy of the relevant OSF repo.

path

Where you want it on your own machine. Will error if directory doesn't exist.

osf_address

pr6wu for peekbank.


Download specific peekbank dataset from OSF

Description

Download specific peekbank dataset from OSF

Usage

ds.get_raw_data(lab_dataset_id, path = ".", osf_address = "pr6wu")

Arguments

lab_dataset_id

Specific ID occurring in the file hierarchy of the relevant OSF repo.

path

Where you want it on your own machine. Will error if directory doesn't exist.

osf_address

pr6wu for peekbank.


Check if a certain table is required according to schema

Description

Check if a certain table is required according to schema

Usage

ds.is_table_required(table_type, coding_methods)

Arguments

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

coding_methods

methods used in the experiment for coding gaze data, to get the list of current coding methods, please use function ds.list_coding_methods()

Value

A boolean value

Examples

## Not run: 
is_required <-ds.is_table_required(table_type = "xy_timepoints",
                                 coding_method = "manual gaze coding")

## End(Not run)

Get the coding method list from json schema file

Description

Get the coding method list from json schema file

Usage

ds.list_coding_methods()

Value

a list of strings indicating allowed coding methods

Examples

## Not run: 
coding_methods <-ds.list_coding_methods()

## End(Not run)

List the tables required based on coding method

Description

List the tables required based on coding method

Usage

ds.list_ds_tables(coding_methods = c("eyetracking"))

Arguments

coding_methods

a list of strings indicating the methods used in the experiment for coding gaze data, to get the list of current coding methods, please use functionds.list_coding_methods()

Value

a list of table types that are required based on input coding method

Examples

## Not run: 
table_list <-ds.list_ds_tables(coding_methods = "manual gaze coding")

## End(Not run)

List current allowed language choices for db import

Description

List current allowed language choices for db import

Usage

ds.list_language_choices()

Value

a list of strings containing all the allowed language codes based on json schema file

Examples

## Not run: 
language_list <-ds.list_language_choices()

## End(Not run)

Function for mapping raw data columns to processed table columns

Description

Function for mapping raw data columns to processed table columns

Usage

ds.map_columns(raw_data, raw_format, table_type)

Arguments

raw_data

raw data frame

raw_format

source of the eye-tracking data, e.g. "tobii"

table_type

type of processed table, e.g. "xy_data" | "aoi_table"

Value

processed data frame with specified column names

Examples

## Not run: 
df_xy_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii",
                          table_type = "xy_data")
df_aoi_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii",
                           table_type = "aoi_data")

## End(Not run)

sets the starting point of a given trial to be zero

Description

sets the starting point of a given trial to be zero

Usage

ds.normalize_times(df_table)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

Value

df_out with resampled time, xy or aoi value rows


Put processed data for specific peekbank dataset on OSF

Description

Put processed data for specific peekbank dataset on OSF

Usage

ds.put_processed_data(token, dataset_name, path = ".", osf_address = "pr6wu")

Arguments

token

personal access tokens for uploading to OSF

dataset_name

Specific dataset name occurring in the file hierarchy of the relevant OSF repo.

path

Where the data live on your own machine.

osf_address

pr6wu for peekbank.


This function resample times to be consistent across labs.

Description

Resampling is done by the following steps:

Usage

ds.resample_times(df_table, table_type)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

table_type

table name, can only be "aoi_timepoints" or "xy_timepoints"

Details

1. iterate through every trial for every administration

2. create desired timepoint sequence with equal spacing according to pre-specified SAMPLE_RATE parameter

3. use approxfun to interpolate given data points to align with desired timepoint sequence "constant" interpolation method is used for AOI timepoints; "linear" interpolation method is used for xy timepoints; for more details on approxfun, please see: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/approxfun.html

4. after resampling, bind resampled dataframes back together and re-assign aoi_timepoint_id

Value

df_out with resampled time, xy or aoi value rows

Examples

## Not run: 
dir_datasets <- "testdataset" # local datasets dir
lab_dataset_id <- "pomper_saffran_2016"
dir_csv <- file.path(dir_datasets, lab_dataset_id, "processed_data")
table_type <- "aoi_timepoints"
file_csv <- file.path(dir_csv, paste0(table_type, '.csv'))
df_table <- utils::read.csv(file_csv)
df_resampled <-ds.resample_times(df_table, table_type = "aoi_timepoints")

## End(Not run)

sets the starting point of a given trial to be zero

Description

sets the starting point of a given trial to be zero

Usage

ds.rezero_times(df_table)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

Value

df_out with resampled time, xy or aoi value rows


check all csv files against database schema for database import

Description

check all csv files against database schema for database import

Usage

ds.validate_for_db_import(
  dir_csv,
  cdi_expected,
  file_ext = ".csv",
  is_null_field_required = TRUE,
  suppress_warnings = c()
)

Arguments

dir_csv

the folder directory containing all the csv files, the path should end in "processed_data"

cdi_expected

specifies whether cdi_data is to be expected to be present in the imported data

file_ext

the default is ".csv"

is_null_field_required

by default is set to TRUE which means that all the columns in the json file are required; when set to FALSE, fields that are allowed null values are not required

suppress_warnings

character vector of warning IDs to silence. Currently supported: "cdi_collision".

Value

A list with two elements:

errors

Character vector of validation errors (blocking), or NULL if none.

warnings

Character vector of validation warnings (suppressible), or NULL if none. Warnings matching suppress_warnings are excluded.

Examples

## Not run: 
result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE)
result$errors    # blocking issues
result$warnings  # warnings that can be opted out of on a case by case basis

# suppress known warnings for a specific dataset
result <- ds.validate_for_db_import(dir_csv = "./processed_data", cdi_expected = TRUE,
                                    suppress_warnings = c("cdi_collision"))

## End(Not run)

Check if a dataframe/table is compliant to peekbank json before database import

Description

Check if a dataframe/table is compliant to peekbank json before database import

Usage

ds.validate_table(
  df_table,
  table_type,
  cdi_expected,
  dir_csv,
  is_null_field_required = TRUE
)

Arguments

df_table

the dataframe to be saved

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

cdi_expected

specifies whether cdi_data is to be expected to be present in the imported data; only relevant for subjects table

dir_csv

the folder directory containing all the csv files, used for stimulus image path validation

is_null_field_required

by default is set to TRUE which means that all the columns in the json file are required; when user specifically sets this to FALSE, then the fields that are allowed null values are not required.

Value

A list with two elements:

errors

Character vector of validation errors (blocking), or NULL if none.

warnings

Named list of validation warnings (suppressable). Names are warning IDs (e.g. "cdi_collision").

Examples

## Not run: 
result <- ds.validate_table(df_table = df_table, table_type = "xy_data", cdi_expected = F, dir_csv = "../processed_data")
result$errors    # blocking issues
result$warnings  # warnings that can be opted out of on a case by case basis

## End(Not run)

Check if within aoi_timepoints table, there is no duplication in all the administration_ids associated with each individual trial_id

Description

Check if within aoi_timepoints table, there is no duplication in all the administration_ids associated with each individual trial_id

Usage

ds.validate_trial_uniqueness_constraint(df_aoi_timepoints)

Arguments

df_aoi_timepoints

the aoi_timepoints dataframe

Value

an empty string when all the administration_ids are unique within each trial_id; Otherwise, the error message will be returned.

Examples

## Not run: 
is_valid <-ds.validate_trial_uniqueness_constraint(df_aoi_timepoints = aoi_timepoints)

## End(Not run)

Check if a file exists with exact case sensitivity

Description

Check if a file exists with exact case sensitivity

Usage

file.exists.case.sensitive(...)

Arguments

...

character vectors, containing file paths

Value

logical value: TRUE if the file exists with the exact same case, FALSE otherwise

Examples

## Not run: 
exists <- file.exists.case.sensitive("path/to/image.jpg")

## End(Not run)

Get administrations

Description

Get administrations

Usage

get_administrations(
  age = NULL,
  dataset_id = NULL,
  dataset_name = NULL,
  connection = NULL
)

Arguments

age

A numeric vector of a single age or a min age and max age (inclusive), in months

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Administrations data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_administrations()
get_administrations(age = c())
get_administrations(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get AOI region sets

Description

Get AOI region sets

Usage

get_aoi_region_sets(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of AOI Region Sets data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_aoi_region_sets()

## End(Not run)

Get AOI timepoints

Description

Get AOI timepoints

Usage

get_aoi_timepoints(
  dataset_id = NULL,
  dataset_name = NULL,
  age = NULL,
  rle = TRUE,
  connection = NULL
)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

age

A numeric vector of a single age or a min age and max age (inclusive), in months

rle

Logical indicating whether to use RLE data representation or not

connection

A connection to the peekbank database

Value

A 'tbl' of AOI Timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_aoi_timepoints(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get datasets

Description

Get datasets

Usage

get_datasets(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of Datasets data. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_datasets()

## End(Not run)

Get information on database connection options

Description

Get information on database connection options

Usage

get_db_info()

Value

List of database info: host name, current version, supported versions, historical versions, username, password

Examples

## Not run: 
get_db_info()

## End(Not run)

Download dataset README files from OSF to a temporary folder

Description

Downloads README files for Peekbank datasets from OSF. Note that READMEs always reflect the latest version of the dataset on OSF.

Usage

get_readmes(datasets = c(), local_base_dir = "dataset_readmes")

Arguments

datasets

Character vector of dataset names. If empty (default), downloads READMEs for all datasets.

local_base_dir

Directory to save README files to (default: "dataset_readmes")

Value

No return value, called for side effects. README files are saved to the specified directory and the path is printed via message.

Examples

## Not run: 
get_readmes()
get_readmes(datasets = c("pomper_saffran_2016"))

## End(Not run)

Run a SQL Query script on the Peekbank database

Description

Run a SQL Query script on the Peekbank database

Usage

get_sql_query(sql_query_string, connection = NULL)

Arguments

sql_query_string

A valid sql query string character

connection

A connection to the Peekbank database

Value

The database after calling the supplied SQL query

Examples

## Not run: 
get_sql_query("SELECT * FROM datasets")

## End(Not run)

Get stimuli

Description

Get stimuli

Usage

get_stimuli(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Stimuli data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_stimuli()
get_stimuli(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get subjects

Description

Get subjects

Usage

get_subjects(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of Subjects data. Note that Subjects is a table used to link longitudinal Administrations, which is the primary table you probably want. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_subjects()

## End(Not run)

Get trial types

Description

Get trial types

Usage

get_trial_types(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Trial Types data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_trial_types()
get_trial_types(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get trials

Description

Get trials

Usage

get_trials(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Trials data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_trials()
get_trials(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get XY timepoints

Description

Get XY timepoints

Usage

get_xy_timepoints(
  dataset_id = NULL,
  dataset_name = NULL,
  age = NULL,
  connection = NULL
)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

age

A numeric vector of a single age or a min age and max age (inclusive), in months

connection

A connection to the peekbank database

Value

A 'tbl' of XY timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_xy_timepoints(dataset_name = "reflook_v4")

## End(Not run)

List of peekbank tables

Description

List of peekbank tables

Usage

list_peekbank_tables(connection)

Arguments

connection

A connection to the peekbank database

Value

A vector of the names of tables in peekbank

Examples

## Not run: 
con <- connect_to_peekbank()
list_peekbank_tables(con)

## End(Not run)

Populate the provided cdi data with percentile values for that specific age, instrument_type, measure and language. Loosely based on the work from this repo https://github.com/kachergis/cdi-percentiles/tree/main by George Kachergis and Jess Mankewitz with advice from Virginia Marchman.

Description

Populate the provided cdi data with percentile values for that specific age, instrument_type, measure and language. Loosely based on the work from this repo https://github.com/kachergis/cdi-percentiles/tree/main by George Kachergis and Jess Mankewitz with advice from Virginia Marchman.

Usage

populate_cdi_percentiles(subjects_table)

Arguments

subjects_table

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore"

Value

the input table with added columns containing the reference age used, the reference year used, and both gender specific and general percentile values for the cdi score

Examples

## Not run: 
full_cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  peekbankr::cleanup_cdi_data() %>%
  peekbankr::populate_cdi_percentiles()

## End(Not run)

Unpack the json sting in the *_aux_data column and turns it into a nested R list

Description

Unpack the json sting in the *_aux_data column and turns it into a nested R list

Usage

unpack_aux_data(df)

Arguments

df

a dataframe in the peekbank format that has an aux data column

Value

the input dataframe, with the *_aux_data column unpacked

Examples

## Not run: 
subjects_table <- unpack_aux_data(df = subjects_table)

## End(Not run)