Package 'peekbankr'

Title: Accessing the Peekbank Database and working with Peekbank data
Description: Collection of tools for working with peekbank, an open repository for developmental eye-tracking data.
Authors: Mika Braginsky [aut, cre], Kyle MacDonald [aut], Michael Frank [aut]
Maintainer: Mika Braginsky <[email protected]>
License: GPL-3
Version: 0.2.3.3
Built: 2025-03-04 06:19:38 UTC
Source: https://github.com/langcog/peekbankr

Help Index


Adds a relative cdi score indicating the percentage of total achievable points the subject got on each given measure

Description

Adds a relative cdi score indicating the percentage of total achievable points the subject got on each given measure

Usage

append_relative_cdi_scores(subjects_table)

Arguments

subjects_table

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "measure", "rawscore"

Value

the input table with an added "cdi_relative" column that contains the percentage of total points gained in the given administrations

Examples

cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  append_relative_cdi_scores()

Checks cdi data for inconsistencies, warns about them, and fixes them

Description

Checks cdi data for inconsistencies, warns about them, and fixes them

Usage

cleanup_cdi_data(cdi_data)

Arguments

cdi_data

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore"

Value

a cleaned up version of the cdi data

Examples

clean_cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  peekbankr::cleanup_cdi_data()

Connect to Peekbank

Description

Connect to Peekbank

Usage

connect_to_peekbank(db_version = "current", db_args = NULL, compress = TRUE)

Arguments

db_version

String of the name of database version to use

db_args

List with host, user, and password defined

compress

Flag to use compression protocol (defaults to TRUE)

Value

con A DBIConnection object for the peekbank database

Examples

con <- connect_to_peekbank(db_version = "current", db_args = NULL)
DBI::dbDisconnect(con)

Add AOIs to an xy dataframe

Description

Add AOIs to an xy dataframe

Usage

ds.add_aois(xy_joined)

Arguments

xy_joined

dataframe containing processed xy timepoints with aoi region sets information

Value

dataframe with two added columns 'side' and 'aoi'. 'side' only contains "left" or "right" value 'aoi' indicates whether this xy timepoint is looking to "target" or "distractor"


Fetching the list of field names and requirements in each table according to the schema json file

Description

Fetching the list of field names and requirements in each table according to the schema json file

Usage

ds.get_json_fields(table_type)

Arguments

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

Value

the list of field names

Examples

## Not run: 
fields_json <-ds.get_json_fields(table_type = "aoi_timepoints")

## End(Not run)

parse json file from peekbank github into a dataframe

Description

parse json file from peekbank github into a dataframe

Usage

ds.get_peekjson()

Value

the organized dataframe from schema json file

Examples

## Not run: 
peekjson <-ds.get_peekjson()

## End(Not run)

Download peekbank processed dataset from OSF

Description

Download peekbank processed dataset from OSF

Usage

ds.get_processed_data(lab_dataset_id, path = ".", osf_address = "pr6wu")

Arguments

lab_dataset_id

Specific ID occurring in the file hierarchy of the relevant OSF repo.

path

Where you want it on your own machine. Will error if directory doesn't exist.

osf_address

pr6wu for peekbank.


Download specific peekbank dataset from OSF

Description

Download specific peekbank dataset from OSF

Usage

ds.get_raw_data(lab_dataset_id, path = ".", osf_address = "pr6wu")

Arguments

lab_dataset_id

Specific ID occurring in the file hierarchy of the relevant OSF repo.

path

Where you want it on your own machine. Will error if directory doesn't exist.

osf_address

pr6wu for peekbank.


Check if a certain table is required according to schema

Description

Check if a certain table is required according to schema

Usage

ds.is_table_required(table_type, coding_methods)

Arguments

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

coding_methods

methods used in the experiment for coding gaze data, to get the list of current coding methods, please use function ds.list_coding_methods()

Value

A boolean value

Examples

## Not run: 
is_required <-ds.is_table_required(table_type = "xy_timepoints",
                                 coding_method = "manual gaze coding")

## End(Not run)

Get the coding method list from json schema file

Description

Get the coding method list from json schema file

Usage

ds.list_coding_methods()

Value

a list of strings indicating allowed coding methods

Examples

## Not run: 
coding_methods <-ds.list_coding_methods()

## End(Not run)

List the tables required based on coding method

Description

List the tables required based on coding method

Usage

ds.list_ds_tables(coding_methods = c("eyetracking"))

Arguments

coding_method

a list of strings indicating the methods used in the experiment for coding gaze data, to get the list of current coding methods, please use functionds.list_coding_methods()

Value

a list of table types that are required based on input coding method

Examples

## Not run: 
table_list <-ds.list_ds_tables(coding_method = "manual gaze coding")

## End(Not run)

List current allowed language choices for db import

Description

List current allowed language choices for db import

Usage

ds.list_language_choices()

Value

a list of strings containing all the allowed language codes based on json schema file

Examples

## Not run: 
language_list <-ds.list_language_choices()

## End(Not run)

Function for mapping raw data columns to processed table columns

Description

Function for mapping raw data columns to processed table columns

Usage

ds.map_columns(raw_data, raw_format, table_type)

Arguments

raw_data

raw data frame

raw_format

source of the eye-tracking data, e.g. "tobii"

table_type

type of processed table, e.g. "xy_data" | "aoi_table"

Value

processed data frame with specified column names

Examples

## Not run: 
df_xy_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii",
                          table_type = "xy_data")
df_aoi_data <-ds.map_columns(raw_data = raw_data, raw_format = "tobii",
                           table_type = "aoi_data")

## End(Not run)

sets the starting point of a given trial to be zero

Description

sets the starting point of a given trial to be zero

Usage

ds.normalize_times(df_table)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

Value

df_out with resampled time, xy or aoi value rows


Put processed data for specific peekbank dataset on OSF

Description

Put processed data for specific peekbank dataset on OSF

Usage

ds.put_processed_data(token, dataset_name, path = ".", osf_address = "pr6wu")

Arguments

token

personal access tokens for uploading to OSF

dataset_name

Specific dataset name occurring in the file hierarchy of the relevant OSF repo.

path

Where the data live on your own machine.

osf_address

pr6wu for peekbank.


This function resample times to be consistent across labs.

Description

Resampling is done by the following steps:

Usage

ds.resample_times(df_table, table_type)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

table_type

table name, can only be "aoi_timepoints" or "xy_timepoints"

Details

1. iterate through every trial for every administration

2. create desired timepoint sequence with equal spacing according to pre-specified SAMPLE_RATE parameter

3. use approxfun to interpolate given data points to align with desired timepoint sequence "constant" interpolation method is used for AOI timepoints; "linear" interpolation method is used for xy timepoints; for more details on approxfun, please see: https://stat.ethz.ch/R-manual/R-devel/library/stats/html/approxfun.html

4. after resampling, bind resampled dataframes back together and re-assign aoi_timepoint_id

Value

df_out with resampled time, xy or aoi value rows

Examples

## Not run: 
dir_datasets <- "testdataset" # local datasets dir
lab_dataset_id <- "pomper_saffran_2016"
dir_csv <- file.path(dir_datasets, lab_dataset_id, "processed_data")
table_type <- "aoi_timepoints"
file_csv <- file.path(dir_csv, paste0(table_type, '.csv'))
df_table <- utils::read.csv(file_csv)
df_resampled <-ds.resample_times(df_table, table_type = "aoi_timepoints")

## End(Not run)

sets the starting point of a given trial to be zero

Description

sets the starting point of a given trial to be zero

Usage

ds.rezero_times(df_table)

Arguments

df_table

to-be-resampled dataframe with t, aoi/xy values, trial_id and administration_id

Value

df_out with resampled time, xy or aoi value rows


check all csv files against database schema for database import

Description

check all csv files against database schema for database import

Usage

ds.validate_for_db_import(
  dir_csv,
  cdi_expected,
  file_ext = ".csv",
  is_null_field_required = TRUE
)

Arguments

dir_csv

the folder directory containing all the csv files, the path should end in "processed_data"

cdi_expected

specifies whether cdi_data is to be expected to be present in the imported data

file_ext

the default is ".csv"

Value

an empty string if all tables passed the validator; otherwise, the function returns a list of messages describing detailed issues that needs to be fixed

Examples

## Not run: 
msg_error_all <-ds.validate_for_db_import(dir_csv = "./processed_data")

## End(Not run)

Check if a dataframe/table is compliant to peekbank json before database import

Description

Check if a dataframe/table is compliant to peekbank json before database import

Usage

ds.validate_table(
  df_table,
  table_type,
  cdi_expected,
  dir_csv,
  is_null_field_required = TRUE
)

Arguments

df_table

the dataframe to be saved

table_type

the type of dataframe, for the most updated table types specified by schema, please use functionds.list_ds_tables()

is_null_field_required

by default is set to TRUE which means that all the columns in the json file are required; when user specifically sets this to FALSE, then the fields that are allowed null values are not required.

Value

an empty string when the input data frame is compliant with json specification, such as having all the required columns, primary key field has unique values, etc. Otherwise, the function returns a list of messages describing detailed issues that needs to be fixed

Examples

## Not run: 
is_valid <-ds.validate_table(df_table = df_table, table_type = "xy_data", cdi_expected = F, dir_csv = "../processed_data")

## End(Not run)

Check if within aoi_timepoints table, there is no duplication in all the administration_ids associated with each individual trial_id

Description

Check if within aoi_timepoints table, there is no duplication in all the administration_ids associated with each individual trial_id

Usage

ds.validate_trial_uniqueness_constraint(df_aoi_timepoints)

Arguments

df_table

the aoi_timepoints dataframe

cdi_expected

specifies whether cdi_data is to be expected to be present in the imported data; only relevant for subjects table. We could consider creating a special table type, so that invalid combinations of table_type and cdi_expected cannot happen, but it does not break anything, so low priority

Value

an empty string when all the administration_ids are unique within each trial_id; Otherwise, the error message will be returned.

Examples

## Not run: 
is_valid <-ds.validate_table(df_table = df_table, table_type = "xy_data", cdi_expected = FALSE)

## End(Not run)

Get administrations

Description

Get administrations

Usage

get_administrations(
  age = NULL,
  dataset_id = NULL,
  dataset_name = NULL,
  connection = NULL
)

Arguments

age

A numeric vector of a single age or a min age and max age (inclusive), in months

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Administrations data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_administrations()
get_administrations(age = c())
get_administrations(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get AOI region sets

Description

Get AOI region sets

Usage

get_aoi_region_sets(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of AOI Region Sets data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_aoi_region_sets()

## End(Not run)

Get AOI timepoints

Description

Get AOI timepoints

Usage

get_aoi_timepoints(
  dataset_id = NULL,
  dataset_name = NULL,
  age = NULL,
  rle = TRUE,
  connection = NULL
)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

age

A numeric vector of a single age or a min age and max age (inclusive), in months

rle

Logical indicating whether to use RLE data representation or not

connection

A connection to the peekbank database

Value

A 'tbl' of AOI Timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_aoi_timepoints(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get datasets

Description

Get datasets

Usage

get_datasets(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of Datasets data. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_datasets()

## End(Not run)

Get information on database connection options

Description

Get information on database connection options

Usage

get_db_info()

Value

List of database info: host name, current version, supported versions, historical versions, username, password

Examples

get_db_info()

Run a SQL Query script on the Peekbank database

Description

Run a SQL Query script on the Peekbank database

Usage

get_sql_query(sql_query_string, connection = NULL)

Arguments

sql_query_string

A valid sql query string character

connection

A connection to the Peekbank database

Value

The database after calling the supplied SQL query

Examples

## Not run: 
get_sql_query("SELECT * FROM datasets")

## End(Not run)

Get stimuli

Description

Get stimuli

Usage

get_stimuli(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Stimuli data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_stimuli()
get_stimuli(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get subjects

Description

Get subjects

Usage

get_subjects(connection = NULL)

Arguments

connection

A connection to the peekbank database

Value

A 'tbl' of Subjects data. Note that Subjects is a table used to link longitudinal Administrations, which is the primary table you probably want. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_subjects()

## End(Not run)

Get trial types

Description

Get trial types

Usage

get_trial_types(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Trial Types data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_trial_types()
get_trial_types(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get trials

Description

Get trials

Usage

get_trials(dataset_id = NULL, dataset_name = NULL, connection = NULL)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

connection

A connection to the peekbank database

Value

A 'tbl' of Trials data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_trials()
get_trials(dataset_name = "pomper_saffran_2016")

## End(Not run)

Get XY timepoints

Description

Get XY timepoints

Usage

get_xy_timepoints(
  dataset_id = NULL,
  dataset_name = NULL,
  age = NULL,
  connection = NULL
)

Arguments

dataset_id

An integer vector of one or more dataset ids

dataset_name

A character vector of one or more dataset names

age

A numeric vector of a single age or a min age and max age (inclusive), in months

connection

A connection to the peekbank database

Value

A 'tbl' of XY timepoints data, filtered down by supplied arguments. If 'connection' is supplied, the result remains a remote query, otherwise it is retrieved into a local tibble.

Examples

## Not run: 
get_xy_timepoints(dataset_name = "pomper_saffran_2016")

## End(Not run)

List of peekbank tables

Description

List of peekbank tables

Usage

list_peekbank_tables(connection)

Arguments

connection

A connection to the peekbank database

Value

A vector of the names of tables in peekbank

Examples

## Not run: 
con <- connect_to_peekbank()
list_peekbank_tables(con)

## End(Not run)

Populate the provided cdi data with percentile values for that specific age, instrument_type, measure and language. Loosely based on the work from this repo https://github.com/kachergis/cdi-percentiles/tree/main by George Kachergis and Jess Mankewitz with advice from Virginia Marchman.

Description

Populate the provided cdi data with percentile values for that specific age, instrument_type, measure and language. Loosely based on the work from this repo https://github.com/kachergis/cdi-percentiles/tree/main by George Kachergis and Jess Mankewitz with advice from Virginia Marchman.

Usage

populate_cdi_percentiles(subjects_table)

Arguments

subjects_table

a subjects table with unnested cdi data, needs columns "subject_id", "language", "instrument_type", "age", "sex", "measure", "rawscore"

Value

the input table with added columns containing the reference age used, the reference year used, and both gender specific and general percentile values for the cdi score

Examples

full_cdi_data <- all_subjects %>%
  unnest(subject_aux_data) %>%
  filter(!is.na(cdi_responses)) %>%
  unnest(cdi_responses) %>%
  peekbankr::cleanup_cdi_data() %>%
  peekbankr::populate_cdi_percentiles()

Unpack the json sting in the *_aux_data column and turns it into a nested R list

Description

Unpack the json sting in the *_aux_data column and turns it into a nested R list

Usage

unpack_aux_data(df)

Arguments

df

a dataframe in the peekbank format that has an aux data column

Value

the input dataframe, with the *_aux_data column unpacked

Examples

## Not run: 
subjects_table <- unpack_aux_data(df = subjects_table)

## End(Not run)