This section covers everything you need to know about the openESM datasets and metadata, their structure, and how to work with them effectively.

Datasets

Initial Data Search Strategy

Our inclusion criteria for datasets were the following:

At least 20 individuals (to allow for group-based modeling later on)
At least 20 time points possible (to estimate person-specific models)
At least 2 self-report variables collected over time (to fit the overarching topic of the project, and to allow for multivariate modeling strategies)
Not being synthetic data, in other words, not data simulated based on real data

Dataset Structure

All datasets are stored in TSV format with standardized metadata in JSON format. Each dataset is stored on Zenodo with a DOI for citation purposes. The naming convention for datasets is ID_datasetname_type.tsv where ID is a unique identifier for the dataset id, datasetname is the name of the dataset (i.e., the name of the first author), and type indicates the type of data. This type can be ts for time series data, static for cross-sectional/baseline data, or sensor for sensor data.

Metadata

Dataset Metadata

The dataset metadata are stored in the metadata .json file of the dataset. It includes the following fields:

General Information

Author

Primary author(s) of the dataset. This typically corresponds to the first or corresponding author of the associated publication. Typically, there are multiple individuals involved in the data collection, but we only list the first author of either the publication or the dataset. All individuals who contributed to the reference are listed in the relevant publication. If you think that we missed someone who should be listed here, please let us know.

Dataset ID

Unique dataset identifier used throughout the openESM database (e.g., “0001”, “0002”). This ID is used in folder names and file references. This ID is not fully consecutive, as we have did not include some datasets that we found in our initial search.

Year

Year when the data collection took place. This may differ from the publication year.

Reference A

Primary publication reference associated with the dataset. This is typically the main paper that describes the study and findings.

Reference B

Secondary publication reference, if applicable. Some datasets may be associated with multiple publications or follow-up studies.

Dataset Version

The current version of the data file, following semantic versioning. Increments only when the cleaned data file changes. This version corresponds directly to the version on the individual Zenodo record for this dataset.

Changelog

A record of all changes made to this dataset, ordered newest first. Each entry includes a date, description, and type: data for changes that produced a new data file version, or metadata for annotation and label corrections that left the data file unchanged.

Code & Data Access

Zenodo DOI

Digital Object Identifier (DOI) of the dataset deposited on Zenodo. Use the version-specific DOI listed on the dataset page when citing a specific version of the data.

Paper DOI

Digital Object Identifier (DOI) of the associated publication, enabling direct access to the published research.

Link to Data

Direct URL to dataset files. This typically points to the TSV files containing the actual data.

Link to Code

URL to analysis code, when available. This may include preprocessing scripts, analysis code, or supplementary materials.

Link to Codebook

URL to variable documentation or codebook that provides detailed information about variables and their coding.

License

Data usage license specifying terms and conditions for using the dataset. Common licenses include Creative Commons variants and custom research licenses.

Design & Participants

N Participants

Total number of study participants who contributed data to the dataset.

N Time Points

Maximum number of possible ESM observations per participant. This represents the theoretical maximum if all prompts were answered. In some datasets, this may vary by participant.

N Days

Total number of days over which ESM data was collected.

N Beeps/Day

Maximum Number of ESM prompts sent to participants per day. This indicates the sampling frequency of the study. In some studies, this may vary by day or participant, so we provide the maximum observed value.

Passive Data

Indicates whether passive data (e.g., GPS, accelerometer, app usage) was collected and is available.

Which Passive Data?

Specifies the types of passive data collected, such as location data, physical activity, phone usage, or sensor data.

Cross-sectional

Indicates availability of baseline or trait measures collected at the beginning or end of the study period. These measures can be found in the static file for the dataset.

Topics

Primary research topics or psychological constructs examined in the study (e.g., mood, stress, social interactions). This helps categorize datasets based on their focus areas, but is not exhaustive.

Implicit Missingness

Specifies whether missing observations are implicitly missing (i.e., non-response can be inferred) or if only completed responses are included. In some datasets, this is unclear, as most individuals have the same number of responses (including missing ones), but some individuals differ.

Raw Time Stamp

Indicates availability of timestamp information for when responses were provided.

Sampling Scheme

Describes the ESM prompt schedule (e.g., fixed intervals, random within time windows, event-contingent).

Participants

Describes participant characteristics such as age range, demographic information, or clinical status. This is typically just a brief summary, and more detailed information can be found in the associated publication.

Variable-Level Metadata

Variable-level metadata provides detailed information about each variable in the dataset, enabling researchers to understand and appropriately use the data.

Core Variable Information

Name

The exact variable name as it appears in the dataset files.

Description

Brief, clear description of what the variable measures or represents.

Variable Type

Data type specification (e.g., rating_scale, Date, binary) indicating the type of the variable

Details

Comprehensive information including exact question wording or exact meaning of passive sensor variable.

Labels

Scale labels and response options. For scales, this typically includes anchor points (e.g., “1 = Not at all, 7 = Extremely”).

Transformation

Documents any data transformations applied to the original responses, such as reverse coding, standardization, or aggregation.

Data Collection Context

Source

Origin of the variable, such as the specific questionnaire, scale, or measurement instrument used.

Assessment Type

Method of data collection, categorized as:

ESM: Experience Sampling Method (momentary assessments)
Daily: Daily assessments
Passive: Automatically collected sensor or behavioral data
Other: Any other type of assessment not fitting the above categories

Construct

The psychological or behavioral construct(s) that the variable is intended to measure (e.g., “positive affect,” “relationship,” “context”). Obviously, assigning a variable to a construct is subjective, so we try to be as consistent as possible, but this is not always possible. We mainly added this field to allow for easier filtering of variables, and thus sometimes used relatively broad constructs.

Comments

Additional notes, caveats, or important information about the variable that doesn’t fit in other categories.

Versioning and Updates

openESM distinguishes between two types of versioning.

1. Dataset versions

Each dataset has a dataset_version field in its metadata, following semantic versioning:

Major: Structural modifications to the data file (e.g., adding or removing variables)
Minor: Corrections to existing data (e.g., fixing errors in values, revising missing data codes)
Patch: Small fixes that do not affect data values (e.g., correcting a column type)

Each new dataset version is deposited as a new version on Zenodo with its own DOI. If you use a specific dataset in your work, cite the version-specific DOI listed on the dataset page. All changes to a dataset — whether to the data file or to its metadata — are recorded in the dataset’s changelog, visible on the dataset page.

2. Metadata database versions

The metadata database tracks the state of all openESM datasets combined. Releases are versioned snapshots archived on Zenodo, following the openesm-metadata repository releases. A single metadata database release may include changes to multiple datasets simultaneously, such as updated construct annotations, label corrections, or new dataset additions.

The metadata database follows semantic versioning:

Major: Breaking changes to the JSON schema that require updates to downstream code (e.g. renaming or removing existing fields)
Minor: Additive schema changes (new fields added) or batches of new dataset additions
Patch: Metadata corrections, typo fixes, no schema changes

Datasets#

Initial Data Search Strategy#

Dataset Structure#

Metadata#

Dataset Metadata#

General Information#

Author#

Dataset ID#

Year#

Reference A#

Reference B#

Dataset Version#

Changelog#

Code & Data Access#

Zenodo DOI#

Paper DOI#

Link to Data#

Link to Code#

Link to Codebook#

License#

Design & Participants#

N Participants#

N Time Points#

N Days#

N Beeps/Day#

Passive Data#

Which Passive Data?#

Cross-sectional#

Topics#

Implicit Missingness#

Raw Time Stamp#

Sampling Scheme#

Participants#

Variable-Level Metadata#

Core Variable Information#

Name#

Description#

Variable Type#

Details#

Labels#

Transformation#

Data Collection Context#

Source#

Assessment Type#

Construct#

Comments#

Versioning and Updates#