This section covers everything you need to know about the openESM datasets and metadata, their structure, and how to work with them effectively.

Datasets

Initial Data Search Strategy

Our inclusion criteria for datasets were the following:

At least 20 individuals (to allow for group-based modeling later on)
At least 20 time points possible (to estimate person-specific models)
At least 2 self-report variables collected over time (to fit the overarching topic of the project, and to allow for multivariate modeling strategies)
Not being synthetic data, in other words, not data simulated based on real data

Dataset Structure

All datasets are stored in TSV format with standardized metadata in JSON format. Each dataset is stored on Zenodo with a DOI for citation purposes. The naming convention for datasets is ID_datasetname_type.tsv where ID is a unique identifier for the dataset id, datasetname is the name of the dataset (i.e., the name of the first author), and type indicates the type of data. This type can be ts for time series data, static for cross-sectional/baseline data, or sensor for sensor data.

Versioning and Updates

The datasets are versioned, and updates are made as new data becomes available or corrections are needed. Each dataset has a version number in its metadata, and the Zenodo DOI will point to the latest version. If you use a specific version of a dataset, please cite it using the DOI provided in the metadata.

We follow semantic versioning principles, where:

Major version changes indicate significant updates or changes in the dataset structure, such as correcting errors in the data or adding new variables that change the overall structure of the dataset.
Minor version changes indicate minor corrections, such as correcting typos or adding new variables without changing the overall structure.
Patch version changes indicate small fixes or updates that do not affect the overall dataset structure, such as changes to a column type that does not affect the data itself.

Metadata

Dataset Metadata

The dataset metadata are stored in the metadata .json file of the dataset. It includes the following fields:

General Information

Author

Primary author(s) of the dataset. This typically corresponds to the first or corresponding author of the associated publication. Typically, there are multiple individuals involved in the data collection, but we only list the first author of either the publication or the dataset. All individuals who contributed to the reference are listed in the relevant publication. If you think that we missed someone who should be listed here, please let us know.

Dataset ID

Unique dataset identifier used throughout the openESM database (e.g., “0001”, “0002”). This ID is used in folder names and file references. This ID is not fully consecutive, as we have did not include some datasets that we found in our initial search.

Year

Year when the data collection took place. This may differ from the publication year.

Reference A

Primary publication reference associated with the dataset. This is typically the main paper that describes the study and findings.

Reference B

Secondary publication reference, if applicable. Some datasets may be associated with multiple publications or follow-up studies.

Code & Data Access

Link to Zenodo

Link to the dataset stored on Zenodo. This provides persistent access to the actual data files.

Paper DOI

Digital Object Identifier (DOI) of the associated publication, enabling direct access to the published research.

Link to Data

Direct URL to dataset files. This typically points to the TSV files containing the actual data.

Link to Code

URL to analysis code, when available. This may include preprocessing scripts, analysis code, or supplementary materials.

Link to Codebook

URL to variable documentation or codebook that provides detailed information about variables and their coding.

License

Data usage license specifying terms and conditions for using the dataset. Common licenses include Creative Commons variants and custom research licenses.

Design & Participants

N Participants

Total number of study participants who contributed data to the dataset.

N Time Points

Maximum number of possible ESM observations per participant. This represents the theoretical maximum if all prompts were answered. In some datasets, this may vary by participant.

N Days

Total number of days over which ESM data was collected.

N Beeps/Day

Maximum Number of ESM prompts sent to participants per day. This indicates the sampling frequency of the study. In some studies, this may vary by day or participant, so we provide the maximum observed value.

Passive Data

Indicates whether passive data (e.g., GPS, accelerometer, app usage) was collected and is available.

Which Passive Data?

Specifies the types of passive data collected, such as location data, physical activity, phone usage, or sensor data.

Cross-sectional

Indicates availability of baseline or trait measures collected at the beginning or end of the study period. These measures can be found in the static file for the dataset.

Topics

Primary research topics or psychological constructs examined in the study (e.g., mood, stress, social interactions). This helps categorize datasets based on their focus areas, but is not exhaustive.

Implicit Missingness

Specifies whether missing observations are implicitly missing (i.e., non-response can be inferred) or if only completed responses are included. In some datasets, this is unclear, as most individuals have the same number of responses (including missing ones), but some individuals differ.

Raw Time Stamp

Indicates availability of timestamp information for when responses were provided.

Sampling Scheme

Describes the ESM prompt schedule (e.g., fixed intervals, random within time windows, event-contingent).

Participants

Describes participant characteristics such as age range, demographic information, or clinical status. This is typically just a brief summary, and more detailed information can be found in the associated publication.

Variable-Level Metadata

Variable-level metadata provides detailed information about each variable in the dataset, enabling researchers to understand and appropriately use the data.

Core Variable Information

Name

The exact variable name as it appears in the dataset files.

Description

Brief, clear description of what the variable measures or represents.

Variable Type

Data type specification (e.g., rating_scale, Date, binary) indicating the type of the variable

Details

Comprehensive information including exact question wording or exact meaning of passive sensor variable.

Labels

Scale labels and response options. For scales, this typically includes anchor points (e.g., “1 = Not at all, 7 = Extremely”).

Transformation

Documents any data transformations applied to the original responses, such as reverse coding, standardization, or aggregation.

Data Collection Context

Source

Origin of the variable, such as the specific questionnaire, scale, or measurement instrument used.

Assessment Type

Method of data collection, categorized as:

ESM: Experience Sampling Method (momentary assessments)
Daily: Daily assessments
Passive: Automatically collected sensor or behavioral data
Other: Any other type of assessment not fitting the above categories

Construct

The psychological or behavioral construct(s) that the variable is intended to measure (e.g., “positive affect,” “relationship,” “context”). Obviously, assigning a variable to a construct is subjective, so we try to be as consistent as possible, but this is not always possible. We mainly added this field to allow for easier filtering of variables, and thus sometimes used relatively broad constructs.

Comments

Additional notes, caveats, or important information about the variable that doesn’t fit in other categories.

Versioning and Updates

We follow semantic versioning principles, where:

Major version changes indicate significant updates or changes in the metadata structure, such as introducing a new column or correcting major errors.
Minor version changes indicate minor corrections, such as adding new information or extending existing metadata without changing the overall structure.
Patch version changes indicate small fixes or updates that do not affect the overall metadata, such as fixing a typo.

Datasets#

Initial Data Search Strategy#

Dataset Structure#

Versioning and Updates#

Metadata#

Dataset Metadata#

General Information#

Author#

Dataset ID#

Year#

Reference A#

Reference B#

Code & Data Access#

Link to Zenodo#

Paper DOI#

Link to Data#

Link to Code#

Link to Codebook#

License#

Design & Participants#

N Participants#

N Time Points#

N Days#

N Beeps/Day#

Passive Data#

Which Passive Data?#

Cross-sectional#

Topics#

Implicit Missingness#

Raw Time Stamp#

Sampling Scheme#

Participants#

Variable-Level Metadata#

Core Variable Information#

Name#

Description#

Variable Type#

Details#

Labels#

Transformation#

Data Collection Context#

Source#

Assessment Type#

Construct#

Comments#

Versioning and Updates#

Datasets

Initial Data Search Strategy

Dataset Structure

Versioning and Updates

Metadata

Dataset Metadata

General Information

Author

Dataset ID

Year

Reference A

Reference B

Code & Data Access

Link to Zenodo

Paper DOI

Link to Data

Link to Code

Link to Codebook

License

Design & Participants

N Participants

N Time Points

N Days

N Beeps/Day

Passive Data

Which Passive Data?

Cross-sectional

Topics

Implicit Missingness

Raw Time Stamp

Sampling Scheme

Participants

Variable-Level Metadata

Core Variable Information

Name

Description

Variable Type

Details

Labels

Transformation

Data Collection Context

Source

Assessment Type

Construct

Comments

Versioning and Updates