We welcome contributions to the OpenESM database, both in the form of new datasets and improvements to existing metadata and the software infrastructure.

Adding a New Dataset

If you have a dataset that you would like to add to the OpenESM database, you can simply contact us. In the early stages of the project, we are happy to add datasets that meet our inclusion criteria ourselves. We then aim to continuously improve the process of adding datasets to the OpenESM database, so that you can add datasets yourself in the future.

Alternatively, you can follow these steps to add a new dataset:

Before contributing, please ensure:

  1. Open an issue on the metadata GitHub repository to discuss the dataset you want to add. Provide a brief description of the dataset, its relevance, and any preliminary metadata you have.
  2. If the dataset meets our inclusion criteria (see our Data Documentation for more information), we will guide you through the process of adding it to the OpenESM database.
  3. Prepare your dataset in TSV format, following our formatting guidelines (see below).
  4. Create a dataset metadata file following our standard format. An example metadata file is provided below.
    • Include all relevant information about the dataset, such as authors, year, number of participants, time points, and topics covered.
  5. Create a variable metadata file for each variable in the dataset, detailing the variable name, description, type, and coding information.
  6. Upload the dataset files to our Zenodo community and obtain a DOI for citation purposes.
  7. Add this information to the dataset metadata file.
  8. Submit a pull request to the metadata GitHub repository.

Dataset Formatting Guidelines

Create a folder structure for your dataset using the following naming convention:

Where:

Data Cleaning Requirements

Column Names

Required Standard Columns

Every time series dataset must include these four columns:

id

day

beep

Data Splitting

Organize your data into separate files based on data type:

Time Series Data (_ts.tsv)

Static Data (_static.tsv)

Weekly Data (_weekly.tsv)

Passive Data (_passive.tsv)

Data Quality Checks

Variable Metadata

For each variable in your dataset, please provide the following information in a structured format:

Core Variable Information

name

The exact column name as it appears in your cleaned dataset

description

Brief, clear description of what the variable measures

variable_type

Select the most appropriate type:

details

labels

Response scale labels and anchors

Additional Metadata

transformations

Document any data transformations applied:

source

Source of the question or scale:

assessment_type

Method of data collection:

answer_categories

Number of response options (for categorical variables)

comments

Any additional relevant information about the variable

Metadata Formatting Guidelines

Each dataset should have a .json metadata file in the following format:

{
  "first_author": "LastName",
  "dataset": "000x",
  "year": 2024,
  "reference_a": "@article{lastname2024,\n  title = {Title of your paper},\n  author = {Last, First and Second, Author},\n  date = {2024-01-01},\n  journaltitle = {Journal Name},\n  volume = {X},\n  number = {X},\n  pages = {X--X},\n  doi = {10.xxxx/xxxxxxx}\n}",
  "reference_b": null,
  "paper_doi": "https://doi.org/10.xxxx/xxxxxxx",
  "link_to_zenodo": "https://doi.org/10.5281/zenodo.xxxxxxx",
  "link_to_data": "https://osf.io/xxxxx/ or https://zenodo.org/record/xxxxx",
  "link_to_codebook": "https://osf.io/xxxxx or URL to codebook",
  "link_to_code": "https://osf.io/xxxxx or https://github.com/username/repo",
  "license": "CC BY 4.0",
  "n_participants": 100,
  "n_time_points": 70,
  "n_days": 14,
  "n_beeps_per_day": "5",
  "passive_data_available": "yes",
  "which_passive_data": "GPS, accelerometer, app usage",
  "cross_sectional_available": "yes",
  "topics": "mood, stress, social interaction",
  "implicit_missingness": "no",
  "raw_time_stamp": "yes",
  "sampling_scheme": "5x/day random within windows",
  "participants": "community sample, ages 18-65",
  "additional_comments": "Any additional notes about the dataset",
  "coder_data": "Your Name",
  "coder_metadata": "Your Name",
  "features": [
    {
      "name": "id",
      "description": "Participant ID",
      "variable_type": "categorical",
      "details": "",
      "labels": "",
      "transformation": "",
      "source": "",
      "assessment_type": "ESM",
      "construct": "",
      "answer_categories": "",
      "comments": "Standard participant identifier"
    },
    {
      "name": "day",
      "description": "Day of data collection",
      "variable_type": "numeric",
      "details": "",
      "labels": "",
      "transformation": "",
      "source": "",
      "assessment_type": "ESM",
      "construct": "",
      "answer_categories": "",
      "comments": "Numeric day counter starting from 1"
    },
    {
      "name": "beep",
      "description": "Beep number per day",
      "variable_type": "numeric",
      "details": "",
      "labels": "",
      "transformation": "",
      "source": "",
      "assessment_type": "ESM",
      "construct": "",
      "answer_categories": "",
      "comments": "Beep counter within each day, starting from 1"
    },
    {
      "name": "example_variable",
      "description": "Brief description of the variable",
      "variable_type": "Likert",
      "details": "Wording of the question or statement",
      "labels": "1 = Not at all, 7 = Extremely",
      "transformation": "",
      "source": "Custom question developed for the study",
      "assessment_type": "ESM",
      "construct": "",
      "answer_categories": 7,
      "comments": ""
    }

Contributing to Existing Datasets

If you find errors or have suggestions for existing datasets, please follow these steps:

  1. Open an issue on the metadata GitHub repository describing the problem or suggestion.
  2. If you can, fork the repository and make the changes directly.
  3. Submit a pull request with your changes.

Improving the Software Infrastructure

If you have ideas for improving the OpenESM software infrastructure, such as the website or data processing scripts, we welcome your contributions. Please follow these steps:

  1. Open an issue on the openESM GitHub repository for the website or metadata. Open an issue in the respective software package repositories to improve the software infrastructure.
  2. Fork the repository and make your changes.
  3. Submit a pull request with a clear description of your changes and their benefits.

Contact

If you have any questions or need help with contributions, feel free to reach out.