----

# MNE-BIDS pipeline
>
#### This document will serve as a step-by-step guide to walk you through transforming your EEG dataset into one compliant with the BIDS format, using MNE-BIDS.

##### It will walk you through:
-> General information
>
-> Downloading/formatting your dataset
>
-> Setting up your coding environment
>
-> Transforming your EEG data to BIDS
>
-> Adapting your BIDS dataset to include relevant metadata
>
-> Validating your BIDS dataset
>
-> Citing MNE-BIDS
>
-> Adapting the code to iterate through all participants 

The pipeline focalises two main sections, 'Transforming EEG data to BIDS', and 'Adapting your BIDS dataset to include relevant metadata'. The first of these will produce a valid BIDS dataset, containing all BIDS-required fields. However, the second section is required to generate a dataset that includes _all_ of the meaningful metadata collected from your study, as a lot of information that may be important for your dataset cannot be included by MNE-BIDS.

Thus, to generate a valid _and_ complete BIDS dataset, you should run through both main sections.

----

## What is MNE?
> MNE is an open source python package for working with EEG and MEG data, which serves to facilitate the exploration, visualisation and analysis of neuroimaging data.

## What is BIDS?
> BIDS (Brain Imaging Data Structure) is a simple method of organising neuroimaging data that is relatively easy to adopt and promotes standardisation across neuroimaging experiments. 

This standardisation is important as it allows other researchers to easily understand and work with your data, fostering collaboration, openness, and better adhesion to [FAIR principles](https://www.go-fair.org/fair-principles/), which ensures data is Findable, Accessible, Interoperable and Reusable. Additionally, many software packages and databases (such as [OpenNeuro.org](https://openneuro.org/)) prefer or require BIDS formatted datasets, so formatting your data in this way makes publication and curation of data much simpler!

>
> BIDS involves a hierarchical folder organisation structure, with four main levels:
>

-> project

---> subject

----->  session

-------> datatype

## SO, MNE-BIDS...?
> Is a processing pipeline that uses MNE-python tools to generate BIDS compliant datasets!
>
If you don't currently have MNE-BIDS installed, please refer to their official [website](https://mne.tools/mne-bids-pipeline/stable/getting_started/install.html) to do so before beginning this walkthrough.

----

#### To some, the contents of this document may appear incredibly complex. We understand that without prior experience, reaching BIDS-compliancy can be incredibly difficult, but we also know that its advantages are striking! As such, we will do our best to make this tutorial in-depth and beginner friendly!

----

# What versions will this document use?

#### - MNE version: 1.9.0
#### - BIDS version: 1.10.0
#### - MNE-BIDS version: 0.16.0

Note: The document is also tailored towards windows operating systems, but the main code sections should still function with other operating systems

# Expected Proficiencies
> #### Prior to using this pipeline, a certain level of understanding/ skill is expected. 

This entails:
- Some knowledge of python (to understand and implement the present code), although this will be explained throughout.
- An understanding of what a BIDS formatted dataset should include and how it should look (for checking the dataset has converted correctly).
  > This information can be found on the [BIDS website](https://bids.neuroimaging.io/getting_started/index.html).
- Familiarity with your EEG dataset and its associated metadata (to ensure all important information is present post-conversion and add any that is missing).

-----

# 1. Downloading data
> #### Collecting the EEG dataset necessary to run through this pipeline

In order to complete this pipeline, you will first need some EEG data. If you intend to run this pipeline using your pre-existing dataset, you can simply move onto the next step. If you don't have any EEG data to test this process on, we suggest downloading the [EEG Motor Movement/Imagery Dataset](https://physionet.org/content/eegmmidb/1.0.0/) from the [Physiobank Database](https://physionet.org/data/). This document will use this as example data.

----

# 2. Data formatting
> #### This pipeline's data format expectations

This pipeline is curated to work with EDF (European Data Format) formatted datasets, however MNE is capable of handling a variety of formats. 
>  If your data is currently in a different format, you will need to use a slightly different section of code when reading in your data (step 6). For guidance on this, refer to MNE's documentation on [importing data from EEG devices](https://mne.tools/stable/auto_tutorials/io/20_reading_eeg_data.html#sphx-glr-auto-tutorials-io-20-reading-eeg-data-py) for guidance. 
>
The pipeline will also write the dataset into the EDF format in step 7 as recommended by BIDS. If you require a different output format, you may edit the `format` parameter of `write_raw_bids` using [MNE's guidance](https://mne.tools/mne-bids/stable/generated/mne_bids.write_raw_bids.html).


------

## Creating a virtual environment
Before you can begin your conversion, it will be useful to create a virtual environment, which allows you to store packages in a self-contained space that won't affect any of your other coding projects.
Using this, we can also install all of the packages from a different virtual environment to ensure you have all the same things installed, with the exact same versions and dependencies.

Thus, this next section will walk you through two methods for creating your virtual environment and downloading the necessary requirements, using different package managers. You should only complete the steps for one of these. The first of these will use [pip](https://pypi.org/project/pip/) and native python tools to complete these steps, while the second will use [uv](https://docs.astral.sh/uv/). While pip is slightly simpler to use in the short term, uv is a more powerful and complete tool, with more long term benefits.

## pip and native python tools
#### The following steps __must__ be completed using your computer's terminal; for windows devices, this will be 'Windows PowerShell'. 

##### You must also first [install python](https://www.python.org/downloads/windows/).

Creating your virtual environment:

First, let's create a directory for our project (using the first line of code), then move into that (using the second line of code).

In [1]:
# Making a directory for the current project
mkdir mne-bids-pipeline
# Moving into the newly created directory
cd mne-bids-pipeline

SyntaxError: invalid syntax (2780325735.py, line 2)

Then, using the venv module, we will create a virtual environment, which will be contained in the sub-directory '.venv'.

In [None]:
# Creating a virtual environment using the venv module (with the sub-directory name '.venv')
python -m venv .venv

Now, lets activate the virtual environment. 

In [None]:
.venv/Scripts/activate

Using pip:
- You should have already installed python, which should also have installed pip.
- Next, you must locate the 'requirements.txt' file within this document's repo, ['RosettaState'](https://github.com/ubdbra001/RosettaState/tree/MNE-BIDS-pipeline-post-feedback).
- This will need downloading and saving to your newly created 'mne-bids-pipeline' directory
- From here, you can install the dependencies from 'requirements.txt' by typing the code below into the powershell terminal. 

In [None]:
# Install the dependencies from requirements.txt: 
python -m pip install --requirement requirements.txt

## uv


Using uv:
- You must first [install uv](https://docs.astral.sh/uv/getting-started/installation/).
- You should also [install git](https://github.com/git-guides/install-git)
- Then, you must [clone the repository](https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository) this document is stored in (['RosettaState'](https://github.com/ubdbra001/RosettaState)) to your local machine. 
- From here, you must navigate to the cloned repository's directory. This can be done using __windows powershell__ by typing cd, then pasting in the folder path (e.g. `cd C:\Users\Ariana\github\RosettaState`).

In [None]:
cd <your_cloned_repo's_folder_path>

- In the powershell terminal, you can also create a virtual environment using `uv venv`. This can then be activated with `.venv\Scripts\activate`.

In [None]:
uv venv

.venv\Scripts\activate

- Finally, you can type `uv sync` into the same terminal to install all required dependencies.

In [None]:
uv sync

# 3. Importing the necessary tools
>### To begin, we will need to import all the tools necessary for converting the data.

This first section of code will import tools that allow us to work with the file paths and simplify the method of handling them.

In [2]:
from pathlib import Path

Next, we need to `import MNE`, a python package for working with EEG and MEG data, and some associated tools that we will use here. 
>
From `mne_bids`, we are importing:
- `BIDSPath`:
A tool for creating a BIDS formatted file path
- `print_dir_tree`:
A tool for presenting the contents of a folder in a 'tree' view
- `write_raw_bids`:
A tool for saving EEG data into BIDS format

In [3]:
import mne

from mne_bids import BIDSPath, print_dir_tree, write_raw_bids, make_dataset_description, update_sidecar_json

------

# 4. Finding the data
>### After completing our imports, we need to find the EEG data files.

In the code below, the first line specifies where the folders and sub-folders for the data can be found. These should include your EEG data and any additional information (metadata). 
>
You should modify this to include your own file pathway: `data_dir = Path(r"___your file pathway____")`. This should be the file containing your task files, or the highest file level containing your dataset and no external (dataset-unrelated) files.
> Here, the `r` (raw) ensures the file location is read as is, and that the backslashes don't get interpreted as special characters, so don't break up the text.
>
The line below this prints a visualisation of the first sub-folders within (using the `print_dir_tree` tool!). 
> You may have 1 or more of these, depending on how much EEG data you wish to make BIDS compliant. Each of these should contain EEG data from one specific task type, including data from each participant and any associated metadata.

In [4]:
#Change the file path to your data's location
data_dir = Path(r"C:\N8_internship_code\Motor_Imaging_Dataset")
print_dir_tree(data_dir, max_depth=2)

|Motor_Imaging_Dataset\
|--- 64_channel_sharbrough-old.png
|--- 64_channel_sharbrough.pdf
|--- 64_channel_sharbrough.png
|--- ANNOTATORS
|--- MNE_BIDS_sheet.xlsx
|--- RECORDS
|--- SHA256SUMS.txt
|--- wfdbcal
|--- ~$MNE_BIDS_sheet.xlsx
|--- S001\
|------ Motor_Imaging_Dataset - Shortcut.lnk
|------ S001R01.edf
|------ S001R01.edf.event
|------ S001R02.edf
|------ S001R02.edf.event
|------ S001R03.edf
|------ S001R03.edf.event
|------ S001R04.edf
|------ S001R04.edf.event
|------ S001R05.edf
|------ S001R05.edf.event
|------ S001R06.edf
|------ S001R06.edf.event
|------ S001R07.edf
|------ S001R07.edf.event
|------ S001R08.edf
|------ S001R08.edf.event
|------ S001R09.edf
|------ S001R09.edf.event
|------ S001R10.edf
|------ S001R10.edf.event
|------ S001R11.edf
|------ S001R11.edf.event
|------ S001R12.edf
|------ S001R12.edf.event
|------ S001R13.edf
|------ S001R13.edf.event
|------ S001R14.edf
|------ S001R14.edf.event
|--- S002\
|------ S002R01.edf
|------ S002R01.edf.event
|------ 

This next section lists the file paths for the sub-folders we just visualised and adds them to the list 'children'.

In [5]:
children = [child for child in data_dir.iterdir()]
children

[WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/64_channel_sharbrough-old.png'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/64_channel_sharbrough.pdf'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/64_channel_sharbrough.png'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/ANNOTATORS'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/MNE_BIDS_sheet.xlsx'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/RECORDS'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S001'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S002'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S003'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S004'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S005'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S006'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_Dataset/S007'),
 WindowsPath('C:/N8_internship_code/Motor_Imaging_D

------

# 5. Selecting specific files
>### Let's specify the files we want to use

Here, the first line serves to identify which of the two files we want to write into BIDS format (note: in python, the first index is given a value of 0). If you have multiple files, each time you run through this you should change the number at the end to match the file you are wanting to adapt. 
>
The second line lists all the files in the specified subfolder.

In [None]:
#Change this to match the file number
dir_number = 5
files = [file for file in children[dir_number].iterdir()]

This sets the first file in the folder to the variable `file_path`, then prints this. 
>
Even when completing multiple iterations (for more than one dataset), the value should NOT be changed from 0!

In [None]:
file_path = files[1]
file_path

-------

# 6. Reading/ specifying the data
>### Now we've completed our preparations, lets compile our data

Here, we are reading the EEG data from the previously selected file path to the `data` variable.
>
As previously mentioned, the current code is tailored to EDF formatted datasets and won't work with any other formats. As such, you must use a slightly different line of code depending on the format of your data. 

In [None]:
raw = mne.io.read_raw_edf(file_path)

In [None]:
raw

### In this section we will also specify some important metadata.

This process will involve writing information to the 'info' dictionary, which holds all of the metadata for the dataset. Not all aspects of this can be written to, but a few can.

So first, we need to import a few extra tools to help with this process.

In [None]:
from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta

Next, fill in the value section of the `key:value` pairs below to match the datset's information.

In [None]:
# Information about the EEG headset
raw.info["device_info"] = {
    "type": "eeg",
    "model": "EasyCap 64-channel cap",
    "serial": "1234567",
    }

# The line frequency of the data (in hertz)
raw.info["line_freq"] = 50

# A description of the recording
raw.info["description"] = "A motor imaging dataset..."

# The name of the experimenter
raw.info["experimenter"] = "John Doe"


If you know your dataset has any broken or noisy channels, you may enter these into the list below (each within a set of double quotation marks) to flag them within the dataset.

In [None]:
# A list of 'bad' (noisy or broken) channels, by name
raw.info["bads"] = ["__", "__"]

Then, enter in the required information for the subject info section using the same method. 

If the participant's birthdate is unknown, this can be calculated by inputting the date of measurement (YYYY/M/D) into the first line of code, then inputting the participant's age into the third line (years=___). The variable 'birthdate' can then be inputted as the value for the "birthday" variable.

In [None]:
# Generates the (approximate) birthdate of the participant based on the measurement date and age
raw.set_meas_date(datetime(2015, 6, 7, tzinfo= timezone.utc))
recording_date = raw.info["meas_date"]
Birthdate = recording_date - relativedelta(years=30)

raw.info["subject_info"] = {
    "id": 1,
    "his_id": "sub-001",
    "last_name": "Doe",
    "first_name": "John",
    "middle_name": "A",
    "birthday": Birthdate,
    "sex": 2,
    "hand": 1,
    "weight": 70.0,
    "height": 175.0,
}

Finally, we will be inputting the montage for your dataset.

The below code will list the standard montages that MNE-BIDS supports. From these, you should select the montage that applies to your dataset.

MNE also supports the creation of your own montage. Those that require this function may benefit from following [MNE's guidelines](https://mne.tools/stable/generated/mne.channels.read_custom_montage.html#mne.channels.read_custom_montage).

In [None]:
builtin_montages = mne.channels.get_builtin_montages(descriptions=True)
for montage_name, montage_description in builtin_montages:
    print(f"{montage_name}: {montage_description}")

Next, you can input your montage name within the double quotation marks below to display it's information and visualise it as both a 2D and 3D plot.

In [None]:
my_montage = mne.channels.make_standard_montage("biosemi64")

# Printing montage information
print(my_montage)

# Visualising montage in 2D
my_montage.plot()

# Visualising montage in 3D
fig = my_montage.plot(kind="3d", show=False)  # 3D
fig = fig.gca().view_init(azim=70, elev=15)  # set view angle

If you are happy with these visualisations, the montage can be written to the data using the code below.

In [None]:
raw.set_montage(my_montage, match_case=True, match_alias=False, on_missing='ignore', verbose=None)

#### Events
It is also important that you specify any events in the EEG recording. Event information should be matched based on index throughout the input sections below. For each event, you should input their onset time (in seconds), their duration (in seconds), their description, their associated channels (empty entry = no specific channel associated), and their event id (dictionary of key value pairs). orig_time can also be edited, but uses measurement date (from 'info') as a default to sync annotations with raw data.

Note: Any events with labels beginning with 'bad' or 'edge' will be ignored, and event id keys must be the same as event description names.

The code will write all of these inputs to 'annotations' in the raw data, then turn these into events to be written to the BIDS dataset.

In [None]:
# The starting time of annotations (in seconds) after 'orig time'
onset = [0, 10.0, 30.5]  

# Durations of the annotations (in seconds)
duration = [0, 0.5, 1.0]

# Descriptions for each annotation
description = ['start', 'bad blink', 'stimulus']

# List of channel names associated with the annotations (empty entries = no specific channel)
ch_names = [[], ['O1', 'C3'], ['Pz']]

# Determines the starting time of annotation acquisition, contains the timestamp as the first element and microseconds as the second element
orig_time = None

# Writing the inputs to 'annotations'
annotations = mne.Annotations(onset, duration, description, orig_time=orig_time, ch_names=ch_names)
raw.set_annotations(annotations)

# The id associated with each event (via its description)
event_id = {
        'start': 1,
        'bad blink': 2,
        'stimulus': 3
    }

# Generates events from 'annotations'
events, _ = mne.events_from_annotations(raw, event_id=event_id)

This next section of code will create a new folder path for storing EEG data in BIDS format, then prints it out. 
>
We recommend renaming your file to something more specific to your dataset, by switching out the text in the quotation marks. Attempt to avoiding using any spaces in the title to prevent possible later complications.

In [None]:
bids_root = data_dir.parent/ "Motor_Imaging_Example"
bids_root

-------

# 7. Writing the data
> #### Let's write our selected data into BIDS format!

First, you should manually define the participant number/ subject id and task name for this dataset, setting them each to a variable as seen in the first two rows.
>
Then, using the `BIDSPath` tool we imported earlier, we will assign the subject, task and the folder path we just created to `bids_path`. 
>
We will then use another imported tool, `write_raw_bids` to write the data (from the file path we defined earlier) into the new file path we created, linking it to the subject id and task type we outlined. The desired format of the output data is also outlined here `format="EEGLAB"`.

The last line will also write the previously generated 'events' and 'event_id' variables to the dataset.

In [None]:
#Edit this information to match your data
subject_id = "S001"
task = "task1"

bids_path = BIDSPath(subject=subject_id, task=task, root=bids_root)
write_raw_bids(raw, bids_path, events=events, event_id=event_id, overwrite=True, allow_preload=True, format="EDF")

_______

### Now you have formatted your dataset to BIDS standards! 
#### Don't forget to repeat steps 4 and 5 for all of the file paths we found in step 3
>
## But hold on!
#### Your BIDS formatted dataset isn't quite complete yet...

----

# Editing and checking your BIDS formatted dataset
### The steps below will walk you through finding and editing some of the files in your new dataset, in order to make them BIDS-compliant. 
Each of these files should automatically include a large amount of information derived from your dataset and stored in BIDS format, however this may not always be completely accurate.
>
As such, the next steps will walk you through checking that your BIDS dataset is accurate, and how to adapt these files if necessary. Some of the file's items will be deemed required for a BIDS-compliant dataset, while others are recommended or merely optional. You __MUST__ ensure that the required elements are present and have correct data, and although not necessary, it will be beneficial for you to include as much additional data as possible, especially if it is important information for your dataset.

> Don't forget to do these checks for all task types (all of the file pathways we found).

You can do this by navigating to the file path we assigned to the variable `bids_root` in step 4, then working through all of the files and investigating what is present/correct.

----

<a id="edits"></a>
# Editing different file formats
Some of the following files will follow the .json format (Sidecar, Coordinate System, Dataset Description), others (Channels Description, Electrodes description) will be in the .tsv format, and a few will have a file in each format (Events, Participants).

These file types are each edited via slightly different methods, so while .json files require no extra imports, to edit our .tsv files we must import the [pandas](https://pandas.pydata.org/pandas-docs/version/1.4/index.html) library.

Due to the differences in their display formats (text vs tabular), while .json files can be edited using a simple dictionary of key:value pairs, editing .tsv files requires a few different code functions. 

Those outlined in this document will walk you through:
- Adding/ Editing a column
- Editing the value of just one row
- Removing a row
- Adding a row

In [None]:
# Importing the pandas library to edit .tsv files
import pandas as pd

# Assigning file pathways
To ensure our next sections of code are as clean and easy to use as possible, we will be assigning key file pathway roots to variables. Later in the document we will use these to create file pathways to specific file locations.

Here, you should:
- Set the variable 'root' to the top-level folder of the BIDS dataset 
- Set the variable 'eeg_root' to the folder containing eeg data for the current subject


In [None]:
# Set 'root' to the top-level folder of the BIDS dataset 
root = Path(r'c:\N8_internship_code\Motor_Imaging_Example')

# Set 'eeg_root' to the folder containing eeg data for the current subject
eeg_root = Path(r'C:\N8_internship_code\Motor_Imaging_Example\sub-S001\eeg')

Contents:
- [Dataset description](#dataset-description)
- [JSON files](#json)
- [Phenotype Files](#phenotype)
- [TSV files](#tsv)

----


#### Despite also being a json file, the dataset description can't be edited using the same code. As such, any edits we hope to make to it must be done by following the next steps.

<a id="dataset-description"></a>
# Dataset Description
> Edits to this file are incredibly important, as this outlines all of the general information about your dataset.

The code below will re-write the ENTIRE dataset description, overwriting any previous dataset description files.
This file should describe the dataset in as much detail as possible, so you should attempt to include as much of the data outlined below as possible although BIDS only requires the presence of the 'necessary' information.

#### BIDS components:
>
Necessary:
>
    1. StudyName
    2. BIDSVersion 
>
Recommended:
>
    3. HEDVersion
    4. DatasetType
    5. DataLicense
    6. Authors
    7. GeneratedBy
        - Name
        - Version
        - Container
        - Type
        - Tag
    8. SourceDatasets
>
Optional:
>
    9. Acknowledgements
    10. HowToAcknowledge
    11. Funding
    12. EthicsApprovals
    13. ReferencesAndLinks
    14 Doi
Note: `BIDS version` will be automatically included in the data file once the code is run.


Once you have decided on the information you wish to include, you can append the code below, changing the information in quotation marks to your dataset's information.
>
Any that you don't intend on including should be written as `<item>=None`, just as `acknowledgements` is below. This will skip that item, preventing its inclusion in the file. 
> This code will overwrite any 'dataset description' file previously generated. This can be changed by changing `overwrite=True` to `overwrite=False`. 
>
- Note: Doi must be written in the format: `doi:<insert_doi>`.

> An example output file can be found within the [BIDS documentation](https://bids-specification.readthedocs.io/en/stable/modality-agnostic-files.html).

In [None]:
# Creating a dataset description JSON file
# Will overwrite any existing dataset_description.json file in the root of the BIDS directory
make_dataset_description(
    path=bids_root,
    name="EEG Motor Movement/ Imagery Dataset", 
    hed_version="1",
    dataset_type='raw',
    data_license="CCO",
    authors=["John Doe", "Jane Doe"],
    generated_by=[
        {
            "Name": "MNE-BIDS",
            "Version": "0.14",
            "Description": "Used to convert MEG data into BIDS format."
        },
        {
            "Name": "MNE-Python",
            "Version": "1.6.1",
            "Description": "Used for MEG preprocessing and analysis."
        }
    ],
    source_datasets=[
        {
            "URL": "https://example.com/source_dataset",
            "DOI": "10.1234/example.doi",
        }],
    acknowledgements=None,
    how_to_acknowledge="Cite (Doe et al., 2025) when using this dataset",
    funding=["The NHS", "The Uk government"],
    ethics_approvals="Ethical approval was granted by the University of ___ School of Psychology Ethics committee",
    references_and_links="https://mne.tools/mne-bids/stable/whats_new_previous_releases.html",
    doi="doi:https://doi.org/10.1016/j.tins.2017.02.004",
            overwrite=True,
            verbose=True)

----

<a id="json"></a>
## Manually updating an element in a JSON file:

Now that our dataset description file is up to date, we can work on editing the rest of our json files (if necessary). Each file uses similar code, with just a few key sections requiring adaptation.

Below the code, information on all of the files and their list of BIDS items, organised on priority. These lists should be used to compare against your files. 

To begin, you must change the `subject=` and `task=`, `suffix=` and `extension=` sections of `bids_path1` to match your chosen file's name. (For example, this line in the code below relates to the file 'sub-S001_task-task1_eeg.json').
> Due to differences in file location, the participants.json file requires the input `None` for the `subject=`, `task=` and `datatype=` sections and the coordsystem.json file requires an extra variable, `space=` to be inputted while `task=` must be `None`. This refers to the coordinate system being used, e.g.'CapTrak'. [see examples below]

Once these are re-defined, you can update one or more aspect(s) of the sidecar using the `entries = {}` dictionary. This accepts `key:value` pairs, separated by colons (:), wherein single quotation marks ('') indicate a variable name, while double quotation marks ("") indicate it's data entry.
>
##### The code below will display an example of a few formats the key-value pairs can present in, such as:
__Numerical__
    - A key-value pair where the value is a number (int/float).
>
__Written__
    - A key-value pair where the value is a string (text).
>
__Nested dictionary (1 level)__
    - A key-value pair where the value is a dictionary containing key-value pairs.
>
__Nested dictionary (2+ levels)__
    - A key-value pair where the value is a dictionary that contains one or more dictionaries.
>

#### Example output files can be found within the [BIDS documentation](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/electroencephalography.html).
> Note: this code uses the sidecar.json file as an example.

In [None]:
bids_path1 = BIDSPath(subject='S001', task='task1',
                     suffix='eeg', extension='.json', datatype='eeg',
                     root=root)

entries = {# Simple key-value pair for head circumference (numerical)
           'HeadCircumference': 58.0,
           # Simple key-value pair for manufacturer model name (written)
            'ManufacturerModelName':"Brain Products actiCHamp",
            # Nested dictionary for software versions (1-level)
            'SoftwareVersions' : {
                'MNE': "1.9.0",
                'BIDS': "1.10.0",
                'MNE-BIDS': "0.16.0"
                },
           # Nested dictionary for software filters (2-levels)
           'SoftwareFilters': {
                "Anti-aliasing filter":{
                "half-amplitude cutoff (Hz)": 500,
                "Roll-off": "6dB/Octave"
                }
                },
            }   

# Update the JSON file with your new entries
update_sidecar_json(bids_path1, entries, verbose=True)

----

# Sidecar JSON
>
This file should have a naming format similar to *_eeg.json in your 'eeg' subfolder. 

MNE-BIDS should automatically generate most of this information, but there may be missing information that's necessary for your dataset.
> Take note of any elements that are missing/incorrect; these can be updated using the above section of code.
>
#### BIDS components:
>
Necessary:
>
    1. EEGReference
    2. SamplingFrequency
    3. PowerlineFrequency
    4. SoftwareFilters
    5. TaskName
>
Recommended:
>
    6. TaskDescription
    7. Instructions
    8. CogAtlasID
    9. CogPOID
    10. CapManufacturer
    11. CapManufacturerModelName
    12. SoftwareVersions
    13. DeviceSerialNumber
    14. EEGChannelCount
    15. ECGChannelCount
    16. EMGChannelCount
    17. EOGChannelCount
    18. MISCChannelCount
    19. TriggerChannelCount
    20. RecordingDuration
    21. RecordingType
    22. EpochLength
    23. EEGGround
    24. HeadCircumference
    25. EEGPlacementScheme
    26. HardwareFilters
    27. SubjectArtefactDescription
    28. InstitutionName
    29. InstitutionAddress
    30. InstitutionalDepartment Name
>
Optional:
>
    31. ElectricalStimulation
    32. ElectricalStimulationParameters



# Coordinate System JSON
This file should have a naming format similar to *_coordsystem.json in your 'eeg' subfolder. 

MNE-BIDS should automatically generate most of this information, but there may be missing information that's necessary for your dataset.
>Take note of any elements that are missing/incorrect; these can be updated using the next section of code.

#### BIDS components:
>
Necessary:
>
    1. EEGCoordinateSystem
    2. EEGCoordinateUnits
    3. EEGCoordinateSystemDescription
Recommended:
>
    4. FiducialsDescription
    5. FiducialsCoordinates
    6. FiducialsCoordinateSystem
    7. FiducialsCoordinateUnits
    8. FiducialsCoordinateSystemDescription
    9. AnatomicalLandmarkCoordinates
    10. AnatomicalLandmarkCoordinateSystem
    11. AnatomicalLandmarkCoordinateUnits
    12. AnatomicalLandmarkCoordinateSystemDescription
>
Optional:
>
    13. IntendedFor

> Note: This file requires an extra variable, `space=` to be inputted while `task=` must be `None`. This refers to the coordinate system being used, e.g.'CapTrak'.

Example:

In [None]:
bids_path1 = BIDSPath(subject='001', task=None,
                     suffix='coordsystem', extension='.json', datatype='eeg',
                     root=root, space='CapTrak')

# Events JSON
This file should have a format similar to *_events.json in your 'eeg' subfolder. 

This will serve as the explanatory counterpart to the events.tsv file. Any edits made to the contents of the tsv file should be mirrored here, with a description.
MNE-BIDS will automatically input the majority of this information, but you may wish to edit these descriptions to be more accurate, or add a description for a new entry.
> Take note of any elements that are missing/incorrect; these can be updated using the above section of code.
>
#### BIDS components:
>
Necessary:
>
    1. Onset
    2. Duration
>
Recommended:
>
    n/a
>
Optional:
>
    3. TrialType
    4. ResponseTime
    5. HED
    6. StimFile
    7. Channel
>

# Participants JSON
The participants.json file exists as a counterpart to the participants.tsv file and is used to describe the TSV column names and the properties of their values, making interpretation easier, especially in the case of dataset-specific columns. 
>
MNE-BIDS will automatically input the majority of this information, but you may wish to edit these descriptions to be more accurate, and should add additional descriptions for each new entry added to the participants.tsv file (e.g. education level) using the above code.
> Take note of any elements that are missing/incorrect; these can be updated using the above section of code.

#### BIDS Components:
>
Necessary:
>
    1. Participant ID 
>
Recommended:
>
    2. Species
    3. Age
    4. Sex
    5. Handedness
    6. Strain
    7. Strain RRID
>
Optional:
>
    - Additional participant information may be included to further bolster your metadata.

> Note: This file requires the input `None` for the `subject=`, `task=` and `datatype=` sections

Example:

In [None]:
bids_path1 = BIDSPath(subject=None, task=None,
                     suffix='participants', extension='.json', datatype=None,
                     root=root)

----

#### As it requires creation of a whole new folder and file path, this next file, although still including json and tsv files, requires slighhtly different code.

<a id="phenotype"></a>
# Phenotype files
> Optional
> Datasets with multiple sets of participant level measurements (such as responses from multiple questionnaires) may benefit from being split into files separate from the participants files.

The only requirements for these files are that their first column is participant_id, that their rows correspond directly with the subjects in the BIDS dataset, and they have a descriptive name. 

First, we must create the folder we want to write the file to. 

In the file path section, we will be adding "phenotype" to the end of our 'root' file pathway to create a new folder named 'phenotype' there. You must then define the file name you wish to create by editing `"descriptive_file_name.tsv"` to include a file name that accurately represents the contents of the file.

In [None]:
# Setting the file path for the new folder
phenotype_1_folder = Path(root / "phenotype")

# Creating the new folder
phenotype_1_folder.mkdir(parents=True, exist_ok=True)

# Assigning the new tsv file path to the new folder location
phenotype_tsv_path = phenotype_1_folder / "descriptive_file_name.tsv"

Now, you can add your data to the key:value pairs in the dictionary below to set the contents of the tsv file. 

The 'keys' will become the column headers, while the 'values' will be the data assigned to each row for that header.

In [None]:
# Assigning the file's contents (column:row entry) using a data frame
phenotype_1 = {
        'participant_id': ["Sub-001", "Sub-002"], 
        'Related_Key': "Related_Value"
}

# Adding the data frame to the new file
descriptive_file_name_phenotype = pd.DataFrame(phenotype_1)

# Writing the changes to the file
descriptive_file_name_phenotype.to_csv(phenotype_tsv_path, sep="\t", index=False, na_rep="n/a")

#### It is recommended that this file is accompanied by a descriptive json file, explaining each of its columns.
First, let's create the file path for this.

In [None]:
# Assigning the new json file path to the new folder location
phenotype_json_path = phenotype_1_folder / "descriptive_file_name.json"

Next, edit the 'entries' dictionary to include the description information for every variable entered into the tsv file. This will accept key:value pair formats, including nested dictionaries.

We will also import the `json` tool to allow us to create the new json file

Then, we will write this to the json file.

In [None]:
# Importing the json tool
import json

# Creating the data entries for the json file
entries = { 
        'participant_id': {"Description": "The participant's unique identifier code"}, 
        'Related_Key': {"Description": "An input related to the phenotype file you are creating"}
        }   

# Writing the changes to the json file    
with open(phenotype_json_path, "w") as outfile:
    json.dump(entries, outfile, indent=4)

-----

<a id="tsv"></a>
## Manually updating a TSV file:

This next section will walk you through editing your TSV files. 

#### Example output files can be found within the [BIDS documentation](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/electroencephalography.html).
> Note: this code uses the participants.tsv file as an example.

To edit this file, we must first edit the double quotation marks ("") to match the full name of your selected tsv file, which will usually be combined with the `eeg_root` variable to create the file path. However, with the participants file, due to its different location in the folder structure, we will substitute this for the `root` variable.

This will ensure that the variable `file_name_tsv` refers to the file we're working with.

In [None]:
file_path = root / "participants.tsv"
# Assigning the .tsv file to a variable
file_tsv = pd.read_csv(file_path, sep='\t')

#### Adding/ Editing a column:

Both of these functions can be managed using the same section of code!

First, you should edit the 'Inputs' list to include the variables you wish to add to your new or pre-existing column. This should be done in row order, beginning with the entry for the first row in the file, and an entry must be submitted for each row.

Then, you should change the text in double quotation marks ("") within `file_tsv["__"]`, to either title of the pre-existing column you wish to add to, or the title of the new column you wish to generate.

In [None]:
# Listing the desired inputs for the chosen column
Inputs = ["High School", "A-level", "Bachelors", "PhD"]  

# Setting the rows in the chosen column to the inputs listed above
file_tsv["Education"] = Inputs

#### Editing one row:
 
To edit a single row, you must use the `.loc` function, which allows us to select a row via it's label under the first column. These column and label names will change from file to file, but in this case, we will use the column header participant ID (set in the first double quotation marks), and set the participant ID number (in the second set of double quotation marks) to match that of the row we hope to edit. 

From there, you can edit the column name to match the one you'd like to edit (this can be different to the column used prior) (in the third set of double quotation marks) and then the item you'd like to assign to the location (in the fourth set of double quotation marks).

In [None]:
# Editing just one row in the .tsv file
file_tsv.loc[file_tsv["participant_id"] == "sub-001", "Education"] = "none"

#### Removing a row:

To remove a row, you must use the `.drop` function, to which you assign an index, which is the number assigned to the row you wish to remove. 
> Note: Indexes begin from 0, so the 'first' row will be #0, the 'second' row will be #1 and so on.

In [None]:
# Removing a row from the .tsv file using the row's index
file_tsv = file_tsv.drop(index=1)

#### Adding a row:

To add a row, you must first create a new data frame containing all of the columns and their values (in key:value pairs) that you want to add to the new row (do so by editing the text in the double quotation marks in the first line, and adding new key:value pairs where necessary). The 'key' should relate to the column (and have the same name as an existing one), while the 'value' should relate to the new input.

This will then be combined with the current data frame (participants_tsv).

In [None]:
# Creating a new data frame for the new row
new_row = pd.DataFrame([{"participant_id": "sub-002", "age": 30, "sex":"M"}])

# Combining the new row with the existing file_tsv data frame
file_tsv = pd.concat([file_tsv, new_row], ignore_index=True)

Finally, after completing any necessary changes, we __must__ write them back to the file with the code below:

In [None]:
# Writing the change to the file
file_tsv.to_csv(file_path, sep= '\t', index=False, na_rep='n/a')

----

# Channels Description TSV
>
This should have a format similar to *_channels.tsv in your 'eeg' subfolder. 

Once you have located the file, you should open it and look through the components it lists. Below is a list of information BIDS needs/suggests for this file. Take note of which elements are missing or incorrect.
>
#### BIDS components:
>
Necessary:
>
    1. Name
    2. Type
    3. Units
Recommended
>
    n/a
>
Optional:
>
    4. Description
    5. SamplingFrequency
    6. Reference
    7. LowCutoff
    8. HighCutoff
    9. Notch
    10. Status
    11. StatusDescription

# Electrodes Description TSV
This should have a format similar to *_electrodes.tsv in your 'eeg' subfolder. 

Once you have located the file, you should open it and look through the components it lists. Below is a list of information BIDS needs/suggests for this file. Take note of which elements are missing/incorrect.
>
#### BIDS components:
>
Necessary:
>
    1. X
    2. Y
    3. Z
>
Recommended:
>
    4. Type
    5. Material
    6. Impedance
>
Optional:
>
    n/a

# Events TSV
This file's name should have a format similar to *_events.tsv in your 'eeg' subfolder. 

This is the main dataset file for 'events', containing the table of variables and their values, which are defined in the json file.

Once you have located this file, you should open it and look through it's variables. Below is a list of information BIDS needs/suggests for this file. 
> Take note of any elements that are missing/incorrect; these can be updated using the next section of code.
>
#### BIDS components:
>
Necessary:
>
    1. Onset
    2. Duration
>
Recommended:
>
    n/a
>
Optional:
>
    3. TrialType
    4. ResponseTime
    5. HED
    6. StimFile
    7. Channel
>

# Participants TSV
The participants.tsv file includes a table containing participant information relevant to the dataset. It is accompanied by the participants.json file, which provides more in-depth explanations for this information.
>
MNE-BIDS will automatically input the majority of this information, but you may wish to edit the file in order to add more columns to include further participant information.

#### BIDS Components:
>
Necessary:
>
    1. Participant ID 
>
Recommended:
>
    2. Species
    3. Age
    4. Sex
    5. Handedness
    6. Strain
    7. Strain RRID
>
Optional:
>
    - Additional participant information may be included to further bolster your metadata.

---

<a id="validator"></a>
# Validating your BIDS dataset
Now that you have completed your edits, we suggest checking your BIDS files against this [BIDS validator](https://bids-standard.github.io/bids-validator/) to check how closely your dataset complies with BIDS formatting, and whether there is anything that may need further adapting.

<a id="cite"></a>
# How to cite MNE-BIDS

#### As we used their tools to generate our BIDS formatted dataset, we must cite MNE-BIDS somewhere within it!
The following code will automatically do this for you:


In [None]:
readme = bids_root / "README"
with open(readme, encoding="utf-8-sig") as fid:
    text = fid.read()
print(text)

----

# Our Citations
MNE-BIDS
> Appelhoff, S., Sanderson, M., Brooks, T., Vliet, M., Quentin, R., Holdgraf, C., Chaumon, M., Mikulan, E., Tavabi, K., Höchenberger, R., Welke, D., Brunner, C., Rockhill, A., Larson, E., Gramfort, A., & Jas, M. (2019). MNE-BIDS: Organizing electrophysiological data into the BIDS format and facilitating their analysis. Journal of Open Source Software, 4:1896. DOI: 10.21105/joss.01896
>
>  Pernet, C.R., Appelhoff, S., Gorgolewski, K.J. et al. EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Sci Data 6, 103 (2019). https://doi.org/10.1038/s41597-019-0104-8
>
MNE-Python
> Alexandre Gramfort, Martin Luessi, Eric Larson, Denis A. Engemann, Daniel Strohmeier, Christian Brodbeck, Roman Goj, Mainak Jas, Teon Brooks, Lauri Parkkonen, and Matti S. Hämäläinen. MEG and EEG data analysis with MNE-Python. Frontiers in Neuroscience, 7(267):1–13, 2013. doi:10.3389/fnins.2013.00267.
>
BIDS
> Gorgolewski, K.J., Auer, T., Calhoun, V.D., Craddock, R.C., Das, S., Duff, E.P., Flandin, G., Ghosh, S.S., Glatard, T., Halchenko, Y.O., Handwerker, D.A., Hanke, M., Keator, D., Li, X., Michael, Z., Maumet, C., Nichols, B.N., Nichols, T.E., Pellman, J., Poline, J.-B., Rokem, A., Schaefer, G., Sochat, V., Triplett, W., Turner, J.A., Varoquaux, G., Poldrack, R.A. (2016). The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data, 3 (160044). doi:10.1038/sdata.2016.44
>
>  Pernet, C. R., Appelhoff, S., Gorgolewski, K.J., Flandin, G., Phillips, C., Delorme, A., Oostenveld, R. (2019). EEG-BIDS, an extension to the brain imaging data structure for electroencephalography. Scientific data, 6 (103). doi:10.1038/s41597-019-0104-8
>
EEG Motor Movement/Imagery Dataset
> Schalk, G., McFarland, D.J., Hinterberger, T., Birbaumer, N., Wolpaw, J.R. BCI2000: A General-Purpose Brain-Computer Interface (BCI) System. IEEE Transactions on Biomedical Engineering 51(6):1034-1043, 2004.
>
PhysioNet
> Goldberger, A., Amaral, L., Glass, L., Hausdorff, J., Ivanov, P. C., Mark, R., ... & Stanley, H. E. (2000). PhysioBank, PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals. Circulation [Online]. 101 (23), pp. e215–e220. RRID:SCR_007345.


-----

### Adapting code to iterate through all participants


The following block of code is intended to be run in one go. It uses the same code as explained earlier in the pipeline, but has no visualisations and has some added code to allow it to format all of your participant data into BIDS compliancy automatically, with minimal input.

Individual participants data are stored in a directory within the root data directory. We're only going to be "BIDS-ifying" the first file/condition for each participant. So, we need to iterate over all the directories (excluding any other files in the root directory) and then access the first eeg data (e.g.'.edf') file in each of them to process.

To input your participants' information into this code, it is necessary that you locate and edit the MNE_BIDS_sheet excel document, stored within the same ['RosettaState'](https://github.com/ubdbra001/RosettaState/tree/MNE-BIDS-pipeline-post-feedback) repository as the current document. You should first delete the example inputs, then insert the correct information for each participant under the related column name. This should be done for every sheet (subject_info, events, age). This spreadsheet will include formatting examples for each of the sheets.

There are some minor edits that you will need to conduct within the code to allow it to run with your dataset. Within the code, you should edit:
- The file path for the 'root_data_str' variable -> change the text in single quotation marks to the file path for the folder containing your entire dataset.
- The reading function for your `mne.io.read_raw_` function -> set it to your data type (e.g. `read_raw_eeglab`)
- The file suffix for your data type -> set the double quotation marks to include the dataset's file type (e.g. ".edf" )
- The name of the output file for your BIDS dataset -> edit the text in double quotation marks, avoiding using spaces
- The variable 'Orig_time' -> the origin time for the dataset, set this to `None` to use meas_date
- The inputs for each of the 'raw.info' items (excluding subject_info) -> edit the second set of double quotation marks or swap out the integer (either after the colon or equals symbol)
> "device_info"
> "line_freq"
> "description"
> "dev_head_t"
> "experimenter"
- Your dataset's montage -> change the text in the double quotation marks to match the name of the dataset's montage
- The task name -> set the text in double quotation marks to match your task name
- The code for updating json files universally -> change suffix to match the json file name, change 'entries' to the inputs you wish to edit
- The code for updating tsv files universally -> change suffix to match the tsv file name, add in the code for your chosen edit type (add/remove/edit) (collected from the explanation section above)
- The dataset description entries -> edit the values for each key:value pair

> Note: each of the sections you must edit should be labelled [edit] in the in-code comments

#### Subject_info
Due to between-participant differences in inputs for the 'subject_info' entry, in order to automatically input the data, we must use a different method. 
We will use a spreadsheet, in which you must input entries for all of the subject_info variables (in columns) for every participant (one per row) in order, starting with the lowest participant id. The process also requires a 'for' loop, which will input information to the 'info' variable and create a BIDS formatted dataset for each participant's files.

The required variables are:
- his_id - The string subject identifier
- last_name - The participant's last name
- first_name - The participant's first name
- middle_name - The participants middle name
- sex - The biological sex of the participant (0 = unknown, 1 = male, 2 = female)
- hand - Whether the participant is right handed (1), left handed (2) or ambidextrous (3)
- weight - Weight in kilograms
- height - Height in meters
- age - Age in years
- meas_date - The measurement date for the participant's data

#### Ages
To input participant age into the dataset, you should edit the age column of the 'subject_info' sheet to match the ages of all of the participants present, in subject order.

We have chosen to use age instead of birthdate, as it is more commonly collected. 
So, if your dataset uses birthdate instead of age, you may benefit from using the code below to convert your birth dates into ages, using the measuring date of the dataset. To do so, you must simply enter your birth dates into the variable 'dob_list' in participant order, in the format `date(YYYY/M/D)`. This will output a list of ages that you can insert into the 'ages' sheet.

#### Code for changing birthdate to age

In [None]:
# Importing necessary tools
from datetime import date

# Setting the measurement date
raw.set_meas_date(datetime(2025, 6, 7, tzinfo= timezone.utc))

# Function for calculating age, using date of birth and measurement date
def calculate_age(dob):
    recording_date = raw.info["meas_date"].date()
    return recording_date.year - dob.year - ((recording_date.month, recording_date.day) < (dob.month, dob.day))

# Create a list of the participants' birthdates by inputting them into the below brackets
dob_list = date(2005, 3, 3), date(2000, 6, 8), date(1985, 2, 3)
# Calculates age for each participant 
for age in dob_list:
    print(calculate_age(dob))

NameError: name 'raw' is not defined

#### Events 

To input event information into your dataset, you should edit the 'events' sheet within the excel spreadsheet. 

The information that will need inputting includes:
- Participant -> should match the participant id of the individual
- Description -> A short description of the event
- Duration -> the duration of the event in seconds
- Onset -> the onset time of the event in seconds (relative to origin time)
- Event_id -> the event's unique identifier number (to be defined by you)

> Note: Ch_names -> the exact name of the channel associated with the event [Specific channel names related to events cannot currently be inputted to events, events are all assigned to all channels]


#### Setting missing values
In the spreadsheet, all columns must be filled for each participant. To indicate a missing value, any integer inputs (e.g. sex) can be changed to '0', while string (written) inputs (e.g. middle_name) can be changed to a consistent identifier, such as 'missing'. Note: id and his_id cannot be set as missing.

In [None]:
# Importing tools
import mne

from mne_bids import BIDSPath, write_raw_bids
import os.path as op
from pathlib import Path
from mne_bids import make_dataset_description, update_sidecar_json

import pandas as pd

from datetime import datetime, timezone
from dateutil.relativedelta import relativedelta
from dateutil import parser
from mne.transforms import Transform

# Sets the root directory of the dataset [edit]
root_data_str = r'C:\N8_internship_code\Motor_Imaging_Dataset'

# Setting overall file location [edit]
data_dir = Path(root_data_str)

# Set the right reading function for your specific data type [edit]
read_data = mne.io.read_raw_edf
#Set the suffix for your correct file format [edit]
file_suffix = ".edf"

# Making a list of (only) participant folders from the 'data_dir' file path to access the first .edf/.set/etc. file and allow for iteration through each participant
folders = [p_folder for p_folder in data_dir.iterdir() if p_folder.is_dir()]

# Setting the file path to the excel file containing your metadata
meta_data_path = data_dir / "MNE_BIDS_sheet.xlsx"

# Opens the excel sheet and loads the 'subject_info' and 'events' sheets into data frames
with pd.ExcelFile(meta_data_path) as xls:
    subject_sheet = pd.read_excel(xls, 'subject_info')
    events_sheet = pd.read_excel(xls, 'events')

# Turns the 'subject_sheet' data frame (created from our excel sheet) into a list of dictionaries (one per participant), containing key:value pairs
# Keys = Excel columns, Values = The column's data input
subject_dict = subject_sheet.to_dict(orient='records')

# Creating the output file location for the dataset [edit]
bids_root = op.join(data_dir.parent, "New_BIDS")

# Sets the origin_time [to use meas_date, change this to equal 'None'] [edit]
orig_time = datetime(1975, 3, 4, tzinfo=timezone.utc)

# Turns the 'events_sheet' data frame (created from our excel sheet) into a list of dictionaries (one per event), containing key:value pairs
events_dict = events_sheet.to_dict(orient='records')

event_id_list = []
description_list = []
onset_list = []
channel_list = []
duration_list = []

# Looping through every event in the events_dict, using enumerate to get the index, then adding the separate values to lists
for index, event in enumerate(events_dict):
    event_list = list(event.values())

    event_id_list.append(event_list[0])
    description_list.append(event_list[1])
    duration_list.append(event_list[2])
    onset_list.append(event_list[3])

# Collating and setting the annotation lists to 'raw'
annotations = mne.Annotations(onset=onset_list, duration=duration_list, description=description_list, orig_time=orig_time)

# The id associated with each event (via its description)
event_id = dict(zip(description_list, event_id_list))

# Looping through every participant folder and its index in the folders list, using enumerate
for index, participant_dir in enumerate(folders):

    files = [file for file in participant_dir.iterdir() if file.suffix == file_suffix]
    file = files[0]

    # Setting the file location for the current participant's data
    raw = read_data(file)

    # Setting the annotations for the dataset, using 'raw'
    raw.set_annotations(annotations)
    # Generates events from 'annotations'
    events, _ = mne.events_from_annotations(raw, event_id=event_id)
    # Collecting the file name (also participant id) from the file
    file_name = participant_dir.name

    # Sets the measurement date of the participant's data using inputs from the subject_info sheet
    meas_dates = subject_sheet["meas_date"].tolist()
    date = parser.parse(meas_dates[index]).replace(tzinfo= timezone.utc)
    raw.set_meas_date(date)

    age_list = subject_sheet['age']
    # Generates the (approximate) birthdate of the participant based on the measurement date and age
    recording_date = raw.info["meas_date"]
    birthday = recording_date - relativedelta(years=age_list[index])

    # Removes the 'age' and 'meas_date' keys from the subject_dict for the current participant to allow for subject_info to be set
    subject_dict[index].pop('age')
    subject_dict[index].pop('meas_date')
    # Sets the 'subject info' metadata to the data for the current participant from the spreadsheet list, and sets the birthday using 'birthdate'
    raw.info["subject_info"] = {**subject_dict[index], "birthday": birthday}

    # Setting the dataset's metadata manually (inputs will often be the same for all participants) [edit]
    raw.info["device_info"] = {
        "type": "EEG",
        "model": "12-channel EEG",
        "serial": 33456423
        }
    raw.info["line_freq"] = 50
    raw.info["description"] = "a resting state dataset"
    raw.info["dev_head_t"] = Transform("meg", "head")
    raw.info["experimenter"] = "John Doe"
    raw.info["bads"] = ["T8..", "Oz.."]
    my_montage = mne.channels.make_standard_montage("biosemi64")
    raw.set_montage(my_montage, match_case=True, match_alias=False, on_missing='ignore', verbose=None)

    # Setting participant id, task name should be inputted manually [edit]
    subject_id = file_name
    task = "task1"

    # Writing the BIDS dataset
    bids_path = BIDSPath(subject=subject_id, task=task, datatype='eeg', root=bids_root)
    # Writes the raw data to the BIDS dataset, overwriting any existing files, using the EDF format [edit]
    write_raw_bids(raw, bids_path, events, event_id, overwrite=True, allow_preload=True, format="EDF")
    
    # Any global edits to a specific json file (e.g. sidecar file) can be done with the code below [edit the suffix and extension] [edit]
    json_path = bids_path.update(extension=".json", suffix='eeg')
    # Input any changes to the inputs
    entries = {'HeadCircumference': 58.0,}
    # Write the changes to the file
    update_sidecar_json(json_path, entries, verbose=True)

    # Any global edits to a specific json file (e.g. sidecar file) can be done with the code below [edit the suffix and extension] [edit]
    tsv_path = bids_path.update(extension=".tsv", suffix='channels').fpath
    # Open the tsv file as a dictionary
    file_tsv = pd.read_csv(tsv_path, sep='\t')
    # Input the code for the desired change to the file (add/remove/edit)
    # e.g. file_tsv = file_tsv.drop(index=1)
    # Write the changes to the file
    file_tsv.to_csv(tsv_path, sep= '\t', index=False, na_rep='n/a')

#You can also add a dataset description file [edit]
make_dataset_description(
    path=bids_root,
    name="EEG Motor Movement/ Imagery Dataset", 
    hed_version="1",
    dataset_type='raw',
    data_license="CCO",
    authors=["John Doe", "Jane Doe"],
    generated_by=[
        {
            "Name": "MNE-BIDS",
            "Version": "0.14",
            "Description": "Used to convert MEG data into BIDS format."
        },
        {
            "Name": "MNE-Python",
            "Version": "1.6.1",
            "Description": "Used for MEG preprocessing and analysis."
        }
    ],
    source_datasets=[
        {
            "URL": "https://example.com/source_dataset",
            "DOI": "10.1234/example.doi",
        }],
    acknowledgements=None,
    how_to_acknowledge="Cite (Doe et al., 2025) when using this dataset",
    funding=["The NHS", "The Uk government"],
    ethics_approvals="Ethical approval was granted by the University of ___ School of Psychology Ethics committee",
    references_and_links="https://mne.tools/mne-bids/stable/whats_new_previous_releases.html",
    doi="doi:https://doi.org/10.1016/j.tins.2017.02.004",
            overwrite=True,
            verbose=True)

Extracting EDF parameters from C:\N8_internship_code\Motor_Imaging_Dataset\S001\S001R01.edf...
EDF file detected


Setting channel info structure...
Creating raw.info structure...
Writing 'C:\N8_internship_code\New_BIDS\participants.tsv'...
Writing 'C:\N8_internship_code\New_BIDS\participants.json'...
Writing 'C:/N8_internship_code/New_BIDS/sub-S001/eeg/sub-S001_space-CapTrak_electrodes.tsv'...
Writing 'C:/N8_internship_code/New_BIDS/sub-S001/eeg/sub-S001_space-CapTrak_coordsystem.json'...
Writing 'C:\N8_internship_code\New_BIDS\dataset_description.json'...
Writing 'C:\N8_internship_code\New_BIDS\sub-S001\eeg\sub-S001_task-task1_eeg.json'...
Writing 'C:\N8_internship_code\New_BIDS\sub-S001\eeg\sub-S001_task-task1_channels.tsv'...
Copying data files to sub-S001_task-task1_eeg.edf
Reading 0 ... 9759  =      0.000 ...    60.994 secs...
Writing 'C:\N8_internship_code\New_BIDS\sub-S001\sub-S001_scans.tsv'...
Wrote C:\N8_internship_code\New_BIDS\sub-S001\sub-S001_scans.tsv entry with eeg\sub-S001_task-task1_eeg.edf.
Writing 'C:\N8_internship_code\New_BIDS\sub-S001\eeg\sub-S001_task-task1_eeg.json'...
Ext

  raw.set_annotations(annotations)
  write_raw_bids(raw, bids_path, events, event_id, overwrite=True, allow_preload=True, format="EDF")
  write_raw_bids(raw, bids_path, events, event_id, overwrite=True, allow_preload=True, format="EDF")
  raw.set_annotations(annotations)
  write_raw_bids(raw, bids_path, events, event_id, overwrite=True, allow_preload=True, format="EDF")
  write_raw_bids(raw, bids_path, events, event_id, overwrite=True, allow_preload=True, format="EDF")
  raw.set_annotations(annotations)


KeyError: 2

This dataset should also cite [MNE-BIDS](#cite), and would benefit from being checked using the [BIDS validator](#validator).