## Import `fptools`
The first thing to do is import `fptools` as well as some other utilities.

- `fptools.io` contains data loading functionality via `load_data()`, and common data classes such as `Session`, `SessionCollection`, `Signal`, etc.
- `fptools.io.test` contains functions to download a test data set.
- `fptools.preprocess.pipelines` contains some prebuilt preprocessing pipelines.


In [1]:
import os

from fptools.io import load_data, Session, SessionCollection
from fptools.preprocess.pipelines import LowpassDFFPipeline
from fptools.io.test import download_test_data, list_datasets

## Let's download some test/demo data
The following code will download a demo dataset to your local computer, if it does not already exist. The variable `test_data_path` will contain the path to the root of the test dataset folder. If the parameter `dest` is `None`, the data will be downloaded to a location within the current user's home directory.

Use the function `list_datasets()` to list the names of the available datasets within the demo data.

In [2]:
test_data_path = download_test_data(dest=None)
list_datasets()

Test data appears to already be in place at "C:\Users\thackray\fptools_test_data".


['MA-PR4-4Day', 'TDT-DLS-GRABDA2m-Male-PR4-2Day']

## Time to load the data

The `load_data()` function from `fptools.io` is your main gateway to loading data in a variety of formats. Shown below is an example of loading data from a TDT tank. See the comments below for a brief explanation of the parameters for `load_data()`.

The `load_data()` function will return to you a `SessionCollection`, which is really a `list` with extra functionality built-in. This means you can largely use it any way you would use a normal python list.

For this example, we will load fiber photometry data collected using a TDT system. Data was collected from the DLS using the GRABDA2m sensor in male mice (~2mo) in a progressive ratio task across two days. For brevity, only the first 10 minutes of each session is included in this dataset.

In [6]:
# `tank_path` is the path to the dataset we want to load
tank_path = os.path.join(test_data_path, "TDT-DLS-GRABDA2m-Male-PR4-2Day")

rename_map = {
    "signals": {
        '_465A': 'Dopamine',
        '_415A': 'Isosbestic',
    },
    "epocs": {
        "RNP_": "RNP",
        "RMG_": "RMG",
        "URM_": "URM",
    }
}

sessions = load_data(tank_path,

                     # path to a manifest file for additional metadata to inject into sessions
                     # manifest index tells the system how to key the manifest and assoicate to data
                     # for TDT data, Sessions are named as the blockname.
                     manifest_path=os.path.join(tank_path, 'manifest.xlsx'),
                     manifest_index='blockname',

                     # use multiple worker processes to load data in parallel
                     max_workers=4,

                     # The locator tells the system what type of data we are looking for.
                     # more complex functionality is available, but most folks could use the string "tdt" for TDT tank files
                     # or the string "ma" for med-associates style files.
                     locator="tdt",

                     # a function to run any preprocessing needed (see library of pre-made pipelines in `fptools.preprocess.pipelines`)
                     preprocess=LowpassDFFPipeline(
                         signals=['_465A', '_415A'],
                         rename_map=rename_map,
                         trim_extent="auto",
                         downsample=10,
                         plot=True,
                         plot_dir=os.path.join(tank_path, 'cache_lowpass_dff'),
                     ),

                     # control if we want to cache files, and where that cache should live
                     cache=True,
                     cache_dir=os.path.join(tank_path, 'cache_lowpass_dff'),
                     )

  0%|          | 0/16 [00:00<?, ?it/s]

## Get an overview of `SessionCollection` contents

The `SessionCollection.describe()` method will print you a string summary of the datasets contained within the collection. Next to each dataset, the number of sessions containing a dataset with that name is provided within the parenthesis.

In [7]:
sessions.describe()

Number of sessions: 16

Signals present in data with counts:
(16) "Fi1d"
(16) "Fi1r"
(16) "Dopamine"
(16) "Isosbestic"
(16) "Dopamine_lowpass"
(16) "Isosbestic_lowpass"

Epocs present in data with counts:
(16) "Cam1"
(16) "P1SC"
(16) "UnNP"
(16) "Nose"
(16) "Tick"
(16) "RNP"
(16) "RMG"
(16) "URM"

Scalars present in data with counts:
(16) "Fi1i"




## Get an overview of `Session` contents.

Similar to `SessionCollection.describe()`, there is also a `Session.describe()`. The information here is a bit more detailed than at the collection level. Where possibe, we provide summary statistics for each dataset.

In [8]:
sessions[0].describe()

Session with name "EN140_PRD1-240813-133324"

Metadata:
    tankpath: C:\Users\thackray\fptools_test_data\TDT-DLS-GRABDA2m-Male-PR4-2Day
    blockname: EN140_PRD1-240813-133324
    start_date: 2024-08-13 13:33:28.999999
    utc_start_time: 13:33:28
    stop_date: 2024-08-13 14:33:28.101190
    utc_stop_time: 14:33:28
    duration: 0:59:59.101191
    stream_channel: 0
    snip_channel: 0
    mouseID: EN140
    sex: M
    genotype: HET
    sensor: GRABDA2M
    hemisphere: L
    paradigm: PR4
    paradigm_day: 1
    cable in?: True
    cube: TDT1
    exclude: False
    notes: nan

Epocs:
    Cam1:
        num_events = (11999,)
        avg_rate = 0:00:00.077988
        earliest = 0:00:00.086999
        latest = 0:09:59.986012
    P1SC:
        num_events = (20,)
        avg_rate = 0:00:03.934781
        earliest = 0:00:01.126072
        latest = 0:08:28.837396
    UnNP:
        num_events = (97,)
        avg_rate = 0:00:10.214932
        earliest = 0:00:32.466371
        latest = 0:09:58.7

## Fetch metadata across sessions as a DataFrame

Use the `SessionCollection.metadata` property to retrieve a `pandas.DataFrame` containing metadata across sessions. *Note*: changing values in this dataframe will not have any effect on the underlying data. Instead, make these changes on the underlying `Session.metadata` property.

In [9]:
sessions.metadata

Unnamed: 0,tankpath,blockname,start_date,utc_start_time,stop_date,utc_stop_time,duration,stream_channel,snip_channel,mouseID,sex,genotype,sensor,hemisphere,paradigm,paradigm_day,cable in?,cube,exclude,notes
0,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN140_PRD1-240813-133324,2024-08-13 13:33:28.999999,13:33:28,2024-08-13 14:33:28.101190,14:33:28,0 days 00:59:59.101191,0,0,EN140,M,HET,GRABDA2M,L,PR4,1,True,TDT1,False,
1,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN157_PRD2-240815-122504,2024-08-15 12:25:08.999999,12:25:08,2024-08-15 13:25:08.098733,13:25:08,0 days 00:59:59.098734,0,0,EN157,M,WTY,GRABDA2M,R,PR4,2,True,TDT1,False,
2,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN140_PRD2-240814-140637,2024-08-14 14:06:42.999999,14:06:42,2024-08-14 15:06:42.194252,15:06:42,0 days 00:59:59.194253,0,0,EN140,M,HET,GRABDA2M,L,PR4,2,True,TDT1,False,
3,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN157_PRD1-240814-125543,2024-08-14 12:55:48.999999,12:55:48,2024-08-14 13:55:48.199822,13:55:48,0 days 00:59:59.199823,0,0,EN157,M,WTY,GRABDA2M,R,PR4,1,True,TDT1,False,
4,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN165_PRD1-240818-150154,2024-08-18 15:01:59.999999,15:01:59,2024-08-18 16:01:59.078417,16:01:59,0 days 00:59:59.078418,0,0,EN165,M,WTY,GRABDA2M,R,PR4,1,True,TDT2,False,
5,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN171_PRD1-240815-144154,2024-08-15 14:41:59.999999,14:41:59,2024-08-15 15:41:59.085298,15:41:59,0 days 00:59:59.085299,0,0,EN171,M,HET,GRABDA2M,L,PR4,1,True,TDT2,False,
6,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN165_PRD2-240819-144305,2024-08-19 14:43:10.999999,14:43:10,2024-08-19 15:43:10.081366,15:43:10,0 days 00:59:59.081367,0,0,EN165,M,WTY,GRABDA2M,R,PR4,2,True,TDT2,False,
7,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN171_PRD2-240816-125119,2024-08-16 12:51:24.999999,12:51:24,2024-08-16 13:51:24.078089,13:51:24,0 days 00:59:59.078090,0,0,EN171,M,HET,GRABDA2M,L,PR4,2,True,TDT2,False,
8,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN172_PRD1-240814-104140,2024-08-14 10:41:45.999999,10:41:45,2024-08-14 11:41:45.094145,11:41:45,0 days 00:59:59.094146,0,0,EN172,M,WTY,GRABDA2M,R,PR4,1,True,TDT2,False,
9,C:\Users\thackray\fptools_test_data\TDT-DLS-GR...,EN172_PRD2-240815-111039,2024-08-15 11:10:44.999999,11:10:44,2024-08-15 12:10:44.231935,12:10:44,0 days 00:59:59.231936,0,0,EN172,M,WTY,GRABDA2M,R,PR4,2,True,TDT2,False,


## Renaming
Renaming is useful because these name are often used by plotting functions for legends, labels, etc. So renaming your data appropriately will reflect better in plots. It may also be easier to reason about your code since the names are often given as parameters to several functions.

- Rename epocs across sessions using the `rename_epoc()` method
- Rename signals across sessions using the `rename_signal()` method
- Rename scalars across sessions using the `rename_scalar()` method


In [10]:
# Here, we will rename a few epocs from their TDT names, which contain underscores,
# to a more refined version of the epoc name.
sessions.rename_epoc('RNP_', 'RNP')
sessions.rename_epoc('RMG_', 'RMG')
sessions.rename_epoc('URM_', 'URM')

KeyError: 'Key `RNP` already exists in data!'

## Filtering
Filter session collections using the `filter()` method. This takes a predicate (i.e. a callable accepting a Session and returning a boolean to indicate inclusion or exclusion).

In [11]:
# here we define a suitable predicate function
def is_day1(session: Session) -> bool:
    return session.metadata['paradigm_day'] == 1

# and then apply the predicate
day1_sessions = sessions.filter(is_day1)
print(f"# day 1 sessions: {len(day1_sessions)}")


# another way to specify the predicate is via a lambda function
day2_sessions = sessions.filter(lambda session: session.metadata['paradigm_day'] == 2)
print(f"# day 2 sessions: {len(day2_sessions)}")


# day 1 sessions: 8
# day 2 sessions: 8


## Apply a function across all sessions
Often you may want to apply an arbitrary function across Sessions. For this, use the `apply()` method on `SessionCollection`.

In [12]:
# define a function that accepts a `Session` parameter, and returns `None`
# within, do what you please...
def add_rewards_earned_to_metadata(session: Session) -> None:
    '''Count the number of rewards earned and add this data to the session metadata

    Args:
        session: the session to interrogate and decorate with additional metadata
    '''
    rewards_earned = session.epocs['RNP'].shape[0]
    session.metadata['rewards_earned'] = rewards_earned
    print(f'Animal {session.metadata["mouseID"]} earned {rewards_earned} rewards')

# now apply the function
sessions.apply(add_rewards_earned_to_metadata)

# we can check for a side effect of this function,
# in this case this would be a new column in metadata
print("metadata columns: ", sessions.metadata_keys)

Animal EN140 earned 7 rewards
Animal EN157 earned 5 rewards
Animal EN140 earned 8 rewards
Animal EN157 earned 6 rewards
Animal EN165 earned 6 rewards
Animal EN171 earned 5 rewards
Animal EN165 earned 7 rewards
Animal EN171 earned 6 rewards
Animal EN172 earned 5 rewards
Animal EN172 earned 8 rewards
Animal EN175 earned 6 rewards
Animal EN175 earned 7 rewards
Animal EN181 earned 5 rewards
Animal EN181 earned 8 rewards
Animal EN182 earned 7 rewards
Animal EN182 earned 6 rewards
metadata columns:  ['sex', 'snip_channel', 'hemisphere', 'paradigm', 'duration', 'start_date', 'cube', 'utc_start_time', 'genotype', 'stream_channel', 'notes', 'blockname', 'tankpath', 'rewards_earned', 'paradigm_day', 'exclude', 'mouseID', 'sensor', 'stop_date', 'utc_stop_time', 'cable in?']
