# FlowKit Tutorial - Part 5 - The Session Class

https://flowkit.readthedocs.io/en/latest/?badge=latest

In this last part of the tutorial series we will cover the `Session` class. The `Session` class is conceptually similar to a FlowJo workspace, combining multiple `Sample` instances and multiple `GatingStrategy` instances. Samples can be assigned to one or more sample groups, and each sample group contains a `GatingStrategy` template that can be custimized per `Sample`. In this notebook we will use everything we've learned from the previous tutorials. The good news is there are no new classes or modules to learn!

If you have any questions about FlowKit, find any bugs, or feel something is missing from these tutorials [please submit an issue to the GitHub repository here](https://github.com/whitews/FlowKit/issues/new/).

In [1]:
import os
import bokeh
from bokeh.plotting import show
import matplotlib.pyplot as plt

import flowkit as fk

bokeh.io.output_notebook()
%matplotlib inline

_ = plt.ioff()

In [2]:
# check version so users can verify they have the same version/API
fk.__version__

'0.9.0'

## `Session` Class

The Session class is intended as the main interface in FlowKit for complex flow cytometry analysis. A Session represents a collection of gating strategies and FCS samples. FCS samples are added and assigned to sample groups, and each sample group has a single gating strategy template. The gates in a template can be customized per sample, however, the gate hierarchy must remain the same. Unlike the GatingStrategy class, which does not retain any Sample instances, the Session class will store the Sample instances that have been loaded. This is also true for the results of applying the gating strategy to samples.

Let's have a look at the constructor:

    Session(
        fcs_samples=None, 
        subsample=10000
    )

The argument `fcs_samples` may be a string or a list. If given a string, it can be a directory path or a file path. If a directory, any .fcs files in the directory will be loaded. If a list, then it must
be a list of file paths or a list of Sample instances. Lists of mixed types are not
supported.

The `subsample` argument is used to set the sub-sample count for loaded Sample instances. This is only used for plotting functions, any `Session` methods performing pre-processing and/or gating will always use all the events within each `Sample`. 

Many of the methods in the `Session` class are similar to those found in the `GatingStrategy` class, with the addition of an extra `group_name` argument to specify the sample group. Other methods will either add or retrieve objects, and there are a few methods for plotting gated events.

Finally, a `Session` supports the importing of GatingML-2.0 documents to create a new sample group, as well as importing FlowJo 10 workspace files. For FlowJO WSP files, any sample group withing the FlowJo file will be added as a sample group to the `Session`. However, support for custom FCS sample gates in FlowJo is not currently supported, but is planned for a future release of FlowKit.

Let's jump in and load some files and a FlowJo workspace. We'll then review the imported data and analyze the files.

In [3]:
help(fk.Session)

Help on class Session in module flowkit._models.session:

class Session(builtins.object)
 |  Session(fcs_samples=None, subsample=10000)
 |  
 |  The Session class is intended as the main interface in FlowKit for complex flow cytometry analysis.
 |  A Session represents a collection of gating strategies and FCS samples. FCS samples are added and assigned to sample
 |  groups, and each sample group has a single gating strategy template. The gates in a template can be customized
 |  per sample.
 |  
 |  :param fcs_samples: str or list. If given a string, it can be a directory path or a file path.
 |          If a directory, any .fcs files in the directory will be loaded. If a list, then it must
 |          be a list of file paths or a list of Sample instances. Lists of mixed types are not
 |          supported.
 |  :param subsample: Number of events to use as a sub-sample. If the number of
 |      events in the Sample is less than the requested sub-sample count, then the
 |      maximum n

In [4]:
# setup some file paths for our data
base_dir = "data/8_color_data_set"

sample_path = os.path.join(base_dir, "fcs_files")
wsp_path = os.path.join(base_dir, "8_color_ICS.wsp")

In [5]:
# Create a Session with the path to our FCS files. 
# Alternatively, FCS files can be added using the 'add_samples' method
session = fk.Session(sample_path)

# import a FlowJo 10 workspace file
session.import_flowjo_workspace(wsp_path)

In [6]:
# look at a summary of the session
session.summary()

Unnamed: 0_level_0,samples,loaded_samples,gates,max_gate_depth
group_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
default,3,3,0,0
All Samples,3,3,0,0
DEN,3,3,14,6


In [7]:
# get a list of sample groups
session.get_sample_groups()

['default', 'All Samples', 'DEN']

In [8]:
# From the summary, we can see all the "real" analysis is within the "DEN" group
sample_group = 'DEN'

In [9]:
# get the sample IDs that are included in the group
sample_list = session.get_group_sample_ids(sample_group)

In [10]:
sample_list

['101_DEN084Y5_15_E01_008_clean.fcs',
 '101_DEN084Y5_15_E05_010_clean.fcs',
 '101_DEN084Y5_15_E03_009_clean.fcs']

In [11]:
# so what is the gating hierarchy for this group?
print(session.get_gate_hierarchy(sample_group))

root
╰── Time
    ╰── Singlets
        ╰── aAmine-
            ╰── CD3+
                ├── CD4+
                │   ├── CD107a+
                │   ├── IFNg+
                │   ├── IL2+
                │   ╰── TNFa+
                ╰── CD8+
                    ├── CD107a+
                    ├── IFNg+
                    ├── IL2+
                    ╰── TNFa+


In [12]:
# looks good, let's go ahead and analyze them (using verbose mode to see each gate as it's processed)
session.analyze_samples(sample_group, verbose=True)

101_DEN084Y5_15_E01_008_clean.fcs: processing gate Time#### Processing gates for 3 samples (multiprocessing is enabled - 3 cpus) ####

101_DEN084Y5_15_E01_008_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E01_008_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E05_010_clean.fcs: processing gate Time
101_DEN084Y5_15_E05_010_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E05_010_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E03_009_clean.fcs: processing gate Time
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD3+
101_DEN084Y5_15_E03_009_clean.fcs: processing gate Singlets
101_DEN084Y5_15_E03_009_clean.fcs: processing gate aAmine-
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD4+
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD3+
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD3+
101_DEN084Y5_15_E05_010_clean.fcs: processing gate CD4+
101_DEN084Y5_15_E01_008_clean.fcs: processing gate CD8+
101_DEN084Y5_15_E03_009_clean.fcs: processing gate CD4+
101_

In [13]:
# and a look a the results
session.get_group_report(sample_group)

Unnamed: 0,sample,gate_path,gate_name,gate_type,quadrant_parent,parent,count,absolute_percent,relative_percent,level
0,101_DEN084Y5_15_E01_008_clean.fcs,"(root,)",Time,RectangleGate,,,290166,99.997932,99.997932,1
1,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time)",Singlets,PolygonGate,,Time,239001,82.365287,82.36699,2
2,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets)",aAmine-,PolygonGate,,Singlets,164655,56.743931,68.893017,3
3,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-)",CD3+,PolygonGate,,aAmine-,133670,46.065782,81.181865,4
4,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+)",CD4+,PolygonGate,,CD3+,82484,28.425899,61.707189,5
5,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+)",CD8+,PolygonGate,,CD3+,47165,16.254153,35.284656,5
6,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD4+)",CD107a+,RectangleGate,,CD4+,68,0.023434,0.08244,6
10,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD8+)",CD107a+,RectangleGate,,CD8+,73,0.025157,0.154776,6
7,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD4+)",IFNg+,RectangleGate,,CD4+,4,0.001378,0.004849,6
11,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD8+)",IFNg+,RectangleGate,,CD8+,2,0.000689,0.00424,6


In [14]:
# what if we want to review the gates for a sample
sample_id = '101_DEN084Y5_15_E01_008_clean.fcs'
sample_results = session.get_gating_results(sample_group, sample_id)
sample_results.report

Unnamed: 0,sample,gate_path,gate_name,gate_type,quadrant_parent,parent,count,absolute_percent,relative_percent,level
0,101_DEN084Y5_15_E01_008_clean.fcs,"(root,)",Time,RectangleGate,,,290166,99.997932,99.997932,1
1,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time)",Singlets,PolygonGate,,Time,239001,82.365287,82.36699,2
2,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets)",aAmine-,PolygonGate,,Singlets,164655,56.743931,68.893017,3
3,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-)",CD3+,PolygonGate,,aAmine-,133670,46.065782,81.181865,4
4,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+)",CD4+,PolygonGate,,CD3+,82484,28.425899,61.707189,5
5,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+)",CD8+,PolygonGate,,CD3+,47165,16.254153,35.284656,5
6,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD4+)",CD107a+,RectangleGate,,CD4+,68,0.023434,0.08244,6
10,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD8+)",CD107a+,RectangleGate,,CD8+,73,0.025157,0.154776,6
7,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD4+)",IFNg+,RectangleGate,,CD4+,4,0.001378,0.004849,6
11,101_DEN084Y5_15_E01_008_clean.fcs,"(root, Time, Singlets, aAmine-, CD3+, CD8+)",IFNg+,RectangleGate,,CD8+,2,0.000689,0.00424,6


In [15]:
# plot the gates for a sample
for i, row in sample_results.report.iterrows():    
    p = session.plot_gate(
        sample_group, 
        row['sample'], # 'sample' is a Pandas DataFrame method, so lookup explicitly
        gate_name=row.gate_name,
        gate_path=row.gate_path,
        x_min=0, 
        x_max=1.2, 
        y_min=0, 
        y_max=1.2
    )
    show(p)

### Extract gated event data

In [16]:
results = session.get_wsp_gated_events(group_name=sample_group, sample_ids=sample_list)

In [17]:
len(results)

3

In [18]:
# Gated event results is a list of DataFrames (in the order of the given sample_list)
# Rows are the individual events
# Columns are the channels (plus a sample_group & sample_id column)
results[0]

Unnamed: 0,sample_group,sample_id,FSC-A,FSC-H,FSC-W,SSC-A,SSC-H,SSC-W,TNFa FITC FLR-A,CD8 PerCP-Cy55 FLR-A,IL2 BV421 FLR-A,Aqua Amine FLR-A,IFNg APC FLR-A,CD3 APC-H7 FLR-A,CD107a PE FLR-A,CD4 PE-Cy7 FLR-A,Time
0,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.669193,0.550243,0.304044,0.145722,0.136929,0.266054,0.246065,0.297479,0.280577,0.248555,0.255891,0.532903,0.300733,0.566413,0.035940
1,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.470615,0.405136,0.290405,0.200456,0.187286,0.267580,0.243213,0.402105,0.281486,0.245255,0.249686,0.212227,0.278771,0.244356,0.035983
2,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.618339,0.518814,0.297958,0.165320,0.157120,0.263048,0.236940,0.638401,0.275555,0.247613,0.247531,0.466408,0.281839,0.268400,0.036026
3,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.466790,0.384033,0.303874,0.136375,0.127686,0.267014,0.241100,0.646098,0.272339,0.249497,0.249413,0.433891,0.275977,0.238305,0.036040
4,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.321537,0.268814,0.299033,0.119671,0.111221,0.268994,0.248873,0.478198,0.328231,0.258179,0.231071,0.487078,0.328163,0.288373,0.036068
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
290167,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.425627,0.361679,0.294202,0.195806,0.181198,0.270155,0.239368,0.660101,0.303709,0.260982,0.223855,0.450421,0.297519,0.310292,0.999530
290168,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.163615,0.140152,0.291853,0.114236,0.110733,0.257908,0.237720,0.253556,0.322732,0.298471,0.229956,0.232781,0.366494,0.231273,0.999530
290169,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.173885,0.145378,0.299023,0.062462,0.062111,0.251414,0.243496,0.244216,0.279439,0.311866,0.237088,0.211083,0.482445,0.209655,0.999545
290170,DEN,101_DEN084Y5_15_E01_008_clean.fcs,0.751028,0.538620,0.348589,0.485911,0.422043,0.287833,0.305464,0.340727,0.387555,0.324167,0.245175,0.265755,0.634425,0.236266,0.999559


In [19]:
# Retrieve all the gate IDs for a sample group
# Note a gate ID is a combination of the gate name plus its gate path
session.get_gate_ids(group_name=sample_group)

[('Time', ('root',)),
 ('Singlets', ('root', 'Time')),
 ('aAmine-', ('root', 'Time', 'Singlets')),
 ('CD3+', ('root', 'Time', 'Singlets', 'aAmine-')),
 ('CD4+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')),
 ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')),
 ('CD8+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')),
 ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')),
 ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+'))]

In [20]:
# Instead of getting the gated events, you can also
# retrieve the gate membership for all events.
# This is a boolean array (True value means the event is in the gate)
# Note: If the gate name is ambiguous, you must specify the gate path
session.get_gate_membership(group_name=sample_group, sample_id=sample_id, gate_name='Singlets')

array([False, False, False, ..., False, False,  True])

In [21]:
# Here we'll collect the gate membership arrays for all gates for a sample
results = {}

for gate_name, gate_path in session.get_gate_ids(group_name=sample_group):
    result = session.get_gate_membership(
        group_name=sample_group, 
        sample_id=sample_list[0], 
        gate_name=gate_name, 
        gate_path=gate_path
    )
    results[(gate_name, gate_path)] = result

In [22]:
results.keys()

dict_keys([('Time', ('root',)), ('Singlets', ('root', 'Time')), ('aAmine-', ('root', 'Time', 'Singlets')), ('CD3+', ('root', 'Time', 'Singlets', 'aAmine-')), ('CD4+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')), ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')), ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')), ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')), ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD4+')), ('CD8+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+')), ('CD107a+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')), ('IFNg+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')), ('IL2+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+')), ('TNFa+', ('root', 'Time', 'Singlets', 'aAmine-', 'CD3+', 'CD8+'))])

In [23]:
results[('aAmine-', ('root', 'Time', 'Singlets'))]

array([False, False, False, ..., False, False,  True])

### This concludes the tutorial series. From here, I recommend looking at the other notebooks in the examples directory for more advanced workflows using FlowKit.