<a href="https://colab.research.google.com/github/tb-harris/neuroscience-2024/blob/main/10_Feature_Extractor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Feature Extractor
We can use the feature extractor to get morphological data for features of interest. You can find the collection of features in the Allen Institute dataset [here](https://neuron-morphology.readthedocs.io/en/latest/autoapi/neuron_morphology/features/index.html).

You can find the steps used to load the .swf files used in this notebook [here](https://github.com/tb-harris/neuroscience-2024/blob/main/tools/Feature_Extractor_and_Reconstructions_Setup.ipynb).

This is based on code adapted from Curt and the Allen Institute.

## Setup

In [None]:
#Mount google drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Install the feature extractor library:

In [None]:
!pip install neuron_morphology

Load in cell metadata and manifest file containing reconstruction file names.

In [76]:
import pandas as pd

# Get the paths to the neuron reconstruction files (.swc files)
swc_paths = pd.read_hdf("/content/drive/Shareddrives/Lisman Laboratory/Lisman 2024/Neuro/Data/raw_data/reconstruction_paths.hdf")

Get the morphology of each cell (takes some time):

In [77]:
import os
from neuron_morphology.swc_io import morphology_from_swc

# Apply the morphology_from_swc() function to each file
morphologies = swc_paths.apply(lambda path : morphology_from_swc(path))

## Step 1: Import our features

Find our relevant feature(s) from the [documentation](https://neuron-morphology.readthedocs.io/en/latest/autoapi/neuron_morphology/features/index.html) and import the correct library.

Let's say, for example, that our features of interest are `max_path_distance` from the [morphology.features.path](https://neuron-morphology.readthedocs.io/en/latest/autoapi/neuron_morphology/features/path/index.html) module and `mean_bifurcation_angle_local` from [neuron_morphology.features.branching.bifurcations](https://neuron-morphology.readthedocs.io/en/latest/autoapi/neuron_morphology/features/branching/bifurcations/index.html):

In [80]:
from neuron_morphology.features.path import max_path_distance
from neuron_morphology.features.branching.bifurcations import mean_bifurcation_angle_local

## Step 2: Register our features
Register the features we want to use. For each new feature, add a new *specialize()* with the feature name, and the constant `NEURITE_SPECIALIZATIONS` (which indicates that we want features for all neurites -- you can narrow this down by including a different constant).

In [81]:
from neuron_morphology.constants import AXON, BASAL_DENDRITE, SOMA, APICAL_DENDRITE

from neuron_morphology.feature_extractor.marked_feature import specialize
from neuron_morphology.feature_extractor.feature_specialization import NEURITE_SPECIALIZATIONS
from neuron_morphology.feature_extractor.feature_extractor import FeatureExtractor
from neuron_morphology.feature_extractor.marked_feature import specialize

# Create a new feature extractor
fe = FeatureExtractor()
# Register our target features
fe.register_features([
    specialize(max_path_distance, NEURITE_SPECIALIZATIONS),
    specialize(mean_bifurcation_angle_local, NEURITE_SPECIALIZATIONS)
])

<neuron_morphology.feature_extractor.feature_extractor.FeatureExtractor at 0x7bf0002f4d60>

## Step 3: Extract features

Run the two cells below to create a dataframe with our features of interest.

In [82]:
from neuron_morphology.feature_extractor.utilities import unnest
from neuron_morphology.feature_extractor.data import Data

# Extract the features from a single neuron morphology object
def extract_features(neuron_morphology):
  data = Data(neuron_morphology)

  feature_extraction_run = fe.extract(data)
  results = feature_extraction_run.results

  unnest(results)
  return results

Create a data frame by running the *extract_features()* function on each neuron morphology (takes some time).

In [None]:
features = pd.DataFrame(
    (extract_features(neuron) for neuron in morphologies.values),
    index=morphologies.index
)

In [87]:
features

Unnamed: 0_level_0,basal_dendrite.max_path_distance,axon.max_path_distance,all_neurites.max_path_distance,dendrite.max_path_distance,basal_dendrite.mean_bifurcation_angle_local,axon.mean_bifurcation_angle_local,all_neurites.mean_bifurcation_angle_local,dendrite.mean_bifurcation_angle_local,apical_dendrite.max_path_distance,apical_dendrite.mean_bifurcation_angle_local
cell_specimen_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
601506507,523.700528,794.445695,794.445695,523.700528,1.312409,1.485740,1.453871,1.312409,,
601790961,325.760216,1061.473816,1061.473816,325.760216,1.223044,1.480208,1.450232,1.223044,,
601803754,281.814132,861.607466,861.607466,281.814132,1.675803,1.634313,1.637428,1.675803,,
601808698,262.570371,864.646669,864.646669,262.570371,1.399844,1.493160,1.486658,1.399844,,
601810307,273.631704,923.251790,923.251790,273.631704,1.178422,1.487785,1.463036,1.178422,,
...,...,...,...,...,...,...,...,...,...,...
992386952,322.624669,764.231479,764.231479,322.624669,1.378766,1.407236,1.404152,1.378766,,
992830261,307.323853,826.281806,826.281806,307.323853,1.335673,1.492918,1.462526,1.335673,,
993243528,311.198809,618.332210,618.332210,311.198809,1.334276,1.457965,1.422864,1.334276,,
993245688,289.194411,1155.124808,1155.124808,289.194411,1.425500,1.405364,1.407533,1.425500,,


Double check that your features dataframe has values for each of your target features across all the cells:

In [88]:
features

Unnamed: 0_level_0,basal_dendrite.max_path_distance,axon.max_path_distance,all_neurites.max_path_distance,dendrite.max_path_distance,basal_dendrite.mean_bifurcation_angle_local,axon.mean_bifurcation_angle_local,all_neurites.mean_bifurcation_angle_local,dendrite.mean_bifurcation_angle_local,apical_dendrite.max_path_distance,apical_dendrite.mean_bifurcation_angle_local
cell_specimen_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
601506507,523.700528,794.445695,794.445695,523.700528,1.312409,1.485740,1.453871,1.312409,,
601790961,325.760216,1061.473816,1061.473816,325.760216,1.223044,1.480208,1.450232,1.223044,,
601803754,281.814132,861.607466,861.607466,281.814132,1.675803,1.634313,1.637428,1.675803,,
601808698,262.570371,864.646669,864.646669,262.570371,1.399844,1.493160,1.486658,1.399844,,
601810307,273.631704,923.251790,923.251790,273.631704,1.178422,1.487785,1.463036,1.178422,,
...,...,...,...,...,...,...,...,...,...,...
992386952,322.624669,764.231479,764.231479,322.624669,1.378766,1.407236,1.404152,1.378766,,
992830261,307.323853,826.281806,826.281806,307.323853,1.335673,1.492918,1.462526,1.335673,,
993243528,311.198809,618.332210,618.332210,311.198809,1.334276,1.457965,1.422864,1.334276,,
993245688,289.194411,1155.124808,1155.124808,289.194411,1.425500,1.405364,1.407533,1.425500,,


## Step 5: Join with other dataframes


We can also combine our dataframe with existing dataframes -- for example, our dataframe that has all the genes and other morph features:

Let's load in our existing gene+morph dataframe (and drop the zero gene values):

In [89]:
# Read in our morph + genetic data
cell_data = pd.read_hdf('/content/drive/Shareddrives/Lisman Laboratory/Lisman 2024/Neuro/Data/processed data/genes_morph.hdf5.lz4')

# Gets all gene cols where the value for every cell is 0.
zero_genes = cell_data.columns[(cell_data == 0).all()]
# gene_data[zero_genes].sum().sum() # Double check these columns are 0

# Removes all of the columns in zero_genes from our dataframe
cell_data = cell_data.drop(zero_genes, axis="columns")

Let's combine this with our dataframe:

In [92]:
# creates a new dataframe that combines the cell_data and our newly calculated features
cell_data_combined = cell_data.join(features, how='inner')

Confirm that your new dataframe has your target columns:

In [96]:
# show last 10 columns
cell_data_combined.columns[-10:]

Index(['basal_dendrite.max_path_distance', 'axon.max_path_distance',
       'all_neurites.max_path_distance', 'dendrite.max_path_distance',
       'basal_dendrite.mean_bifurcation_angle_local',
       'axon.mean_bifurcation_angle_local',
       'all_neurites.mean_bifurcation_angle_local',
       'dendrite.mean_bifurcation_angle_local',
       'apical_dendrite.max_path_distance',
       'apical_dendrite.mean_bifurcation_angle_local'],
      dtype='object')

## Step 4: Save data

Save your features data to the shared drive. **Change the file name from _Mr\_Harris\_path\_bifurcation.hdf5_ to match your own name and data.**

In [97]:
features.to_hdf(
    '/content/drive/Shareddrives/Lisman Laboratory/Lisman 2024/Neuro/Data/student data/Mr_Harris_path_bifurcation.hdf5',
    key='features'
)

Save your combined dataframe as well. As before, **change the file name (stuff after the last /) to have your name and a description of the overall data, and to note that it is combined with the original cell data:**

In [98]:
cell_data_combined.to_hdf(
    '/content/drive/Shareddrives/Lisman Laboratory/Lisman 2024/Neuro/Data/student data/Mr_Harris_gene_morph_with_path.hdf5',
    key='cell_data'
)

Now, in any future notebook, we can just directly load in the dataframe that we created:

## Step 6: Loading in your data

Now that we've saved our data to Drive, we can load it in any notebook (so we won't need to run the code again):

**Change the file name at the end of the path to have your name and a description of the data:**

In [100]:
cell_data_combined = pd.read_hdf('/content/drive/Shareddrives/Lisman Laboratory/Lisman 2024/Neuro/Data/student data/Mr_Harris_path_bifurcation.hdf5')

In [102]:
cell_data_combined

Unnamed: 0_level_0,basal_dendrite.max_path_distance,axon.max_path_distance,all_neurites.max_path_distance,dendrite.max_path_distance,basal_dendrite.mean_bifurcation_angle_local,axon.mean_bifurcation_angle_local,all_neurites.mean_bifurcation_angle_local,dendrite.mean_bifurcation_angle_local,apical_dendrite.max_path_distance,apical_dendrite.mean_bifurcation_angle_local
cell_specimen_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
601506507,523.700528,794.445695,794.445695,523.700528,1.312409,1.485740,1.453871,1.312409,,
601790961,325.760216,1061.473816,1061.473816,325.760216,1.223044,1.480208,1.450232,1.223044,,
601803754,281.814132,861.607466,861.607466,281.814132,1.675803,1.634313,1.637428,1.675803,,
601808698,262.570371,864.646669,864.646669,262.570371,1.399844,1.493160,1.486658,1.399844,,
601810307,273.631704,923.251790,923.251790,273.631704,1.178422,1.487785,1.463036,1.178422,,
...,...,...,...,...,...,...,...,...,...,...
992386952,322.624669,764.231479,764.231479,322.624669,1.378766,1.407236,1.404152,1.378766,,
992830261,307.323853,826.281806,826.281806,307.323853,1.335673,1.492918,1.462526,1.335673,,
993243528,311.198809,618.332210,618.332210,311.198809,1.334276,1.457965,1.422864,1.334276,,
993245688,289.194411,1155.124808,1155.124808,289.194411,1.425500,1.405364,1.407533,1.425500,,
