#**Predicting Subcellular Localization of Proteins**

Proteins are localized into different cellular compartments and sub-compartments inside the cell.
Each subcellular compartment has a distinct well-defined function in the cell and has a characteristic
physicochemical environment, which drives proper functioning of the proteins. Each subcellular
compartment has a distinct, well defined function in the cell.

After synthesis, a protein must be targeted to
the right compartment in cells to perform their function and mis-localization of the proteins leads
to functional loss or disorder, which contributes to many human diseases including cardiovascular,
neurodegenerative disease and cancers. Assigning subcellular localization for protein is a
significant step to elucidate its interaction partners and predict their functions or potential roles in
the cellular machinery

There is a need of computational methods to predict subcellular localization with high quality and accuracy, which is of great significance in understanding
cellular proteome and also helpful in designing the drug or targets

For this project,we will be using *Pfeatures* to generate a set of features that will be useful in building a machine learning model for prediction

# Install PFeatures

In [None]:
!echo $PYTHONPATH

/env/python


In [None]:
%env PYTHONPATH=

env: PYTHONPATH=


In [None]:
! wget https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
! chmod +x Miniconda3-py37_4.8.2-Linux-x86_64.sh
! bash ./Miniconda3-py37_4.8.2-Linux-x86_64.sh -b -f -p /usr/local
import sys
sys.path.append('/usr/local/lib/python3.7/site-packages/')

--2021-11-10 13:13:07--  https://repo.anaconda.com/miniconda/Miniconda3-py37_4.8.2-Linux-x86_64.sh
Resolving repo.anaconda.com (repo.anaconda.com)... 104.16.130.3, 104.16.131.3, 2606:4700::6810:8303, ...
Connecting to repo.anaconda.com (repo.anaconda.com)|104.16.130.3|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 85055499 (81M) [application/x-sh]
Saving to: ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’


2021-11-10 13:13:08 (89.5 MB/s) - ‘Miniconda3-py37_4.8.2-Linux-x86_64.sh’ saved [85055499/85055499]

PREFIX=/usr/local
Unpacking payload ...
Collecting package metadata (current_repodata.json): - \ | / done
Solving environment: \ | done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - _libgcc_mutex==0.1=main
    - asn1crypto==1.3.0=py37_0
    - ca-certificates==2020.1.1=0
    - certifi==2019.11.28=py37_0
    - cffi==1.14.0=py37h2e261b9_0
    - chardet==3.0.4=py37_1003
    - conda-package-handling==1.6.0=py37h7b644

In [None]:
! wget https://github.com/raghavagps/Pfeature/raw/master/PyLib/Pfeature.zip

--2021-11-10 13:13:43--  https://github.com/raghavagps/Pfeature/raw/master/PyLib/Pfeature.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/raghavagps/Pfeature/master/PyLib/Pfeature.zip [following]
--2021-11-10 13:13:43--  https://raw.githubusercontent.com/raghavagps/Pfeature/master/PyLib/Pfeature.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 511222 (499K) [application/zip]
Saving to: ‘Pfeature.zip’


2021-11-10 13:13:43 (11.6 MB/s) - ‘Pfeature.zip’ saved [511222/511222]



In [None]:
! unzip Pfeature.zip

Archive:  Pfeature.zip
   creating: Pfeature/
  inflating: __MACOSX/._Pfeature     
  inflating: Pfeature/PKG-INFO       
  inflating: __MACOSX/Pfeature/._PKG-INFO  
  inflating: Pfeature/.DS_Store      
  inflating: __MACOSX/Pfeature/._.DS_Store  
  inflating: Pfeature/README         
  inflating: __MACOSX/Pfeature/._README  
  inflating: Pfeature/setup.py       
  inflating: __MACOSX/Pfeature/._setup.py  
  inflating: Pfeature/Functions_Tables.pdf  
  inflating: __MACOSX/Pfeature/._Functions_Tables.pdf  
   creating: Pfeature/build/
  inflating: __MACOSX/Pfeature/._build  
  inflating: Pfeature/Pfeature_Descriptors.pdf  
  inflating: __MACOSX/Pfeature/._Pfeature_Descriptors.pdf  
   creating: Pfeature/Pfeature/
  inflating: __MACOSX/Pfeature/._Pfeature  
   creating: Pfeature/build/lib/
  inflating: __MACOSX/Pfeature/build/._lib  
  inflating: Pfeature/Pfeature/bonds.csv  
  inflating: Pfeature/Pfeature/pfeature.py  
  inflating: Pfeature/Pfeature/AAIndexNames.csv  
  inflating: Pfea

In [None]:
% cd Pfeature/

/content/Pfeature


In [None]:
! python setup.py install

running install
running build
running build_py
copying Pfeature/pfeature.py -> build/lib/Pfeature
copying Pfeature/__init__.py -> build/lib/Pfeature
running install_lib
creating /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/data -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/aa_attr_group.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/can_pat.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/atom.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/aaindices.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/aaind.txt -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/AAIndexNames.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/aaindex.csv -> /usr/local/lib/python3.7/site-packages/Pfeature
copying build/lib/Pfeature/Schneider-Wrede.csv -> /us

#Install CD-HIT

The CD-HIT package is useful for removing redundancies

In [None]:
! conda install -c bioconda/label/cf201901 cd-hit -y

Collecting package metadata (current_repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
Solving environment: / - \ done

## Package Plan ##

  environment location: /usr/local

  added / updated specs:
    - cd-hit


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2021.10.26 |       h06a4308_2         115 KB
    cd-hit-4.6.8               |       hfc679d8_2         232 KB  bioconda/label/cf201901
    certifi-2021.10.8          |   py37h06a4308_0         151 KB
    conda-4.10.3               |   py37h06a4308_0         2.9 MB
    openssl-1.1.1l             |       h7f8727e_0         2.5 MB
    ------------------------------------------------------------
                                           Total:         5.9 MB

#Copying files to Google Drive

Firstly, we need to mount the Google Drive into Colab so that we can have access to our Google gdrive from within Colab.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive/', force_remount=True)

Mounted at /content/gdrive/


Next, we create a data folder in our Colab Notebooks folder on Google Drive.

In [None]:
! mkdir "/content/gdrive/My Drive/Colab Notebooks/data"

In [None]:
! cp /content/Cytosol.fa "/content/gdrive/My Drive/Colab Notebooks/data"
! cp /content/Nucleus.fa "/content/gdrive/My Drive/Colab Notebooks/data"
! cp /content/Extracellular.fa "/content/gdrive/My Drive/Colab Notebooks/data"
! cp /content/Mitochodrion.fa "/content/gdrive/My Drive/Colab Notebooks/data"
! cp /content/Unknown.fa "/content/gdrive/My Drive/Colab Notebooks/data"


cp: cannot stat '/content/Cytosol.fa': No such file or directory
cp: cannot stat '/content/Nucleus.fa': No such file or directory
cp: cannot stat '/content/Extracellular.fa': No such file or directory
cp: cannot stat '/content/Mitochodrion.fa': No such file or directory
cp: cannot stat '/content/Unknown.fa': No such file or directory


In [None]:
cd /content/gdrive/MyDrive/Colab Notebooks/data/

/content/gdrive/MyDrive/Colab Notebooks/data


In [None]:
! head -n 30 Nucleus.fa


>ESTR_HUMAN 
MTMTLHTKASGMALLHQIQGNELEPLNRPQLKIPLERPLGEVYLDSSKPAVYNYPEGAAYEFNAAAAANA
QVYGQTGLPYGPGSEAAAFGSNGLGGFPPLNSVSPSPLMLLHPPPQLSPFLQPHGQQVPYYLENEPSGYT
VREAGPPAFYRPNSDNRRQGGRERLASTNDKGSMAMESAKETRYCAVCNDYASGYHYGVWSCEGCKAFFK
RSIQGHNDYMCPATNQCTIDKNRRKSCQACRLRKCYEVGMMKGGIRKDRRGGRMLKHKRQRDDGEGRGEV
GSAGDMRAANLWPSPLMIKRSKKNSLALSLTADQMVSALLDAEPPILYSEYDPTRPFSEASMMGLLTNLA
DRELVHMINWAKRVPGFVDLTLHDQVHLLECAWLEILMIGLVWRSMEHPGKLLFAPNLLLDRNQGKCVEG
MVEIFDMLLATSSRFRMMNLQGEEFVCLKSIILLNSGVYTFLSSTLKSLEEKDHIHRVLDKITDTLIHLM
AKAGLTLQQQHQRLAQLLLILSHIRHMSNKGMEHLYSMKCKNVVPLYDLLLEMLDAHRLHAPTSRGGASV
EETDQSHLATAGSTSSHSLQKYYITGEAEGFPATV
>H2B1_PARAN 
PSQKSPTKRSPTKRSPTKRSPQKGGKGGKGAKRGGKAGKRRRGVQVKRRRRRRESYGIYIYKVLKQVHPD
TGISSRAMSVMNSFVNDVFERIAAEAGRLTTYNRRSTVSSREVQTAVRLLLPGELAKHAVSEGTKAVTKY
TTSR
>ID1_RAT 
MKVASSSAAATAGPSCSLKAGRTAGEVVLGLSEQSVAISRCAGTRLPALLDEQQVNVLLYDMNGCYSRLK
ELVPTLPQNRKVSKVEILQHVIDYIRDLQLELNSESEVATAGGRGLPVRAPLSTLNGEISALAAEVRSES
EYYIILLWETKATGGGCPPYFSGA
>RI14_HUMAN 
MTHGEELGSDVHQD

##Scan the sequences for non-amino acid sequences 

In [None]:
# Borrow R's functionality from 'rmagic'
#rmagic allows us to execute r code within colab
%load_ext rpy2.ipython

In [None]:
%%R
getwd()
setwd('/content/gdrive/My Drive/Colab Notebooks/data')

In [None]:
%%R
install.packages('protr')
library(protr)

R[write to console]: Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)

R[write to console]: trying URL 'https://cran.rstudio.com/src/contrib/protr_1.6-2.tar.gz'

R[write to console]: Content type 'application/x-gzip'
R[write to console]:  length 1847590 bytes (1.8 MB)

R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write to console]: =
R[write

In [None]:
%%R
c <- readFASTA("/content/gdrive/My Drive/Colab Notebooks/data/Cytosol.fa")
length(c)

C <- c[(sapply(c, protcheck))]
length(C)
#1 unclean sequence

[1] 341


In [None]:
%%R
e <-readFASTA("/content/gdrive/My Drive/Colab Notebooks/data/Extracellular.fa")
length(e)

E <- e[(sapply(e, protcheck))]
length(E)
#Clean

[1] 159


In [None]:
%%R
m <-readFASTA("/content/gdrive/My Drive/Colab Notebooks/data/Mitochondrion.fa")
length(m)

M <- m[(sapply(m, protcheck))]
length(M)
#1 unclean sequence

[1] 235


In [None]:
%%R
n <-readFASTA("/content/gdrive/My Drive/Colab Notebooks/data/Nucleus.fa")
length(n)

N <- n[(sapply(n, protcheck))]
length(N)
#Clean

[1] 1478


In [None]:
%%R
u <-readFASTA("/content/gdrive/My Drive/Colab Notebooks/data/Unknown.fa")
length(u)

U <- u[(sapply(u, protcheck))]
length(U)
#Clean

[1] 20


In [None]:
! head -n 10 Mitochondrion.fa

>AATM_BOVIN 
MALLHSGRFLSGVAAAFHPGLAAAASARASSWWAHVEMGPPDPILGVTEAFKRDTNSKKMNLGVGAYRDD
NGKPYVLPSVRKAEAQIAAKNLDKEYLPIAGLAEFCKASAELALGENNEVLKSGRYVTVQTISGTGALRI
GASFLQRFFKFSRDVFLPKPTWGNHTPIFRDAGMQLQSYRYYDPKTCGFDFTGAIEDISKIPAQSVILLH
ACAHNPTGVDPRPEQWKEMATVVKKNNLFAFFDMAYQGFASGDGNKDAWAVRHFIEQGINVCLCQSYAKN
MGLYGERVGAFTVVCKDAEEAKRVESQLKILIRPMYSNPPINGARIASTILTSPDLRKQWLHEVKGMADR
IISMRTQLVSNLKKEGSSHNWQHIIDQIGMFCYTGLKPEQVERLTKEFSIYMTKDGRISVAGVTSGNVAY
LAHAIHQVTK
>ODO1_YEAST 
MLRFVSSQTCRYSSRGLLKTSLLKNASTVKIVGRGLATTGTDNFLSTSNATYIDEMYQAWQKDPSSVHVS


##Remove Redundant Sequences using CD-HIT

Let's remove sequences with high peptide similarity greater than 0.99

In another paper, *CD-HIT was used to eliminate redundant sequences by setting the threshold to 0.7, whereas other parameters were set as the default. Note that if the threshold of similarity is set too high, the learning model will easily lead to overfitting. On the other hand, lower threshold value will be resulting in poor learning effect.*

(Tung C-H, Chen C-W, Sun H-H, Chu Y-W (2017) Predicting human protein subcellular localization by heterogeneous and comprehensive approaches. PLoS ONE 12(6): e0178832. https://doi.org/10.1371/journal.pone.0178832)

In [None]:
! cd-hit -i Cytosol.fa -o Cytosol_cd-hit.txt -c 0.99
! cd-hit -i Extracellular.fa -o Extracellular_cd-hit.txt -c 0.99
! cd-hit -i Mitochondrion.fa -o Mitochondrion_cd-hit.txt -c 0.99
! cd-hit -i Nucleus.fa -o Nucleus_cd-hit.txt -c 0.99

Program: CD-HIT, V4.7 (+OpenMP), Jul 13 2018, 17:17:44
Command: cd-hit -i Cytosol.fa -o Cytosol_cd-hit.txt -c 0.99

Started: Wed Nov 10 13:32:02 2021
                            Output                              
----------------------------------------------------------------
total seq: 342
longest and shortest : 4725 and 72
Total letters: 179413
Sequences have been sorted

Approximated minimal memory consumption:
Sequence        : 0M
Buffer          : 1 X 11M = 11M
Table           : 1 X 65M = 65M
Miscellaneous   : 0M
Total           : 77M

Table limit with the given memory limit:
Max number of representatives: 959133
Max number of word counting entries: 90365330

comparing sequences from          0  to        342

      342  finished        342  clusters

Apprixmated maximum memory consumption: 78M
writing new database
writing clustering information
program completed !

Total CPU time 0.13
Program: CD-HIT, V4.7 (+OpenMP), Jul 13 2018, 17:17:44
Command: cd-hit -i Extracellular.fa -

Let's compare the size of the datasets before and after performing the cd-hit operation

In [None]:
! grep '>' Cytosol.fa | wc -l
! grep '>' Cytosol_cd-hit.txt | wc -l

342
342


In [None]:
! grep '>' Extracellular.fa | wc -l
! grep '>' Extracellular_cd-hit.txt | wc -l

159
159


In [None]:
! grep '>' Mitochondrion.fa | wc -l
! grep '>' Mitochondrion_cd-hit.txt | wc -l

236
236


In [None]:
! grep '>' Nucleus.fa | wc -l
! grep '>' Nucleus_cd-hit.txt | wc -l

1478
739


# Calculate features using the PFeature Library

There are several feature classes provided by PFeature

[PFeature Manual](https://github.com/raghavagps/Pfeature/blob/master/Pfeature_Manual.pdf)

##Define functions for computing the features

In [None]:
import pandas as pd

In [None]:
# Amino acid composition(AAC)

from Pfeature.pfeature import aac_wp

def aac(input):
  a = input.rstrip('txt')
  output = a + 'aac.csv'
  df_out = aac_wp(input, output)
  df_in = pd.read_csv(output)
  return df_in

In [None]:
#aac('Cytosol_cd-hit.txt')

In [None]:
# Dipeptide composition

from Pfeature.pfeature import dpc_wp

def dpc(input):
  a = input.rstrip('txt')
  output = a + 'dpc.csv'
  df_out = dpc_wp(input, output, 1)
  df_in = pd.read_csv(output)
  return df_in

In [None]:
#dpc('Cytosol_cd-hit.txt')

In [None]:
# Tripeptide composition

#Computes 8000 descriptors. This will take too much runtime to execute

In [None]:
# Pseudo amino acid composition

from Pfeature.pfeature import paac_wp

def paac(input):
  a = input.rstrip('txt')
  output = a + 'paac.csv'
  df_out = paac_wp(input, output, 30, 0.05) #Used default parameters lamda=30, weight=0.05
  df_in = pd.read_csv(output)
  return df_in

In [None]:
#paac('Mitochondrion_cd-hit.txt')

In [None]:
#Physicochemical properties

from Pfeature.pfeature import pcp_wp

def pcp(input):
  a = input.rstrip('txt')
  output = a + 'pcp.csv'
  df_out = pcp_wp(input, output)
  df_in = pd.read_csv(output)
  return df_in

In [None]:
#pcp('Mitochondrion_cd-hit.txt')

In [None]:
pip install isoelectric

Collecting isoelectric
  Downloading isoelectric-1.0-py3-none-any.whl (9.4 kB)
Installing collected packages: isoelectric
Successfully installed isoelectric-1.0


In [None]:
# Let's also generate the isoelectric points of these protein sequences as a feature
#from isoelectric import *
#ipc.scale.keys()
#help(ipc)

###Generate a function that outputs a single dataframe with all the descriptors

In [None]:
cyt = 'Cytosol_cd-hit.txt'
mit = 'Mitochondrion_cd-hit.txt'
nuc = 'Nucleus_cd-hit.txt'
ext = 'Extracellular_cd-hit.txt'

def feature_calc(cyto,mito,nucl,extr,feat1):
  cyto_feat = feat1(cyto)
  mito_feat = feat1(mito)
  nucl_feat = feat1(nucl)
  extr_feat = feat1(extr)
  #create class label
  class_cyto = pd.Series(['cytosol' for i in range(len(cyto_feat))])
  class_mito = pd.Series(['mitochondrion' for i in range(len(mito_feat))])
  class_nucl = pd.Series(['nucleus' for i in range(len(nucl_feat))])
  class_extr = pd.Series(['extracellular' for i in range(len(extr_feat))])
  #Combine the classes
  cellular_class = pd.concat([class_cyto,class_mito,class_nucl,class_extr], axis=0)
  cellular_class.name = 'class'
  cellular_feat = pd.concat([cyto_feat,mito_feat,nucl_feat,extr_feat], axis=0)
  #combine feature and class
  df = pd.concat([cellular_feat, cellular_class], axis=1)
  return df

In [None]:
aac_feature = feature_calc(cyt,mit,nuc,ext,aac)
aac_feature

Unnamed: 0,AAC_A,AAC_C,AAC_D,AAC_E,AAC_F,AAC_G,AAC_H,AAC_I,AAC_K,AAC_L,AAC_M,AAC_N,AAC_P,AAC_Q,AAC_R,AAC_S,AAC_T,AAC_V,AAC_W,AAC_Y,class
0,6.31,3.10,6.43,7.02,3.81,7.86,1.67,6.19,7.38,8.10,3.10,4.05,4.64,2.62,5.00,5.60,5.24,8.21,1.07,2.62,cytosol
1,8.57,0.82,5.71,11.84,2.45,4.08,0.82,4.08,8.16,9.80,2.86,5.71,2.04,6.53,4.08,7.76,4.90,4.49,0.82,4.49,cytosol
2,8.50,3.27,7.19,6.54,4.58,4.58,5.88,1.96,9.80,13.07,1.31,3.92,3.92,2.61,2.61,5.88,4.58,4.58,2.61,2.61,cytosol
3,9.67,1.51,3.93,6.34,3.32,11.18,3.02,6.34,6.95,8.46,1.81,2.72,4.23,3.32,3.93,6.95,3.93,9.37,0.30,2.72,cytosol
4,3.69,1.38,5.99,6.91,6.91,5.07,1.38,5.99,8.29,12.90,4.15,5.53,5.53,3.23,5.07,5.07,3.69,1.84,1.84,5.53,cytosol
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,6.53,4.31,6.69,6.05,2.58,8.38,2.22,3.99,3.71,6.01,0.44,4.80,8.79,5.16,6.01,6.37,6.77,7.26,1.13,2.82,extracellular
155,5.50,0.23,5.05,5.96,5.50,4.36,3.44,5.50,8.94,10.55,3.44,6.88,3.21,5.73,2.29,8.03,5.73,6.19,0.46,2.98,extracellular
156,7.09,6.56,5.25,4.99,1.84,5.77,2.62,3.41,2.89,15.22,3.15,1.57,8.66,8.40,4.20,5.25,4.20,5.77,2.10,1.05,extracellular
157,6.44,7.20,7.20,12.88,2.65,4.55,3.03,4.17,12.12,9.85,2.27,3.79,4.55,1.89,3.03,4.55,2.27,4.17,1.89,1.52,extracellular


In [None]:
dpc_feature = feature_calc(cyt,mit,nuc,ext,dpc)
dpc_feature

Unnamed: 0,DPC1_AA,DPC1_AC,DPC1_AD,DPC1_AE,DPC1_AF,DPC1_AG,DPC1_AH,DPC1_AI,DPC1_AK,DPC1_AL,DPC1_AM,DPC1_AN,DPC1_AP,DPC1_AQ,DPC1_AR,DPC1_AS,DPC1_AT,DPC1_AV,DPC1_AW,DPC1_AY,DPC1_CA,DPC1_CC,DPC1_CD,DPC1_CE,DPC1_CF,DPC1_CG,DPC1_CH,DPC1_CI,DPC1_CK,DPC1_CL,DPC1_CM,DPC1_CN,DPC1_CP,DPC1_CQ,DPC1_CR,DPC1_CS,DPC1_CT,DPC1_CV,DPC1_CW,DPC1_CY,...,DPC1_WC,DPC1_WD,DPC1_WE,DPC1_WF,DPC1_WG,DPC1_WH,DPC1_WI,DPC1_WK,DPC1_WL,DPC1_WM,DPC1_WN,DPC1_WP,DPC1_WQ,DPC1_WR,DPC1_WS,DPC1_WT,DPC1_WV,DPC1_WW,DPC1_WY,DPC1_YA,DPC1_YC,DPC1_YD,DPC1_YE,DPC1_YF,DPC1_YG,DPC1_YH,DPC1_YI,DPC1_YK,DPC1_YL,DPC1_YM,DPC1_YN,DPC1_YP,DPC1_YQ,DPC1_YR,DPC1_YS,DPC1_YT,DPC1_YV,DPC1_YW,DPC1_YY,class
0,0.36,0.12,0.60,0.24,0.60,0.72,0.24,0.48,0.72,0.36,0.24,0.36,0.12,0.24,0.48,0.00,0.12,0.12,0.00,0.24,0.12,0.12,0.12,0.12,0.36,0.36,0.00,0.12,0.00,0.36,0.00,0.12,0.24,0.00,0.12,0.12,0.00,0.72,0.00,0.12,...,0.12,0.24,0.00,0.00,0.12,0.00,0.00,0.12,0.12,0.0,0.00,0.00,0.12,0.00,0.12,0.00,0.00,0.0,0.00,0.12,0.12,0.12,0.24,0.00,0.12,0.0,0.00,0.12,0.48,0.00,0.00,0.00,0.12,0.12,0.00,0.60,0.24,0.12,0.12,cytosol
1,0.82,0.41,0.00,1.64,0.82,0.41,0.00,0.41,0.82,0.41,0.41,0.00,0.00,0.00,0.41,0.41,0.41,0.41,0.00,0.82,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.41,0.00,0.00,0.00,0.41,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.41,0.00,0.41,0.00,0.0,0.00,0.00,0.00,0.41,0.41,0.41,0.00,0.0,0.00,0.82,1.23,0.00,0.00,0.00,0.41,0.41,0.00,0.00,0.00,0.00,0.41,cytosol
2,0.00,1.32,0.66,0.00,0.00,0.00,0.66,0.00,0.00,1.32,0.00,0.66,0.00,0.00,0.66,0.66,1.32,0.00,0.66,0.66,0.00,0.00,0.00,0.66,0.00,0.66,0.66,0.00,0.00,0.66,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.66,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,1.32,0.0,0.00,0.66,0.00,0.00,0.66,0.00,0.00,0.0,0.00,1.32,0.00,0.00,0.00,0.66,0.00,0.0,0.00,0.00,0.66,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,cytosol
3,0.91,0.61,0.30,0.30,0.30,1.52,0.91,0.30,0.30,1.21,0.00,0.61,0.00,0.30,0.91,0.30,0.61,0.00,0.00,0.30,0.00,0.00,0.00,0.00,0.00,0.30,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.30,0.61,0.00,0.00,0.00,0.00,0.30,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.30,0.0,0.00,0.30,0.00,0.00,0.00,0.30,0.00,0.0,0.61,0.00,0.00,0.00,0.00,0.61,0.00,0.00,0.61,0.30,0.00,0.00,0.00,cytosol
4,0.00,0.00,0.00,0.46,0.93,0.46,0.00,0.46,0.00,0.00,0.00,0.00,0.46,0.00,0.46,0.00,0.00,0.00,0.00,0.46,0.00,0.00,0.00,0.00,0.00,0.46,0.00,0.00,0.00,0.46,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.46,...,0.00,0.00,0.00,0.46,0.00,0.00,0.00,0.00,0.46,0.0,0.93,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.93,0.46,0.00,0.00,0.0,0.46,0.00,0.46,0.46,0.46,0.00,0.00,0.46,0.46,0.46,0.46,0.46,0.00,cytosol
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,0.48,0.08,0.36,0.32,0.20,0.60,0.20,0.16,0.16,0.32,0.00,0.16,0.65,0.36,0.65,0.44,0.77,0.48,0.04,0.08,0.16,0.00,0.32,0.52,0.12,0.12,0.12,0.20,0.40,0.20,0.04,0.28,0.28,0.20,0.16,0.20,0.36,0.36,0.16,0.08,...,0.00,0.00,0.08,0.12,0.08,0.04,0.04,0.04,0.08,0.0,0.04,0.00,0.04,0.08,0.08,0.16,0.16,0.0,0.00,0.16,0.04,0.08,0.20,0.00,0.40,0.0,0.24,0.04,0.24,0.00,0.08,0.08,0.08,0.32,0.16,0.28,0.28,0.08,0.04,extracellular
155,0.23,0.00,0.23,0.00,0.46,0.23,0.23,0.00,0.69,0.92,0.23,0.23,0.00,0.46,0.00,0.69,0.46,0.46,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.23,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,...,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.23,0.00,0.00,0.00,0.23,0.00,0.00,0.0,0.00,0.46,0.23,0.46,0.00,0.00,0.46,0.23,0.23,0.00,0.46,0.00,0.23,extracellular
156,0.26,0.53,0.26,0.53,0.00,0.79,0.26,0.53,0.26,1.58,0.53,0.00,0.00,0.53,0.26,0.00,0.00,0.53,0.26,0.00,0.26,0.00,0.00,0.26,0.00,0.53,0.26,0.26,0.53,0.53,0.53,0.53,0.00,1.05,1.05,0.26,0.00,0.26,0.26,0.00,...,0.26,0.26,0.00,0.00,0.26,0.00,0.00,0.00,1.05,0.0,0.00,0.00,0.00,0.00,0.00,0.26,0.00,0.0,0.00,0.00,0.26,0.00,0.00,0.53,0.00,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.26,0.00,0.00,0.00,0.00,extracellular
157,0.76,0.38,1.14,0.38,0.38,0.00,0.00,0.00,1.14,0.38,0.00,0.76,0.38,0.38,0.00,0.38,0.00,0.00,0.00,0.00,0.38,0.00,0.76,1.14,0.00,0.38,0.00,0.76,0.76,1.14,0.00,0.00,0.38,0.00,0.00,0.38,0.38,0.38,0.00,0.00,...,0.00,0.00,0.38,0.00,0.76,0.00,0.00,0.38,0.38,0.0,0.00,0.00,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.38,0.00,0.00,0.00,0.00,0.00,0.0,0.00,0.00,0.38,0.00,0.00,0.38,0.00,0.38,0.00,0.00,0.00,0.00,0.00,extracellular


In [None]:
paac_feature = feature_calc(cyt,mit,nuc,ext,paac)
paac_feature

Unnamed: 0,PAAC30_A,PAAC30_C,PAAC30_D,PAAC30_E,PAAC30_F,PAAC30_G,PAAC30_H,PAAC30_I,PAAC30_K,PAAC30_L,PAAC30_M,PAAC30_N,PAAC30_P,PAAC30_Q,PAAC30_R,PAAC30_S,PAAC30_T,PAAC30_V,PAAC30_W,PAAC30_Y,PAAC30_lam1,PAAC30_lam2,PAAC30_lam3,PAAC30_lam4,PAAC30_lam5,PAAC30_lam6,PAAC30_lam7,PAAC30_lam8,PAAC30_lam9,PAAC30_lam10,PAAC30_lam11,PAAC30_lam12,PAAC30_lam13,PAAC30_lam14,PAAC30_lam15,PAAC30_lam16,PAAC30_lam17,PAAC30_lam18,PAAC30_lam19,PAAC30_lam20,PAAC30_lam21,PAAC30_lam22,PAAC30_lam23,PAAC30_lam24,PAAC30_lam25,PAAC30_lam26,PAAC30_lam27,PAAC30_lam28,PAAC30_lam29,PAAC30_lam30,class
0,6.31,3.10,6.43,7.02,3.81,7.86,1.67,6.19,7.38,8.10,3.10,4.05,4.64,2.62,5.00,5.60,5.24,8.21,1.07,2.62,0.0247,0.0254,0.0250,0.0240,0.0266,0.0252,0.0239,0.0253,0.0257,0.0239,0.0236,0.0252,0.0252,0.0253,0.0257,0.0250,0.0252,0.0246,0.0241,0.0248,0.0246,0.0244,0.0251,0.0247,0.0259,0.0248,0.0257,0.0237,0.0261,0.0253,cytosol
1,8.57,0.82,5.71,11.84,2.45,4.08,0.82,4.08,8.16,9.80,2.86,5.71,2.04,6.53,4.08,7.76,4.90,4.49,0.82,4.49,0.0238,0.0261,0.0243,0.0235,0.0265,0.0258,0.0220,0.0254,0.0252,0.0240,0.0228,0.0251,0.0252,0.0246,0.0246,0.0253,0.0250,0.0252,0.0240,0.0259,0.0236,0.0252,0.0246,0.0243,0.0247,0.0246,0.0269,0.0230,0.0226,0.0262,cytosol
2,8.50,3.27,7.19,6.54,4.58,4.58,5.88,1.96,9.80,13.07,1.31,3.92,3.92,2.61,2.61,5.88,4.58,4.58,2.61,2.61,0.0246,0.0272,0.0237,0.0208,0.0267,0.0245,0.0234,0.0233,0.0266,0.0255,0.0246,0.0260,0.0257,0.0269,0.0244,0.0250,0.0252,0.0240,0.0234,0.0241,0.0249,0.0255,0.0248,0.0245,0.0263,0.0241,0.0234,0.0253,0.0257,0.0281,cytosol
3,9.67,1.51,3.93,6.34,3.32,11.18,3.02,6.34,6.95,8.46,1.81,2.72,4.23,3.32,3.93,6.95,3.93,9.37,0.30,2.72,0.0249,0.0265,0.0251,0.0251,0.0256,0.0258,0.0245,0.0250,0.0242,0.0251,0.0243,0.0260,0.0254,0.0223,0.0250,0.0243,0.0249,0.0250,0.0249,0.0231,0.0234,0.0252,0.0238,0.0264,0.0238,0.0244,0.0238,0.0232,0.0235,0.0227,cytosol
4,3.69,1.38,5.99,6.91,6.91,5.07,1.38,5.99,8.29,12.90,4.15,5.53,5.53,3.23,5.07,5.07,3.69,1.84,1.84,5.53,0.0272,0.0275,0.0256,0.0268,0.0249,0.0246,0.0238,0.0278,0.0259,0.0227,0.0234,0.0268,0.0230,0.0244,0.0258,0.0247,0.0255,0.0237,0.0250,0.0251,0.0257,0.0272,0.0256,0.0261,0.0250,0.0250,0.0255,0.0234,0.0269,0.0252,cytosol
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,6.53,4.31,6.69,6.05,2.58,8.38,2.22,3.99,3.71,6.01,0.44,4.80,8.79,5.16,6.01,6.37,6.77,7.26,1.13,2.82,0.0256,0.0237,0.0250,0.0248,0.0256,0.0251,0.0242,0.0239,0.0249,0.0248,0.0238,0.0236,0.0242,0.0235,0.0244,0.0247,0.0243,0.0238,0.0241,0.0244,0.0243,0.0241,0.0243,0.0238,0.0245,0.0252,0.0243,0.0243,0.0237,0.0237,extracellular
155,5.50,0.23,5.05,5.96,5.50,4.36,3.44,5.50,8.94,10.55,3.44,6.88,3.21,5.73,2.29,8.03,5.73,6.19,0.46,2.98,0.0227,0.0247,0.0234,0.0217,0.0237,0.0246,0.0247,0.0233,0.0243,0.0238,0.0232,0.0236,0.0235,0.0235,0.0248,0.0240,0.0256,0.0239,0.0237,0.0245,0.0260,0.0243,0.0227,0.0240,0.0247,0.0240,0.0230,0.0239,0.0260,0.0234,extracellular
156,7.09,6.56,5.25,4.99,1.84,5.77,2.62,3.41,2.89,15.22,3.15,1.57,8.66,8.40,4.20,5.25,4.20,5.77,2.10,1.05,0.0225,0.0260,0.0237,0.0203,0.0227,0.0264,0.0233,0.0217,0.0232,0.0264,0.0261,0.0216,0.0220,0.0246,0.0262,0.0223,0.0225,0.0244,0.0225,0.0229,0.0241,0.0252,0.0239,0.0232,0.0257,0.0248,0.0249,0.0229,0.0243,0.0250,extracellular
157,6.44,7.20,7.20,12.88,2.65,4.55,3.03,4.17,12.12,9.85,2.27,3.79,4.55,1.89,3.03,4.55,2.27,4.17,1.89,1.52,0.0242,0.0255,0.0233,0.0227,0.0257,0.0239,0.0236,0.0247,0.0261,0.0279,0.0249,0.0256,0.0259,0.0252,0.0250,0.0260,0.0231,0.0254,0.0230,0.0238,0.0231,0.0220,0.0246,0.0261,0.0242,0.0231,0.0249,0.0240,0.0259,0.0261,extracellular


In [None]:
pcp_feature = feature_calc(cyt,mit,nuc,ext,pcp)
pcp_feature

Unnamed: 0,PCP_PC,PCP_NC,PCP_NE,PCP_PO,PCP_NP,PCP_AL,PCP_CY,PCP_AR,PCP_AC,PCP_BS,PCP_NE_pH,PCP_HB,PCP_HL,PCP_NT,PCP_HX,PCP_SC,PCP_SS_HE,PCP_SS_ST,PCP_SS_CO,PCP_SA_BU,PCP_SA_EX,PCP_SA_IN,PCP_TN,PCP_SM,PCP_LR,PCP_Z1,PCP_Z2,PCP_Z3,PCP_Z4,PCP_Z5,class
0,0.140,0.135,0.725,0.192,0.493,0.413,0.046,0.075,0.135,0.140,0.725,0.498,0.227,0.348,0.108,0.062,0.412,0.302,0.286,0.446,0.321,0.229,0.229,0.514,0.486,0.166,-0.555,-0.242,-0.430,0.051,cytosol
1,0.131,0.176,0.694,0.245,0.392,0.331,0.020,0.078,0.176,0.131,0.694,0.408,0.208,0.408,0.127,0.037,0.527,0.220,0.253,0.351,0.400,0.229,0.212,0.441,0.559,0.505,-0.296,-0.334,-0.672,0.101,cytosol
2,0.183,0.137,0.680,0.190,0.451,0.366,0.039,0.098,0.137,0.183,0.680,0.484,0.261,0.314,0.105,0.046,0.503,0.242,0.255,0.431,0.340,0.242,0.222,0.464,0.536,0.203,-0.234,-0.126,-0.201,0.122,cytosol
3,0.139,0.103,0.758,0.184,0.547,0.492,0.042,0.063,0.103,0.139,0.758,0.489,0.208,0.356,0.109,0.033,0.435,0.275,0.290,0.502,0.275,0.227,0.293,0.535,0.465,0.132,-0.848,-0.307,-0.422,0.170,cytosol
4,0.147,0.129,0.724,0.189,0.479,0.350,0.055,0.143,0.129,0.147,0.724,0.479,0.258,0.300,0.088,0.055,0.456,0.272,0.272,0.396,0.355,0.253,0.152,0.378,0.622,-0.201,-0.015,-0.247,-0.291,0.030,cytosol
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,0.119,0.127,0.753,0.254,0.451,0.410,0.088,0.065,0.127,0.119,0.753,0.478,0.255,0.394,0.131,0.048,0.361,0.289,0.350,0.402,0.352,0.274,0.256,0.599,0.401,0.480,-0.524,-0.031,-0.483,0.105,extracellular
155,0.147,0.110,0.743,0.227,0.447,0.353,0.032,0.089,0.110,0.147,0.743,0.463,0.248,0.349,0.138,0.037,0.459,0.266,0.275,0.383,0.358,0.268,0.181,0.452,0.548,0.101,-0.310,-0.326,-0.426,0.104,extracellular
156,0.097,0.102,0.801,0.255,0.530,0.459,0.087,0.050,0.102,0.097,0.801,0.580,0.199,0.339,0.094,0.097,0.486,0.249,0.265,0.478,0.318,0.249,0.247,0.501,0.499,-0.178,-0.594,-0.064,-0.345,0.194,extracellular
157,0.182,0.201,0.617,0.174,0.405,0.337,0.045,0.061,0.201,0.182,0.617,0.455,0.265,0.333,0.068,0.095,0.515,0.239,0.246,0.409,0.424,0.182,0.227,0.447,0.553,0.525,-0.278,-0.056,-0.393,-0.002,extracellular


In [None]:
# concatenate the 4 dataframes into 1
aac_feature = aac_feature.drop('class', 1)
dpc_feature = dpc_feature.drop('class', 1)
paac_feature = paac_feature.drop('class', 1)
frames = [aac_feature, dpc_feature, paac_feature, pcp_feature]
df = pd.concat(frames, axis=1)

In [None]:
df

Unnamed: 0,AAC_A,AAC_C,AAC_D,AAC_E,AAC_F,AAC_G,AAC_H,AAC_I,AAC_K,AAC_L,AAC_M,AAC_N,AAC_P,AAC_Q,AAC_R,AAC_S,AAC_T,AAC_V,AAC_W,AAC_Y,DPC1_AA,DPC1_AC,DPC1_AD,DPC1_AE,DPC1_AF,DPC1_AG,DPC1_AH,DPC1_AI,DPC1_AK,DPC1_AL,DPC1_AM,DPC1_AN,DPC1_AP,DPC1_AQ,DPC1_AR,DPC1_AS,DPC1_AT,DPC1_AV,DPC1_AW,DPC1_AY,...,PAAC30_lam22,PAAC30_lam23,PAAC30_lam24,PAAC30_lam25,PAAC30_lam26,PAAC30_lam27,PAAC30_lam28,PAAC30_lam29,PAAC30_lam30,PCP_PC,PCP_NC,PCP_NE,PCP_PO,PCP_NP,PCP_AL,PCP_CY,PCP_AR,PCP_AC,PCP_BS,PCP_NE_pH,PCP_HB,PCP_HL,PCP_NT,PCP_HX,PCP_SC,PCP_SS_HE,PCP_SS_ST,PCP_SS_CO,PCP_SA_BU,PCP_SA_EX,PCP_SA_IN,PCP_TN,PCP_SM,PCP_LR,PCP_Z1,PCP_Z2,PCP_Z3,PCP_Z4,PCP_Z5,class
0,6.31,3.10,6.43,7.02,3.81,7.86,1.67,6.19,7.38,8.10,3.10,4.05,4.64,2.62,5.00,5.60,5.24,8.21,1.07,2.62,0.36,0.12,0.60,0.24,0.60,0.72,0.24,0.48,0.72,0.36,0.24,0.36,0.12,0.24,0.48,0.00,0.12,0.12,0.00,0.24,...,0.0244,0.0251,0.0247,0.0259,0.0248,0.0257,0.0237,0.0261,0.0253,0.140,0.135,0.725,0.192,0.493,0.413,0.046,0.075,0.135,0.140,0.725,0.498,0.227,0.348,0.108,0.062,0.412,0.302,0.286,0.446,0.321,0.229,0.229,0.514,0.486,0.166,-0.555,-0.242,-0.430,0.051,cytosol
1,8.57,0.82,5.71,11.84,2.45,4.08,0.82,4.08,8.16,9.80,2.86,5.71,2.04,6.53,4.08,7.76,4.90,4.49,0.82,4.49,0.82,0.41,0.00,1.64,0.82,0.41,0.00,0.41,0.82,0.41,0.41,0.00,0.00,0.00,0.41,0.41,0.41,0.41,0.00,0.82,...,0.0252,0.0246,0.0243,0.0247,0.0246,0.0269,0.0230,0.0226,0.0262,0.131,0.176,0.694,0.245,0.392,0.331,0.020,0.078,0.176,0.131,0.694,0.408,0.208,0.408,0.127,0.037,0.527,0.220,0.253,0.351,0.400,0.229,0.212,0.441,0.559,0.505,-0.296,-0.334,-0.672,0.101,cytosol
2,8.50,3.27,7.19,6.54,4.58,4.58,5.88,1.96,9.80,13.07,1.31,3.92,3.92,2.61,2.61,5.88,4.58,4.58,2.61,2.61,0.00,1.32,0.66,0.00,0.00,0.00,0.66,0.00,0.00,1.32,0.00,0.66,0.00,0.00,0.66,0.66,1.32,0.00,0.66,0.66,...,0.0255,0.0248,0.0245,0.0263,0.0241,0.0234,0.0253,0.0257,0.0281,0.183,0.137,0.680,0.190,0.451,0.366,0.039,0.098,0.137,0.183,0.680,0.484,0.261,0.314,0.105,0.046,0.503,0.242,0.255,0.431,0.340,0.242,0.222,0.464,0.536,0.203,-0.234,-0.126,-0.201,0.122,cytosol
3,9.67,1.51,3.93,6.34,3.32,11.18,3.02,6.34,6.95,8.46,1.81,2.72,4.23,3.32,3.93,6.95,3.93,9.37,0.30,2.72,0.91,0.61,0.30,0.30,0.30,1.52,0.91,0.30,0.30,1.21,0.00,0.61,0.00,0.30,0.91,0.30,0.61,0.00,0.00,0.30,...,0.0252,0.0238,0.0264,0.0238,0.0244,0.0238,0.0232,0.0235,0.0227,0.139,0.103,0.758,0.184,0.547,0.492,0.042,0.063,0.103,0.139,0.758,0.489,0.208,0.356,0.109,0.033,0.435,0.275,0.290,0.502,0.275,0.227,0.293,0.535,0.465,0.132,-0.848,-0.307,-0.422,0.170,cytosol
4,3.69,1.38,5.99,6.91,6.91,5.07,1.38,5.99,8.29,12.90,4.15,5.53,5.53,3.23,5.07,5.07,3.69,1.84,1.84,5.53,0.00,0.00,0.00,0.46,0.93,0.46,0.00,0.46,0.00,0.00,0.00,0.00,0.46,0.00,0.46,0.00,0.00,0.00,0.00,0.46,...,0.0272,0.0256,0.0261,0.0250,0.0250,0.0255,0.0234,0.0269,0.0252,0.147,0.129,0.724,0.189,0.479,0.350,0.055,0.143,0.129,0.147,0.724,0.479,0.258,0.300,0.088,0.055,0.456,0.272,0.272,0.396,0.355,0.253,0.152,0.378,0.622,-0.201,-0.015,-0.247,-0.291,0.030,cytosol
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
154,6.53,4.31,6.69,6.05,2.58,8.38,2.22,3.99,3.71,6.01,0.44,4.80,8.79,5.16,6.01,6.37,6.77,7.26,1.13,2.82,0.48,0.08,0.36,0.32,0.20,0.60,0.20,0.16,0.16,0.32,0.00,0.16,0.65,0.36,0.65,0.44,0.77,0.48,0.04,0.08,...,0.0241,0.0243,0.0238,0.0245,0.0252,0.0243,0.0243,0.0237,0.0237,0.119,0.127,0.753,0.254,0.451,0.410,0.088,0.065,0.127,0.119,0.753,0.478,0.255,0.394,0.131,0.048,0.361,0.289,0.350,0.402,0.352,0.274,0.256,0.599,0.401,0.480,-0.524,-0.031,-0.483,0.105,extracellular
155,5.50,0.23,5.05,5.96,5.50,4.36,3.44,5.50,8.94,10.55,3.44,6.88,3.21,5.73,2.29,8.03,5.73,6.19,0.46,2.98,0.23,0.00,0.23,0.00,0.46,0.23,0.23,0.00,0.69,0.92,0.23,0.23,0.00,0.46,0.00,0.69,0.46,0.46,0.00,0.00,...,0.0243,0.0227,0.0240,0.0247,0.0240,0.0230,0.0239,0.0260,0.0234,0.147,0.110,0.743,0.227,0.447,0.353,0.032,0.089,0.110,0.147,0.743,0.463,0.248,0.349,0.138,0.037,0.459,0.266,0.275,0.383,0.358,0.268,0.181,0.452,0.548,0.101,-0.310,-0.326,-0.426,0.104,extracellular
156,7.09,6.56,5.25,4.99,1.84,5.77,2.62,3.41,2.89,15.22,3.15,1.57,8.66,8.40,4.20,5.25,4.20,5.77,2.10,1.05,0.26,0.53,0.26,0.53,0.00,0.79,0.26,0.53,0.26,1.58,0.53,0.00,0.00,0.53,0.26,0.00,0.00,0.53,0.26,0.00,...,0.0252,0.0239,0.0232,0.0257,0.0248,0.0249,0.0229,0.0243,0.0250,0.097,0.102,0.801,0.255,0.530,0.459,0.087,0.050,0.102,0.097,0.801,0.580,0.199,0.339,0.094,0.097,0.486,0.249,0.265,0.478,0.318,0.249,0.247,0.501,0.499,-0.178,-0.594,-0.064,-0.345,0.194,extracellular
157,6.44,7.20,7.20,12.88,2.65,4.55,3.03,4.17,12.12,9.85,2.27,3.79,4.55,1.89,3.03,4.55,2.27,4.17,1.89,1.52,0.76,0.38,1.14,0.38,0.38,0.00,0.00,0.00,1.14,0.38,0.00,0.76,0.38,0.38,0.00,0.38,0.00,0.00,0.00,0.00,...,0.0220,0.0246,0.0261,0.0242,0.0231,0.0249,0.0240,0.0259,0.0261,0.182,0.201,0.617,0.174,0.405,0.337,0.045,0.061,0.201,0.182,0.617,0.455,0.265,0.333,0.068,0.095,0.515,0.239,0.246,0.409,0.424,0.182,0.227,0.447,0.553,0.525,-0.278,-0.056,-0.393,-0.002,extracellular


In [None]:
df.to_csv('aac_dpc_paaac_pcp.csv')

In [None]:
#! cp aac_dpc_paac_pcp.csv '/content/gdrive/MyDrive/Colab Notebooks/data'

#I was already in the same directory

In [None]:
! ls "/content/gdrive/My Drive/Colab Notebooks/data"

Now that we have generated some key features, we can use this dataframe for the machine learning step