# Ranking annotations from METASPACE datasets

In this workflow we will first download METASPACE data using the [python client](https://metaspace2020.readthedocs.io/en/latest/) and
the [metaspace-converter](https://metaspace2020.github.io/metaspace-converter/index.html) package,
and then use the package to rank the lipid candidate annotations using lipid metabolic networks.

First, the required packages are imported

In [2]:
import metaspace
import metaspace_converter
import linex2metaspace as lx2m
import pandas as pd



We will perform the exemplary analysis on a high quality METASPACE dataset with a high number of annotations: [2016-09-21_16h06m56s](https://metaspace2020.eu/annotations?db_id=6&ds=2016-09-21_16h06m56s).

In [3]:
dataset_id = '2016-09-21_16h06m56s'
database = ('HMDB', 'v4')
fdr_cutoff = .1

## Download through the python client

We can download the annotation table through the python client package:

In [4]:
sm = metaspace.SMInstance()
ds = sm.dataset(id=dataset_id)

# Downloading the annotations
annotations = ds.results(database=database, fdr=fdr_cutoff)

The resulting table contains the the required information to perform the network-based lipid ranking: 
* Ion information (`formula` & `adduct`)
* Names of lipid candidates that are parsed and then used for the ranking (`moleculeNames`)

In [7]:
annotations

Unnamed: 0_level_0,Unnamed: 1_level_0,ionFormula,ion,mz,msm,rhoSpatial,rhoSpectral,moc,fdr,offSample,isotopeImages,colocCoeff,moleculeNames,moleculeIds,intensity
formula,adduct,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
C44H86NO8P,+H,C44H87NO8P,C44H86NO8P+H+,788.616362,0.975502,0.985168,0.991898,0.998277,0.05,False,"[{'mz': 788.6163620428745, 'url': 'https://s3....",,"[PC(14:0/22:1(13Z)), PC(14:1(9Z)/22:0), PC(16:...","[HMDB0007887, HMDB0007919, HMDB0007978, HMDB00...",8.213491e+06
C40H80NO8P,+H,C40H81NO8P,C40H80NO8P+H+,734.569412,0.952178,0.958258,0.994579,0.999071,0.05,False,"[{'mz': 734.5694118508743, 'url': 'https://s3....",,"[PC(16:0/16:0), PC(14:0/18:0), PC(18:0/14:0), ...","[HMDB0000564, HMDB0007871, HMDB0008031, HMDB00...",1.067504e+07
C41H83N2O6P,+K,C41H83N2O6PK,C41H83N2O6P+K+,769.562014,0.947626,0.959462,0.988501,0.999153,0.05,False,"[{'mz': 769.5620135848744, 'url': 'https://s3....",,"[SM(d18:0/18:1(11Z)), SM(d18:0/18:1(9Z)), stea...","[HMDB0012088, HMDB0012089, HMDB0062559]",7.441223e+06
C41H83N2O6P,+H,C41H84N2O6P,C41H83N2O6P+H+,731.606132,0.946311,0.964511,0.982116,0.998996,0.05,False,"[{'mz': 731.6061317168744, 'url': 'https://s3....",,"[SM(d18:0/18:1(11Z)), SM(d18:0/18:1(9Z)), stea...","[HMDB0012088, HMDB0012089, HMDB0062559]",2.758408e+06
C26H42O4,+H,C26H43O4,C26H42O4+H+,419.315567,0.937039,0.960247,0.976837,0.998970,0.05,False,"[{'mz': 419.3155666548801, 'url': 'https://s3....",,"[11'-Carboxy-alpha-chromanol, Hexadecyl ferulate]","[HMDB0012515, HMDB0039317]",1.019578e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
C55H96O6,+H,C55H97O6,C55H96O6+H+,853.727948,0.522666,0.574491,0.931053,0.977160,0.10,False,"[{'mz': 853.7279476228744, 'url': 'https://s3....",,"[TG(16:0/16:1(9Z)/20:4(5Z,8Z,11Z,14Z)), TG(16:...","[HMDB0005380, HMDB0005446, HMDB0010450, HMDB00...",1.517550e+05
C24H26O6,+H,C24H27O6,C24H26O6+H+,411.180195,0.517545,0.543960,0.957293,0.993886,0.10,False,"[{'mz': 411.18019538288013, 'url': 'https://s3...",,"[1-Isomangostin, Dulxanthone B, alpha-Mangosti...","[HMDB0029981, HMDB0032729, HMDB0035796, HMDB00...",2.976548e+04
C46H88NO10P,+H,C46H89NO10P,C46H88NO10P+H+,846.621841,0.517033,0.638335,0.812527,0.996856,0.10,False,"[{'mz': 846.6218413468742, 'url': 'https://s3....",,"[PS(16:0/24:1(15Z)), PS(16:1(9Z)/24:0), PS(18:...","[HMDB0112356, HMDB0112370, HMDB0112380, HMDB01...",5.447059e+04
C49H78O4,+H,C49H79O4,C49H78O4+H+,731.597268,0.515719,0.596372,0.866240,0.998291,0.10,False,"[{'mz': 731.5972678068741, 'url': 'https://s3....",,[Ubiquinol 8],[HMDB0001060],3.436138e+05


## Download through the `metaspace-converter` package

Another way to download METASPACE datasets (directly into `AnnData` that contain all information, includin ion images) is the [metaspace-converter](https://metaspace2020.github.io/metaspace-converter/index.html).
This data can also be used to rank lipid annotations.

In [9]:
adata = metaspace_converter.metaspace_to_anndata(dataset_id=dataset_id, database=database, fdr=fdr_cutoff)
adata

100%|████████████████████████████████████████| 202/202 [00:01<00:00, 157.96it/s]


AnnData object with n_obs × n_vars = 13365 × 202
    obs: 'ion_image_pixel_x', 'ion_image_pixel_y', 'ion_image_shape_y', 'ion_image_shape_x'
    var: 'formula', 'adduct', 'ionFormula', 'ion', 'mz', 'msm', 'rhoSpatial', 'rhoSpectral', 'moc', 'fdr', 'offSample', 'isotopeImages', 'colocCoeff', 'moleculeNames', 'moleculeIds', 'intensity'
    uns: 'metaspace'
    obsm: 'spatial'

The `adata.var` table contains the same information as the output of the previously downloaded annotations table (just using a different indexing):

In [11]:
adata.var

Unnamed: 0_level_0,formula,adduct,ionFormula,ion,mz,msm,rhoSpatial,rhoSpectral,moc,fdr,offSample,isotopeImages,colocCoeff,moleculeNames,moleculeIds,intensity
formula_adduct,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
C44H86NO8P+H,C44H86NO8P,+H,C44H87NO8P,C44H86NO8P+H+,788.616362,0.975502,0.985168,0.991898,0.998277,0.05,False,"[{""mz"": 788.6163620428745, ""url"": ""https://s3....",,"[""PC(14:0/22:1(13Z))"", ""PC(14:1(9Z)/22:0)"", ""P...","[""HMDB0007887"", ""HMDB0007919"", ""HMDB0007978"", ...",8.213491e+06
C40H80NO8P+H,C40H80NO8P,+H,C40H81NO8P,C40H80NO8P+H+,734.569412,0.952178,0.958258,0.994579,0.999071,0.05,False,"[{""mz"": 734.5694118508743, ""url"": ""https://s3....",,"[""PC(16:0/16:0)"", ""PC(14:0/18:0)"", ""PC(18:0/14...","[""HMDB0000564"", ""HMDB0007871"", ""HMDB0008031"", ...",1.067504e+07
C41H83N2O6P+K,C41H83N2O6P,+K,C41H83N2O6PK,C41H83N2O6P+K+,769.562014,0.947626,0.959462,0.988501,0.999153,0.05,False,"[{""mz"": 769.5620135848744, ""url"": ""https://s3....",,"[""SM(d18:0/18:1(11Z))"", ""SM(d18:0/18:1(9Z))"", ...","[""HMDB0012088"", ""HMDB0012089"", ""HMDB0062559""]",7.441223e+06
C41H83N2O6P+H,C41H83N2O6P,+H,C41H84N2O6P,C41H83N2O6P+H+,731.606132,0.946311,0.964511,0.982116,0.998996,0.05,False,"[{""mz"": 731.6061317168744, ""url"": ""https://s3....",,"[""SM(d18:0/18:1(11Z))"", ""SM(d18:0/18:1(9Z))"", ...","[""HMDB0012088"", ""HMDB0012089"", ""HMDB0062559""]",2.758408e+06
C26H42O4+H,C26H42O4,+H,C26H43O4,C26H42O4+H+,419.315567,0.937039,0.960247,0.976837,0.998970,0.05,False,"[{""mz"": 419.3155666548801, ""url"": ""https://s3....",,"[""11'-Carboxy-alpha-chromanol"", ""Hexadecyl fer...","[""HMDB0012515"", ""HMDB0039317""]",1.019578e+06
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
C55H96O6+H,C55H96O6,+H,C55H97O6,C55H96O6+H+,853.727948,0.522666,0.574491,0.931053,0.977160,0.10,False,"[{""mz"": 853.7279476228744, ""url"": ""https://s3....",,"[""TG(16:0/16:1(9Z)/20:4(5Z,8Z,11Z,14Z))"", ""TG(...","[""HMDB0005380"", ""HMDB0005446"", ""HMDB0010450"", ...",1.517550e+05
C24H26O6+H,C24H26O6,+H,C24H27O6,C24H26O6+H+,411.180195,0.517545,0.543960,0.957293,0.993886,0.10,False,"[{""mz"": 411.18019538288013, ""url"": ""https://s3...",,"[""1-Isomangostin"", ""Dulxanthone B"", ""alpha-Man...","[""HMDB0029981"", ""HMDB0032729"", ""HMDB0035796"", ...",2.976548e+04
C46H88NO10P+H,C46H88NO10P,+H,C46H89NO10P,C46H88NO10P+H+,846.621841,0.517033,0.638335,0.812527,0.996856,0.10,False,"[{""mz"": 846.6218413468742, ""url"": ""https://s3....",,"[""PS(16:0/24:1(15Z))"", ""PS(16:1(9Z)/24:0)"", ""P...","[""HMDB0112356"", ""HMDB0112370"", ""HMDB0112380"", ...",5.447059e+04
C49H78O4+H,C49H78O4,+H,C49H79O4,C49H78O4+H+,731.597268,0.515719,0.596372,0.866240,0.998291,0.10,False,"[{""mz"": 731.5972678068741, ""url"": ""https://s3....",,"[""Ubiquinol 8""]","[""HMDB0001060""]",3.436138e+05


## Lipid ranking

We can now use this data to perform the lipid ranking algorithm.

We first need to load some information about lipid classes and reactions before we can create the networks:

In [12]:
ref_dict = lx2m.get_lx2_ref_lip_dict()
class_reactions = lx2m.get_organism_combined_class_reactions(ref_dict)

Next, we can perform the ranking. The whole pipeline can be run through one function:

In [13]:
(new_annotations, lipid_graph, annotation_graph) = lx2m.make_lipid_networks(
    ann=annotations, class_reacs=class_reactions,
    lipid_col='moleculeNames', bootstraps=30, verbose=False)

1 / 30
2 / 30
3 / 30
4 / 30
5 / 30
6 / 30
7 / 30
8 / 30
9 / 30
10 / 30
11 / 30
12 / 30
13 / 30
14 / 30
15 / 30
16 / 30
17 / 30
18 / 30
19 / 30
20 / 30
21 / 30
22 / 30
23 / 30
24 / 30
25 / 30
26 / 30
27 / 30
28 / 30
29 / 30
30 / 30
