# Basic Pickaxe Examples
This notebook has basic examples on how to run Pickaxe.

1. Basic Expansion
2. Expansion with Filters
3. Save to MongoDB and get UniProt ID

In [18]:
from minedatabase import pickaxe
from minedatabase import rules
from minedatabase import filters
from minedatabase import databases
import pymongo

# 1. Basic Expansion

## Initialize the Pickaxe Object

In [12]:
# Specify Rules by Fraction Metacyc Covered
rule_list, correactants, _ = rules.metacyc_generalized(fraction_coverage=0.1)
# or by 
rule_list, correactants, _ = rules.metacyc_generalized(n_rules=10)

# These are the base requirements
# More options available to initialize
pk = pickaxe.Pickaxe(rule_list=rule_list, coreactant_list=correactants, errors=False)

----------------------------------------
Intializing pickaxe object

Done intializing pickaxe object
----------------------------------------





## Read the Compounds in and Expand

In [13]:
# Read in compounds
pk.load_compound_set("inputs/YMBD_30.csv")

# Expand with n processors for m generations
pk.transform_all(processes=10, generations=2)

30 compounds loaded...
(30 after removing stereochemistry)
----------------------------------------
Expanding Generation 1

Generation 1: 0 percent complete
Generation 1: 10 percent complete
Generation 1: 20 percent complete
Generation 1: 30 percent complete
Generation 1: 40 percent complete
Generation 1: 50 percent complete
Generation 1: 60 percent complete
Generation 1: 70 percent complete
Generation 1: 80 percent complete
Generation 1: 90 percent complete
Generation 1 finished in 3.0716092586517334 s and contains:
		432 new compounds
		432 new reactions

Done expanding Generation: 1.
----------------------------------------

----------------------------------------
Expanding Generation 2

Generation 2: 0 percent complete
Generation 2: 10 percent complete
Generation 2: 20 percent complete
Generation 2: 30 percent complete
Generation 2: 40 percent complete
Generation 2: 50 percent complete
Generation 2: 60 percent complete
Generation 2: 70 percent complete
Generation 2: 80 percent com

## Save The Data

In [14]:
pk.assign_ids()
pk.write_compound_output_file("data/pk_basic/compounds.tsv")
pk.write_reaction_output_file("data/pk_basic/reactions.tsv")

RDKit ERROR: [15:10:03] Invalid InChI prefix in generating InChI Key
[15:10:03] Invalid InChI prefix in generating InChI Key
RDKit ERROR: [15:10:03] Invalid InChI prefix in generating InChI Key
RDKit ERROR: [15:10:03] Invalid InChI prefix in generating InChI Key
RDKit ERROR: [15:10:03] Invalid InChI prefix in generating InChI Key
RDKit ERROR: [15:10:03] Can't kekulize mol.  Unkekulized atoms: 19 23 26 27 28 29 30 31 33 34
[15:10:03] Invalid InChI prefix in generating InChI Key
[15:10:03] Invalid InChI prefix in generating InChI Key
[15:10:03] Invalid InChI prefix in generating InChI Key
[15:10:03] Can't kekulize mol.  Unkekulized atoms: 19 23 26 27 28 29 30 31 33 34

RDKit ERROR: 
RDKit ERROR: [15:10:03] Invalid InChI prefix in generating InChI Key
[15:10:03] Invalid InChI prefix in generating InChI Key


# Apply Filters
By applying filters it is possible to reduce the pickaxe expansion to a more targeted one. In the following example, two filters (an atomic composition and a molecular weight filter) are used with the previous expansion to reduce the run to contain only compounds with molecular weight less than 300 g/mol, 5 or fewere carbons, and more than one oxygen.

In [15]:
rule_list, correactants, _ = rules.metacyc_generalized(n_rules=10)
pk_filter = pickaxe.Pickaxe(rule_list=rule_list, coreactant_list=correactants)

# Retain only compounds with <= 100 g/mol
mw_filter = filters.MWFilter(min_MW=0, max_MW=300)
# Retain only compounds with specific composition
ac_filter = filters.AtomicCompositionFilter({"C":[0, 5], "O": [1, None]})

# Append filters on to the filter variable in the pickaxe object
pk_filter.filters.extend([mw_filter, ac_filter])

----------------------------------------
Intializing pickaxe object

Done intializing pickaxe object
----------------------------------------





In [16]:
# Read in compounds
pk_filter.load_compound_set("inputs/YMBD_30.csv")

# Expand with n processors for m generations
pk_filter.transform_all(processes=10, generations=2)

30 compounds loaded...
(30 after removing stereochemistry)
----------------------------------------
Filtering Generation 0

Applying filter: Molecular Weight
Filtering Generation 0 with 0 < MW < 300.
23 of 30 compounds remain after applying filter: Molecular Weight--took 0.0s.

Done filtering Generation 0
----------------------------------------

----------------------------------------
Filtering Generation 0

Applying filter: Atomic Composition
Filtering Generation 0 with atomic composition {'C': [0, 5], 'O': [1, None]}.
6 of 23 compounds remain after applying filter: Atomic Composition--took 0.0s.

Done filtering Generation 0
----------------------------------------

----------------------------------------
Expanding Generation 1

Generation 1: 0 percent complete
Generation 1: 17 percent complete
Generation 1: 33 percent complete
Generation 1: 50 percent complete
Generation 1: 67 percent complete
Generation 1: 83 percent complete
Generation 1 finished in 1.155043125152588 s and conta

## Filter Results
The pickaxe expansion without filters yields approximately 10k compounds and 13k reactions. Utilizing the filter greatly reduces this by two orders of magnitude.

In [8]:
print(f"No Filter\n\tTotal Compounds:  {len(pk.compounds)}\n\tTotal Reactions: {len(pk.reactions)}")
print(f"Filter\n\tTotal Compounds:  {len(pk_filter.compounds)}\n\tTotal Reactions: {len(pk_filter.reactions)}")

No Filter
	Total Compounds:  8929
	Total Reactions: 12526
Filter
	Total Compounds:  151
	Total Reactions: 167


# 3. Using MongoDB
It is possible to save the results and recall the results from a MongoDB.


## Save to Mongo
First generate a new pickaxe object 

In [17]:
rule_list, correactants, _ = rules.metacyc_generalized(n_rules=10)
pk = pickaxe.Pickaxe(rule_list=rule_list, coreactant_list=correactants, database="pickaxe_basic", database_overwrite=True)
pk.load_compound_set("./inputs/YMBD_30.csv")
pk.transform_all(10, 2)
# Save to the time. Write core is unecessary for Pickaxe
pk.save_to_mine(processes=1, indexing=True, write_core=False)

----------------------------------------
Intializing pickaxe object

Done intializing pickaxe object
----------------------------------------

30 compounds loaded...
(30 after removing stereochemistry)
----------------------------------------
Expanding Generation 1





Generation 1: 0 percent complete
Generation 1: 10 percent complete
Generation 1: 20 percent complete
Generation 1: 30 percent complete
Generation 1: 40 percent complete
Generation 1: 50 percent complete
Generation 1: 60 percent complete
Generation 1: 70 percent complete
Generation 1: 80 percent complete
Generation 1: 90 percent complete
Generation 1 finished in 2.1425869464874268 s and contains:
		432 new compounds
		432 new reactions

Done expanding Generation: 1.
----------------------------------------

----------------------------------------
Expanding Generation 2

Generation 2: 0 percent complete
Generation 2: 10 percent complete
Generation 2: 20 percent complete
Generation 2: 30 percent complete
Generation 2: 40 percent complete
Generation 2: 50 percent complete
Generation 2: 60 percent complete
Generation 2: 70 percent complete
Generation 2: 80 percent complete
Generation 2: 90 percent complete
Generation 2: 100 percent complete
Generation 2 finished in 40.83272409439087 s and 

## Read in DB

In [20]:
db = databases.MINE("pickaxe_basic")

## Get info

In [27]:
rxns = {val["_id"]: val for val in db.reactions.find({})}
cpds = {val["_id"]: val for val in db.compounds.find({})}
ops = {val["_id"]: val for val in db.operators.find({})}

## Get Uniprot IDs for a Reaction

In [43]:
sample_rxn = "Rcf57b322f34bf94027264a82bf010a137ecaa7f75c71d409e301e848644cafe2"
rxn_ops = rxns[sample_rxn]["Operators"]
uniprot_ids = ops['rule0004']['Comments']


print(f"Operators: {rxn_ops}")
print(f"UniProt IDs for rule0004: {uniprot_ids}")

Operators: ['rule0004']
UniProt IDs for rule0004: A0A1P8W705;A0A250DUW2;A1Z745;A5HMH6;A5HMH7;A5HMH8;A5HMH9;A5HMI0;A5HMI1;A5LGH2;A5YUW2;A5YUW3;A5YUW6;A5YUY2;A5YUY5;A5YUY6;A5YUY7;A5YUZ3;A5YUZ5;A5YUZ6;A5YUZ7;A5YUZ8;A5YUZ9;A5YV00;A5YV01;A5YV02;A5YV03;A5YV04;A5YV05;A5YV06;A5YV08;A5YV10;A5YV11;A5YV12;A5YV13;A5YV14;A5YV15;A5YV16;A5YV18;A5YV19;A5YV20;A5Z0R4;A5Z0R5;A7L9S7;A7L9S8;A7L9S9;A7L9T0;A7LCL0;A7LCL1;A7YVV2;AKR1C2;B2ZFP6;B5TYS8;C0KYN4;C4B644;E0WMN6;E0WMN7;E0WMN8;E1CBX4;E2EB14;ECU0066;G1UBD1;G8FRC5;K4BZH9;K4CEE8;M1JEK6;NCED52;O15229;O15528;O35084;O49814;O68977;O75881;O88867;P00191;P00438;P08683;P08686;P20586;P22869;P27353;P32009;P38169;P38992;P42535;P48635;P51589;P51590;P71875;P72495;Q00456;Q00G65;Q078T0;Q08KD8;Q08KE2;Q0SJK9;Q11PP7;Q25BV9;Q2EMR3;Q2LI72;Q38IC3;Q3LFR2;Q54530;Q57160;Q59971;Q607G3;Q60991;Q6Q8Q7;Q6SSJ6;Q6V9W5;Q6VVW9;Q6VVX0;Q6WG30;Q768T5;Q84HF5;Q84KI1;Q86PM2;Q8ISJ5;Q8KQF0;Q8KQH9;Q8KQI0;Q8PDI2;Q91WN4;Q95NK3;Q95NP6;Q9F131;Q9I0Q0;Q9JKJ9;Q9LTG0;Q9MZS9;Q9R9T1;Q9SZZ8;Q9XS57;Q9ZAU3
