### Python Exercise: Converting a Script to a Package

In this exercise, you'll learn how to convert a Python script into a Python package. Packaging your code is an essential skill in software development as it promotes code reusability, maintainability, and distribution.

The goal of the exercise is to take the existing `amremover_script.py` file, which contains functions for removing atom mapping numbers from SMILES (Simplified Molecular Input Line Entry System) strings and canonicalizing them, and restructure it into a Python package called `amremover_package`.

There are situations where you would like to have your SMILES without atom-mapping. Although, you could remove them by hand for a few reactions, it is convenient to automate atom-mapping removal with a Python tool. 

The `amremover_script.py` file contains the following code (to view the full code open the actual file):

```python
# amremover_script.py
import re
from rdkit import Chem

def remove_atom_mapping(smiles: str) -> str:
    ...

def canonicalize_smiles(smiles: str) -> str:
    ...

def remove_atom_mapping_and_canonicalize_rxn_smiles(smiles: str) -> str:
    ...

rxn_smiles_with_atom_mapping = '[CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][N:8](Cc2ccccc2)[CH2:13][CH2:12]1.C1CCCCC1>[OH-].[OH-].[Pd+2].CCO>[CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][NH:8][CH2:13][CH2:12]1'

print(f"RXN SMILES with atom mapping: {rxn_smiles_with_atom_mapping}")
print("*** Remove atom mapping ***")
rxn_smiles_without_atom_mapping = remove_atom_mapping_and_canonicalize_rxn_smiles(rxn_smiles_with_atom_mapping)
print(f"RXN SMILES without atom mapping: {rxn_smiles_without_atom_mapping}")
```

We can run this script even from a Jupyter notebook, using the `!` character to execute code in the shell. 


In [1]:
!pip install rdkit

Collecting rdkit
  Downloading rdkit-2023.9.5-cp310-cp310-macosx_11_0_arm64.whl.metadata (3.9 kB)
Collecting numpy (from rdkit)
  Using cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl.metadata (61 kB)
Collecting Pillow (from rdkit)
  Using cached pillow-10.3.0-cp310-cp310-macosx_11_0_arm64.whl.metadata (9.2 kB)
Downloading rdkit-2023.9.5-cp310-cp310-macosx_11_0_arm64.whl (27.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m4.8 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hUsing cached numpy-1.26.4-cp310-cp310-macosx_11_0_arm64.whl (14.0 MB)
Using cached pillow-10.3.0-cp310-cp310-macosx_11_0_arm64.whl (3.4 MB)
Installing collected packages: Pillow, numpy, rdkit
Successfully installed Pillow-10.3.0 numpy-1.26.4 rdkit-2023.9.5


In [2]:
!python amremover_script.py

RXN SMILES with atom mapping: [CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][N:8](Cc2ccccc2)[CH2:13][CH2:12]1.C1CCCCC1>[OH-].[OH-].[Pd+2].CCO>[CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][NH:8][CH2:13][CH2:12]1
*** Remove atom mapping ***
RXN SMILES without atom mapping: C1CCCCC1.CS(=O)(=O)N1CCN(Cc2ccccc2)CC1>CCO.[OH-].[OH-].[Pd+2]>CS(=O)(=O)N1CCNCC1


Or import the function and execute it here:

In [None]:
from amremover_script import remove_atom_mapping_and_canonicalize_rxn_smiles

rxn_smiles_with_atom_mapping = '[CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][N:8](Cc2ccccc2)[CH2:13][CH2:12]1.C1CCCCC1>[OH-].[OH-].[Pd+2].CCO>[CH3:17][S:14](=[O:15])(=[O:16])[N:11]1[CH2:10][CH2:9][NH:8][CH2:13][CH2:12]1'

print(f"RXN SMILES with atom mapping: {rxn_smiles_with_atom_mapping}")
print("*** Remove atom mapping ***")
rxn_smiles_without_atom_mapping = remove_atom_mapping_and_canonicalize_rxn_smiles(rxn_smiles_with_atom_mapping)
print(f"RXN SMILES without atom mapping: {rxn_smiles_without_atom_mapping}")



While this script works as intended, it lacks the structure and organization that a Python package provides. By converting it into a package, you'll learn how to organize your code into modules, set up package metadata, and create a distributable version of your software.

To get started, you'll be provided with a minimal template for the `amremover_package` folder. Your task is to:

1. Analyze the existing code in `amremover_script.py`.
2. Determine the appropriate file structure and module organization for the package.
3. Move the code from `amremover_script.py` into the corresponding files within the package.
4. Make sure to specify `rdkit` as a dependency. 
5. Ensure that the package is properly configured and can be installed using `pip install -e .`.

Throughout the exercise, you'll learn about essential package components like `pyproject.toml`, `__init__.py`, `if __name__ == '__main__:'`, and module organization. By the end, you'll have a better understanding of how to structure and distribute your Python code as a reusable package.

You can always come back here and check whether you have managed to create the package successfully:

In [9]:

!pip install ./amremover_package2

from amremover_package2.src.amremover_package.amremover_module import remove_atom_mapping_and_canonicalize_rxn_smiles

rxn_smiles_with_atom_mapping = 'CCN(C(C)C)C(C)C.[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][NH:12][CH2:13][CH2:14]1.[O:8]=[c:4]1[cH:3][c:2](Cl)[n:7][cH:6][nH:5]1>CCC(C)O>[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][N:12]([c:2]2[cH:3][c:4](=[O:8])[nH:5][cH:6][n:7]2)[CH2:13][CH2:14]1'

rxn_smiles_without_atom_mapping = remove_atom_mapping_and_canonicalize_rxn_smiles(rxn_smiles_with_atom_mapping)

print(rxn_smiles_without_atom_mapping)

assert rxn_smiles_without_atom_mapping == 'CCN(C(C)C)C(C)C.O=C(OCc1ccccc1)N1CCNCC1.O=c1cc(Cl)nc[nH]1>CCC(C)O>O=C(OCc1ccccc1)N1CCN(c2cc(=O)[nH]cn2)CC1'

Processing ./amremover_package2
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: amremover_package
  Building wheel for amremover_package (pyproject.toml) ... [?25ldone
[?25h  Created wheel for amremover_package: filename=amremover_package-0.0.1-py2.py3-none-any.whl size=1993 sha256=c0ae82efa710393ebcdfca9f2b09d298495e25848842888451ae5a273ae92d09
  Stored in directory: /private/var/folders/s7/8xqqc8_x7g38j5x2ddblrgx00000gn/T/pip-ephem-wheel-cache-jtet4vj9/wheels/95/ab/af/89772fe5808101c525e93d28dbddd7695389184138d8ac8192
Successfully built amremover_package
Installing collected packages: amremover_package
  Attempting uninstall: amremover_package
    Found existing installation: amremover_package 0.0.1
    Uninstalling amremover_package-0.0.1:
      Successfully uninstalled amremover_package-0.0.1
Successfully installed amremover_

In [11]:
!pip install ./amremover_package3

from amremover_package3.src.amremover_package.amremover_module import remove_atom_mapping_and_canonicalize_rxn_smiles

rxn_smiles_with_atom_mapping = 'CCN(C(C)C)C(C)C.[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][NH:12][CH2:13][CH2:14]1.[O:8]=[c:4]1[cH:3][c:2](Cl)[n:7][cH:6][nH:5]1>CCC(C)O>[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][N:12]([c:2]2[cH:3][c:4](=[O:8])[nH:5][cH:6][n:7]2)[CH2:13][CH2:14]1'

rxn_smiles_without_atom_mapping = remove_atom_mapping_and_canonicalize_rxn_smiles(rxn_smiles_with_atom_mapping)

print(rxn_smiles_without_atom_mapping)

assert rxn_smiles_without_atom_mapping == 'CCN(C(C)C)C(C)C.O=C(OCc1ccccc1)N1CCNCC1.O=c1cc(Cl)nc[nH]1>CCC(C)O>O=C(OCc1ccccc1)N1CCN(c2cc(=O)[nH]cn2)CC1'

Processing ./amremover_package3
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Collecting typer (from amremover_package==0.0.1)
  Downloading typer-0.12.3-py3-none-any.whl.metadata (15 kB)
Collecting click>=8.0.0 (from typer->amremover_package==0.0.1)
  Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting shellingham>=1.3.0 (from typer->amremover_package==0.0.1)
  Downloading shellingham-1.5.4-py2.py3-none-any.whl.metadata (3.5 kB)
Collecting rich>=10.11.0 (from typer->amremover_package==0.0.1)
  Downloading rich-13.7.1-py3-none-any.whl.metadata (18 kB)
Collecting markdown-it-py>=2.2.0 (from rich>=10.11.0->typer->amremover_package==0.0.1)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich>=10.11.0->typer->amremover_package==0.0.1)
  Downloading mdurl-0.1.2-py3-none-any.whl.metad

# What could be a next step (optional, advanced)

Explore [Typer](https://typer.tiangolo.com) to add a command line interface to the amremover_package. 

The goal of this part would be to be able to run:

!amremover "CCN(C(C)C)C(C)C.[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][NH:12][CH2:13][CH2:14]1.[O:8]=[c:4]1[cH:3][c:2](Cl)[n:7][cH:6][nH:5]1>CCC(C)O>[O:16]=[C:15]([O:17][CH2:18][c:19]1[cH:20][cH:21][cH:22][cH:23][cH:24]1)[N:9]1[CH2:10][CH2:11][N:12]([c:2]2[cH:3][c:4](=[O:8])[nH:5][cH:6][n:7]2)[CH2:13][CH2:14]1" 

and it should print the canonicalized reaction SMILES without atom mapping: 
```
CCN(C(C)C)C(C)C.O=C(OCc1ccccc1)N1CCNCC1.O=c1cc(Cl)nc[nH]1>CCC(C)O>O=C(OCc1ccccc1)N1CCN(c2cc(=O)[nH]cn2)CC1
```