# Visualizing Molecules Within a Jupyter Notebook

The following code has been created for the purpose of visualizing molecules within a Jupyter notebook. In essence, this notebook functions by reading xyz files, converting their contents into a SMILES code with the `ChemML` library, interpreting these SMILES codes as 2D molecular structures with the `RDKit` library, and visualizing the molecules in a grid. There are two main sections for visualization, based on the combinations of the main libraries implemented in each:

- Section I - Visualization With RDKit
- Section II - Visualization With RDKit and mols2grid

Regardless of which section you'd like to access, there are some preliminary things that need to be set up, which are covered in the *Prerequisites* section.

# Prerequisites

In this section, all of the required libraries for the notebook's functioning are imported, and all necessary variables are established with appropriate data. First, you'll do the former:

In [5]:
# import required libraries
#import iodata 
#from iodata import iodata

import rdkit
from rdkit import Chem
from rdkit.Chem import Draw

from chemml.chem import Molecule

import mols2grid

import pandas as pd

import glob
import os

### Relevant Documentations:
 - [RDKit](http://www.rdkit.org/docs/)
 - [ChemML](https://hachmannlab.github.io/chemml)
 - [mols2grid](https://github.com/cbouy/mols2grid)
 - [pandas](https://pandas.pydata.org/docs/)
 - [glob](https://docs.python.org/3/library/glob.html)
 - [os](https://docs.python.org/3/library/os.html)

The `Molecule` class from the`ChemML` library can support as of now only smiles, smarts, inchi, and xyz file formats, according to the `ChemML` documentation. Note, however, that this notebook only makes use of the latter. For further reading, see the ChemML documentation from the list of relevant documentations above.

Next, you need to specify the path to the xyz files you'd like to visualize. This should be done in the input below in place of `FILE_PATH`. An example of this input looks as follows:

`xyz_files = glob.glob('/Users/johnsmith/Desktop/structures/*.xyz')`

**Note**: It will be easiest to identify your molecules within your grid if your xyz files have sensible names. This is due to how names of the molecules are defined within this notebook, which is by taking the name of the file and cutting off the ".xyz" extension from it. For example, "101_IndolebenzeneTshapecomplex.xyz" will be named "101_IndolebenzeneTshapecomplex" in the grids generated. Thus, try not to have your files named like "C46H65N15O12S2.xyz" or anything else nonsensical to you or a third-party reader.

In [6]:
# get names of all .xyz files in desired directory
xyz_files = glob.glob('FILE_PATH')
if not xyz_files:
    raise Warning('''Couldn't find any xyz files via the specified path''')

Warning: Couldn't find any xyz files via the specified path

For the last part of this section, you're going to initialize all the variables you need to use Section I and/or Section II for visualizing your molecules.

In [None]:
# create names for the molecules based on their file names in xyz_files
xyz_names = [os.path.basename(item)[:-4] for item in xyz_files]

# make a list of ChemML Molecule object instances based on the names of the xyz files from xyz_files
molecule_list = [Molecule(file, input_type='xyz') for file in xyz_files]

# alter the ChemML Molecule object instances of molecule_list so that their SMILES codes can be recognzied by RDKit
for molecule in molecule_list:
    molecule.to_smiles(kekuleSmiles=True)

# make a list of visual representations of the molecules in molecule_list using RDKit functionality
drawing_list = [Chem.MolFromSmiles(molecule.smiles) for molecule in molecule_list]

# create a pandas dataframe containing molecule names in one column ("Name") and SMILES codes ("SMILES") in the other
table = pd.DataFrame(data={'Name':xyz_names, 
                         'SMILES':[molecule.smiles for molecule in molecule_list]}, 
                   columns=['Name','SMILES'])

At this point, you're set to move on to either (or both) Sections I and II.

# Section I - Visualization With RDKit

This section can be used for two things:

1. Visualizing a relatively rudimentary grid of 2D molecules based on the contents of your xyz files.
2. Saving this grid as an image.

In [None]:
# define the grid of molecules with drawing_list, the number of molecules per row, the image sizes, and the legend 
# of names of the molecules
grid = Draw.MolsToGridImage(drawing_list,
                          molsPerRow=4, 
                          subImgSize=(290,425), 
                          legends=xyz_names, 
                          returnPNG=False)

# display the grid
grid

**Note**: If the stuctures and/or their names in the grid above aren't spaced as you'd like them, you can likely resolve this by changing the dimensions specified by `subIMGSize=( , )` when initializing the `grid` variable.

In the input below, you can save the grid above as a .png file within the same directory you're currently working in. To do this, you should specify a name for this file in place of `IMG_NAME` in your input (if you don't, the file will be saved as "IMG_NAME.png"). **Note that the file extension must be specified by `.png`**. An example input could look like:

`grid.save('grid_of_molecules.png')`

If you don't wish to save this grid as an image, you can simply skip over the following input.

In [None]:
# save the grid as an image
grid.save('IMG_NAME.png')

# Section II - Visualization With RDKit and mols2grid

This section is similar to Section I, though its function differs in two main ways:

1. The potential amount of chemical information that can be accessed via your grid is increased. By hovering your cursor over the grid generated, an additional window will be displayed, which can be customized to include more information about your molecule, such as its pH or relative density. The amount of information displayed can be controlled by altering the `tooltip` parameter in the input below. In this version of the notebook however, only the name of the molecule can be displayed, so don't alter this parameter for now.
2. The grid generated in this section cannot currently be saved as an image.

In [None]:
# display the grid of molecules based on "tables" pandas dataframe using mols2grid library
mols2grid.display(table, 
                  smiles_col='SMILES', 
                  size=(200, 200), 
                  subset=['Name', 'img'], 
                  n_cols=4, 
                  tooltip=['Name'], 
                  template='table')