# AMASE Example Notebook: Complete run_assignment Guide

This notebook demonstrates how to use the `run_assignment` function with all available parameters.

For more information about AMASE, see: https://pubs.acs.org/doi/10.1021/acs.jpca.4c03580

## Import AMASE

In [None]:
import amase
import pandas as pd

## Example 1: Basic Usage (Required Parameters Only)

This is the simplest way to run AMASE with just the required parameters.

**Important:**  The directory_path must contain the files downloaded from the Dropbox repository linked in the README

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory", 
    sigma_threshold=5.0,
    temperature=5.0 #rotational temperature of experiment
)

## Example 2: Using Local Catalogs

If you have local .cat files (not in CDMS or JPL), you can include them in the analysis.

**Requirements:**
- Place all `.cat` files in a single directory
- Catalogs should be generated at T = 300 K
- Create a CSV with columns: `name` (without .cat extension), `smiles`, `iso` (number of isotopically substituted atoms)

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory",
    sigma_threshold=5.0,
    temperature=5.0,
    local_catalogs_enabled=True,
    local_directory="/path/to/local/catalogs",
    local_df="/path/to/local_metadata.csv"
)

## Example 3: Restricting Valid Atoms and Setting Assignment Strictness

Specify which atoms are allowed in the molecular candidates. Also the stricter parameter determines how strict the algorithm is with regards to molecular filtering during the fitting stage. If set to True (which is the default), an additional check will be done to try and filter out false positive assignments. Can set this to True if getting too many false positive assignments and can set it to False if not enough molecules are being assigned.

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory",
    sigma_threshold=5.0,
    temperature=300.0,
    valid_atoms=['C', 'H', 'N', 'O'],  # Only molecules with C, H, N, O
    stricter = True #strictness of assignments, default is True, if not enough molecules assigned, set to False.
)

## Example 4: Using Structural Information

Enable structural consideration with known starting molecules (chemical priors) to initialize the graph calculation. Note that including starting_molecules is optional. 

Starting molecules should be provided as SMILES strings. For example:
- `'CCO'` = ethanol
- `'CO'` = methanol
- `'CC(C)=O'` = acetone

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory",
    sigma_threshold=5.0,
    temperature=300.0,
    consider_structure=True,
    starting_molecules=['CCO', 'CO', 'CC(C)=O'],  # SMILES strings
)

## Example 5: Manual SMILES Input

Enable interactive prompts to manually input SMILES strings for molecules that lack stored SMILES data.

If `manual_add_smiles=False`, molecules without stored SMILES will be ignored.

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory",
    sigma_threshold=5.0,
    temperature=300.0,
    manual_add_smiles=True,  # Prompt for SMILES input when needed
    stricter = False #less strict molecule filtering
)

## Example 6: Forcing Molecule Exclusion/Inclusion

Exclude specific molecules from consideration (useful for removing false positives). 

Or force the code to include a certain molecule in the fit to test its presence.

Molecule names should match those in the CDMS/JPL CSV files or your local catalog directory. Separate the molecule names with commas.

In [None]:
amase.run_assignment(
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/file/directory",
    sigma_threshold=5.0,
    temperature=300.0,
    force_ignore_molecules=['(CH3)2CO v=0'],  # Exclude this molecule. Name must match the cdms and jpl .csv files donwloaded from the Dropbox repository
    force_include_molecules = ['c-C6H5CN', 'HCOOCH2D']
)

## Example 7: Complete Example with All Parameters

This example demonstrates using all available parameters together.

In [None]:
amase.run_assignment(
    # Required parameters
    spectrum_path="/path/to/your/spectrum.txt",
    directory_path="/path/to/output/directory",
    sigma_threshold=5.0,
    temperature=300.0,
    
    # Local catalogs (optional)
    local_catalogs_enabled=True,
    local_directory="/path/to/local/catalogs",
    local_df="/path/to/local_metadata.csv",
    
    # Atom filtering (optional)
    valid_atoms=['C', 'O', 'H', 'N', 'S'],
    
    # Structural analysis (optional)
    consider_structure=True,
    starting_molecules=['CCO', 'CC(=O)O'],  # SMILES strings
    
    # Interactive SMILES input (optional)
    manual_add_smiles=False,
    
    # Force ignore or include specific molecules (optional)
    force_ignore_molecules=['CH3OCHO vt=0,1'],
    force_include_molecules=['(CH3)2CO v=0', '34SO2']

    stricter = True #stricter molecule filtering (optional, default is True)
)

## Parameter Descriptions

### Required Parameters

- **`spectrum_path`** (str): Path to the spectrum .txt file with two columns (frequency in MHz, intensity) and no header
- **`directory_path`** (str): Directory path for output and data files. Must include the files downloaded from the Dropbox repository
- **`sigma_threshold`** (float): Sigma threshold for peak detection.
- **`temperature`** (float): Temperature in Kelvin

### Optional Parameters

- **`local_catalogs_enabled`** (bool): Whether to use local catalogs. Default: `False`
- **`local_directory`** (str): Directory containing local .cat files. Default: `None`
- **`local_df`** (str): Path to CSV file with local catalog metadata (columns: name, smiles, iso). Default: `None`
- **`valid_atoms`** (list): List of valid atoms for molecules. Default: `['C', 'O', 'H', 'N', 'S']`
- **`consider_structure`** (bool): Whether to consider molecular structure. Useful if you suspect mixture components should be chemically related (i.e. discharge experiments). Default: `False`
- **`starting_molecules`** (list): List of starting molecules (SMILES strings) to initialize the structural relevance graph. Default: `None`
- **`manual_add_smiles`** (bool): Enable interactive prompts to manually input SMILES strings for molecules lacking stored SMILES. Default: `False`
- **`force_ignore_molecules`** (list): Molecule names to force the algorithm to ignore. Name must match the downloaded CDMS and JPL .csv files or local directory of catalogs. Default: `[]`
- **`force_include_molecules`** (list): Molecule names to force the algorithm to include in the fit. Name must match the downloaded CDMS and JPL .csv files or local directory of catalogs. Default: `[]`
- **`stricter`** (bool): If `True` has extra strict molecule filtering during the fitting stage in order to minimize false positive assignments. If too few molecules are being assigned, should set to `False`. Default `True`



## Output Files

After running, AMASE generates several output files in `directory_path`:

1. **`dataset_final.csv`** - Full dataset of all peak frequencies and intensities with molecular candidates
2. **`fit_spectrum.html`** - Interactive plot of all assigned molecules overlaid on observational data
3. **`output_report.txt`** - Detailed description of each line assignment
4. **`final_peak_results.csv`** - Summary table of all line assignments

## Notes

- **Required Data Files**: Download from [Dropbox](https://www.dropbox.com/scl/fo/ycr5qe4mueemtuyoffp9d/ACd8engNRUgVtEERkm_0JSU?rlkey=1tiop6c30zefloyny8ntzelwg&dl=0) and place in `directory_path`
- **Spectrum Format**: Two-column .txt file (frequency in MHz, intensity) with no header
- **Local Catalogs**: Must be generated at T = 300 K for proper interface with molsim
- **SMILES Strings**: Use standard SMILES notation for starting molecules

For questions or issues, contact: **zfried@mit.edu**