## **Chapter 6 – Generative Models for Molecule and Materials Design**



Generative artificial intelligence is empowering chemists to go beyond analyzing existing data and start creating new molecules and materials. In this chapter, we introduce the fundamentals of generative models and their significance in chemical discovery. We discuss how large language models can be leveraged to generate novel molecular structures, and we examine modern workflows that integrate LLMs with specialized chemistry models. Key topics include how generative models are trained and used, strategies for validating and filtering the new molecules they propose, and methods for steering generation toward desired properties. By the end, the reader will understand how LLM-driven generative models can suggest creative solutions in molecule design, what challenges arise such as ensuring chemical validity and synthesizability, and how property optimization is achieved through iterative feedback.

### 6.3 Property Prediction and Optimization
Regardless of the method, a key component of generative design is having a **property predictor** in the loop. This predictor could be:  

- A simple calculator (e.g., computing logP or molecular weight)  
- A machine learning model (e.g., predicting activity or an experimental property)  
- A physics-based model (e.g., quantum chemistry for excitation energy)  

The trade-off is accuracy vs. speed: ML predictors are fast but approximate, while physics-based methods are slower but more precise. A common strategy is to use ML predictors for **guiding generation** and then validate top candidates with more accurate methods.  

To illustrate, here’s a toy **iterative optimization** example: we try to maximize a molecule’s QED score (a proxy for drug-likeness) using hill-climbing—start from a molecule, make random modifications, and accept changes if the QED improves.  


In [None]:
import random
from rdkit import Chem
from rdkit.Chem import QED

# Start from ethane
best_smiles = "CC"
best_score = QED.qed(Chem.MolFromSmiles(best_smiles))

# Run 5 mutation steps
for step in range(5):
    # Propose a mutation: randomly add one atom (C, N, or O) to the SMILES string
    new_smiles = best_smiles + random.choice(["C", "N", "O"])
    mol = Chem.MolFromSmiles(new_smiles)

    if mol:  # only consider valid molecules
        score = QED.qed(mol)
        if score > best_score:
            best_smiles, best_score = new_smiles, score

    print(f"Step {step}: new best = {best_smiles} (QED = {best_score:.2f})")


Step 0: new best = CCN (QED = 0.41)
Step 1: new best = CCNO (QED = 0.42)
Step 2: new best = CCNON (QED = 0.43)
Step 3: new best = CCNON (QED = 0.43)
Step 4: new best = CCNON (QED = 0.43)
