# Reskilling for the AI Age: Preparing Scientists for an Evolving R&D Landscape
This notebook demonstrates key concepts and practical implementations for scientists adapting to AI-driven research environments.

## Setup and Required Libraries
First, let's import the necessary Python packages we'll use throughout this notebook.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import StandardScaler

# Set style for visualizations
plt.style.use('seaborn')
sns.set_palette('husl')

# Configure warnings
import warnings
warnings.filterwarnings('ignore')

## 1. Example: AI in Genomics Research
Let's demonstrate a simple machine learning workflow using a genomics dataset.

In [None]:
# Generate synthetic genomic data for demonstration
np.random.seed(42)

# Create synthetic features and target
X = np.random.randn(1000, 10)  # 10 genomic features
y = (X[:, 0] + X[:, 1] > 0).astype(int)  # Binary classification target

# Convert to DataFrame with meaningful column names
feature_names = [f'Gene_{i}' for i in range(10)]
df = pd.DataFrame(X, columns=feature_names)
df['Target'] = y

# Display first few rows
print("Sample of genomic data:")
print(df.head())

# Basic statistics
print("\nBasic statistics:")
print(df.describe().round(2))

## 2. Model Training and Evaluation
Now we'll train a decision tree classifier and evaluate its performance.

In [None]:
# Prepare data
X = df.drop('Target', axis=1)
y = df['Target']

# Split dataset
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

try:
    # Train model
    model = DecisionTreeClassifier(random_state=42)
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    
    # Print classification report
    print("Model Performance:")
    print(classification_report(y_test, y_pred))
    
    # Feature importance visualization
    plt.figure(figsize=(10, 6))
    importance = pd.DataFrame({
        'feature': feature_names,
        'importance': model.feature_importances_
    })
    sns.barplot(data=importance, x='importance', y='feature')
    plt.title('Feature Importance in Genomic Classification')
    plt.tight_layout()
    
except Exception as e:
    print(f"An error occurred: {str(e)}")

## Best Practices and Tips
1. Always split your data into training and testing sets
2. Implement proper error handling
3. Visualize results for better interpretation
4. Document your code and methodology
5. Use standardized data preprocessing techniques

## Conclusion
This notebook demonstrated basic AI implementation in scientific research, focusing on:
- Data preparation and preprocessing
- Model training and evaluation
- Visualization of results
- Error handling and best practices

These skills form the foundation for scientists adapting to AI-driven research environments.