# AI-Powered Drug Checking Services Analysis

## Comprehensive Tutorial: Fixed-Site vs Festival Services in Australia

This notebook demonstrates the full AI-driven research pipeline including:
- Quantitative analysis with advanced statistics
- AI-powered qualitative analysis (NLP)
- Machine learning predictions
- Network analysis
- Policy recommendations

**Author:** AI Research Team  
**Date:** 2024-2025  
**Version:** 1.0

## Setup and Imports

In [None]:
# Standard libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sys
import os
from datetime import datetime

# Add src to path
sys.path.insert(0, '../src')

# Import our modules
from analysis import DrugCheckingAnalyzer
from qualitative_analysis import QualitativeAnalyzer
from ai_nlp_analysis import AIQualitativeAnalyzer
from ml_predictive_models import NPSTrendPredictor, AnomalyDetector, SubstanceClusterAnalyzer
from ai_research_assistant import ResearchAssistant
from network_analysis import SubstanceNetworkAnalyzer

# Configure display
pd.set_option('display.max_columns', None)
plt.style.use('seaborn-v0_8-darkgrid')
%matplotlib inline

print("âœ“ All modules imported successfully!")

## 1. Data Generation

First, let's generate our datasets (or load existing data).

In [None]:
# Generate data if not already present
from generate_data import main as generate_quant_data
from generate_qualitative_data import main as generate_qual_data

print("Generating datasets...")
generate_quant_data()
generate_qual_data()
print("âœ“ Data generated!")

## 2. Load and Explore Data

In [None]:
# Load quantitative data
quant_df = pd.read_csv('../data/combined_data.csv')
quant_df['date'] = pd.to_datetime(quant_df['date'])

print(f"Quantitative Data: {len(quant_df)} samples")
print(f"Service Types: {quant_df['service_type'].unique()}")
print(f"Date Range: {quant_df['date'].min()} to {quant_df['date'].max()}")
print(f"Unique Substances: {quant_df['substance_detected'].nunique()}")

quant_df.head()

In [None]:
# Load qualitative data
qual_df = pd.read_csv('../data/all_interviews.csv')
qual_df['interview_date'] = pd.to_datetime(qual_df['interview_date'])

print(f"Qualitative Data: {len(qual_df)} interviews")
print(f"Participant Types: {qual_df['participant_type'].unique()}")
print(f"Service Types: {qual_df['service_type'].unique()}")

qual_df.head()

## 3. Quantitative Analysis

### 3.1 Basic Comparison

In [None]:
analyzer = DrugCheckingAnalyzer(dataframe=quant_df)

# Get service comparison
comparison = analyzer.get_service_comparison()

# Display as DataFrame for better visualization
comparison_df = pd.DataFrame(comparison).T
comparison_df

### 3.2 Diversity Analysis

In [None]:
# Calculate diversity indices
diversity_results = {}
for service_type in quant_df['service_type'].unique():
    diversity_results[service_type] = analyzer.calculate_diversity_index(service_type)

diversity_df = pd.DataFrame(diversity_results).T
print("Diversity Indices by Service Type:")
print(diversity_df)

# Visualize
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

# Shannon Diversity
diversity_df['shannon_diversity'].plot(kind='bar', ax=axes[0], color=['#1f77b4', '#ff7f0e'])
axes[0].set_title('Shannon Diversity Index', fontsize=12, fontweight='bold')
axes[0].set_ylabel('Index Value')

# Simpson Diversity
diversity_df['simpson_diversity'].plot(kind='bar', ax=axes[1], color=['#1f77b4', '#ff7f0e'])
axes[1].set_title('Simpson Diversity Index', fontsize=12, fontweight='bold')
axes[1].set_ylabel('Index Value')

# Species Richness
diversity_df['species_richness'].plot(kind='bar', ax=axes[2], color=['#1f77b4', '#ff7f0e'])
axes[2].set_title('Species Richness', fontsize=12, fontweight='bold')
axes[2].set_ylabel('Unique Substances')

plt.tight_layout()
plt.show()

## 4. AI-Powered NLP Analysis

### 4.1 Sentiment Analysis

In [None]:
ai_analyzer = AIQualitativeAnalyzer(dataframe=qual_df)

# Perform sentiment analysis
sentiment_results = ai_analyzer.perform_sentiment_analysis()

print("Sentiment Analysis Results:")
for service_type, data in sentiment_results.get('by_service_type', {}).items():
    print(f"\n{service_type}:")
    print(f"  Average Sentiment: {data.get('avg_sentiment', 0):.3f}")
    print(f"  Positive Responses: {data.get('positive_responses', 0)}")
    print(f"  Negative Responses: {data.get('negative_responses', 0)}")
    print(f"  Neutral Responses: {data.get('neutral_responses', 0)}")

### 4.2 Topic Modeling

In [None]:
# Perform topic modeling
topic_results = ai_analyzer.perform_topic_modeling(n_topics=5, method='lda')

if 'topics' in topic_results and 'error' not in topic_results:
    print("Discovered Topics:\n")
    for topic_name, topic_data in topic_results['topics'].items():
        print(f"{topic_name}:")
        print(f"  Keywords: {', '.join(topic_data['keywords'][:8])}\n")

### 4.3 Named Entity Recognition

In [None]:
entities = ai_analyzer.extract_named_entities()

print("Most Mentioned Substances:")
for substance, count in list(entities['substances'].items())[:10]:
    print(f"  {substance}: {count}")

print("\nService Type Mentions:")
for service, count in entities['service_types'].items():
    print(f"  {service}: {count}")

## 5. Machine Learning Predictions

### 5.1 NPS Trend Forecasting

In [None]:
predictor = NPSTrendPredictor(dataframe=quant_df)

# Forecast for both service types
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

for idx, service_type in enumerate(['Fixed-site', 'Festival']):
    if service_type in quant_df['service_type'].unique():
        forecast = predictor.forecast_nps_trend(service_type, periods_ahead=6)
        
        if 'error' not in forecast:
            # Plot predictions
            periods = list(range(1, len(forecast['predictions']) + 1))
            axes[idx].plot(periods, forecast['predictions'], 'o-', label='Forecast', linewidth=2)
            
            if 'lower_bound' in forecast:
                axes[idx].fill_between(periods, forecast['lower_bound'], 
                                     forecast['upper_bound'], alpha=0.3, label='95% CI')
            
            axes[idx].axhline(y=forecast['current_rate'], color='r', 
                            linestyle='--', label='Current Rate')
            axes[idx].set_title(f'{service_type} NPS Forecast', fontsize=12, fontweight='bold')
            axes[idx].set_xlabel('Periods Ahead')
            axes[idx].set_ylabel('NPS Detection Rate')
            axes[idx].legend()
            axes[idx].grid(True, alpha=0.3)
            
            print(f"{service_type} Forecast:")
            print(f"  Current Rate: {forecast['current_rate']:.1%}")
            print(f"  Predicted Trend: {forecast['trend'].upper()}")
            print(f"  6-Period Forecast: {forecast['predictions'][-1]:.1%}\n")

plt.tight_layout()
plt.show()

### 5.2 Anomaly Detection

In [None]:
anomaly_detector = AnomalyDetector(dataframe=quant_df)

# Detect emerging substances
emerging = anomaly_detector.detect_emerging_substances(threshold_days=30)

print("ðŸš¨ Emerging Substances (Last 30 Days):\n")
for service_type, data in emerging.items():
    print(f"{service_type}: {data['emerging_count']} new substances")
    for sub in data['substances'][:5]:
        print(f"  - {sub['substance']} (NPS: {sub['is_nps']}) - First detected: {sub['first_detected']}")
    print()

### 5.3 Substance Clustering

In [None]:
cluster_analyzer = SubstanceClusterAnalyzer(dataframe=quant_df)
clusters = cluster_analyzer.cluster_by_detection_patterns(n_clusters=4)

if 'error' not in clusters:
    print("Substance Clusters:\n")
    for cluster_name, data in clusters.items():
        print(f"{cluster_name} ({data['count']} substances):")
        print(f"  NPS %: {data['characteristics']['nps_percentage']:.1f}%")
        print(f"  Fixed-site %: {data['characteristics']['fixed_site_percentage']:.1f}%")
        print(f"  Substances: {', '.join(data['substances'][:5])}")
        if len(data['substances']) > 5:
            print(f"  ... and {len(data['substances']) - 5} more")
        print()

## 6. Network Analysis

### 6.1 Substance Co-occurrence Networks

In [None]:
network_analyzer = SubstanceNetworkAnalyzer(dataframe=quant_df)
networks = network_analyzer.build_temporal_cooccurrence_network(time_window='W', min_cooccurrence=2)

print("Co-occurrence Network Analysis:\n")
for service_type, data in networks.items():
    if 'error' not in data:
        print(f"{service_type}:")
        if 'nodes' in data:
            print(f"  Network Size: {data['nodes']} substances, {data['edges']} connections")
            print(f"  Network Density: {data['density']:.3f}")
            print(f"  Average Clustering: {data['avg_clustering']:.3f}")
            
            if data['top_central_substances']:
                print(f"  Most Central Substances:")
                for substance, centrality in data['top_central_substances'][:5]:
                    print(f"    - {substance}: {centrality:.3f}")
        else:
            print(f"  Unique Substances: {data['unique_substances']}")
            print(f"  Co-occurrence Pairs: {data['cooccurrence_pairs']}")
        print()

### 6.2 NPS Diffusion Analysis

In [None]:
diffusion = network_analyzer.analyze_nps_diffusion()

print("NPS Diffusion Patterns:\n")
for service_type, data in diffusion.items():
    print(f"{service_type}:")
    print(f"  Total NPS Types: {data['total_nps_types']}")
    print(f"  Avg New NPS per Month: {data['avg_new_nps_per_month']:.2f}")
    if data['first_nps_detected']:
        print(f"  First Detection: {data['first_nps_detected']}")
    if data['latest_nps_detected']:
        print(f"  Latest Detection: {data['latest_nps_detected']}")
    print()

## 7. AI Research Assistant

### 7.1 Generate Research Questions

In [None]:
assistant = ResearchAssistant(quantitative_data=quant_df, qualitative_data=qual_df)

research_questions = assistant.generate_research_questions()

print("AI-Generated Research Questions:\n")
for category, questions in research_questions.items():
    if questions:
        print(f"{category.replace('_', ' ').title()}:")
        for i, q in enumerate(questions[:3], 1):
            print(f"  {i}. {q}")
        print()

### 7.2 Generate Hypotheses

In [None]:
hypotheses = assistant.generate_hypotheses()

print("Testable Hypotheses:\n")
for h in hypotheses[:3]:
    print(f"{h['id']}: {h['hypothesis']}")
    print(f"  Type: {h['type']}")
    print(f"  Suggested tests: {', '.join(h['suggested_tests'])}")
    print(f"  Implications: {h['implications']}")
    print()

### 7.3 Key Insights

In [None]:
insights = assistant.extract_key_insights()

print("Key Insights:\n")
for category, insights_list in insights.items():
    if insights_list:
        print(f"{category.replace('_', ' ').title()}:")
        for insight in insights_list[:2]:
            print(f"  â€¢ {insight['insight']}")
            if 'implication' in insight:
                print(f"    â†’ {insight['implication']}")
        print()

### 7.4 Policy Recommendations

In [None]:
recommendations = assistant.generate_policy_recommendations()

print("Policy Recommendations:\n")
for rec in recommendations[:3]:
    print(f"{rec['id']} [{rec['priority'].upper()}]: {rec['recommendation']}")
    print(f"\n  Rationale:")
    for r in rec['rationale'][:2]:
        print(f"    â€¢ {r}")
    print(f"\n  Expected Outcomes:")
    for outcome in rec['expected_outcomes'][:2]:
        print(f"    âœ“ {outcome}")
    print("\n" + "="*80 + "\n")

## 8. Comprehensive Report Generation

In [None]:
# Generate and save comprehensive research report
report_path = '../outputs/ai_research_report.txt'
assistant.export_research_report(report_path)

print(f"âœ“ Comprehensive research report saved to: {report_path}")
print("\nReport includes:")
print("  - Research questions")
print("  - Testable hypotheses")
print("  - Key insights")
print("  - Policy recommendations")

## Summary

This tutorial demonstrated the complete AI-driven research pipeline for analyzing drug checking services:

1. **Quantitative Analysis**: Statistical comparisons, diversity indices, early warning indicators
2. **AI-Powered NLP**: Sentiment analysis, topic modeling, named entity recognition
3. **Machine Learning**: Trend forecasting, anomaly detection, substance clustering
4. **Network Analysis**: Co-occurrence patterns, NPS diffusion, temporal evolution
5. **Research Assistant**: Automated hypothesis generation, insight extraction, policy recommendations

### Key Findings:

- Fixed-site services show higher substance diversity and NPS detection rates
- Different service models serve complementary roles in harm reduction
- AI analysis reveals hidden patterns in qualitative data
- Predictive models enable proactive public health responses
- Network analysis illuminates drug market dynamics

### Next Steps:

- Integrate with real-world data
- Implement real-time monitoring dashboards
- Develop automated alert systems
- Expand predictive modeling capabilities
- Conduct longitudinal studies