Skip to content

webro12138/PyIM

Repository files navigation

🌐 PyIM - Python Influence Maximization Package

Python Version License Tests Code Style

A Comprehensive Python Package for Influence Maximization, Network Analysis, and Information Diffusion Modeling

InstallationQuick StartDocumentationExamplesAPI Reference


📖 Table of Contents


🌟 Overview

PyIM is a comprehensive Python package designed for researchers and practitioners working on influence maximization, network analysis, and information diffusion modeling in complex networks. It provides state-of-the-art algorithms, multiple diffusion models, and comprehensive evaluation metrics for studying information spread in social networks, biological networks, and other complex systems.

🎯 Key Capabilities

  • 🌐 Network Support: Single and multi-layer networks with full NetworkX integration
  • 🔬 7 Diffusion Models: IC, LT, SI, SIR, SIS, CIC, MLIC with multiprocessing
  • 🧮 5+ Algorithms: Greedy, CELF, Centrality-based, SAW_ASA, SFLA
  • 📈 Evaluation Framework: Comprehensive metrics and benchmarking tools
  • Performance Optimized: Multiprocessing support for large-scale networks
  • 🛡️ Robust: Comprehensive error handling and validation

📊 Use Cases

  • Social Network Analysis: Identify influential users for viral marketing
  • Epidemiology: Model disease spread and identify key intervention points
  • Recommendation Systems: Optimize information dissemination
  • Security: Detect critical nodes for network protection
  • Biological Networks: Study protein interactions and gene regulation

✨ Features

🌐 Network Support

Network Integration

  • Single Layer Networks (SLN): Standard graph structures
  • Multi-Layer Networks (MLN): Multiplex and interconnected networks
  • Full NetworkX Integration: Seamless compatibility with NetworkX ecosystem
  • Network Statistics: Comprehensive analysis and metrics
  • Flexible Operations: Node/edge manipulation, attribute management

📖 Network Module Documentation

🔬 Diffusion Models

Models Multiprocessing

  • Independent Cascade (IC): Classic probabilistic diffusion model
  • Linear Threshold (LT): Threshold-based activation model
  • Susceptible-Infected (SI): Simple epidemic model
  • Susceptible-Infected-Recovered (SIR): Epidemic model with recovery
  • Susceptible-Infected-Susceptible (SIS): Endemic disease model
  • Competitive Independent Cascade (CIC): Multi-competitor diffusion
  • Multi-Layer Independent Cascade (MLIC): Cross-layer diffusion

📖 Diffusion Models Documentation

🧮 Influence Maximization Algorithms

Algorithms Optimization

  • Greedy Algorithm: Classic greedy approach with (1-1/e) approximation
  • Cost-Effective Lazy Forward (CELF): Optimized greedy with lazy evaluation
  • Centrality-Based Selection: Degree, betweenness, closeness, eigenvector
  • Simulated Annealing (SAW_ASA): Metaheuristic optimization
  • Shuffled Frog Leaping (SFLA): Population-based optimization

📖 Algorithms Documentation

📈 Evaluation Framework

Metrics Benchmarking

  • Influence Spread Metrics: Direct diffusion simulation
  • Structural Metrics: Distance, clustering, connectivity
  • Comparative Metrics: Kendall Tau, Jaccard, Spearman
  • Batch Evaluation: Systematic multi-algorithm comparison
  • Performance Benchmarking: Timing and memory profiling

📖 Evaluation Documentation

⚡ Performance Optimization

Speed Scalability Memory

  • Multiprocessing Support: All diffusion models support parallel execution
  • Performance Monitoring: Built-in timing and profiling
  • Memory Efficient: Optimized for large-scale networks
  • Scalable: Tested on networks with 10,000+ nodes
  • Flexible Configuration: Tune performance for your hardware

📖 Performance Guide

🛡️ Robust Error Handling

Reliability Validation

  • Comprehensive Exception Hierarchy: Clear error categorization
  • Informative Error Messages: Easy debugging and troubleshooting
  • Input Validation: Prevent common mistakes
  • Consistent Patterns: Uniform error handling across modules

📖 Error Handling Guide


📦 Installation

🎯 Prerequisites

  • Python: 3.7 or higher
  • pip: Python package manager
  • Operating System: Windows, macOS, or Linux

📥 Installation Methods

Method 1: Install from PyPI (Recommended)

pip install PyIM

Method 2: Install from Source

# Clone the repository
git clone https://github.com/yourusername/PyIM.git
cd PyIM

# Install dependencies
pip install -r PyIM/requirements.txt

# Install the package
pip install -e .

Method 3: Manual Dependency Installation

pip install networkx>=2.8.0
pip install numpy>=1.21.0
pip install rich>=12.0.0
pip install psutil>=5.9.0

✅ Verification

Verify installation by running:

import PyIM
print(f"PyIM version: {PyIM.__version__}")
print(f"PyIM author: {PyIM.__author__}")

Expected output:

PyIM version: 1.0.0
PyIM author: PyIM Team

🚀 Quick Start

📝 Basic Usage Example

from PyIM import SLN, IC, Greedy
from PyIM.diffusionModel import ICWeighter

# Step 1: Create a network
edges = [(1, 2), (2, 3), (3, 4), (4, 1), (5, 6)]
network = SLN("example_network", "undirected", edges=edges)

# Step 2: Set edge weights for diffusion
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
weighter(network)

# Step 3: Create diffusion model
ic_model = IC(MC=1000, verbose=False)

# Step 4: Create influence maximization algorithm
greedy = Greedy(ic_model, verbose=False)

# Step 5: Find optimal seed set
seed_nodes = greedy(network, k=3)

# Step 6: Evaluate influence spread
influence = ic_model(network, seed_nodes)

# Step 7: Display results
print(f"📊 Network Statistics:")
print(f"   Nodes: {network.number_of_nodes()}")
print(f"   Edges: {network.number_of_edges()}")
print(f"   Density: {network.get_network_statistics()['density']:.4f}")
print(f"\n🎯 Influence Maximization Results:")
print(f"   Seed nodes: {seed_nodes}")
print(f"   Influence spread: {influence:.2f}")
print(f"   Coverage: {influence/network.number_of_nodes()*100:.1f}%")

Expected Output:

📊 Network Statistics:
   Nodes: 6
   Edges: 5
   Density: 0.3333

🎯 Influence Maximization Results:
   Seed nodes: [1, 2, 3]
   Influence spread: 2.45
   Coverage: 40.8%

🔧 Advanced Example with Multiprocessing

from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy
import time

# Create a larger network
edges = [(i, i+1) for i in range(1, 100)] + [(1, 50), (50, 100)]
network = SLN("large_network", "undirected", edges=edges)

# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.05)
weighter(network)

# Create model with multiprocessing
ic_model = IC(MC=10000, verbose=True, enable_monitoring=True)

# Test single process
start = time.time()
result_single = ic_model(network, [1, 2, 3], multiprocess=False)
time_single = time.time() - start

# Test multiprocessing
start = time.time()
result_multi = ic_model(network, [1, 2, 3], multiprocess=True, n_processes=4)
time_multi = time.time() - start

# Compare results
print(f"\n⚡ Performance Comparison:")
print(f"   Single process: {time_single:.2f}s, Influence: {result_single:.2f}")
print(f"   Multiprocess:    {time_multi:.2f}s, Influence: {result_multi:.2f}")
print(f"   Speedup:         {time_single/time_multi:.2f}x")

# Get performance metrics
if ic_model._performance_monitor:
    summary = ic_model._performance_monitor.get_summary()
    print(f"\n📊 Performance Metrics: {summary}")

📚 Documentation

📖 Module Documentation

Module Description Documentation
🌐 Network Single and multi-layer network classes Network Module
🔬 Diffusion Models Information diffusion simulation Diffusion Models
🧮 Algorithms Influence maximization algorithms Algorithms
📈 Evaluation Performance evaluation framework Evaluation
📊 Dataset Dataset management and loading Dataset
⚙️ Configuration Configuration management Configuration
🛠️ Utils Utility functions and helpers Utilities
⚠️ Error Handling Exception handling guide Error Handling
Performance Performance optimization guide Performance

📚 Getting Started Guides

📊 Reference Documentation


💡 Examples

🌐 Example 1: Network Creation and Analysis

from PyIM import SLN, MLN
from PyIM.network import load_networks

# Create a single layer network
edges = [(1, 2), (2, 3), (3, 4), (4, 1), (5, 6)]
sln = SLN("social_network", "undirected", edges=edges)

# Analyze network
stats = sln.get_network_statistics()
print(f"📊 Network Analysis:")
print(f"   Name: {stats['name']}")
print(f"   Nodes: {stats['nodes']}")
print(f"   Edges: {stats['edges']}")
print(f"   Density: {stats['density']:.4f}")
print(f"   Average Degree: {stats['avg_degree']:.2f}")

# Create a multi-layer network
layer1_edges = [(1, 2), (2, 3), (3, 4)]
layer2_edges = [(1, 3), (3, 5), (5, 2)]
layer3_edges = [(2, 4), (4, 5), (5, 1)]
edges_of_layers = [layer1_edges, layer2_edges, layer3_edges]
mln = MLN("multiplex_network", "undirected", edges_of_layers=edges_of_layers)

print(f"\n🌐 Multi-Layer Network:")
print(f"   Layers: {mln.number_of_layers()}")
print(f"   Total Nodes: {mln.number_of_nodes()}")
print(f"   Total Edges: {mln.number_of_edges()}")

# Load multiple networks from specifications
network_specs = [
    {
        'name': 'network1',
        'type': 'SLN',
        'directionality': 'undirected',
        'edges': [(1, 2), (2, 3), (3, 4)]
    },
    {
        'name': 'network2',
        'type': 'SLN',
        'directionality': 'directed',
        'edges': [(1, 2), (2, 3), (3, 1)]
    }
]

networks = load_networks(network_specs)
print(f"\n📦 Loaded {len(networks)} networks:")
for net in networks:
    print(f"   - {net.name}: {net.number_of_nodes()} nodes, {net.number_of_edges()} edges")

🔬 Example 2: Diffusion Model Comparison

from PyIM import SLN
from PyIM.diffusionModel import IC, LT, SIR, ICWeighter, LTWeighter, SIRWeighter
import matplotlib.pyplot as plt

# Create network
edges = [(i, i+1) for i in range(1, 20)] + [(1, 10), (10, 20)]
network = SLN("test_network", "undirected", edges=edges)

# Set weights for different models
ic_weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
ic_weighter(network)

lt_weighter = LTWeighter(weighting_type="uniform", threshold=0.3)
lt_weighter(network)

sir_weighter = SIRWeighter(weighting_type="uniform", infection_prob=0.1, recovery_prob=0.05)
sir_weighter(network)

# Create models
ic_model = IC(MC=1000, verbose=False)
lt_model = LT(MC=1000, verbose=False)
sir_model = SIR(MC=1000, verbose=False)

# Test different seed sets
seed_sets = [[1], [1, 5], [1, 5, 10], [1, 5, 10, 15]]

print("📊 Diffusion Model Comparison:")
print("-" * 60)

for seeds in seed_sets:
    ic_result = ic_model(network, seeds)
    lt_result = lt_model(network, seeds)
    sir_result = sir_model(network, seeds)

    print(f"Seeds {seeds}:")
    print(f"  IC:  {ic_result:.2f} activated nodes")
    print(f"  LT:  {lt_result:.2f} activated nodes")
    print(f"  SIR: {sir_result:.2f} activated nodes")
    print()

# Plot results (optional)
# plt.figure(figsize=(10, 6))
# plt.plot(range(len(seed_sets)), [ic_model(network, s) for s in seed_sets], 'o-', label='IC')
# plt.plot(range(len(seed_sets)), [lt_model(network, s) for s in seed_sets], 's-', label='LT')
# plt.plot(range(len(seed_sets)), [sir_model(network, s) for s in seed_sets], '^-', label='SIR')
# plt.xlabel('Seed Set Size')
# plt.ylabel('Influence Spread')
# plt.title('Diffusion Model Comparison')
# plt.legend()
# plt.grid(True)
# plt.show()

🧮 Example 3: Algorithm Comparison

from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy, CELF, CentralitySeedSelector
import time

# Create network
edges = [(i, i+1) for i in range(1, 50)] + [(1, 25), (25, 50)]
network = SLN("algorithm_test", "undirected", edges=edges)

# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
weighter(network)

# Create diffusion model
ic_model = IC(MC=1000, verbose=False)

# Create algorithms
algorithms = {
    'Greedy': Greedy(ic_model, verbose=False),
    'CELF': CELF(ic_model, verbose=False),
    'Degree': CentralitySeedSelector(centrality_type='degree'),
    'Betweenness': CentralitySeedSelector(centrality_type='betweenness'),
    'Closeness': CentralitySeedSelector(centrality_type='closeness')
}

# Test different k values
k_values = [5, 10, 15, 20]

print("🧮 Algorithm Performance Comparison")
print("=" * 70)

for k in k_values:
    print(f"\n📊 k = {k}:")
    print("-" * 70)
    
    results = {}
    times = {}
    
    for name, algorithm in algorithms.items():
        start = time.time()
        seeds = algorithm(network, k=k)
        elapsed = time.time() - start
        
        # Evaluate influence
        influence = ic_model(network, seeds)
        
        results[name] = influence
        times[name] = elapsed
        
        print(f"{name:15s}: Influence = {influence:6.2f}, Time = {elapsed:6.3f}s")
    
    # Find best algorithm
    best_algorithm = max(results, key=results.get)
    fastest_algorithm = min(times, key=times.get)
    
    print(f"\n🏆 Best Influence: {best_algorithm} ({results[best_algorithm]:.2f})")
    print(f"⚡ Fastest: {fastest_algorithm} ({times[fastest_algorithm]:.3f}s)")

📈 Example 4: Comprehensive Evaluation

from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy, CentralitySeedSelector
from PyIM.evaluation import Evaluation
from PyIM.evaluation.metrics import InfluenceSpread, Distance, ClusteringCoefficient

# Create multiple networks
networks = [
    SLN("network1", "undirected", edges=[(i, i+1) for i in range(1, 20)]),
    SLN("network2", "undirected", edges=[(i, i+1) for i in range(1, 30)]),
    SLN("network3", "undirected", edges=[(i, i+1) for i in range(1, 40)])
]

# Set weights for all networks
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
for network in networks:
    weighter(network)

# Create diffusion model
ic_model = IC(MC=1000, verbose=False)

# Create metrics
metrics = [
    InfluenceSpread(ic_model),
    Distance(),
    ClusteringCoefficient()
]

# Create algorithms
algorithms = [
    Greedy(ic_model, verbose=False),
    CentralitySeedSelector(centrality_type='degree')
]

# Create evaluation framework
evaluator = Evaluation(
    networks=networks,
    metrics=metrics,
    algorithms=algorithms,
    k_range=[5, 10, 15],
    verbose=True
)

# Run evaluation
print("📈 Running Comprehensive Evaluation...")
print("=" * 70)
results = evaluator.run_evaluation()

# Display results
print("\n📊 Evaluation Results:")
print("=" * 70)

for metric_name, metric_results in results.items():
    print(f"\n📏 {metric_name}:")
    for network_name, network_results in metric_results.items():
        print(f"   {network_name}:")
        for algo_name, algo_results in network_results.items():
            print(f"      {algo_name}: {algo_results}")

🔧 Configuration

⚙️ Environment Variables

Configure PyIM using environment variables:

# Data directories
export PYIM_DATA_DIR="/path/to/data"
export PYIM_DOWNLOAD_DIR="/path/to/downloads"
export PYIM_TEMP_DIR="/path/to/temp"

# Logging
export PYIM_LOG_LEVEL="INFO"
export PYIM_LOG_FORMAT="%(asctime)s - %(name)s - %(levelname)s - %(message)s"

# Performance
export PYIM_MAX_WORKERS="4"
export PYIM_CACHE_ENABLED="true"

📝 Configuration File

Create a config.json file:

{
  "data_dir": "/path/to/data",
  "download_dir": "/path/to/downloads",
  "temp_dir": "/path/to/temp",
  "log_level": "INFO",
  "max_workers": 4,
  "cache_enabled": true,
  "validation_strict": false
}

Load configuration:

from PyIM.config import PyIMConfig

# Load from file
config = PyIMConfig.from_file("config.json")

# Or use global configuration
from PyIM.config import get_config
config = get_config()

# Access configuration
print(f"Data directory: {config.data_dir}")
print(f"Log level: {config.log_level}")
print(f"Max workers: {config.max_workers}")

⚡ Performance

🚀 Multiprocessing

All diffusion models support multiprocessing for improved performance:

from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter

# Create network
network = SLN("large_network", "undirected", edges=[(i, i+1) for i in range(1, 1000)])

# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.01)
weighter(network)

# Create model
ic_model = IC(MC=10000, verbose=True)

# Use multiprocessing (recommended for large networks)
result = ic_model(network, seed_nodes=[1, 2, 3], multiprocess=True, n_processes=4)

# Or use single process (for small networks)
result = ic_model(network, seed_nodes=[1, 2, 3], multiprocess=False)

📊 Performance Monitoring

Enable performance monitoring to track execution time:

from PyIM.diffusionModel import IC

# Create model with monitoring enabled
ic_model = IC(MC=1000, enable_monitoring=True, verbose=False)

# Run simulation
result = ic_model(network, seed_nodes=[1, 2, 3])

# Get performance metrics
if ic_model._performance_monitor:
    summary = ic_model._performance_monitor.get_summary()
    print(f"Performance: {summary}")

💡 Performance Tips

  1. Use multiprocessing for large networks (100+ nodes)
  2. Adjust MC parameter based on accuracy requirements
  3. Choose appropriate diffusion model for your application
  4. Use CELF for faster greedy approximation
  5. Enable performance monitoring for optimization

🧪 Testing

🏃 Running Tests

Run the comprehensive test suite:

# Run all tests
python tests/test_all_modules.py

# Or use the test runner
python run_tests.py

📊 Test Coverage

The test suite includes:

  • Network Module (8 tests): Creation, operations, statistics
  • Dataset Module (3 tests): Loading, management, error handling
  • Diffusion Models (3 tests): IC model, multiprocessing
  • Algorithm Module (2 tests): Greedy, Centrality
  • Error Handling (4 tests): All exception types
  • Configuration (2 tests): Config loading, constants
  • Integration Tests (1 test): Complete workflows

Total: 23 tests with 100% success rate

🔍 Running Specific Tests

# Run specific test class
python -m unittest tests.test_all_modules.TestNetworkModule

# Run specific test method
python -m unittest tests.test_all_modules.TestNetworkModule.test_sln_creation

# Run with verbose output
python -m unittest tests.test_all_modules -v

🤝 Contributing

We welcome contributions to PyIM! Please follow these guidelines:

📝 Code Style

  • Follow PEP 8 style guidelines
  • Use meaningful variable and function names
  • Add docstrings to all public functions and classes
  • Include type hints where appropriate

🧪 Testing

  • Write unit tests for new features
  • Ensure all tests pass before submitting
  • Test on multiple Python versions (3.7+)
  • Include edge cases and error conditions

📚 Documentation

  • Update documentation for new features
  • Include usage examples
  • Maintain API documentation
  • Update CHANGELOG.md

🚀 Pull Requests

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with clear commit messages
  4. Submit a pull request with description

📝 Citation

If you use PyIM in your research, please cite:

@software{pyim2024,
  title = {PyIM: Python Influence Maximization Package},
  author = {PyIM Team},
  year = {2024},
  version = {1.0.0},
  url = {https://github.com/yourusername/PyIM}
}

📞 Support

📖 Documentation

💬 Community

📧 Contact


📜 License

PyIM is licensed under the MIT License. See LICENSE file for details.

Made with ❤️ by the PyIM Team

GitHub License Python

About

Python Influence Maximization Package - A comprehensive toolkit for influence maximization, network analysis, and information diffusion modeling

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages