A Comprehensive Python Package for Influence Maximization, Network Analysis, and Information Diffusion Modeling
Installation • Quick Start • Documentation • Examples • API Reference
- 🌟 Overview
- ✨ Features
- 📦 Installation
- 🚀 Quick Start
- 📚 Documentation
- 💡 Examples
- 🔧 Configuration
- ⚡ Performance
- 🧪 Testing
- 🤝 Contributing
- 📝 Citation
- 📞 Support
PyIM is a comprehensive Python package designed for researchers and practitioners working on influence maximization, network analysis, and information diffusion modeling in complex networks. It provides state-of-the-art algorithms, multiple diffusion models, and comprehensive evaluation metrics for studying information spread in social networks, biological networks, and other complex systems.
- 🌐 Network Support: Single and multi-layer networks with full NetworkX integration
- 🔬 7 Diffusion Models: IC, LT, SI, SIR, SIS, CIC, MLIC with multiprocessing
- 🧮 5+ Algorithms: Greedy, CELF, Centrality-based, SAW_ASA, SFLA
- 📈 Evaluation Framework: Comprehensive metrics and benchmarking tools
- ⚡ Performance Optimized: Multiprocessing support for large-scale networks
- 🛡️ Robust: Comprehensive error handling and validation
- Social Network Analysis: Identify influential users for viral marketing
- Epidemiology: Model disease spread and identify key intervention points
- Recommendation Systems: Optimize information dissemination
- Security: Detect critical nodes for network protection
- Biological Networks: Study protein interactions and gene regulation
- Single Layer Networks (SLN): Standard graph structures
- Multi-Layer Networks (MLN): Multiplex and interconnected networks
- Full NetworkX Integration: Seamless compatibility with NetworkX ecosystem
- Network Statistics: Comprehensive analysis and metrics
- Flexible Operations: Node/edge manipulation, attribute management
📖 Network Module Documentation
- Independent Cascade (IC): Classic probabilistic diffusion model
- Linear Threshold (LT): Threshold-based activation model
- Susceptible-Infected (SI): Simple epidemic model
- Susceptible-Infected-Recovered (SIR): Epidemic model with recovery
- Susceptible-Infected-Susceptible (SIS): Endemic disease model
- Competitive Independent Cascade (CIC): Multi-competitor diffusion
- Multi-Layer Independent Cascade (MLIC): Cross-layer diffusion
📖 Diffusion Models Documentation
- Greedy Algorithm: Classic greedy approach with (1-1/e) approximation
- Cost-Effective Lazy Forward (CELF): Optimized greedy with lazy evaluation
- Centrality-Based Selection: Degree, betweenness, closeness, eigenvector
- Simulated Annealing (SAW_ASA): Metaheuristic optimization
- Shuffled Frog Leaping (SFLA): Population-based optimization
- Influence Spread Metrics: Direct diffusion simulation
- Structural Metrics: Distance, clustering, connectivity
- Comparative Metrics: Kendall Tau, Jaccard, Spearman
- Batch Evaluation: Systematic multi-algorithm comparison
- Performance Benchmarking: Timing and memory profiling
- Multiprocessing Support: All diffusion models support parallel execution
- Performance Monitoring: Built-in timing and profiling
- Memory Efficient: Optimized for large-scale networks
- Scalable: Tested on networks with 10,000+ nodes
- Flexible Configuration: Tune performance for your hardware
- Comprehensive Exception Hierarchy: Clear error categorization
- Informative Error Messages: Easy debugging and troubleshooting
- Input Validation: Prevent common mistakes
- Consistent Patterns: Uniform error handling across modules
- Python: 3.7 or higher
- pip: Python package manager
- Operating System: Windows, macOS, or Linux
pip install PyIM# Clone the repository
git clone https://github.com/yourusername/PyIM.git
cd PyIM
# Install dependencies
pip install -r PyIM/requirements.txt
# Install the package
pip install -e .pip install networkx>=2.8.0
pip install numpy>=1.21.0
pip install rich>=12.0.0
pip install psutil>=5.9.0Verify installation by running:
import PyIM
print(f"PyIM version: {PyIM.__version__}")
print(f"PyIM author: {PyIM.__author__}")Expected output:
PyIM version: 1.0.0
PyIM author: PyIM Team
from PyIM import SLN, IC, Greedy
from PyIM.diffusionModel import ICWeighter
# Step 1: Create a network
edges = [(1, 2), (2, 3), (3, 4), (4, 1), (5, 6)]
network = SLN("example_network", "undirected", edges=edges)
# Step 2: Set edge weights for diffusion
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
weighter(network)
# Step 3: Create diffusion model
ic_model = IC(MC=1000, verbose=False)
# Step 4: Create influence maximization algorithm
greedy = Greedy(ic_model, verbose=False)
# Step 5: Find optimal seed set
seed_nodes = greedy(network, k=3)
# Step 6: Evaluate influence spread
influence = ic_model(network, seed_nodes)
# Step 7: Display results
print(f"📊 Network Statistics:")
print(f" Nodes: {network.number_of_nodes()}")
print(f" Edges: {network.number_of_edges()}")
print(f" Density: {network.get_network_statistics()['density']:.4f}")
print(f"\n🎯 Influence Maximization Results:")
print(f" Seed nodes: {seed_nodes}")
print(f" Influence spread: {influence:.2f}")
print(f" Coverage: {influence/network.number_of_nodes()*100:.1f}%")Expected Output:
📊 Network Statistics:
Nodes: 6
Edges: 5
Density: 0.3333
🎯 Influence Maximization Results:
Seed nodes: [1, 2, 3]
Influence spread: 2.45
Coverage: 40.8%
from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy
import time
# Create a larger network
edges = [(i, i+1) for i in range(1, 100)] + [(1, 50), (50, 100)]
network = SLN("large_network", "undirected", edges=edges)
# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.05)
weighter(network)
# Create model with multiprocessing
ic_model = IC(MC=10000, verbose=True, enable_monitoring=True)
# Test single process
start = time.time()
result_single = ic_model(network, [1, 2, 3], multiprocess=False)
time_single = time.time() - start
# Test multiprocessing
start = time.time()
result_multi = ic_model(network, [1, 2, 3], multiprocess=True, n_processes=4)
time_multi = time.time() - start
# Compare results
print(f"\n⚡ Performance Comparison:")
print(f" Single process: {time_single:.2f}s, Influence: {result_single:.2f}")
print(f" Multiprocess: {time_multi:.2f}s, Influence: {result_multi:.2f}")
print(f" Speedup: {time_single/time_multi:.2f}x")
# Get performance metrics
if ic_model._performance_monitor:
summary = ic_model._performance_monitor.get_summary()
print(f"\n📊 Performance Metrics: {summary}")| Module | Description | Documentation |
|---|---|---|
| 🌐 Network | Single and multi-layer network classes | Network Module |
| 🔬 Diffusion Models | Information diffusion simulation | Diffusion Models |
| 🧮 Algorithms | Influence maximization algorithms | Algorithms |
| 📈 Evaluation | Performance evaluation framework | Evaluation |
| 📊 Dataset | Dataset management and loading | Dataset |
| ⚙️ Configuration | Configuration management | Configuration |
| 🛠️ Utils | Utility functions and helpers | Utilities |
| Exception handling guide | Error Handling | |
| ⚡ Performance | Performance optimization guide | Performance |
- 🚀 Quick Start Guide - Get up and running in 5 minutes
- 📖 Tutorial Series - Step-by-step tutorials
- 💡 Best Practices - Recommended patterns and approaches
- 🔧 Advanced Usage - Advanced features and techniques
- 📋 API Reference - Complete API documentation
- 📝 Examples Gallery - Collection of usage examples
- 🧪 Testing Guide - How to run and write tests
- 🤝 Contributing Guide - Contribution guidelines
from PyIM import SLN, MLN
from PyIM.network import load_networks
# Create a single layer network
edges = [(1, 2), (2, 3), (3, 4), (4, 1), (5, 6)]
sln = SLN("social_network", "undirected", edges=edges)
# Analyze network
stats = sln.get_network_statistics()
print(f"📊 Network Analysis:")
print(f" Name: {stats['name']}")
print(f" Nodes: {stats['nodes']}")
print(f" Edges: {stats['edges']}")
print(f" Density: {stats['density']:.4f}")
print(f" Average Degree: {stats['avg_degree']:.2f}")
# Create a multi-layer network
layer1_edges = [(1, 2), (2, 3), (3, 4)]
layer2_edges = [(1, 3), (3, 5), (5, 2)]
layer3_edges = [(2, 4), (4, 5), (5, 1)]
edges_of_layers = [layer1_edges, layer2_edges, layer3_edges]
mln = MLN("multiplex_network", "undirected", edges_of_layers=edges_of_layers)
print(f"\n🌐 Multi-Layer Network:")
print(f" Layers: {mln.number_of_layers()}")
print(f" Total Nodes: {mln.number_of_nodes()}")
print(f" Total Edges: {mln.number_of_edges()}")
# Load multiple networks from specifications
network_specs = [
{
'name': 'network1',
'type': 'SLN',
'directionality': 'undirected',
'edges': [(1, 2), (2, 3), (3, 4)]
},
{
'name': 'network2',
'type': 'SLN',
'directionality': 'directed',
'edges': [(1, 2), (2, 3), (3, 1)]
}
]
networks = load_networks(network_specs)
print(f"\n📦 Loaded {len(networks)} networks:")
for net in networks:
print(f" - {net.name}: {net.number_of_nodes()} nodes, {net.number_of_edges()} edges")from PyIM import SLN
from PyIM.diffusionModel import IC, LT, SIR, ICWeighter, LTWeighter, SIRWeighter
import matplotlib.pyplot as plt
# Create network
edges = [(i, i+1) for i in range(1, 20)] + [(1, 10), (10, 20)]
network = SLN("test_network", "undirected", edges=edges)
# Set weights for different models
ic_weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
ic_weighter(network)
lt_weighter = LTWeighter(weighting_type="uniform", threshold=0.3)
lt_weighter(network)
sir_weighter = SIRWeighter(weighting_type="uniform", infection_prob=0.1, recovery_prob=0.05)
sir_weighter(network)
# Create models
ic_model = IC(MC=1000, verbose=False)
lt_model = LT(MC=1000, verbose=False)
sir_model = SIR(MC=1000, verbose=False)
# Test different seed sets
seed_sets = [[1], [1, 5], [1, 5, 10], [1, 5, 10, 15]]
print("📊 Diffusion Model Comparison:")
print("-" * 60)
for seeds in seed_sets:
ic_result = ic_model(network, seeds)
lt_result = lt_model(network, seeds)
sir_result = sir_model(network, seeds)
print(f"Seeds {seeds}:")
print(f" IC: {ic_result:.2f} activated nodes")
print(f" LT: {lt_result:.2f} activated nodes")
print(f" SIR: {sir_result:.2f} activated nodes")
print()
# Plot results (optional)
# plt.figure(figsize=(10, 6))
# plt.plot(range(len(seed_sets)), [ic_model(network, s) for s in seed_sets], 'o-', label='IC')
# plt.plot(range(len(seed_sets)), [lt_model(network, s) for s in seed_sets], 's-', label='LT')
# plt.plot(range(len(seed_sets)), [sir_model(network, s) for s in seed_sets], '^-', label='SIR')
# plt.xlabel('Seed Set Size')
# plt.ylabel('Influence Spread')
# plt.title('Diffusion Model Comparison')
# plt.legend()
# plt.grid(True)
# plt.show()from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy, CELF, CentralitySeedSelector
import time
# Create network
edges = [(i, i+1) for i in range(1, 50)] + [(1, 25), (25, 50)]
network = SLN("algorithm_test", "undirected", edges=edges)
# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
weighter(network)
# Create diffusion model
ic_model = IC(MC=1000, verbose=False)
# Create algorithms
algorithms = {
'Greedy': Greedy(ic_model, verbose=False),
'CELF': CELF(ic_model, verbose=False),
'Degree': CentralitySeedSelector(centrality_type='degree'),
'Betweenness': CentralitySeedSelector(centrality_type='betweenness'),
'Closeness': CentralitySeedSelector(centrality_type='closeness')
}
# Test different k values
k_values = [5, 10, 15, 20]
print("🧮 Algorithm Performance Comparison")
print("=" * 70)
for k in k_values:
print(f"\n📊 k = {k}:")
print("-" * 70)
results = {}
times = {}
for name, algorithm in algorithms.items():
start = time.time()
seeds = algorithm(network, k=k)
elapsed = time.time() - start
# Evaluate influence
influence = ic_model(network, seeds)
results[name] = influence
times[name] = elapsed
print(f"{name:15s}: Influence = {influence:6.2f}, Time = {elapsed:6.3f}s")
# Find best algorithm
best_algorithm = max(results, key=results.get)
fastest_algorithm = min(times, key=times.get)
print(f"\n🏆 Best Influence: {best_algorithm} ({results[best_algorithm]:.2f})")
print(f"⚡ Fastest: {fastest_algorithm} ({times[fastest_algorithm]:.3f}s)")from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
from PyIM.algorithm import Greedy, CentralitySeedSelector
from PyIM.evaluation import Evaluation
from PyIM.evaluation.metrics import InfluenceSpread, Distance, ClusteringCoefficient
# Create multiple networks
networks = [
SLN("network1", "undirected", edges=[(i, i+1) for i in range(1, 20)]),
SLN("network2", "undirected", edges=[(i, i+1) for i in range(1, 30)]),
SLN("network3", "undirected", edges=[(i, i+1) for i in range(1, 40)])
]
# Set weights for all networks
weighter = ICWeighter(weighting_type="uniform", active_probability=0.1)
for network in networks:
weighter(network)
# Create diffusion model
ic_model = IC(MC=1000, verbose=False)
# Create metrics
metrics = [
InfluenceSpread(ic_model),
Distance(),
ClusteringCoefficient()
]
# Create algorithms
algorithms = [
Greedy(ic_model, verbose=False),
CentralitySeedSelector(centrality_type='degree')
]
# Create evaluation framework
evaluator = Evaluation(
networks=networks,
metrics=metrics,
algorithms=algorithms,
k_range=[5, 10, 15],
verbose=True
)
# Run evaluation
print("📈 Running Comprehensive Evaluation...")
print("=" * 70)
results = evaluator.run_evaluation()
# Display results
print("\n📊 Evaluation Results:")
print("=" * 70)
for metric_name, metric_results in results.items():
print(f"\n📏 {metric_name}:")
for network_name, network_results in metric_results.items():
print(f" {network_name}:")
for algo_name, algo_results in network_results.items():
print(f" {algo_name}: {algo_results}")Configure PyIM using environment variables:
# Data directories
export PYIM_DATA_DIR="/path/to/data"
export PYIM_DOWNLOAD_DIR="/path/to/downloads"
export PYIM_TEMP_DIR="/path/to/temp"
# Logging
export PYIM_LOG_LEVEL="INFO"
export PYIM_LOG_FORMAT="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# Performance
export PYIM_MAX_WORKERS="4"
export PYIM_CACHE_ENABLED="true"Create a config.json file:
{
"data_dir": "/path/to/data",
"download_dir": "/path/to/downloads",
"temp_dir": "/path/to/temp",
"log_level": "INFO",
"max_workers": 4,
"cache_enabled": true,
"validation_strict": false
}Load configuration:
from PyIM.config import PyIMConfig
# Load from file
config = PyIMConfig.from_file("config.json")
# Or use global configuration
from PyIM.config import get_config
config = get_config()
# Access configuration
print(f"Data directory: {config.data_dir}")
print(f"Log level: {config.log_level}")
print(f"Max workers: {config.max_workers}")All diffusion models support multiprocessing for improved performance:
from PyIM import SLN
from PyIM.diffusionModel import IC, ICWeighter
# Create network
network = SLN("large_network", "undirected", edges=[(i, i+1) for i in range(1, 1000)])
# Set weights
weighter = ICWeighter(weighting_type="uniform", active_probability=0.01)
weighter(network)
# Create model
ic_model = IC(MC=10000, verbose=True)
# Use multiprocessing (recommended for large networks)
result = ic_model(network, seed_nodes=[1, 2, 3], multiprocess=True, n_processes=4)
# Or use single process (for small networks)
result = ic_model(network, seed_nodes=[1, 2, 3], multiprocess=False)Enable performance monitoring to track execution time:
from PyIM.diffusionModel import IC
# Create model with monitoring enabled
ic_model = IC(MC=1000, enable_monitoring=True, verbose=False)
# Run simulation
result = ic_model(network, seed_nodes=[1, 2, 3])
# Get performance metrics
if ic_model._performance_monitor:
summary = ic_model._performance_monitor.get_summary()
print(f"Performance: {summary}")- Use multiprocessing for large networks (100+ nodes)
- Adjust MC parameter based on accuracy requirements
- Choose appropriate diffusion model for your application
- Use CELF for faster greedy approximation
- Enable performance monitoring for optimization
Run the comprehensive test suite:
# Run all tests
python tests/test_all_modules.py
# Or use the test runner
python run_tests.pyThe test suite includes:
- ✅ Network Module (8 tests): Creation, operations, statistics
- ✅ Dataset Module (3 tests): Loading, management, error handling
- ✅ Diffusion Models (3 tests): IC model, multiprocessing
- ✅ Algorithm Module (2 tests): Greedy, Centrality
- ✅ Error Handling (4 tests): All exception types
- ✅ Configuration (2 tests): Config loading, constants
- ✅ Integration Tests (1 test): Complete workflows
Total: 23 tests with 100% success rate
# Run specific test class
python -m unittest tests.test_all_modules.TestNetworkModule
# Run specific test method
python -m unittest tests.test_all_modules.TestNetworkModule.test_sln_creation
# Run with verbose output
python -m unittest tests.test_all_modules -vWe welcome contributions to PyIM! Please follow these guidelines:
- Follow PEP 8 style guidelines
- Use meaningful variable and function names
- Add docstrings to all public functions and classes
- Include type hints where appropriate
- Write unit tests for new features
- Ensure all tests pass before submitting
- Test on multiple Python versions (3.7+)
- Include edge cases and error conditions
- Update documentation for new features
- Include usage examples
- Maintain API documentation
- Update CHANGELOG.md
- Fork the repository
- Create a feature branch
- Make your changes with clear commit messages
- Submit a pull request with description
If you use PyIM in your research, please cite:
@software{pyim2024,
title = {PyIM: Python Influence Maximization Package},
author = {PyIM Team},
year = {2024},
version = {1.0.0},
url = {https://github.com/yourusername/PyIM}
}- Main Documentation: See individual module documentation
- API Reference: API Reference
- Examples: Examples Gallery
- Tutorials: Tutorial Series
- Issues: Report bugs and request features via GitHub Issues
- Discussions: Join our GitHub Discussions
- Wiki: Check our Wiki for additional resources
- Email: support@pyim.org
- Website: https://pyim.org
- GitHub: https://github.com/yourusername/PyIM
PyIM is licensed under the MIT License. See LICENSE file for details.