# TOPSIS: Technique for Order Preference by Similarity to Ideal Solution

**Author:** Vani Goyal  
**Roll Number:** 102303078  
**Email:** vgoyal_be23@thapar.edu

---

## What is TOPSIS?

TOPSIS is a multi-criteria decision analysis method that identifies the best alternative by:
1. Calculating the geometric distance from the **ideal solution** (best possible)
2. Calculating the geometric distance from the **negative-ideal solution** (worst possible)
3. Choosing the alternative closest to the ideal and farthest from the negative-ideal

## Algorithm Steps

1. **Normalization**: Convert decision matrix to comparable scales
2. **Weighted Normalization**: Apply criterion weights
3. **Ideal Solutions**: Identify ideal best and worst
4. **Distance Calculation**: Calculate Euclidean distances
5. **TOPSIS Score**: Calculate closeness coefficient
6. **Ranking**: Rank alternatives

In [None]:
# Install required packages
!pip install pandas numpy matplotlib seaborn openpyxl -q

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set styling
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (12, 6)

## Step 1: Load and Explore Data

In [None]:
# Upload your data.xlsx or data.csv file here
# For Colab, use:
# from google.colab import files
# uploaded = files.upload()

# Load data
df = pd.read_csv('data.csv')  # or pd.read_excel('data.xlsx')
print("Original Data:")
print(df)
print(f"\nShape: {df.shape}")
print(f"\nData Types:\n{df.dtypes}")

## Step 2: TOPSIS Implementation

### Mathematical Formulation

Given:
- Decision matrix $X$ with $m$ alternatives and $n$ criteria
- Weight vector $W = [w_1, w_2, ..., w_n]$
- Impact vector $I = [i_1, i_2, ..., i_n]$ where $i_j \in \{+, -\}$

#### Step 2.1: Normalization

$$r_{ij} = \frac{x_{ij}}{\sqrt{\sum_{k=1}^{m} x_{kj}^2}}$$

#### Step 2.2: Weighted Normalized Matrix

$$v_{ij} = w_j \times r_{ij}$$

#### Step 2.3: Ideal Solutions

$$V^+ = \{v_1^+, v_2^+, ..., v_n^+\}$$
$$V^- = \{v_1^-, v_2^-, ..., v_n^-\}$$

where:
- $v_j^+ = \max(v_{ij})$ if benefit criterion, $\min(v_{ij})$ if cost criterion
- $v_j^- = \min(v_{ij})$ if benefit criterion, $\max(v_{ij})$ if cost criterion

#### Step 2.4: Euclidean Distance

$$S_i^+ = \sqrt{\sum_{j=1}^{n}(v_{ij} - v_j^+)^2}$$
$$S_i^- = \sqrt{\sum_{j=1}^{n}(v_{ij} - v_j^-)^2}$$

#### Step 2.5: TOPSIS Score

$$C_i = \frac{S_i^-}{S_i^+ + S_i^-}$$

Higher $C_i$ indicates better alternative.

In [None]:
def topsis_detailed(df, weights, impacts):
    """
    Perform TOPSIS with detailed step-by-step output
    
    Parameters:
    -----------
    df : DataFrame with first column as names, rest as criteria
    weights : list of weights for each criterion
    impacts : list of '+' or '-' for each criterion
    
    Returns:
    --------
    dict with all intermediate steps and final results
    """
    results = {}
    
    # Extract data
    names = df.iloc[:, 0].values
    data = df.iloc[:, 1:].values.astype(float)
    results['original_data'] = pd.DataFrame(data, columns=df.columns[1:], index=names)
    
    # Step 1: Normalization
    normalized = data / np.sqrt((data ** 2).sum(axis=0))
    results['normalized'] = pd.DataFrame(normalized, columns=df.columns[1:], index=names)
    
    # Step 2: Weighted normalized matrix
    weighted = normalized * np.array(weights)
    results['weighted'] = pd.DataFrame(weighted, columns=df.columns[1:], index=names)
    
    # Step 3: Ideal best and worst
    ideal_best = []
    ideal_worst = []
    
    for j, impact in enumerate(impacts):
        if impact == '+':
            ideal_best.append(weighted[:, j].max())
            ideal_worst.append(weighted[:, j].min())
        else:
            ideal_best.append(weighted[:, j].min())
            ideal_worst.append(weighted[:, j].max())
    
    results['ideal_best'] = pd.Series(ideal_best, index=df.columns[1:])
    results['ideal_worst'] = pd.Series(ideal_worst, index=df.columns[1:])
    
    # Step 4: Distances
    dist_best = np.sqrt(((weighted - ideal_best) ** 2).sum(axis=1))
    dist_worst = np.sqrt(((weighted - ideal_worst) ** 2).sum(axis=1))
    
    results['distances'] = pd.DataFrame({
        'Distance from Best': dist_best,
        'Distance from Worst': dist_worst
    }, index=names)
    
    # Step 5: TOPSIS Score
    scores = dist_worst / (dist_best + dist_worst)
    results['scores'] = pd.Series(scores, index=names)
    
    # Step 6: Ranking
    ranks = scores.argsort()[::-1].argsort() + 1
    results['ranks'] = pd.Series(ranks, index=names)
    
    # Final result
    final = df.copy()
    final['Topsis Score'] = (scores * 100).round(2)
    final['Rank'] = ranks
    results['final'] = final
    
    return results

In [None]:
# Define parameters
weights = [1, 1, 1, 1, 1]  # Equal weights
impacts = ['+', '+', '-', '+', '+']  # + means maximize, - means minimize

print("Parameters:")
print(f"Weights: {weights}")
print(f"Impacts: {impacts}")
print(f"\nInterpretation:")
for i, col in enumerate(df.columns[1:]):
    impact_text = "maximize (higher is better)" if impacts[i] == '+' else "minimize (lower is better)"
    print(f"  {col}: Weight={weights[i]}, {impact_text}")

In [None]:
# Run TOPSIS
results = topsis_detailed(df, weights, impacts)

print("="*80)
print("TOPSIS ANALYSIS - DETAILED RESULTS")
print("="*80)

## Step 3: View Intermediate Results

In [None]:
print("\n1. NORMALIZED MATRIX")
print("="*80)
print(results['normalized'].round(4))

In [None]:
print("\n2. WEIGHTED NORMALIZED MATRIX")
print("="*80)
print(results['weighted'].round(4))

In [None]:
print("\n3. IDEAL SOLUTIONS")
print("="*80)
print("\nIdeal Best (V+):")
print(results['ideal_best'].round(4))
print("\nIdeal Worst (V-):")
print(results['ideal_worst'].round(4))

In [None]:
print("\n4. EUCLIDEAN DISTANCES")
print("="*80)
print(results['distances'].round(4))

## Step 4: Final Results

In [None]:
print("\n5. FINAL TOPSIS SCORES AND RANKINGS")
print("="*80)
print(results['final'].sort_values('Rank'))

## Step 5: Visualizations

In [None]:
# Visualization 1: TOPSIS Scores Bar Chart
fig, ax = plt.subplots(figsize=(12, 6))

sorted_data = results['final'].sort_values('Rank')
colors = plt.cm.RdYlGn(sorted_data['Topsis Score'] / 100)

bars = ax.barh(sorted_data.iloc[:, 0], sorted_data['Topsis Score'], color=colors)
ax.set_xlabel('TOPSIS Score', fontsize=12, fontweight='bold')
ax.set_ylabel('Alternatives', fontsize=12, fontweight='bold')
ax.set_title('TOPSIS Scores Comparison\n(Higher is Better)', fontsize=14, fontweight='bold')
ax.grid(axis='x', alpha=0.3)

# Add value labels
for i, (idx, row) in enumerate(sorted_data.iterrows()):
    ax.text(row['Topsis Score'] + 1, i, f"{row['Topsis Score']:.2f} (Rank {int(row['Rank'])})", 
            va='center', fontsize=10)

plt.tight_layout()
plt.show()

In [None]:
# Visualization 2: Distance Analysis
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: Distance comparison
results['distances'].plot(kind='bar', ax=axes[0], color=['#e74c3c', '#3498db'])
axes[0].set_title('Distance from Ideal Solutions', fontsize=12, fontweight='bold')
axes[0].set_xlabel('Alternatives', fontsize=10)
axes[0].set_ylabel('Euclidean Distance', fontsize=10)
axes[0].legend(['Distance from Best', 'Distance from Worst'])
axes[0].grid(axis='y', alpha=0.3)
axes[0].tick_params(axis='x', rotation=45)

# Right: Rank distribution
rank_counts = results['final']['Rank'].value_counts().sort_index()
axes[1].pie([1]*len(rank_counts), labels=results['final'].sort_values('Rank').iloc[:, 0],
           autopct='Rank %d', startangle=90, colors=plt.cm.Set3.colors)
axes[1].set_title('Ranking Distribution', fontsize=12, fontweight='bold')

plt.tight_layout()
plt.show()

In [None]:
# Visualization 3: Criteria Performance Heatmap
fig, ax = plt.subplots(figsize=(10, 6))

sns.heatmap(results['weighted'], annot=True, fmt='.3f', cmap='YlOrRd', 
            cbar_kws={'label': 'Weighted Score'}, ax=ax)
ax.set_title('Weighted Performance Heatmap\n(Higher values are better for + criteria, lower for - criteria)', 
            fontsize=12, fontweight='bold')
ax.set_xlabel('Criteria', fontsize=10)
ax.set_ylabel('Alternatives', fontsize=10)

plt.tight_layout()
plt.show()

In [None]:
# Visualization 4: Radar Chart for Top 3 Alternatives
from math import pi

top3 = results['final'].nsmallest(3, 'Rank')
categories = list(df.columns[1:])
N = len(categories)

angles = [n / float(N) * 2 * pi for n in range(N)]
angles += angles[:1]

fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(projection='polar'))

for idx, row in top3.iterrows():
    values = row[1:-2].tolist()  # Get criteria values
    values += values[:1]
    ax.plot(angles, values, 'o-', linewidth=2, label=f"{row.iloc[0]} (Rank {int(row['Rank'])})")
    ax.fill(angles, values, alpha=0.15)

ax.set_xticks(angles[:-1])
ax.set_xticklabels(categories)
ax.set_title('Performance Comparison: Top 3 Alternatives', size=14, fontweight='bold', pad=20)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
ax.grid(True)

plt.tight_layout()
plt.show()

## Step 6: Statistical Summary

In [None]:
print("\n" + "="*80)
print("STATISTICAL SUMMARY")
print("="*80)

print("\nTOPSIS Score Statistics:")
print(results['final']['Topsis Score'].describe())

print("\n\nBest Alternative:")
best = results['final'].loc[results['final']['Rank'] == 1]
print(f"  Name: {best.iloc[0, 0]}")
print(f"  Score: {best.iloc[0, -2]:.2f}")
print(f"  Rank: {int(best.iloc[0, -1])}")

print("\n\nWorst Alternative:")
worst = results['final'].loc[results['final']['Rank'] == len(df)]
print(f"  Name: {worst.iloc[0, 0]}")
print(f"  Score: {worst.iloc[0, -2]:.2f}")
print(f"  Rank: {int(worst.iloc[0, -1])}")

print("\n\nScore Range:")
print(f"  Maximum: {results['final']['Topsis Score'].max():.2f}")
print(f"  Minimum: {results['final']['Topsis Score'].min():.2f}")
print(f"  Range: {results['final']['Topsis Score'].max() - results['final']['Topsis Score'].min():.2f}")
print(f"  Mean: {results['final']['Topsis Score'].mean():.2f}")
print(f"  Std Dev: {results['final']['Topsis Score'].std():.2f}")

## Step 7: Save Results

In [None]:
# Save to CSV
results['final'].to_csv('result.csv', index=False)
print("Results saved to 'result.csv'")

# Download file (for Colab)
# from google.colab import files
# files.download('result.csv')

## Conclusion

### Key Findings:

1. **Best Alternative**: The alternative with the highest TOPSIS score is closest to the ideal solution
2. **Worst Alternative**: The alternative with the lowest TOPSIS score is farthest from the ideal solution
3. **Score Distribution**: Provides insights into how well alternatives are separated

### Methodology Summary:

- **Normalization** ensures all criteria are on comparable scales
- **Weighted scores** reflect the relative importance of each criterion
- **Distance measures** quantify similarity to ideal solutions
- **TOPSIS score** provides a comprehensive performance metric

### Advantages of TOPSIS:

1. Simple and intuitive methodology
2. Considers both ideal and anti-ideal solutions
3. Suitable for any number of alternatives and criteria
4. Provides clear ranking of alternatives
5. Computationally efficient

---

**Note**: This analysis can be customized by changing the weights and impacts according to your specific requirements.