# 3D Model Classification using PCA + Plane Fitting Method

This notebook implements and evaluates a geometric classification method for 3D models, distinguishing between **FLAT** and **FREE-FORM** models using:

- **Principal Component Analysis (PCA)** - Eigenvalue analysis for shape distribution
- **Plane Fitting Analysis** - SVD-based best-fit plane computation  
- **Geometric Features** - Z-dimension ratios and normalized distances

## Method Overview

The classification uses three geometric features with OR logic:
1. **Eigenvalue Ratio** < 0.01 → Nearly planar distribution
2. **Z-Ratio** < 0.03 → Flat in Z-direction  
3. **Normalized RMS** < 0.02 → Close to a plane

If any condition is met, the model is classified as **FLAT**, otherwise **FREE-FORM**.

## Implementation: Classification Functions

The following cell contains the complete implementation of the PCA + Plane Fitting classification method:

### Key Functions:
- `classify_models_to_csv()` - Main classification pipeline
- `load_mesh()` - STL file loading and validation
- `compute_basic_properties()` - Extract geometric properties (size, volume, area)
- `compute_pca_eigenvalues()` - Principal component analysis
- `compute_plane_fit_rms()` - Best-fit plane computation using SVD
- `classify_model()` - Apply classification logic with thresholds

### Processing Pipeline:
1. Load STL mesh files from directory
2. Extract geometric features for each model
3. Apply classification thresholds
4. Save results to CSV and organize files into folders
5. Generate summary statistics

In [1]:
import os
import numpy as np
import trimesh
import csv
import shutil
from pathlib import Path


def classify_models_to_csv(input_dir, 
                          output_csv="classification_results.csv",
                          out_flat="flat_models",
                          out_free="free_models",
                          eigen_ratio_threshold=0.01,
                          z_ratio_threshold=0.03,
                          normalized_rms_threshold=0.02,
                          limit=None,
                          copy_files=True):
    """
    Classify 3D models as FLAT or FREE using geometric analysis.
    
    Parameters:
    -----------
    input_dir : str
        Directory containing STL files
    output_csv : str
        Output CSV filename
    out_flat, out_free : str
        Output directories for classified models
    eigen_ratio_threshold : float
        Threshold for eigenvalue ratio (default: 0.01)
    z_ratio_threshold : float
        Threshold for Z-dimension ratio (default: 0.03)
    normalized_rms_threshold : float
        Threshold for normalized RMS distance (default: 0.02)
    limit : int or None
        Maximum number of files to process
    copy_files : bool
        Whether to copy files to output directories
    """
    
    def load_mesh(path):
        """Load and validate STL mesh"""
        mesh = trimesh.load(path, force='mesh', process=False)
        if mesh.is_empty:
            raise ValueError("Empty mesh")
        return mesh

    def compute_basic_properties(mesh):
        """Extract basic geometric properties from mesh"""
        extents = mesh.extents
        size_x, size_y, size_z = extents.tolist()
        bbox_diagonal = np.linalg.norm(extents)
        area = mesh.area
        
        try:
            volume = mesh.volume if mesh.is_watertight else mesh.convex_hull.volume
        except Exception:
            volume = 0.0
            
        return {
            'size_x': size_x, 
            'size_y': size_y, 
            'size_z': size_z,
            'bbox_diagonal': bbox_diagonal, 
            'area': area, 
            'volume': volume
        }

    def compute_pca_eigenvalues(vertices):
        """Compute PCA eigenvalues for shape analysis"""
        centroid = vertices.mean(axis=0)
        centered_vertices = vertices - centroid
        covariance_matrix = np.cov(centered_vertices.T)
        eigenvalues, _ = np.linalg.eigh(covariance_matrix)
        return np.sort(eigenvalues)[::-1]

    def compute_plane_fit_rms(vertices):
        """Compute RMS distance to best-fit plane using SVD"""
        centroid = vertices.mean(axis=0)
        centered_vertices = vertices - centroid
        _, _, v_transpose = np.linalg.svd(centered_vertices, full_matrices=False)
        plane_normal = v_transpose[-1]
        distances = np.abs(np.dot(centered_vertices, plane_normal))
        rms_distance = float(np.sqrt(np.mean(distances ** 2)))
        return rms_distance

    def classify_model(properties, eigenvalues, rms_plane_distance):
        """
        Classify model as FLAT or FREE based on geometric features
        
        Classification Logic:
        - FLAT if any of these conditions are met:
          1. Small eigenvalue ratio (nearly planar distribution)
          2. Small Z-dimension ratio (flat in Z direction)
          3. Small normalized RMS distance (close to a plane)
        """
        s1, s2, s3 = eigenvalues
        eigenvalue_sum = s1 + s2 + s3
        
        eigen_ratio = (s3 / eigenvalue_sum) if eigenvalue_sum > 0 else 0.0
        max_xy = max(properties['size_x'], properties['size_y'], 1e-9)
        z_ratio = properties['size_z'] / max_xy
        normalized_rms = rms_plane_distance / (properties['bbox_diagonal'] + 1e-9)
        
        is_flat = (eigen_ratio < eigen_ratio_threshold) or \
                  (z_ratio < z_ratio_threshold) or \
                  (normalized_rms < normalized_rms_threshold)
        
        label = 'flat' if is_flat else 'free'
        return label, eigen_ratio, z_ratio, normalized_rms

    
    os.makedirs(out_flat, exist_ok=True)
    os.makedirs(out_free, exist_ok=True)

    stl_files = [os.path.join(input_dir, f) 
                 for f in os.listdir(input_dir) 
                 if f.lower().endswith(".stl")]
    
    if limit:
        stl_files = stl_files[:limit]
    
    print(f"Processing {len(stl_files)} STL files...")

    results = []
    processed_count = 0
    error_count = 0
    
    for file_path in stl_files:
        try:
            mesh = load_mesh(file_path)
            properties = compute_basic_properties(mesh)
            eigenvalues = compute_pca_eigenvalues(mesh.vertices)
            rms_distance = compute_plane_fit_rms(mesh.vertices)
            
            label, eigen_ratio, z_ratio, normalized_rms = classify_model(
                properties, eigenvalues, rms_distance
            )

            results.append({
                "filename": os.path.basename(file_path),
                "path": file_path,
                "classification": label.upper(),
                "size_x": properties['size_x'],
                "size_y": properties['size_y'], 
                "size_z": properties['size_z'],
                "volume": properties['volume'],
                "area": properties['area'],
                "eigenvalue_1": eigenvalues[0],
                "eigenvalue_2": eigenvalues[1],
                "eigenvalue_3": eigenvalues[2],
                "plane_rms": rms_distance,
                "eigen_ratio": eigen_ratio,
                "z_ratio": z_ratio,
                "normalized_rms": normalized_rms
            })

            if copy_files:
                destination_dir = out_flat if label == "flat" else out_free
                shutil.copy2(file_path, os.path.join(destination_dir, os.path.basename(file_path)))

            processed_count += 1
            
            if processed_count % 10 == 0:
                print(f"Processed {processed_count}/{len(stl_files)} files")

        except Exception as e:
            error_count += 1
            results.append({
                "filename": os.path.basename(file_path),
                "path": file_path,
                "classification": "ERROR",
                "error_message": str(e)
            })

    if results:
        with open(output_csv, "w", newline="", encoding="utf-8") as csvfile:
            if results[0].get("classification") != "ERROR":
                fieldnames = list(results[0].keys())
            else:
                fieldnames = ["filename", "path", "classification", "error_message"]
                
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(results)

    flat_count = sum(1 for r in results if r.get("classification") == "FLAT")
    free_count = sum(1 for r in results if r.get("classification") == "FREE")
    
    print(f"\nClassification Complete!")
    print(f"Successfully processed: {processed_count} files")
    print(f"Errors encountered: {error_count} files")
    print(f"FLAT models: {flat_count}")
    print(f"FREE models: {free_count}")
    print(f"Results saved to: {output_csv}")
    print(f"Files copied to: {out_flat} | {out_free}")
    
    return results


if __name__ == "__main__":
    stl_folder = r"C:\Users\Anuhas\Documents\Research\Thingiverse\data"
    
    results = classify_models_to_csv(
        input_dir=stl_folder,
        output_csv="pca_classification_results.csv",
        out_flat="flat_models",
        out_free="free_models",
        limit=None
    )

Processing 248 STL files...
Processed 10/248 files
Processed 20/248 files
Processed 30/248 files
Processed 40/248 files
Processed 50/248 files
Processed 60/248 files
Processed 70/248 files
Processed 80/248 files
Processed 90/248 files
Processed 100/248 files
Processed 110/248 files
Processed 120/248 files
Processed 130/248 files
Processed 140/248 files
Processed 150/248 files
Processed 160/248 files
Processed 170/248 files
Processed 180/248 files
Processed 190/248 files
Processed 200/248 files
Processed 210/248 files
Processed 220/248 files
Processed 230/248 files
Processed 240/248 files

Classification Complete!
Successfully processed: 248 files
Errors encountered: 0 files
FLAT models: 39
FREE models: 209
Results saved to: pca_classification_results.csv
Files copied to: flat_models | free_models


## Accuracy evaluation against manual results csv file

This section evaluates the PCA + Plane Fitting method against manually labeled data to measure its performance.

### Evaluation Process:
1. **Load manual results csv** - Import manual classifications from `AllModels.csv`
2. **Apply Method** - Run PCA classification on each manually labeled file
3. **Calculate Metrics** - Compute accuracy, precision, recall, and F1-score
4. **Analyze Results** - Show confusion matrix and misclassification details

### Performance Metrics:
- **Accuracy**: Overall correctness percentage
- **Precision**: Correctness of FREE predictions  
- **Recall**: Coverage of actual FREE models
- **F1-Score**: Harmonic mean of precision and recall
- **Confusion Matrix**: Detailed breakdown of predictions vs. reality

In [2]:
import pandas as pd
import numpy as np
import trimesh
from pathlib import Path
from sklearn.metrics import (accuracy_score, precision_score, recall_score, 
                           f1_score, confusion_matrix, classification_report)


def load_manual_resultset():
    """
    Load manual classifications from AllModels.csv
    
    Returns:
    --------
    dict : Dictionary mapping filename to manual result label
    """
    csv_path = r"C:\Users\Anuhas\Documents\Research\Thingiverse\AllModels.csv"
    
    encodings_to_try = ['utf-8', 'latin-1', 'cp1252', 'iso-8859-1']
    df = None
    
    for encoding in encodings_to_try:
        try:
            df = pd.read_csv(csv_path, encoding=encoding)
            print(f"Successfully loaded manual result CSV with {encoding} encoding")
            break
        except UnicodeDecodeError:
            continue
    
    if df is None:
        raise ValueError("Could not read CSV file with any standard encoding")
    
    manual_resultset_labels = {}
    
    for _, row in df.iterrows():
        filename = str(row.iloc[0]).strip()
        
        if filename and filename != 'nan' and '.stl' in filename:
            label_text = str(row.iloc[1]).strip().upper()
            
            if 'FLAT' in label_text:
                normalized_label = 'flat'
            elif 'FREE' in label_text:
                normalized_label = 'free'
            else:
                continue
                
            manual_resultset_labels[filename] = normalized_label
    
    print(f"Loaded {len(manual_resultset_labels)} manual manual result labels")
    return manual_resultset_labels


def pca_classification_method(mesh):
    """
    PCA + Plane Fitting Classification Method
    
    Parameters:
    -----------
    mesh : trimesh.Trimesh
        Input 3D mesh
        
    Returns:
    --------
    str : Classification result ('flat' or 'free')
    dict : Feature values used for classification
    """
    vertices = mesh.vertices
    extents = mesh.extents
    size_x, size_y, size_z = extents.tolist()
    bbox_diagonal = np.linalg.norm(extents)
    
    centroid = vertices.mean(axis=0)
    centered_vertices = vertices - centroid
    covariance_matrix = np.cov(centered_vertices.T)
    eigenvalues, _ = np.linalg.eigh(covariance_matrix)
    eigenvalues = np.sort(eigenvalues)[::-1]
    
    _, _, v_transpose = np.linalg.svd(centered_vertices, full_matrices=False)
    plane_normal = v_transpose[-1]
    distances_to_plane = np.abs(np.dot(centered_vertices, plane_normal))
    rms_plane_distance = float(np.sqrt(np.mean(distances_to_plane ** 2)))
    
    s1, s2, s3 = eigenvalues
    eigenvalue_sum = s1 + s2 + s3
    eigen_ratio = (s3 / eigenvalue_sum) if eigenvalue_sum > 0 else 0.0
    
    max_xy_dimension = max(size_x, size_y, 1e-9)
    z_ratio = size_z / max_xy_dimension
    normalized_rms = rms_plane_distance / (bbox_diagonal + 1e-9)
    
    EIGEN_THRESHOLD = 0.01
    Z_RATIO_THRESHOLD = 0.03
    RMS_THRESHOLD = 0.02
    
    is_flat_model = (eigen_ratio < EIGEN_THRESHOLD) or \
                    (z_ratio < Z_RATIO_THRESHOLD) or \
                    (normalized_rms < RMS_THRESHOLD)
    
    classification = 'flat' if is_flat_model else 'free'
    
    features = {
        'eigen_ratio': eigen_ratio,
        'z_ratio': z_ratio,
        'normalized_rms': normalized_rms,
        'eigenvalues': eigenvalues,
        'rms_distance': rms_plane_distance
    }
    
    return classification, features


def evaluate_classification_accuracy(ground_truth_dict, stl_directory):
    """
    Evaluate classification accuracy against manual ground truth
    
    Parameters:
    -----------
    ground_truth_dict : dict
        Manual classifications {filename: label}
    stl_directory : str
        Directory containing STL files
        
    Returns:
    --------
    dict : Comprehensive evaluation results
    """
    stl_folder = Path(stl_directory)
    
    predictions = []
    true_labels = []
    filenames = []
    feature_data = []
    
    print(f"Evaluating PCA Classification Method...")
    print(f"STL Directory: {stl_directory}")
    
    successful_predictions = 0
    failed_predictions = 0
    
    for filename, true_label in ground_truth_dict.items():
        stl_file_path = stl_folder / filename
        
        if stl_file_path.exists():
            try:
                mesh = trimesh.load(stl_file_path, force='mesh', process=False)
                if mesh.is_empty:
                    continue
                
                predicted_label, features = pca_classification_method(mesh)
                
                predictions.append(predicted_label)
                true_labels.append(true_label)
                filenames.append(filename)
                feature_data.append(features)
                
                successful_predictions += 1
                
            except Exception as e:
                failed_predictions += 1
                print(f"Error processing {filename}: {str(e)[:50]}")
                continue
    
    if len(predictions) == 0:
        print("No valid predictions generated!")
        return None
    
    accuracy = accuracy_score(true_labels, predictions)
    precision = precision_score(true_labels, predictions, pos_label='free', average='binary')
    recall = recall_score(true_labels, predictions, pos_label='free', average='binary')
    f1 = f1_score(true_labels, predictions, pos_label='free', average='binary')
    
    cm = confusion_matrix(true_labels, predictions, labels=['flat', 'free'])
    
    class_report = classification_report(true_labels, predictions, 
                                       target_names=['FLAT', 'FREE'], 
                                       output_dict=True)
    
    misclassifications = []
    for i, (pred, true, fname) in enumerate(zip(predictions, true_labels, filenames)):
        if pred != true:
            misclassifications.append({
                'filename': fname,
                'true_label': true,
                'predicted_label': pred,
                'features': feature_data[i]
            })
    
    print(f"\n**EVALUATION RESULTS**")
    print(f"{'='*50}")
    print(f"Files Successfully Processed: {successful_predictions}")
    print(f"Files Failed: {failed_predictions}")
    print(f"Overall Accuracy: {accuracy:.3f} ({accuracy*100:.1f}%)")
    print(f"Precision (FREE): {precision:.3f}")
    print(f"Recall (FREE): {recall:.3f}")
    print(f"F1-Score (FREE): {f1:.3f}")
    
    print(f"\n**CONFUSION MATRIX**")
    print(f"                 Predicted")
    print(f"           FLAT    FREE")
    print(f"   FLAT  | {cm[0,0]:4d} | {cm[0,1]:4d} |")
    print(f"   FREE  | {cm[1,0]:4d} | {cm[1,1]:4d} |")
    
    flat_predictions = sum(1 for p in predictions if p == 'flat')
    free_predictions = sum(1 for p in predictions if p == 'free')
    flat_actual = sum(1 for t in true_labels if t == 'flat')
    free_actual = sum(1 for t in true_labels if t == 'free')
    
    print(f"\n**CLASSIFICATION DISTRIBUTION**")
    print(f"Ground Truth: {flat_actual} FLAT, {free_actual} FREE")
    print(f"Predictions:  {flat_predictions} FLAT, {free_predictions} FREE")
    
    if misclassifications:
        print(f"\n**MISCLASSIFICATIONS** ({len(misclassifications)} total)")
        for i, error in enumerate(misclassifications[:10]):
            print(f"  {i+1:2d}. {error['filename']}")
            print(f"      True: {error['true_label'].upper()}, Predicted: {error['predicted_label'].upper()}")
            features = error['features']
            print(f"      Features: eigen_ratio={features['eigen_ratio']:.4f}, "
                  f"z_ratio={features['z_ratio']:.3f}, rms={features['normalized_rms']:.4f}")
        
        if len(misclassifications) > 10:
            print(f"      ... and {len(misclassifications)-10} more misclassifications")
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1_score': f1,
        'confusion_matrix': cm,
        'classification_report': class_report,
        'predictions': predictions,
        'true_labels': true_labels,
        'filenames': filenames,
        'misclassifications': misclassifications,
        'feature_data': feature_data
    }

## Execute Evaluation

Run the accuracy evaluation and display comprehensive results including performance metrics and misclassification analysis.

In [3]:
print("**PCA + PLANE FITTING METHOD ACCURACY EVALUATION**")
print("="*60)

try:
    ground_truth = load_manual_resultset()
    
    if len(ground_truth) > 0:
        stl_directory = r"C:\Users\Anuhas\Documents\Research\Thingiverse\data"
        
        evaluation_results = evaluate_classification_accuracy(ground_truth, stl_directory)
        
        if evaluation_results:
            print(f"\n**FINAL SUMMARY**")
            print(f"The PCA + Plane Fitting method achieved:")
            print(f"{evaluation_results['accuracy']*100:.1f}% accuracy")
            print(f"{evaluation_results['f1_score']:.3f} F1-score")
            print(f"{evaluation_results['precision']:.3f} precision")
            print(f"{evaluation_results['recall']:.3f} recall")
            
            results_df = pd.DataFrame({
                'filename': evaluation_results['filenames'],
                'true_label': evaluation_results['true_labels'],
                'predicted_label': evaluation_results['predictions'],
                'correct': [t == p for t, p in zip(evaluation_results['true_labels'], 
                                                 evaluation_results['predictions'])]
            })
            
            print(f"\nEvaluation complete! Results stored in memory.")
            print(f"Total files evaluated: {len(results_df)}")
            
        else:
            print("Evaluation failed - no valid results generated")
    
    else:
        print("No manual result labels loaded")

except Exception as e:
    print(f"Error during evaluation: {e}")
    print("Please check file paths and ensure all required files are accessible")

**PCA + PLANE FITTING METHOD ACCURACY EVALUATION**
Successfully loaded manual result CSV with latin-1 encoding
Loaded 249 manual manual result labels
Evaluating PCA Classification Method...
STL Directory: C:\Users\Anuhas\Documents\Research\Thingiverse\data

**EVALUATION RESULTS**
Files Successfully Processed: 247
Files Failed: 0
Overall Accuracy: 0.700 (70.0%)
Precision (FREE): 0.707
Recall (FREE): 0.919
F1-Score (FREE): 0.799

**CONFUSION MATRIX**
                 Predicted
           FLAT    FREE
   FLAT  |   26 |   61 |
   FREE  |   13 |  147 |

**CLASSIFICATION DISTRIBUTION**
Ground Truth: 87 FLAT, 160 FREE
Predictions:  39 FLAT, 208 FREE

**MISCLASSIFICATIONS** (74 total)
   1. 35269.stl
      True: FLAT, Predicted: FREE
      Features: eigen_ratio=0.2571, z_ratio=0.360, rms=0.0622
   2. 36069.stl
      True: FLAT, Predicted: FREE
      Features: eigen_ratio=0.0375, z_ratio=3.192, rms=0.0813
   3. 36075.stl
      True: FLAT, Predicted: FREE
      Features: eigen_ratio=0.0323, z_ra