# Detecting a rotated chest X-ray (CXR) image is crucial for proper medical analysis and automated diagnostics. There are several techniques that can be used to identify and correct rotated X-ray images:
1. Deep Learning-Based Approaches

âœ… Best for large datasets & high accuracy

    Convolutional Neural Networks (CNNs):
        Train a CNN to classify images as "rotated" or "correctly oriented."
        Example architectures: ResNet, VGG, EfficientNet.
    Autoencoders / Self-Supervised Learning:
        An autoencoder can learn normal orientations and identify when an image is misaligned.
        Contrastive learning (e.g., SimCLR, MoCo) can learn representations and detect rotation.

ðŸ”¹ Example:

    Train a CNN classifier on CXR images labeled with different rotation angles (e.g., 0Â°, 90Â°, 180Â°, 270Â°).
    The model predicts whether an image is rotated and suggests corrections.

2. Classical Computer Vision Techniques

âœ… Good for small datasets & fast processing
A. Edge Detection & Keypoint Detection

    Hough Line Transform:
        Detects vertical and horizontal structures in X-rays.
        If the lung boundaries or clavicles are tilted, the image is likely rotated.
    Harris Corner Detector / SIFT / ORB Features:
        Extract keypoints (e.g., rib cage edges) and compare them with known orientations.

B. Symmetry-Based Methods

    Lungs and spine are normally symmetrical in frontal CXR.
    Use symmetry detection:
        Histogram of oriented gradients (HOG)
        Radon transform (detects dominant line orientations)
        Fourier transform-based alignment (checks for dominant vertical structures)

C. PCA (Principal Component Analysis)

    Compute the principal axis of the lung region.
    If the major axis is significantly tilted, the image is rotated.

3. Template Matching

âœ… Useful when reference images are available

    Compare an input X-ray with a set of correctly aligned reference images.
    Use cross-correlation or structural similarity index (SSIM) to measure alignment.
    If misalignment is detected, rotate the image until it best matches the reference.

4. Statistical Methods

    Compute the intensity profile along the vertical axis:
        Normally, pixel intensity distribution should be symmetrical.
        If the distribution is skewed, the image might be rotated.
    Gradient orientation histograms:
        In a correctly oriented image, most gradients should be vertical.
        Rotation causes shifts in the gradient histogram.

Which Method to Use?
<table data-start="2634" data-end="3127"><thead data-start="2634" data-end="2673"><tr data-start="2634" data-end="2673"><th data-start="2634" data-end="2650"><strong data-start="2636" data-end="2649">Technique</strong></th><th data-start="2650" data-end="2661"><strong data-start="2652" data-end="2660">Pros</strong></th><th data-start="2661" data-end="2673"><strong data-start="2663" data-end="2671">Cons</strong></th></tr></thead><tbody data-start="2711" data-end="3127"><tr data-start="2711" data-end="2797"><td>CNN-based detection</td><td>High accuracy, generalizable</td><td>Needs labeled data &amp; training</td></tr><tr data-start="2798" data-end="2883"><td>Hough Line Transform</td><td>Fast, detects rotations well</td><td>Struggles with noisy images</td></tr><tr data-start="2884" data-end="2969"><td>PCA-based</td><td>No training needed, works on symmetry</td><td>Fails if lungs are asymmetric</td></tr><tr data-start="2970" data-end="3057"><td>Radon Transform</td><td>Effective for detecting skewed images</td><td>Computationally expensive</td></tr><tr data-start="3058" data-end="3127"><td>Template Matching</td><td>Easy to implement</td><td>Requires reference images</td></tr></tbody></table>

# ===============================================
# CNN-based detection with downscaled xray images
# ===============================================

* ## Downscaling the image
Best Interpolation for Downscaling X-ray Images

For medical images, preserving details is crucial. The best interpolation method is:

âœ… cv2.INTER_AREA â†’ Best for downscaling (avoids aliasing & preserves details).

This method averages pixel values, making it ideal for medical imaging where preserving structures is critical.

In [1]:
%load_ext autoreload
%autoreload 2
#import sys
#sys.path.append(r"C:\Users\User\DataScience\area51")
#sys.path

## Prepare dowscaled data to train the CNN model
1. Downscaling of the original images
2. removing duplicates
3. Creating test and traning data

In [None]:
# 1. Dowsncaling
from src.utils.img_processing import ImageProcessor
from src.defs import IMAGE_DIRECTORIES as imdir, DiseaseCategory as dc

ip = ImageProcessor()

new_size = (75, 75) # 224x224 for resNet50 / 75x75 for mobileNet


for disease in dc:

    inFolder = fr"C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\{disease.value}\images"
    outFolder = fr"C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\{disease.value}\downscaled"
    
    # prepends the 224x224_ to the file names like 224x224_COVID-3143.png
    ip.downscaleToFolder(inputFolder=inFolder, outputFolder=outFolder, new_size=new_size) 


1345 images have been downscaled and stored to C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\Viral Pneumonia\downscaled\224x224
3616 images have been downscaled and stored to C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\COVID\downscaled\224x224
6012 images have been downscaled and stored to C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\Lung_Opacity\downscaled\224x224
10192 images have been downscaled and stored to C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\Normal\downscaled\224x224


In [4]:
# 2. removing duplicates

do_remove_duplicates = False

if do_remove_duplicates:

    # remove duplicated images for each category in downscaled folders
    import os, pandas as pd
    from src.defs import DiseaseCategory as dc

    # prepare a list having images to be removed
    df_duplicates_only = pd.read_csv(r"C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\3_image_duplicates_only.csv")
    df_duplicates_to_remove = df_duplicates_only[df_duplicates_only['mean intensity'].duplicated(keep='first')]
    duplicated_images_to_remove = df_duplicates_to_remove["file name"] + '.png'
    duplicated_images_to_remove = duplicated_images_to_remove.to_list()

    # removing duplicated images from the dataset

    img_removed = []	# List to store removed images
    img_resolution_prefix = "224x224_"

    for cat in dc:
        #set the base directory
        img_dir = fr"C:\Users\User\DataScience\area51\data\COVID-19_Radiography_Dataset\{cat.value}\downscaled\224x224"
        #img_dir = r"C:\Users\User\DataScience\area51\data_224x224\224x224_rotated_0"
        
        # Iterate through all images in the directory
        all_images = [img for img in os.listdir(img_dir) if img.endswith('.png')]

        # Remove matching files
        for filename in duplicated_images_to_remove:
            file_path = os.path.join(img_dir, img_resolution_prefix + filename)
            if os.path.exists(file_path):
                os.remove(file_path)
                img_removed.append(filename)

    print("duplicates to remove:", len(duplicated_images_to_remove), duplicated_images_to_remove)
    print("removed duplicates:", len(img_removed), img_removed)

else:

    print("Removal of duplicates is disabled!")


Removal of duplicates is disabled!


In [None]:
# create 75x75 train/valid dataset
from src.utils.img_processing import ImageProcessor
ip = ImageProcessor()

# in order to compare both models use the same train/val/test images as for ResNet50

strip_prefix = "224x224_"
from_dir = rf"C:\Users\User\DataScience\area51\data_224x224\train_val_224x224\224x224_rotated_0"

prefix_0 = "75x75_"
prefix_90 = "75x75_rotated_90_"
prefix_180 = "75x75_rotated_180_"
prefix_minus_90 = "75x75_rotated_minus_90_"

to_dir_0 = rf"C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\{prefix_0[:-1]}"
to_dir_90 = rf"C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\{prefix_90[:-1]}"
to_dir_180 = rf"C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\{prefix_180[:-1]}"
to_dir_minus_90 = rf"C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\{prefix_minus_90[:-1]}"

imgNames = ip.listFiles(from_dir, extensions=(".png"))
imgs, imgNames = ip.loadImgs(imgNames, from_dir)

angles = [0, 90, 180, -90]
new_resolution = (75, 75)

# copy rotate rename and save images
for to_dir, prx, angle in zip((to_dir_0, to_dir_90, to_dir_180, to_dir_minus_90), (prefix_0, prefix_90, prefix_180, prefix_minus_90), angles ):
    ip.copyRenameDownscaleRotateSave(imgs, imgNames, to_dir, new_resolution, rotAngle=angle, strip_prefix=strip_prefix, add_prefix=prx)


6012 images have been copied and rotated to C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\75x75
6012 images have been copied and rotated to C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\75x75_rotated_90
6012 images have been copied and rotated to C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\75x75_rotated_180
6012 images have been copied and rotated to C:\Users\User\DataScience\area51\data_75x75\train_val_75x75\75x75_rotated_minus_90


In [None]:
# 3. create test dataset
from src.utils.img_processing import ImageProcessor
import cv2, os
ip = ImageProcessor()

# in order to compare both models the same train/val/test images as for ResNet50 are used
from_dir = rf"C:\Users\User\DataScience\area51\data_224x224\test_224x224"

prefix_0 = "75x75_"
prefix_90 = "75x75_rotated_90_"
prefix_180 = "75x75_rotated_180_"
prefix_minus_90 = "75x75_rotated_minus_90_"

base_prfx = "224x224_"
replacePrefixes = [(f"rotated_90_{base_prfx}", prefix_90), 
                   (f"rotated_180_{base_prfx}", prefix_180), (f"rotated_-90_{base_prfx}", prefix_minus_90), (base_prfx , prefix_0), ]

to_dir = rf"C:\Users\User\DataScience\area51\data_75x75\test_75x75"
imgNames = ip.listFiles(from_dir, extensions=(".png"))  
imgs, imgNames = ip.loadImgs(imgNames, from_dir)

new_imgNames = []

# rename and store 
for img, name in zip(imgs, imgNames):
    for old, new in replacePrefixes:
        
        if name.startswith(old):
            name_new = name.replace(old, new)
            img_path = os.path.join(to_dir, name_new)
            if not os.path.exists(img_path):
                cv2.imwrite(img_path, img)
                new_imgNames.append(name_new)
            else:
                print(f"Skipped: {img_path} already exists.")

            break

        else:
            
            continue

len(new_imgNames), new_imgNames






Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-13.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-24.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-28.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-29.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-3.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-34.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-41.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-44.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_75x75\test_75x75\75x75_Lung_Opacity-70.png already exists.
Skipped: C:\Users\User\DataScience\area51\data_

(0, [])

In [8]:
p

'abc'