# Compare Fold-Change and Chi-Squared Results
### Group 33, Florida Atlantic University
- Compare top-ranked features from fold-change and chi-squared feature selection methods.
- Identify overlapping and unique features.
- Save results for biomarker ranking and downstream analysis.


#### Load and Inspect Data

In [26]:
# Load necessary libraries
import pandas as pd

# Load fold-change results
fold_change_df = pd.read_csv('../processed_data/fold_change_results.csv')

# Load chi-squared results
chi2_ranked_df = pd.read_csv('../processed_data/chi2_ranked_features.csv')

# Inspect the datasets
print("Fold-change dataset columns:", fold_change_df.columns)
print(fold_change_df.head())

print("Chi-squared dataset columns:", chi2_ranked_df.columns)
print(chi2_ranked_df.head())

Fold-change dataset columns: Index(['Unnamed: 0', '0'], dtype='object')
       Unnamed: 0         0
0    hsa-mir-520a  8.776809
1    hsa-mir-520f  7.479981
2    hsa-mir-518c  7.341205
3  hsa-mir-516b-1  6.974440
4   hsa-mir-512-1  6.891411
Chi-squared dataset columns: Index(['Score', 'Feature'], dtype='object')
       Score         Feature
0        NaN  hsa-mir-103b-1
1        NaN  hsa-mir-103b-2
2        NaN    hsa-mir-1183
3  38.356704    hsa-mir-4663
4   5.234286  hsa-mir-1972-1


#### Clean and Align Datasets

In [27]:
# Rename or reset index if necessary for fold-change
if 'Unnamed: 0' in fold_change_df.columns:
    fold_change_df.rename(columns={"Unnamed: 0": "Feature"}, inplace=True)  # Rename index column to Feature
else:
    fold_change_df.reset_index(inplace=True)  # Reset index to ensure miRNA names are in a column

# Ensure feature names are strings and standardized
fold_change_df['Feature'] = fold_change_df['Feature'].astype(str).str.strip().str.lower()
chi2_ranked_df['Feature'] = chi2_ranked_df['Feature'].astype(str).str.strip().str.lower()

# Check cleaned datasets
print("Cleaned Fold-change dataset:")
print(fold_change_df.head())

print("Cleaned Chi-squared dataset:")
print(chi2_ranked_df.head())

Cleaned Fold-change dataset:
          Feature         0
0    hsa-mir-520a  8.776809
1    hsa-mir-520f  7.479981
2    hsa-mir-518c  7.341205
3  hsa-mir-516b-1  6.974440
4   hsa-mir-512-1  6.891411
Cleaned Chi-squared dataset:
       Score         Feature
0        NaN  hsa-mir-103b-1
1        NaN  hsa-mir-103b-2
2        NaN    hsa-mir-1183
3  38.356704    hsa-mir-4663
4   5.234286  hsa-mir-1972-1


#### Extract Top Features (If there are no overlaps, expand the comparison scope (e.g., top 200 features)

In [28]:
# Extract top features from fold-change and chi-squared datasets
top_fold_change_features = fold_change_df.head(100)['Feature']
top_chi2_features = chi2_ranked_df.head(100)['Feature']

# Display top features for verification
print("Top features from Fold-change results:")
print(top_fold_change_features)

print("Top features from Chi-squared results:")
print(top_chi2_features)

Top features from Fold-change results:
0       hsa-mir-520a
1       hsa-mir-520f
2       hsa-mir-518c
3     hsa-mir-516b-1
4      hsa-mir-512-1
           ...      
95       hsa-mir-506
96       hsa-mir-597
97      hsa-mir-6505
98      hsa-mir-3131
99      hsa-mir-4454
Name: Feature, Length: 100, dtype: object
Top features from Chi-squared results:
0     hsa-mir-103b-1
1     hsa-mir-103b-2
2       hsa-mir-1183
3       hsa-mir-4663
4     hsa-mir-1972-1
           ...      
95      hsa-mir-1250
96      hsa-mir-1252
97      hsa-mir-1263
98      hsa-mir-1231
99      hsa-mir-1227
Name: Feature, Length: 100, dtype: object


#### Find Overlapping Features

In [29]:
# Find overlaps between top features
overlaps = set(top_fold_change_features).intersection(set(top_chi2_features))

# Display overlaps
if overlaps:
    print("Overlapping features:")
    print(overlaps)
else:
    print("No overlapping features found.")

Overlapping features:
{'hsa-mir-2113', 'hsa-mir-1269b'}


#### Save Results

In [30]:
# Save overlapping features to a text file
if overlaps:
    with open('../processed_data/overlapping_features.txt', 'w') as f:
        for feature in overlaps:
            f.write(f"{feature}\n")
    print("Overlapping features saved to ../processed_data/overlapping_features.txt")
else:
    print("No overlaps to save.")

# Save top features from each method for documentation
fold_change_df.head(10).to_csv('../processed_data/top_fold_change_features.csv', index=False)
chi2_ranked_df.head(10).to_csv('../processed_data/top_chi2_features.csv', index=False)
print("Top features from each method saved for review.")


Overlapping features saved to ../processed_data/overlapping_features.txt
Top features from each method saved for review.
