### Steps to Correct Labeling Issues

1. **Identify and Correct Obvious Errors**: Manually correct obvious labeling issues based on the filename.
2. **Automate the Process for Larger Datasets**: If the dataset is large, automate the correction process where possible.
3. **Verify the Correctness of Labels**: Cross-check corrected labels with known information or manually verify a subset of the data.

### Step 1: Identify and Correct Obvious Errors

You can manually correct the entries where the command does not match the filename:


In [2]:
import pandas as pd

# Load the annotations CSV
scene_annotations_df = pd.read_csv('../dataset/development_scene_annotations.csv')

# Print the first few rows to identify issues
print(scene_annotations_df.head())

# Define a function to correct labels based on filenames
def correct_label(row):
    filename = row['filename']
    command = row['command']
    if "Ofen_aus" in filename and command != "Ofen aus":
        return "Ofen aus"
    elif "Radio_an" in filename and command != "Radio an":
        return "Radio an"
    elif "Alarm_an" in filename and command != "Alarm an":
        return "Alarm an"
    elif "Radio_aus" in filename and command != "Radio aus":
        return "Radio aus"
    elif "Fernseher_aus" in filename and command != "Fernseher aus":
        return "Fernseher aus"
    elif "Staubsauger_an" in filename and command != "Staubsauger an":
        return "Staubsauger an"
    elif "Staubsauger_aus" in filename and command != "Staubsauger aus":
        return "Staubsauger aus"
    # Add more conditions as necessary
    else:
        return command

# Apply the correction function
scene_annotations_df['command'] = scene_annotations_df.apply(correct_label, axis=1)

# Verify the corrections
print(scene_annotations_df.head())

                        filename         command     start       end
0         2_speech_true_Ofen_aus        Ofen aus  11.25230  12.07747
1         3_speech_true_Radio_an  Staubsauger an  21.48040  23.18083
2         4_speech_true_Alarm_an        Alarm an  14.45720  16.08301
3        9_speech_true_Radio_aus  Staubsauger an   3.67909   5.63126
4  11_speech_false_Fernseher_aus  Staubsauger an  10.57850  11.67886
                        filename        command     start       end
0         2_speech_true_Ofen_aus       Ofen aus  11.25230  12.07747
1         3_speech_true_Radio_an       Radio an  21.48040  23.18083
2         4_speech_true_Alarm_an       Alarm an  14.45720  16.08301
3        9_speech_true_Radio_aus      Radio aus   3.67909   5.63126
4  11_speech_false_Fernseher_aus  Fernseher aus  10.57850  11.67886


### Step 2: Automate the Process for Larger Datasets

If the dataset is larger and contains more complex issues, you might need a more sophisticated approach:

In [3]:
# Define a mapping based on filename patterns
command_mapping = {
    'Ofen_aus': 'Ofen aus',
    'Radio_an': 'Radio an',
    'Alarm_an': 'Alarm an',
    'Radio_aus': 'Radio aus',
    'Fernseher_aus': 'Fernseher aus',
    'Staubsauger_an': 'Staubsauger an',
    'Staubsauger_aus': 'Staubsauger aus'
    # Add more mappings as necessary
}

# Function to automatically correct labels
def auto_correct_label(row):
    filename = row['filename']
    for key, value in command_mapping.items():
        if key in filename:
            return value
    return row['command']

# Apply the automatic correction
scene_annotations_df['command'] = scene_annotations_df.apply(auto_correct_label, axis=1)

# Verify the automatic corrections
print(scene_annotations_df.head())

                        filename        command     start       end
0         2_speech_true_Ofen_aus       Ofen aus  11.25230  12.07747
1         3_speech_true_Radio_an       Radio an  21.48040  23.18083
2         4_speech_true_Alarm_an       Alarm an  14.45720  16.08301
3        9_speech_true_Radio_aus      Radio aus   3.67909   5.63126
4  11_speech_false_Fernseher_aus  Fernseher aus  10.57850  11.67886


### Step 3: Verify the Correctness of Labels

Cross-check a subset of the corrected data manually to ensure the correctness of labels:


In [4]:
# Randomly sample a subset for manual verification
subset = scene_annotations_df.sample(10)
print(subset)

                                              filename         command   
667                          1390_speech_false_Ofen_an         Ofen an  \
218              421_speech_true_Fernseher_aus_Ofen_an   Fernseher aus   
577  1237_speech_false_Alarm_aus_Radio_an_Fernseher...        Radio an   
189          365_speech_false_Fernseher_an_Lüftung_aus     Lüftung aus   
841                1656_speech_false_Alarm_an_Ofen_aus        Ofen aus   
825           1628_speech_false_Licht_aus_Fernseher_an    Fernseher an   
532          1162_speech_false_Ofen_aus_Staubsauger_an        Ofen aus   
714                    1469_speech_true_Staubsauger_an  Staubsauger an   
948                       1797_speech_true_Heizung_aus     Heizung aus   
754                         1507_speech_true_Radio_aus       Radio aus   

        start        end  
667  14.39520  15.694770  
218   8.57969   9.905410  
577  12.85350  14.754020  
189  11.18770  12.464150  
841  16.76040  18.686600  
825   8.30332   9.05362

### Step 4: Re-Check Class Distribution

After correcting the labels, re-check the class distribution:


In [None]:
# Re-check the distribution of labels in the annotations CSV after correction
label_distribution_annotations = scene_annotations_df['command'].value_counts()
print("Revised Label Distribution in development_scene_annotations.csv:")
print(label_distribution_annotations)

### Summary

1. **Identify and Correct Errors**: Manually and/or automatically correct obvious labeling issues.
2. **Automate Where Possible**: Use a mapping or pattern-matching approach to correct labels for larger datasets.
3. **Verify Labels**: Manually verify a subset of the corrected data.
4. **Re-Check Distribution**: Ensure the class distribution is more balanced and accurate.

Implement these steps to correct the labeling issues in your dataset. Once the labels are corrected, re-evaluate the model performance. Let me know if you need further assistance!