<a href="https://colab.research.google.com/github/wilkiamieva/SSS_tools/blob/main/SSS_post_processing_v1_0_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# SSS post-processing v1.0.0

**Warning:** Audio files should be pre-processed before SSS with a separate notebook (SSS pre-processing).

This notebook truncates the SSS output files to the lenght of the original files. This should be run after SSS processing.

## Dependencies management

Pandas, SoundFile and PyDub packages are installed and imported. OS, Pandas, SoundFile and AudioSegment packages are imported. Google Drive is also imported.

In [1]:
# Install necessary dependencies
!pip install pandas soundfile pydub

# Import required libraries
import os
import pandas as pd
import soundfile as sf
from pydub import AudioSegment
from google.colab import drive

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


## Configuration

The notebook will process every audio file in the input folder and the results will be saved in the output folder. Please adjust the paths as needed. By default, Google Drive is mounted.

In this cell, the input and output folders shall be specified. The code will check for CSV files (.csv) with the needed fields ('Original Samples' and 'Sample Rate') and that there are audio files in Wave (.wav), AIFF (.aiff or .aiff) or FLAC (.flac) formats with their names starting as the CSV filenames.

In [2]:
# Mount Google Drive, delete if no needed.
drive.mount('/content/drive')

# Configure input and output folders
input_folder = '/content/drive/MyDrive/SSS_tools/post_input'  # Replace with your input folder
output_folder = '/content/drive/MyDrive/SSS_tools/post_output'  # Replace with your output folder

# Ensure the output folder exists
if not os.path.exists(output_folder):
    os.makedirs(output_folder)

# Function to find CSV and corresponding audio files
def find_audio_files(input_folder):
    audio_files = []

    for file_name in os.listdir(input_folder):
        if file_name.endswith(".csv"):
            csv_path = os.path.join(input_folder, file_name)
            df = pd.read_csv(csv_path)

            # Check for required columns
            if 'Original Samples' in df.columns and 'Sample Rate' in df.columns:
                base_name = os.path.splitext(file_name)[0]

                # Find audio files that start with the CSV base name
                for audio_file in os.listdir(input_folder):
                    if audio_file.startswith(base_name) and audio_file.endswith((".wav", ".aiff", ".aif", ".flac", ".mp3", ".aac")):
                        audio_files.append((audio_file, csv_path))

    return audio_files

# Get the list of audio files and corresponding CSV files
audio_file_list = find_audio_files(input_folder)
print(f"Found audio files: {audio_file_list}")

Mounted at /content/drive
Found audio files: [('19. Pequeña Desviación En La Conducta De Los Reyes Magos_000.flac', '/content/drive/MyDrive/SSS_processing/pre_output/19. Pequeña Desviación En La Conducta De Los Reyes Magos_000.csv')]


## Audio truncation

Defines a function to truncate the lenght of an audio file.

In [3]:
def truncate_audio_file(audio_file, csv_path):
    # Read the CSV to get required values
    df = pd.read_csv(csv_path)
    num_samples = df['Original Samples'].iloc[0]

    # Determine the file path
    audio_path = os.path.join(input_folder, audio_file)

    # Process the audio files based on format
    if audio_file.endswith((".wav", ".flac", ".aiff", ".aif")):
        data, samplerate = sf.read(audio_path)
        truncated_data = data[:num_samples]
        output_path = os.path.join(output_folder, audio_file)
        sf.write(output_path, truncated_data, samplerate)

    elif audio_file.endswith((".mp3", ".aac")):
        audio = AudioSegment.from_file(audio_path)
        truncated_audio = audio[:(num_samples / samplerate) * 1000]  # Convert to milliseconds
        output_path = os.path.join(output_folder, audio_file)
        truncated_audio.export(output_path, format="flac")  # Save as FLAC

## Sample rate checking

Defines a function to check if the sample rate of the input file is the same as the original file.

In [4]:
def check_sample_rate(audio_file, csv_path):
    # Read the CSV to get the defined sample rate
    df = pd.read_csv(csv_path)
    defined_sample_rate = df['Sample Rate'].iloc[0]

    # Determine the file path
    audio_path = os.path.join(input_folder, audio_file)

    # Check the sample rate of the audio file
    if audio_file.endswith((".wav", ".flac", ".aiff", ".aif")):
        data, samplerate = sf.read(audio_path)
    else:
        audio = AudioSegment.from_file(audio_path)
        samplerate = audio.frame_rate

    return samplerate, defined_sample_rate

# Execution

Execution of the above functions requires the CSV files and corresponding audio files in the input folder.

Results are saved in the output folder in FLAC format.

In [5]:
# Iterate through the list of audio files and apply truncation and sample rate check
for audio_file, csv_path in audio_file_list:
    truncate_audio_file(audio_file, csv_path)

    # Check sample rate and print message if it doesn't match
    found_sample_rate, defined_sample_rate = check_sample_rate(audio_file, csv_path)
    if found_sample_rate != defined_sample_rate:
        print(f"Sample rate mismatch for {audio_file}: expected {defined_sample_rate}, found {found_sample_rate}")
    else:
        print(f"Processed {audio_file} successfully.")


Processed 19. Pequeña Desviación En La Conducta De Los Reyes Magos_000.flac successfully.


## Version history

v1.0.0 Initial release on 2024-09-21.

## To do

-

## Author information

Wilki Amieva

wilki@hgprods.com.ar.