# Lesson 3: Error Handling and Cleanup in FFmpeg Transcription


Welcome back! Let's continue our path to implementing the Audio/Video Transcriber using the OpenAI Whisper API!

In this lesson, we will wrap up the main functionality by putting together:

- The media file split functionality (from the previous lesson)
- The Whisper API call on a small media chunk (from the previous course)

On top of that, we will make sure to:

- Properly handle all potential errors  
- Avoid leaving redundant garbage files on disk (to save disk space)

Ensuring robust error handling and cleanup is key to avoiding data loss and maintaining efficiency, even in unexpected scenarios.

> Let's step in to see how exciting this all is!

---

## Building the Transcription Process

Let's examine our main transcription function and understand how it handles errors and cleanup:

```python
def transcribe(file_path):
    """ Transcribe a large media file by splitting it into chunks """
    chunks = []
    try:
        # split_media is implemented in the previous lesson
        # it splits a large media file into smaller chunks
        chunks = split_media(file_path, 1)
        transcriptions = []
        
        for chunk in chunks:
            # transcribe_small_media uses OpenAI Whisper API
            # to transcribe chunks under 25MB
            text = transcribe_small_media(chunk)
            if text:
                transcriptions.append(text)
        return ' '.join(transcriptions)
    except Exception as e:
        print(f"Error processing large file: {e}")
        return None
    finally:
        # Clean up all chunks in the finally block
        for chunk in chunks:
            cleanup_temp_files(chunk)
```

---

### The function works in several steps:

1. Initialize an empty `chunks` list outside the `try` block to ensure it's accessible in the `finally` block.
2. Use `split_media` to split the large media file into smaller chunks.
3. For each chunk, use `transcribe_small_media` (which wraps the OpenAI Whisper API) to get the text transcription.
4. Join all transcriptions into a single text.
5. Clean up any temporary files in the `finally` block — even if an error occurs.

> Initializing `chunks` outside the `try` block ensures that any created chunks can be cleaned up properly — even if the process fails midway.

---

## Cleanup Process Implementation

The cleanup process is handled by the `cleanup_temp_files` function:

```python
def cleanup_temp_files(file_path):
    """Clean up temporary files and directories"""
    try:
        if os.path.isfile(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            for root, dirs, files in os.walk(file_path, topdown=False):
                for name in files:
                    os.unlink(os.path.join(root, name))
                for name in dirs:
                    os.rmdir(os.path.join(root, name))
            os.rmdir(file_path)
    except Exception as e:
        print(f"Warning: Could not clean up {file_path}: {e}")
```

---

### Why is this cleanup process reliable?

- The `chunks` list is initialized before any operations.
- The `finally` block ensures cleanup happens regardless of success or failure.
- Each cleanup operation is wrapped in a `try-except` block.
- Both files and directories are handled systematically.
- Cleanup failures are logged — but they don't stop the cleanup of other files.

---

# Lesson Summary

In this lesson, we've learned how to implement a robust error handling and cleanup system for our media file transcription process.

### Key Takeaways:

- Building a comprehensive transcription function that gracefully handles errors.
- Implementing systematic cleanup of temporary files and directories.
- Using `try-except-finally` blocks to guarantee cleanup.
- Proper initialization of variables to ensure accessibility in cleanup blocks.
- Logging errors without disrupting the cleanup process.

---

> These practices are fundamental for creating reliable applications that efficiently manage system resources and provide clear feedback when issues occur.

---

As you move to the practice section, you'll have the opportunity to implement these concepts in real-world scenarios.



## Observe the Large Media Files Transcribing Process

In this practice, let's see our transcribing mechanism in action. However, there is a catch! The code simulates the transcription process using OpenAI's Whisper API while intentionally allowing errors to occur sporadically - which definitely happens in real life as well! Take some time to go through the code and reviewing how these errors are handled, and how we make sure that our temporary media chunks still get deleted eventually, even if these errors occur.

The goal is to provide a robust solution that efficiently handles these errors and ensures temporary files are cleaned up at the end of the transcription process.

Try it yourself! In the preview tab, select a longer media file, uncollapse the terminal tab in the IDE, and click "Transcribe". See the logs and how we execute our transcription steps with FFmpeg and Whisper API to make a sense of the process.


```python
import math
import os
import subprocess
import tempfile
import random

from openai import OpenAI

client = OpenAI()


def run_command_with_output(cmd, desc=None):
    """Run a command and stream its output in real-time"""
    if desc:
        print(f"\n{desc}")
    
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    for line in iter(process.stdout.readline, ''):
        print(line, end='')
    
    process.stdout.close()
    return_code = process.wait()

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)


def get_audio_duration(file_path):
    """Get the duration of an audio file using ffprobe"""
    cmd = [
        'ffprobe', 
        '-v', 'quiet',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        file_path
    ]
    try:
        output = subprocess.check_output(cmd)
        return float(output)
    except:
        return None


def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)
    
    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )
        
        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]
        
        run_command_with_output(
            cmd, 
            f"Extracting chunk {i + 1}/{num_chunks}"
        )
        chunks.append(temp_file.name)
    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks


def cleanup_temp_files(file_path):
    """Clean up temporary files and directories"""
    try:
        if os.path.isfile(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            for root, dirs, files in os.walk(file_path, topdown=False):
                for name in files:
                    os.unlink(os.path.join(root, name))
                for name in dirs:
                    os.rmdir(os.path.join(root, name))
            os.rmdir(file_path)
    except Exception as e:
        print(f"Warning: Could not clean up {file_path}: {e}")


def transcribe_small_media(file_path):
    """
    Transcribe an media file using OpenAI's Whisper API.
    Simulate occasional errors to mimic real-world API failures.
    """
    try:
        # Randomly throw an exception to simulate API failure
        if random.randint(1, 3) == 1: # 33% chance
            raise Exception("Simulated API failure")
        
        with open(file_path, 'rb') as media_file:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=media_file,
                timeout=60
            )
            return transcript.text
    except Exception as e:
        raise Exception(f"Transcription failed: {str(e)}")


def transcribe(file_path):
    """ Transcribe a large media file by splitting it into chunks """
    chunks = []
    try:
        chunks = split_media(file_path) # 20Mb chunks
        transcriptions = []
        
        for chunk_id, chunk in enumerate(chunks):
            try:
                print(f"Transcribing chunk {chunk_id + 1}/{len(chunks)} via Whisper API...")
                text = transcribe_small_media(chunk)
            except Exception as e:
                print(e)
                text = None
            if text:
                transcriptions.append(text)
        return ' '.join(transcriptions)
    except Exception as e:
        print(f"Error processing large file: {e}")
        return None
    finally:
        # Clean up all chunks in finally block
        for chunk in chunks:
            cleanup_temp_files(chunk)


```

## Improve Transcribing Mechanism for Shorter Media Files

In this practice, let's improve our transcribing mechanism to handle files more efficiently. Currently, even if we have a small file that doesn't require splitting, our code still processes it through FFmpeg. This is unnecessary overhead that we can optimize.

Your task is to modify the transcription process to handle single-chunk scenarios more efficiently. When the calculated number of chunks is 1, we should skip the FFmpeg splitting process entirely and proceed directly to transcription. This optimization will make our solution more efficient for smaller files while maintaining the chunk-based approach for larger ones.

Try it yourself! In the preview tab, select different sizes of media files, uncollapse the terminal tab in the IDE, and click "Transcribe". Observe how the process differs between small files (direct transcription) and larger files (split into chunks then transcribed).

Note: Make sure to handle the cleanup properly, not to remove the original file in case the media file doesn't require splitting with FFmpeg.

```python
import math
import os
import subprocess
import tempfile

from openai import OpenAI

client = OpenAI()


def run_command_with_output(cmd, desc=None):
    """Run a command and stream its output in real-time"""
    if desc:
        print(f"\n{desc}")
    
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    for line in iter(process.stdout.readline, ''):
        print(line, end='')
    
    process.stdout.close()
    return_code = process.wait()

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)


def get_audio_duration(file_path):
    """Get the duration of an audio file using ffprobe"""
    cmd = [
        'ffprobe', 
        '-v', 'quiet',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        file_path
    ]
    try:
        output = subprocess.check_output(cmd)
        return float(output)
    except:
        return None


def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)
    
    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )
        
        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]
        
        run_command_with_output(
            cmd, 
            f"Extracting chunk {i + 1}/{num_chunks}"
        )
        chunks.append(temp_file.name)
    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks


def cleanup_temp_files(file_path):
    """Clean up temporary files and directories"""
    try:
        if os.path.isfile(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            for root, dirs, files in os.walk(file_path, topdown=False):
                for name in files:
                    os.unlink(os.path.join(root, name))
                for name in dirs:
                    os.rmdir(os.path.join(root, name))
            os.rmdir(file_path)
    except Exception as e:
        print(f"Warning: Could not clean up {file_path}: {e}")


def transcribe_small_media(file_path):
    """Transcribe an media file using OpenAI's Whisper API"""
    try:
        with open(file_path, 'rb') as media_file:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=media_file,
                timeout=60
            )
            return transcript.text
    except Exception as e:
        raise Exception(f"Transcription failed: {str(e)}")


def transcribe(file_path):
    """ Transcribe a large media file by splitting it into chunks """
    chunks = []
    try:
        chunks = split_media(file_path) # 20Mb chunks
        transcriptions = []
        
        for chunk_id, chunk in enumerate(chunks):
            try:
                print(f"Transcribing chunk {chunk_id + 1}/{len(chunks)} via Whisper API...")
                text = transcribe_small_media(chunk)
            except Exception as e:
                print(e)
                text = None
            if text:
                transcriptions.append(text)
        return ' '.join(transcriptions)
    except Exception as e:
        print(f"Error processing file: {e}")
        return None
    finally:
        # Clean up all chunks in finally block
        for chunk in chunks:
            cleanup_temp_files(chunk)

```

Awesome challenge! To make the transcription process smarter for **shorter media files**, we’ll modify the logic to **skip FFmpeg splitting if only one chunk is needed** and just transcribe the original file directly. Here's the updated `split_media` and `transcribe` function with the optimization:

---

## ✅ Updated `split_media()` (to detect single-chunk scenarios)

```python
def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit, unless small enough to skip"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)

    # Optimization: If only one chunk is needed, skip splitting
    if num_chunks <= 1:
        print("Media file is small. Skipping splitting.")
        return [file_path]  # Return original path directly

    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )

        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]

        run_command_with_output(cmd, f"Extracting chunk {i + 1}/{num_chunks}")
        chunks.append(temp_file.name)

    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks
```

---

## ✅ Updated `transcribe()` (with smarter cleanup)

```python
def transcribe(file_path):
    """Transcribe a media file efficiently with error handling and cleanup"""
    chunks = []
    cleanup_needed = False  # Flag to avoid deleting original file

    try:
        chunks = split_media(file_path)
        cleanup_needed = not (len(chunks) == 1 and chunks[0] == file_path)

        transcriptions = []

        for chunk_id, chunk in enumerate(chunks):
            try:
                print(f"Transcribing chunk {chunk_id + 1}/{len(chunks)} via Whisper API...")
                text = transcribe_small_media(chunk)
            except Exception as e:
                print(e)
                text = None
            if text:
                transcriptions.append(text)

        return ' '.join(transcriptions)

    except Exception as e:
        print(f"Error processing file: {e}")
        return None

    finally:
        if cleanup_needed:
            for chunk in chunks:
                cleanup_temp_files(chunk)
```

---

## 🔍 What Changed?

- **In `split_media()`**:  
  - If only 1 chunk is needed, we skip FFmpeg and return the original file path directly.

- **In `transcribe()`**:  
  - We added a flag (`cleanup_needed`) to track whether the chunk was created by us.
  - If it's the original file, we don’t delete it!

---

## 🧪 Ready to Test?

In the IDE:

- Try a **small file** — should skip FFmpeg.
- Try a **large file** — should split and process in chunks.
- Check the **terminal** to see the adaptive behavior.
- Verify no original files are deleted in short media cases.

Let me know if you'd like me to also print a summary at the end with word count or estimated duration of speech!

## Implement Error-Resistant Temporary Chunks Cleanup

Now, let's see how you've learned the safe cleanup process. The given code does everything correctly but doesn't clean up temporary chunk files created during the transcribing process; they are left on the disk and will eventually take the whole storage. Implement a mechanism to clean up these temporary chunks, no matter whether API errors occur or not - files should always be cleaned after processing the media file.

Note: While you can't directly see these temporary chunks in the application logs or on the preview tab, they are stored on the filesystem in the /tmp directory. You can access them in the terminal by opening a new tab and doing something like ls /tmp.

```python
import math
import os
import subprocess
import tempfile

from openai import OpenAI

client = OpenAI()


def run_command_with_output(cmd, desc=None):
    """Run a command and stream its output in real-time"""
    if desc:
        print(f"\n{desc}")
    
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    for line in iter(process.stdout.readline, ''):
        print(line, end='')
    
    process.stdout.close()
    return_code = process.wait()

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)


def get_audio_duration(file_path):
    """Get the duration of an audio file using ffprobe"""
    cmd = [
        'ffprobe', 
        '-v', 'quiet',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        file_path
    ]
    try:
        output = subprocess.check_output(cmd)
        return float(output)
    except:
        return None


def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)

    # If file is small enough, return it as-is
    if num_chunks == 1:
        return [file_path]
    
    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )
        
        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]
        
        run_command_with_output(
            cmd, 
            f"Extracting chunk {i + 1}/{num_chunks}"
        )
        chunks.append(temp_file.name)
    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks


def transcribe_small_media(file_path):
    """Transcribe an media file using OpenAI's Whisper API"""
    try:
        with open(file_path, 'rb') as media_file:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=media_file,
                timeout=60
            )
            return transcript.text
    except Exception as e:
        raise Exception(f"Transcription failed: {str(e)}")


def transcribe(file_path):
    """ Transcribe a large media file by splitting it into chunks """
    chunks = []
    try:
        chunks = split_media(file_path) # 20Mb chunks
        transcriptions = []
        
        for chunk_id, chunk in enumerate(chunks):
            try:
                print(f"Transcribing chunk {chunk_id + 1}/{len(chunks)} via Whisper API...")
                text = transcribe_small_media(chunk)
            except Exception as e:
                print(e)
                text = None
            if text:
                transcriptions.append(text)
        return ' '.join(transcriptions)
    except Exception as e:
        print(f"Error processing file: {e}")
        return None




```

To implement error-resistant cleanup of temporary chunks in the provided code, we need to ensure that the temporary chunk files are always removed, even if an error occurs during processing (e.g., API failures or issues in transcription). This can be achieved using a **`try...finally`** block to guarantee cleanup after the chunks are processed, whether the transcription succeeds or fails.

Here's how to modify the code:

1. **Ensure chunks are cleaned up** after the transcription process is complete, regardless of whether errors occur during transcription.
2. Use a `finally` block to delete the temporary chunk files after they have been processed.

Let's implement this:

```python
import math
import os
import subprocess
import tempfile
import shutil
from openai import OpenAI

client = OpenAI()

def run_command_with_output(cmd, desc=None):
    """Run a command and stream its output in real-time"""
    if desc:
        print(f"\n{desc}")
    
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    for line in iter(process.stdout.readline, ''):
        print(line, end='')
    
    process.stdout.close()
    return_code = process.wait()

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)


def get_audio_duration(file_path):
    """Get the duration of an audio file using ffprobe"""
    cmd = [
        'ffprobe', 
        '-v', 'quiet',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        file_path
    ]
    try:
        output = subprocess.check_output(cmd)
        return float(output)
    except:
        return None


def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)

    # If file is small enough, return it as-is
    if num_chunks == 1:
        return [file_path]
    
    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )
        
        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]
        
        run_command_with_output(
            cmd, 
            f"Extracting chunk {i + 1}/{num_chunks}"
        )
        chunks.append(temp_file.name)
    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks


def transcribe_small_media(file_path):
    """Transcribe an media file using OpenAI's Whisper API"""
    try:
        with open(file_path, 'rb') as media_file:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=media_file,
                timeout=60
            )
            return transcript.text
    except Exception as e:
        raise Exception(f"Transcription failed: {str(e)}")


def transcribe(file_path):
    """ Transcribe a large media file by splitting it into chunks """
    chunks = []
    transcriptions = []
    
    try:
        chunks = split_media(file_path)  # 20Mb chunks
        
        for chunk_id, chunk in enumerate(chunks):
            try:
                print(f"Transcribing chunk {chunk_id + 1}/{len(chunks)} via Whisper API...")
                text = transcribe_small_media(chunk)
                if text:
                    transcriptions.append(text)
            except Exception as e:
                print(f"Error in transcribing chunk {chunk_id + 1}: {e}")
                
        return ' '.join(transcriptions)
    
    except Exception as e:
        print(f"Error processing file: {e}")
        return None
    
    finally:
        # Cleanup: Delete temporary chunk files
        for chunk in chunks:
            try:
                os.remove(chunk)
                print(f"Deleted temporary chunk file: {chunk}")
            except Exception as cleanup_error:
                print(f"Error deleting temporary chunk file {chunk}: {cleanup_error}")
```

### Explanation of Changes:

1. **`finally` block for cleanup**:
   - The `finally` block ensures that the chunk files are always cleaned up, even if there's an exception during the transcription process.
   - It attempts to delete each chunk file by calling `os.remove(chunk)` and prints a message confirming the deletion. If the deletion fails (e.g., due to permissions or file being in use), it catches and prints the cleanup error.

2. **Error handling during transcription**:
   - If an error occurs while transcribing a chunk, it catches the exception, prints the error message, and continues with the next chunk.
   - Even if transcription for one or more chunks fails, the remaining chunks will still be processed, and the cleanup will still occur.

This modification ensures that temporary chunk files are always cleaned up, preventing them from filling up the disk storage, regardless of whether transcription was successful for all chunks.

In this final practice, let's practice how you mastered today's lesson. You have all the pieces provided as helper methods - splitting the media file into chunks, transcribing a single chunk with Whisper API, and even cleaning up these chunks produced during splitting. The last piece remains non-implemented - the transcribe method itself that will use all these helper methods to process a potentially large media file.

Let's go! Onward and upward!


```python
import math
import os
import subprocess
import tempfile

from openai import OpenAI

client = OpenAI()


def run_command_with_output(cmd, desc=None):
    """Run a command and stream its output in real-time"""
    if desc:
        print(f"\n{desc}")
    
    process = subprocess.Popen(
        cmd,
        stdout=subprocess.PIPE,
        stderr=subprocess.STDOUT,
        universal_newlines=True
    )
    
    for line in iter(process.stdout.readline, ''):
        print(line, end='')
    
    process.stdout.close()
    return_code = process.wait()

    if return_code != 0:
        raise subprocess.CalledProcessError(return_code, cmd)


def get_audio_duration(file_path):
    """Get the duration of an audio file using ffprobe"""
    cmd = [
        'ffprobe', 
        '-v', 'quiet',
        '-show_entries', 'format=duration',
        '-of', 'default=noprint_wrappers=1:nokey=1',
        file_path
    ]
    try:
        output = subprocess.check_output(cmd)
        return float(output)
    except:
        return None


def split_media(file_path, chunk_size_mb=20):
    """Split media file into chunks smaller than the API limit"""
    duration = get_audio_duration(file_path)
    
    if not duration:
        raise Exception("Could not determine media duration")
    
    file_size = os.path.getsize(file_path)
    chunk_duration = duration * (chunk_size_mb * 1024 * 1024) / file_size
    num_chunks = math.ceil(duration / chunk_duration)
    
    # If file is small enough, return it as-is
    if num_chunks == 1:
        return [file_path]
    
    chunks = []
    for i in range(num_chunks):
        start_time = i * chunk_duration
        temp_file = tempfile.NamedTemporaryFile(
            delete=False,
            suffix=os.path.splitext(file_path)[1]
        )
        
        cmd = [
            'ffmpeg',
            '-i', file_path,
            '-ss', str(start_time),
            '-t', str(chunk_duration),
            '-c', 'copy',
            '-y',
            temp_file.name
        ]
        
        run_command_with_output(
            cmd, 
            f"Extracting chunk {i + 1}/{num_chunks}"
        )
        chunks.append(temp_file.name)
    print(f"Split media into {len(chunks)} chunk(s): {chunks}")
    return chunks


def cleanup_temp_files(file_path, is_original=False):
    """Clean up temporary files and directories"""
    if is_original:
        return
        
    try:
        if os.path.isfile(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            for root, dirs, files in os.walk(file_path, topdown=False):
                for name in files:
                    os.unlink(os.path.join(root, name))
                for name in dirs:
                    os.rmdir(os.path.join(root, name))
            os.rmdir(file_path)
    except Exception as e:
        print(f"Warning: Could not clean up {file_path}: {e}")


def transcribe_small_media(file_path):
    """Transcribe an media file using OpenAI's Whisper API"""
    try:
        with open(file_path, 'rb') as media_file:
            transcript = client.audio.transcriptions.create(
                model="whisper-1",
                file=media_file,
                timeout=60
            )
            return transcript.text
    except Exception as e:
        raise Exception(f"Transcription failed: {str(e)}")


def transcribe(file_path):
    # TODO: Implement transcribe method for a potentially large media file
    return None

```