# Fan On/Off Detection using CLAP

Let's put some of the things we learned to practice! In this notebook we are going to use CLAP zero-shot classification to measure when a fan is on and off. Let's get into it!

In [None]:
import librosa
import numpy as np
import matplotlib.pyplot as plt
from transformers import pipeline
from IPython.display import Audio

# Only needed on the Udacity workspace. Comment this out if running on another system.
import os
os.environ['HF_HOME'] = '/voc/data/huggingface'
os.environ['OLLAMA_MODELS'] = '/voc/data/ollama/cache'
os.environ['HF_HUB_OFFLINE'] = '1'
os.environ['PATH'] = f"/voc/data/ollama/bin:/voc/data/ffmpeg/bin:{os.environ.get('PATH', '')}"
os.environ['LD_LIBRARY_PATH'] = f"/voc/data/ollama/lib:/voc/data/ffmpeg/lib:{os.environ.get('LD_LIBRARY_PATH', '')}"

## Load Audio File

In [None]:
# Load the MP3 file
audio_path = "fan.mp3"

# TODO: load the audio file with librosa. Use a sample rate of 48000 Hz
# HINT: use librosa.load, and set sr=48000
audio, sr = ... # complete

print(f"Audio duration: {len(audio)/sr:.2f} seconds")
print(f"Sample rate: {sr} Hz")
Audio(
    data=audio,
    rate=sr
)

## Setup CLAP Pipeline

In [None]:
# Initialize zero-shot audio classification pipeline
# TODO: setup the zero-shot-audio-classification pipeline.
# Use the "laion/larger_clap_general" model
# HINT: use pipeline(). Provide the pipeline kind (zero-shot-audio-classification)
# and the model (laion/larger_clap_general)
pipe = ... # complete

# TODO: Define candidate labels for the states of the fan (on and off)
# You can use whatever you want, but in general it's better to be specific
# and short.
# You need one label for "silence or background noise" and one for "the sound of a fan"
candidate_labels = ... #complete

# TODO: whatever you chose as your label for the ON state in the previous line, set it here
# (for example, "the sound of a fan")
POSITIVE_LABEL = ... #complete

## Split Audio into Chunks

Here we divide the audio in overlapping chunks of 1 second:

In [None]:
# Parameters
chunk_duration = 1.0  # seconds
overlap = 0.25  # 50% overlap

# Convert to samples
chunk_samples = int(chunk_duration * sr)
hop_samples = int(chunk_samples * (1 - overlap))

# Create chunks and keep track of their start times
chunks = []
timestamps = []

for start in range(0, len(audio) - chunk_samples + 1, hop_samples):
    end = start + chunk_samples
    chunk = audio[start:end]
    timestamp = start / sr
    
    chunks.append(chunk)
    timestamps.append(timestamp)

print(f"Created {len(chunks)} chunks of {chunk_duration}s each")

let's listen to one to make sure we did not do anything wrong:

In [None]:
Audio(
    data=chunks[2],
    rate=sr
)

## Classify Each Chunk

Now we classify each chunk to understand whether the fan is on or off:

In [None]:
# Classify each chunk
from tqdm import tqdm


fan_scores = []

for i, chunk in tqdm(enumerate(chunks), total=len(chunks)):

    # TODO: use the pipeline we defined above to classify the chunk
    # HINT: provide the chunk, and then set candidate_labels to (well)
    # candidate_labels
    result = ... #complete

    # From the result we extract the probability of the POSITIVE_LABEL
    prob_on = [x["score"] for x in result if x["label"] == POSITIVE_LABEL][0]

    # Let's append it to our lists
    fan_scores.append(prob_on)

print("Classification complete")

## Detect State Changes

Here we just detect when there are transitions ON/OFF or OFF/ON, and when they are. The idea is simple: whenever the probability for the positive class goes below 0.5, it means that the fan is OFF. On the opposite, every time the probability for the positive class goes from below to above 0.5, the fan has turned ON.

In [None]:
# Convert fan scores to binary states (fan on/off)
threshold = 0.5
fan_states = [score > threshold for score in fan_scores]

# Find state change points
state_changes = []
for i in range(1, len(fan_states)):
    if fan_states[i] != fan_states[i-1]:
        timestamp = timestamps[i]
        new_state = "ON" if fan_states[i] else "OFF"
        state_changes.append((timestamp, new_state))

print("\nDetected state changes:")
for timestamp, state in state_changes:
    print(f"Time: {timestamp:.2f}s - Fan: {state}")

## Visualize Results

Let's put everything together in a nice plot:

In [None]:
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 10), sharex=True)

# Waveform
time_axis = np.linspace(0, len(audio)/sr, len(audio))
ax1.plot(time_axis, audio)
ax1.set_ylabel('Amplitude')
ax1.set_title('Audio Waveform')

# Fan scores
ax2.plot(timestamps, fan_scores, 'b-', linewidth=2)
ax2.axhline(y=threshold, color='r', linestyle='--', label=f'Threshold ({threshold})')
ax2.set_ylabel('Fan Score')
ax2.set_title('Fan Detection Score')
ax2.legend()

# Binary states
ax3.plot(timestamps, fan_states, 'g-', linewidth=2)
ax3.set_ylabel('Fan State')
ax3.set_title('Fan On/Off State')
ax3.set_ylim(-0.1, 1.1)

# Mel spectrogram
mel_spec = librosa.feature.melspectrogram(y=audio, sr=sr, n_mels=128, fmax=8000)
mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)
# Set extent to match the time axis of other plots
extent = [0, len(audio)/sr, 0, 128]
img = ax4.imshow(mel_spec_db, aspect='auto', origin='lower', extent=extent, cmap='viridis')
ax4.set_ylabel('Mel Frequency Bins')
ax4.set_xlabel('Time (seconds)')
ax4.set_title('Mel Spectrogram')

# Mark state changes
for timestamp, state in state_changes:
    color = 'green' if state == 'ON' else 'red'
    for ax in [ax1, ax2, ax3, ax4]:
        ax.axvline(x=timestamp, color=color, linestyle=':', alpha=0.7)

plt.tight_layout()
plt.show()

## Summary

In [None]:
# Print summary
total_on_time = sum(fan_states) * chunk_duration * (1 - overlap)
total_duration = len(audio) / sr

print(f"\nSummary:")
print(f"Total audio duration: {total_duration:.2f} seconds")
print(f"Estimated fan on time: {total_on_time:.2f} seconds")
print(f"Fan duty cycle: {total_on_time/total_duration*100:.1f}%")
print(f"Number of state changes: {len(state_changes)}")

Obviously this is not the only way to achieve this result, and also probably not the most efficient. However, by exploiting the zero-shot capabilities of CLAP we were able to do this very quickly and without any training or optimization! 

Note however that in a real-world scenario we would need to test this solution thoroughly before deploying it in the field.