# Week 04: User Input and Audio

This notebook covers the key concepts and practical applications for Week 4, including:
- Streamlit user input handling (chat inputs, text inputs, buttons)
- Audio recording and playback
- Text-to-speech (TTS) and voice cloning
- Real-time audio processing with PyAudio
- Interactive voice applications

## Learning Objectives
By the end of this notebook, you will be able to:
1. Create interactive Streamlit applications with various user input methods
2. Record and play audio in web applications
3. Implement text-to-speech functionality
4. Work with real-time audio streams
5. Build voice-enabled chatbots and interactive applications

## 1. Environment Setup and Dependencies

First, let's install the required libraries for this week's examples. Note that some examples require external services or specific hardware configurations.

In [None]:
# Install required packages
!pip install streamlit pandas numpy streamlit-audiorec torch pyaudio matplotlib

# For TTS functionality (optional, requires external API)
# !pip install coqui-tts requests

In [None]:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import uuid
import time
from pathlib import Path

# Audio-related imports (may require additional setup)
try:
    import pyaudio
    PYAUDIO_AVAILABLE = True
except ImportError:
    print("PyAudio not available. Audio examples will be limited.")
    PYAUDIO_AVAILABLE = False

try:
    from st_audiorec import st_audiorec
    AUDIOREC_AVAILABLE = True
except ImportError:
    print("streamlit-audiorec not available. Recording examples will be limited.")
    AUDIOREC_AVAILABLE = False

print("Environment setup complete!")

## 2. Streamlit User Input Fundamentals

Streamlit provides various ways to capture user input. Let's explore the most common methods used in interactive applications.

### 2.1 Basic Chat Input

The simplest form of user input in Streamlit is the chat input widget. Here's how it works:

In [None]:
# Example 1: Basic User Input (from 1_user_input.py)
'''
import streamlit as st

if prompt := st.chat_input():
    st.chat_message("user").write(prompt)
'''

print("This example shows the simplest chat input.")
print("When run in Streamlit, it displays an input field at the bottom of the page.")
print("When the user types and presses Enter, the message is displayed as a chat bubble.")

### 2.2 Chat Input with Message History

For more sophisticated applications, we want to maintain conversation history:

In [None]:
# Example 2: User Input with History (from 2_user_input_with_history.py)
'''
import streamlit as st

if "messages" not in st.session_state:
    st.session_state["messages"] = [{"role": "assistant", "content": "How can I help you?"}]

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input():
    st.session_state.messages.append({"role": "user", "content": prompt})
    st.chat_message("user").write(prompt)
'''

print("This example demonstrates:")
print("1. Using st.session_state to maintain conversation history")
print("2. Displaying all previous messages when the page reloads")
print("3. Adding new messages to the conversation history")
print("4. Using message roles ('user' and 'assistant') for different styling")

### 2.3 Other Input Methods

Streamlit offers many other input widgets for different use cases:

In [None]:
# Demonstration of various Streamlit input methods
print("Common Streamlit input widgets:")
print("")

# Text inputs
print("1. Text Inputs:")
print('   st.text_input("Enter text")  # Single line text')
print('   st.text_area("Enter text")   # Multi-line text')
print('   st.chat_input()             # Chat-style input')
print("")

# Buttons and selections
print("2. Buttons and Selections:")
print('   st.button("Click me")       # Basic button')
print('   st.selectbox("Choose", options)  # Dropdown')
print('   st.multiselect("Choose multiple", options)  # Multi-select')
print('   st.radio("Pick one", options)  # Radio buttons')
print('   st.checkbox("Check me")     # Checkbox')
print("")

# Numeric inputs
print("3. Numeric Inputs:")
print('   st.number_input("Number")   # Number input')
print('   st.slider("Value", 0, 100)  # Slider')
print("")

# File and media inputs
print("4. File and Media Inputs:")
print('   st.file_uploader("Upload")  # File upload')
print('   st.camera_input("Photo")    # Camera input')
print('   st_audiorec()              # Audio recording (requires streamlit-audiorec)')

## 3. Audio Recording and Playback

Working with audio in web applications involves recording, processing, and playing back audio data.

### 3.1 Audio Recording with Streamlit

The `st_audiorec` component allows users to record audio directly in the browser:

In [None]:
# Audio recording demonstration
print("Audio Recording with streamlit-audiorec:")
print("")

if AUDIOREC_AVAILABLE:
    print("✓ streamlit-audiorec is available")
    print("")
    print("Basic usage:")
    print('from st_audiorec import st_audiorec')
    print('recording = st_audiorec()')
    print('if recording:')
    print('    # recording contains the audio data as bytes')
    print('    with open("recorded_audio.wav", "wb") as f:')
    print('        f.write(recording)')
else:
    print("⚠ streamlit-audiorec not available")
    print("Install with: pip install streamlit-audiorec")

print("")
print("Features of st_audiorec:")
print("- Browser-based recording (no external software needed)")
print("- Returns audio data as bytes")
print("- Supports various audio formats")
print("- Works on most modern browsers")

### 3.2 Audio Playback in Streamlit

Streamlit provides built-in audio playback capabilities:

In [None]:
# Audio playback demonstration
print("Audio Playback in Streamlit:")
print("")
print("Basic usage:")
print('st.audio(audio_file, format="audio/wav")')
print("")
print("Supported inputs:")
print("- File path (string): st.audio('path/to/audio.wav')")
print("- Bytes data: st.audio(audio_bytes)")
print("- NumPy array: st.audio(audio_array, sample_rate=44100)")
print("")
print("Supported formats:")
print("- WAV: format='audio/wav'")
print("- MP3: format='audio/mp3'")
print("- OGG: format='audio/ogg'")
print("")

# Create a simple sine wave for demonstration
sample_rate = 44100
duration = 2  # seconds
frequency = 440  # A4 note

t = np.linspace(0, duration, int(sample_rate * duration), False)
sine_wave = 0.3 * np.sin(2 * np.pi * frequency * t)

print(f"Generated a {duration}s sine wave at {frequency}Hz")
print(f"Audio array shape: {sine_wave.shape}")
print(f"Sample rate: {sample_rate}Hz")
print("")
print("In Streamlit, you would play this with:")
print(f"st.audio(sine_wave, sample_rate={sample_rate})")

## 4. Text-to-Speech and Voice Cloning

The repository includes examples of text-to-speech (TTS) functionality and voice cloning using external APIs.

### 4.1 Basic Text-to-Speech

The `tts.py` example shows how to convert text to speech using an external API:

In [None]:
# Basic TTS example (from week05/tts.py)
print("Basic Text-to-Speech Example:")
print("")
print("Code structure:")
print('import streamlit as st')
print('import uuid')
print('import requests')
print("")
print('if text := st.text_input("Enter text to convert to speech", "Hello, how are you?"):')
print('    tmp_file = f"samples/tmp{uuid.uuid1()}.wav"')
print('    response = requests.post(')
print('        "http://localhost:8000/tts",')
print('        params={"text": text},')
print('        stream=True,')
print('    )')
print('    ')
print('    with open(tmp_file, "wb") as f:')
print('        for chunk in response.iter_content(chunk_size=1024):')
print('            f.write(chunk)')
print('    ')
print('    st.audio(tmp_file, format="audio/wav")')
print("")
print("Key concepts:")
print("1. User enters text via st.text_input()")
print("2. Text is sent to TTS API endpoint")
print("3. Audio response is saved to temporary file")
print("4. Audio is played back using st.audio()")
print("5. UUID ensures unique filenames for concurrent users")

### 4.2 Voice Cloning with Audio Input

The `tts_wav.py` example demonstrates voice cloning by combining text input with voice recording:

In [None]:
# Voice cloning example (from week04/tts_wav.py)
print("Voice Cloning Example:")
print("")
print("Code structure:")
print('import streamlit as st')
print('from st_audiorec import st_audiorec')
print('import torch')
print('import uuid')
print('import requests')
print("")
print('device = "cuda" if torch.cuda.is_available() else "cpu"')
print("")
print('"Record your voice to clone it"')
print('recording = st_audiorec()')
print("")
print('"Synthesize voice"')
print('if text := st.text_input("Enter text to convert to speech"):')
print('    options = {"text": text, "language": "en"}')
print('    ')
print('    if recording:')
print('        voice_file = f"samples/voice-{uuid.uuid1()}.wav"')
print('        with open(voice_file, "wb") as f:')
print('            f.write(recording)')
print('        options["speaker_wav"] = voice_file')
print('    ')
print('    response = requests.post(')
print('        "http://localhost:8000/generate_audio",')
print('        json=options,')
print('    )')
print('    ')
print('    st.audio(response.json().get("file_path"), format="audio/wav")')
print("")
print("Voice cloning workflow:")
print("1. User records their voice using st_audiorec()")
print("2. User enters text to be synthesized")
print("3. Both audio sample and text are sent to the API")
print("4. API clones the voice and generates speech")
print("5. Result is played back to the user")

## 5. Real-time Audio Processing with PyAudio

PyAudio enables real-time audio processing, including recording, playback, and audio effects.

### 5.1 Audio Loopback Example

The simplest PyAudio example is an audio loopback that captures input and immediately plays it back:

In [None]:
# PyAudio loopback example (from week06/4_pyaudio_loopback.py)
print("PyAudio Loopback Example:")
print("")
if PYAUDIO_AVAILABLE:
    print("✓ PyAudio is available")
else:
    print("⚠ PyAudio not available")
    print("Install with: pip install pyaudio")
    print("Note: May require additional system dependencies")

print("")
print("Code structure:")
print('import pyaudio')
print("")
print('# Audio parameters')
print('CHUNK = 256        # Buffer size')
print('FORMAT = pyaudio.paInt16  # 16-bit audio')
print('CHANNELS = 1       # Mono audio')
print('RATE = 44100       # Sample rate (Hz)')
print("")
print('# Initialize PyAudio')
print('p = pyaudio.PyAudio()')
print("")
print('# Open input and output streams')
print('input_stream = p.open(format=FORMAT, channels=CHANNELS,')
print('                      rate=RATE, input=True,')
print('                      frames_per_buffer=CHUNK)')
print('output_stream = p.open(format=FORMAT, channels=CHANNELS,')
print('                       rate=RATE, output=True,')
print('                       frames_per_buffer=CHUNK)')
print("")
print('# Main loop')
print('while True:')
print('    data = input_stream.read(CHUNK)')
print('    output_stream.write(data)')
print("")
print("Key concepts:")
print("1. CHUNK size determines latency vs. CPU usage")
print("2. FORMAT specifies bit depth and encoding")
print("3. RATE is the sampling frequency")
print("4. Separate streams for input and output")
print("5. Real-time processing in main loop")

### 5.2 Understanding Audio Parameters

Let's explore the key parameters used in audio processing:

In [None]:
# Audio parameters explanation
print("Audio Parameters in Digital Audio:")
print("")

# Sample rate
print("1. Sample Rate (Hz):")
print("   - Determines audio quality and frequency range")
print("   - 44100 Hz: CD quality, captures up to ~22kHz")
print("   - 48000 Hz: Professional audio standard")
print("   - 16000 Hz: Speech applications (saves bandwidth)")
print("")

# Bit depth
print("2. Bit Depth (Format):")
print("   - paInt16: 16-bit integers (-32768 to 32767)")
print("   - paInt32: 32-bit integers (higher dynamic range)")
print("   - paFloat32: 32-bit floating point (-1.0 to 1.0)")
print("")

# Channels
print("3. Channels:")
print("   - 1: Mono audio")
print("   - 2: Stereo audio (left/right)")
print("   - More: Surround sound systems")
print("")

# Buffer size
print("4. Buffer Size (CHUNK):")
print("   - Smaller: Lower latency, higher CPU usage")
print("   - Larger: Higher latency, lower CPU usage")
print("   - Typical values: 128, 256, 512, 1024 samples")
print("")

# Calculate some examples
sample_rates = [16000, 44100, 48000]
chunk_sizes = [128, 256, 512]

print("Latency calculations (buffer_size / sample_rate):")
for rate in sample_rates:
    for chunk in chunk_sizes:
        latency_ms = (chunk / rate) * 1000
        print(f"   {chunk} samples @ {rate}Hz = {latency_ms:.1f}ms latency")
    print()

### 5.3 Real-time Waveform Visualization

The repository includes an advanced example that visualizes audio waveforms in real-time:

In [None]:
# Real-time waveform visualization (from week06/6_waveform.py)
print("Real-time Waveform Visualization:")
print("")
print("This example combines:")
print("1. PyAudio for real-time audio capture")
print("2. NumPy for audio data processing")
print("3. Matplotlib for real-time plotting")
print("4. AsyncIO for concurrent processing")
print("")
print("Key components:")
print("")
print("Audio callback function:")
print('def input_callback(in_data, frame_count, time_info, status):')
print('    audio_queue.put_nowait(in_data)')
print('    return (None, pyaudio.paContinue)')
print("")
print("Rolling buffer for waveform display:")
print('ROLLING_WINDOW = 4 * RATE  # 4 seconds of audio')
print('buffer = np.zeros(ROLLING_WINDOW, dtype=np.int16)')
print("")
print("Processing audio data:")
print('data = audio_queue.get_nowait()')
print('waveform = np.frombuffer(data, dtype=np.int16)')
print('buffer = np.roll(buffer, -len(waveform))')
print('buffer[-len(waveform):] = waveform')
print("")
print("Real-time plotting:")
print('def update_frame(frame):')
print('    line.set_ydata(buffer)')
print('    return line,')
print('anim = animation.FuncAnimation(fig, update_frame, interval=50)')
print("")
print("Technical concepts:")
print("- Stream callbacks for low-latency audio")
print("- Circular/rolling buffers for continuous data")
print("- Real-time plotting with matplotlib animation")
print("- Asynchronous processing with asyncio")

## 6. Practical Exercises and Activities

Here are some hands-on exercises to practice the concepts covered in this notebook.

### Exercise 1: Enhanced Chat Application

Create a Streamlit chat application with the following features:
- Message history persistence
- User name input
- Timestamp for each message
- Message export functionality

In [None]:
# Exercise 1 Template
print("Exercise 1: Enhanced Chat Application")
print("")
print("Create a file 'enhanced_chat.py' with the following structure:")
print("")
print('import streamlit as st')
print('import datetime')
print('import json')
print("")
print('st.title("Enhanced Chat Application")')
print("")
print('# TODO: Add user name input')
print('# username = st.text_input("Enter your name:", "Anonymous")')
print("")
print('# TODO: Initialize message history with timestamps')
print('# if "messages" not in st.session_state:')
print('#     st.session_state.messages = []')
print("")
print('# TODO: Display messages with timestamps and usernames')
print("")
print('# TODO: Add new messages with metadata')
print('# if prompt := st.chat_input():')
print('#     message = {')
print('#         "user": username,')
print('#         "content": prompt,')
print('#         "timestamp": datetime.datetime.now().isoformat()')
print('#     }')
print('#     st.session_state.messages.append(message)')
print("")
print('# TODO: Add export functionality')
print('# if st.button("Export Chat"):')
print('#     chat_json = json.dumps(st.session_state.messages, indent=2)')
print('#     st.download_button("Download Chat", chat_json, "chat_export.json")')
print("")
print("Run with: streamlit run enhanced_chat.py")

### Exercise 2: Audio Recorder and Analyzer

Build an application that records audio and displays basic analysis:

In [None]:
# Exercise 2 Template
print("Exercise 2: Audio Recorder and Analyzer")
print("")
print("Create a file 'audio_analyzer.py' that:")
print("1. Records audio using st_audiorec")
print("2. Converts audio to numpy array")
print("3. Displays waveform plot")
print("4. Shows basic statistics (duration, peak amplitude, etc.)")
print("")
print("Key functions to implement:")
print("")
print('def bytes_to_numpy(audio_bytes):')
print('    """Convert audio bytes to numpy array"""')
print('    # TODO: Use wave module or numpy.frombuffer')
print('    pass')
print("")
print('def plot_waveform(audio_array, sample_rate=44100):')
print('    """Plot audio waveform"""')
print('    # TODO: Create time axis and plot with matplotlib')
print('    pass')
print("")
print('def analyze_audio(audio_array, sample_rate=44100):')
print('    """Calculate audio statistics"""')
print('    # TODO: Calculate duration, peak, RMS, etc.')
print('    pass')
print("")
print("Bonus features:")
print("- Frequency spectrum analysis")
print("- Audio filtering (low-pass, high-pass)")
print("- Export processed audio")

### Exercise 3: Voice-Controlled Calculator

Combine speech recognition with text-to-speech for a voice-controlled calculator:

In [None]:
# Exercise 3 Template
print("Exercise 3: Voice-Controlled Calculator")
print("")
print("Create a calculator that:")
print("1. Accepts voice input for mathematical expressions")
print("2. Processes the speech to extract numbers and operations")
print("3. Calculates the result")
print("4. Speaks the result back to the user")
print("")
print("Required components:")
print("")
print('def speech_to_text(audio_bytes):')
print('    """Convert speech to text (requires speech recognition API)"""')
print('    # TODO: Implement using external service')
print('    pass')
print("")
print('def parse_math_expression(text):')
print('    """Extract mathematical expression from text"""')
print('    # TODO: Handle "two plus three", "5 times 7", etc.')
print('    pass')
print("")
print('def calculate(expression):')
print('    """Safely evaluate mathematical expression"""')
print('    # TODO: Use ast.literal_eval or similar safe method')
print('    pass')
print("")
print('def text_to_speech(text):')
print('    """Convert result to speech"""')
print('    # TODO: Use TTS API from earlier examples')
print('    pass')
print("")
print("Example workflow:")
print('User says: "What is five plus three?"')
print('System responds: "Five plus three equals eight"')
print("")
print("Challenges:")
print("- Handling different ways to express numbers")
print("- Robust parsing of mathematical operations")
print("- Error handling for invalid expressions")

## 7. Integration Patterns and Best Practices

When building applications that combine user input and audio, consider these patterns and practices.

### 7.1 State Management in Streamlit

Managing application state is crucial for interactive applications:

In [None]:
# State management best practices
print("Streamlit State Management Best Practices:")
print("")
print("1. Initialize state with default values:")
print('if "key" not in st.session_state:')
print('    st.session_state.key = default_value')
print("")
print("2. Use descriptive keys:")
print('st.session_state.chat_messages  # Good')
print('st.session_state.msgs          # Avoid')
print("")
print("3. Group related state:")
print('if "app_state" not in st.session_state:')
print('    st.session_state.app_state = {')
print('        "user_name": "",')
print('        "messages": [],')
print('        "settings": {}')
print('    }')
print("")
print("4. Reset state when needed:")
print('if st.button("Reset Chat"):')
print('    st.session_state.messages = []')
print('    st.rerun()  # Refresh the app')
print("")
print("5. Avoid storing large objects:")
print('# Store file paths, not file contents')
print('st.session_state.audio_file_path = "temp/audio.wav"')
print('# Not: st.session_state.audio_data = large_audio_array')

### 7.2 Error Handling and User Feedback

Robust error handling improves user experience:

In [None]:
# Error handling patterns
print("Error Handling and User Feedback Patterns:")
print("")
print("1. API call error handling:")
print('try:')
print('    response = requests.post(api_url, json=data, timeout=30)')
print('    response.raise_for_status()')
print('    return response.json()')
print('except requests.exceptions.Timeout:')
print('    st.error("Request timed out. Please try again.")')
print('except requests.exceptions.ConnectionError:')
print('    st.error("Cannot connect to service. Check your connection.")')
print('except requests.exceptions.HTTPError as e:')
print('    st.error(f"Service error: {e}")')
print("")
print("2. Audio processing error handling:")
print('try:')
print('    if not PYAUDIO_AVAILABLE:')
print('        st.warning("PyAudio not available. Install with: pip install pyaudio")')
print('        return')
print('    ')
print('    # Audio processing code')
print('except OSError as e:')
print('    st.error(f"Audio device error: {e}")')
print('    st.info("Check that your microphone is connected and working.")')
print("")
print("3. User input validation:")
print('if not text.strip():')
print('    st.warning("Please enter some text.")')
print('    return')
print('if len(text) > 1000:')
print('    st.error("Text too long. Maximum 1000 characters.")')
print('    return')
print("")
print("4. Progress indicators:")
print('with st.spinner("Processing audio..."):')
print('    result = process_audio(audio_data)')
print('st.success("Audio processed successfully!")')
print("")
print("5. Graceful degradation:")
print('if AUDIOREC_AVAILABLE:')
print('    recording = st_audiorec()')
print('else:')
print('    st.info("Audio recording not available.")')
print('    uploaded_file = st.file_uploader("Upload audio file", type=["wav", "mp3"])')

### 7.3 Performance Considerations

Audio applications can be resource-intensive. Here are optimization strategies:

In [None]:
# Performance optimization tips
print("Performance Optimization for Audio Applications:")
print("")
print("1. Cache expensive operations:")
print('@st.cache_data')
print('def load_audio_model():')
print('    # Load model once, cache for subsequent uses')
print('    return model')
print("")
print("2. Use appropriate audio parameters:")
print('# For speech: lower sample rate saves bandwidth')
print('SPEECH_RATE = 16000')
print('# For music: higher sample rate for quality')
print('MUSIC_RATE = 44100')
print("")
print("3. Limit audio duration:")
print('MAX_RECORDING_SECONDS = 30')
print('if audio_duration > MAX_RECORDING_SECONDS:')
print('    st.warning(f"Audio too long. Maximum {MAX_RECORDING_SECONDS} seconds.")')
print("")
print("4. Use streaming for large files:")
print('def stream_audio_to_api(audio_data):')
print('    with requests.post(url, stream=True) as response:')
print('        for chunk in response.iter_content(chunk_size=8192):')
print('            yield chunk')
print("")
print("5. Clean up temporary files:")
print('import tempfile')
print('import os')
print("")
print('with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:')
print('    tmp_file.write(audio_data)')
print('    tmp_path = tmp_file.name')
print("")
print('try:')
print('    # Process audio file')
print('    result = process_audio_file(tmp_path)')
print('finally:')
print('    os.unlink(tmp_path)  # Clean up')
print("")
print("6. Memory management for real-time audio:")
print('# Use circular buffers instead of growing lists')
print('from collections import deque')
print('audio_buffer = deque(maxlen=sample_rate * 10)  # 10 seconds')

## 8. Conclusion and Next Steps

This notebook covered the essential concepts for building interactive applications with user input and audio processing.

### Summary of Key Concepts

1. **User Input Methods**:
   - `st.chat_input()` for conversational interfaces
   - `st.text_input()`, `st.text_area()` for text entry
   - Various widgets for different input types
   - Session state management for persistent data

2. **Audio Recording and Playback**:
   - `st_audiorec()` for browser-based recording
   - `st.audio()` for playback in web apps
   - Audio format considerations and conversions

3. **Text-to-Speech and Voice Processing**:
   - Integration with external TTS APIs
   - Voice cloning with sample audio
   - Real-time voice synthesis

4. **Real-time Audio Processing**:
   - PyAudio for low-level audio operations
   - Stream callbacks and buffer management
   - Real-time visualization and analysis

5. **Best Practices**:
   - Error handling and user feedback
   - Performance optimization
   - State management patterns

### Next Steps and Advanced Topics

To continue developing your skills in user input and audio processing:

1. **Explore Advanced Audio Processing**:
   - Digital signal processing (DSP) techniques
   - Audio effects and filters
   - Fourier transforms for frequency analysis

2. **Machine Learning Integration**:
   - Speech recognition with models like Whisper
   - Audio classification and analysis
   - Real-time audio generation with AI models

3. **Production Deployment**:
   - Scaling audio applications
   - Cloud audio processing services
   - WebRTC for real-time communication

4. **Mobile and Cross-Platform**:
   - PWA (Progressive Web App) audio features
   - Mobile-specific audio considerations
   - Cross-browser compatibility

5. **Security and Privacy**:
   - Audio data encryption
   - User consent and privacy policies
   - Secure API communication

Keep experimenting with the code examples and try building your own audio-enabled applications!

### Additional Resources

- **Streamlit Documentation**: https://docs.streamlit.io/
- **PyAudio Documentation**: https://people.csail.mit.edu/hubert/pyaudio/
- **streamlit-audiorec**: https://github.com/stefanrmmr/streamlit-audio-recorder
- **Coqui TTS**: https://github.com/coqui-ai/TTS
- **Digital Signal Processing**: https://www.dspguide.com/
- **Web Audio API**: https://developer.mozilla.org/en-US/Web_Audio_API