# Algorithm: Chinese Speech Recognition and Waveform Visualization

## Input:
- AudioFile: Input audio file path
- APPID, APIKey, APISecret: API credentials
- Sample rate: 16000 Hz
- Frame size: 8000 bytes
- Interval: 0.04 seconds

## Output:
- Word timings with timestamps
- Visualized waveform with Pinyin annotations

## Main Procedure:

1. Initialize:
   - word_timings = empty list
   - Create WebSocket connection parameters
   - Generate authentication URL using HMAC-SHA256

2. ProcessAudioStream(AudioFile):
   status = FIRST_FRAME
   WHILE not end of file:
       buffer = read frameSize bytes from AudioFile
       IF buffer is empty:
           status = LAST_FRAME
       
       encoded_audio = base64_encode(buffer)
       
       IF status == FIRST_FRAME:
           send initial frame with config
           status = CONTINUE_FRAME
       ELSE IF status == CONTINUE_FRAME:
           send continuation frame
       ELSE:
           send final frame
           break
       
       wait(interval)

3. HandleServerResponse(message):
   IF message is valid:
       FOR each word in message.result:
           time = word.begin_time × 10
           word_timings.append((time, word.text))

4. VisualizeResults:
   - Load audio file as numpy array
   - Create time axis
   - Plot waveform
   FOR each (timestamp, word) in word_timings:
       - Convert word to Pinyin
       - Draw vertical line at timestamp
       - Annotate with Pinyin text

## Error Handling:
- WebSocket connection errors
- Invalid message format
- Audio file reading errors
- Authentication failures

## Constraints:
- Audio format: audio/L16
- Sample rate: 16000 Hz
- Language: Mandarin Chinese
- VAD end silence: 10000ms

# Algorithm: Audio Pause Detection and Analysis

## Input:
- audio_file: Input audio signal
- window_len: Smoothing window length (default: 1000)
- min_silence_len: Minimum silence duration (ms) (default: 500)
- debounce_time: Minimum time between pauses (ms) (default: 300)

## Output:
- List of pause segments (start_time, duration)
- Visualization of audio waveform with highlighted pauses

## Auxiliary Functions:

1. SmoothSignal(signal, window_len):
   window = ones(window_len) / window_len
   RETURN convolve(signal, window, 'same')

2. CalculateVarianceThreshold(samples, window_size):
   FOR i in range(len(samples)):
       start = max(0, i - window_size)
       end = min(len(samples), i + window_size)
       variance[i] = variance(samples[start:end])
   RETURN SmoothSignal(variance, window_size)

## Main Algorithm:

1. Initialize:
   samples = normalize(audio_to_array(audio_file))
   IF stereo:
       samples = convert_to_mono(samples)

2. Calculate Envelope:
   envelope = absolute_value(samples)
   smoothed_envelope = SmoothSignal(envelope)
   variance = CalculateVarianceThreshold(smoothed_envelope)
   smoothed_variance = SmoothSignal(variance)

3. Define Thresholds:
   amplitude_threshold = mean(smoothed_envelope) × 0.1
   variance_threshold = mean(smoothed_variance) × 0.1

4. Detect Silent Regions:
   silent_mask = (smoothed_envelope < amplitude_threshold) AND
                (smoothed_variance < variance_threshold)

5. Extract Pause Segments:
   silent_segments = []
   current_start = NULL
   last_end = 0
   
   FOR i, is_silent in enumerate(silent_mask):
       IF is_silent AND current_start is NULL:
           current_start = i
       ELSE IF not is_silent AND current_start exists:
           duration = (i - current_start) × 1000 / sample_rate
           IF duration ≥ min_silence_len:
               start_time = current_start × 1000 / sample_rate
               IF silent_segments not empty AND
                  start_time ≤ last_end + debounce_time:
                   Merge with previous segment
               ELSE:
                   Add new segment (start_time, duration)
           current_start = NULL

6. Visualization:
   Plot waveform
   FOR each pause in silent_segments:
       Draw vertical line at pause start
       Highlight pause duration region

## Complexity:
- Time: O(n), where n is the number of samples
- Space: O(n) for storing smoothed signals and results

## Constraints:
- Minimum silence length: 500ms
- Debounce time: 300ms
- Amplitude threshold: 10% of mean envelope
- Variance threshold: 10% of mean variance

# Algorithm: Dominant Frequency Analysis of Audio Signal

## Input:
- audio_file: Input audio signal
- sample_rate: Sampling frequency (Hz)
- window_size: Size of analysis window (default: 1024 samples)

## Output:
- Time series of dominant frequencies
- Visualization of frequency variation over time

## Main Algorithm:

1. Initialize:
   step_size = window_size/2
   frequencies = FFT_FREQUENCIES(window_size, sample_rate)
   dominant_frequencies = []

2. PreprocessAudio:
   data = READ_AUDIO(audio_file)
   IF data is stereo:
       data = CONVERT_TO_MONO(data)

3. DominantFrequencyAnalysis:
   FOR i = 0 to length(data) - window_size STEP step_size:
       # Extract and window the frame
       frame = data[i : i + window_size]
       windowed_frame = frame × HANNING_WINDOW(window_size)
       
       # Perform FFT
       magnitude_spectrum = |FFT(windowed_frame)|
       
       # Find peak frequency
       peak_index = ARGMAX(magnitude_spectrum)
       dominant_freq = frequencies[peak_index]
       
       dominant_frequencies.APPEND(dominant_freq)

4. TimeAxisGeneration:
   time_values = [i × (window_size/2)/sample_rate
                 for i in range(length(dominant_frequencies))]

5. Visualization:
   PLOT(time_values, dominant_frequencies)
   SET_XLABEL("Time (seconds)")
   SET_YLABEL("Frequency (Hz)")
   SET_TITLE("Dominant Frequency Over Time")

## Parameters:
- Window size: 1024 samples
- Step size: 512 samples (50% overlap)
- Window function: Hanning window

## Complexity:
- Time: O(N log N), where N is signal length
- Space: O(N)

## Mathematical Foundation:
1. Hanning Window:
   w(n) = 0.5(1 - cos(2πn/(N-1)))
   where N is window length

2. FFT Frequency Resolution:
   f = k × (sample_rate/window_size)
   where k is frequency bin index

3. Frequency Range:
   f_max = sample_rate/2 (Nyquist frequency)

# Algorithm: Speech Characteristics Analysis and Classification

## Input:
- audio_file: Input audio file
- word_timings: List of (timestamp, word) pairs
- detected_pauses: List of (start_time, duration) for pauses

## Output:
- List of categorized speech characteristics per word

## Auxiliary Functions:

1. FindPauseDuration(start, end, pauses):
    total_pause = 0
    FOR each (pause_start, pause_length) in pauses:
        pause_end = pause_start + pause_length
        IF pause_start ∈ [start, end):
            IF pause_end ≤ end:
                total_pause += pause_length
            ELSE:
                total_pause += end - pause_start
    RETURN total_pause

## Main Algorithm:

1. Initialize:
   analysis_results = []
   previous_end_time = 0

2. Feature Extraction:
   FOR i = 0 to length(word_timings) - 1:
       current_word = word_timings[i]
       next_word = word_timings[i + 1]
       
       # Time boundaries
       start_time = max(current_word.time, previous_end_time)
       end_time = next_word.time
       
       # Duration calculations
       word_duration = end_time - start_time
       pause_duration = FindPauseDuration(start_time, end_time, detected_pauses)
       actual_duration = word_duration - pause_duration
       
       # Spectral analysis
       audio_segment = ExtractAudioSegment(start_time, end_time)
       IF ValidSegment(audio_segment):
           samples = ConvertToArray(audio_segment)
           spectrum = FFT(samples)
           peak_frequency = FindPeakFrequency(spectrum)
           peak_amplitude = max(|samples|)
           
           analysis_results.APPEND({
               word: current_word.text,
               amplitude: peak_amplitude,
               duration: actual_duration,
               pause: pause_duration,
               frequency: peak_frequency
           })
       
       previous_end_time = end_time

3. Threshold Calculation:
   amplitude_thresholds = PERCENTILE(amplitudes, [33, 66])
   duration_thresholds = PERCENTILE(durations, [33, 66])
   pause_threshold = PERCENTILE(pauses > 100ms, [50])
   frequency_thresholds = PERCENTILE(frequencies, [33, 66])

4. Classification:
   categorized_results = []
   FOR each entry in analysis_results:
       categorized_entry = {
           word: entry.word,
           amplitude_category: CLASSIFY_AMPLITUDE(entry.amplitude, amplitude_thresholds),
           duration_category: CLASSIFY_DURATION(entry.duration, duration_thresholds),
           pause_category: CLASSIFY_PAUSE(entry.pause, pause_threshold),
           frequency_category: CLASSIFY_FREQUENCY(entry.frequency, frequency_thresholds)
       }
       categorized_results.APPEND(categorized_entry)

## Classification Rules:
1. Amplitude Categories:
   - Low: < 33rd percentile
   - Medium: 33rd-66th percentile
   - High: > 66th percentile

2. Duration Categories:
   - Short: < 33rd percentile
   - Medium: 33rd-66th percentile
   - Long: > 66th percentile

3. Pause Categories:
   - None: < 100ms
   - Short: 100ms-median
   - Long: > median

4. Frequency Categories:
   - Low: < 33rd percentile
   - Medium: 33rd-66th percentile
   - High: > 66th percentile

## Complexity:
- Time: O(n log n), where n is number of words
- Space: O(n)

# Algorithm: Poetry Recitation Analysis System Using LLM

## Input:
- poem_content: Original poetry text
- student_results: Student's recitation characteristics
  {word, volume_level, duration_level, pause_level, pitch_level}
- api_key: Authentication key for LLM service

## Output:
- Expert analysis of poetry recitation techniques
- Personalized feedback on student's performance

## Data Structures:
1. ConversationHistory: Queue
   - Elements: {role: String, content: String}
2. Cache: LRU Cache
   - Key: Message tuple
   - Value: API response
   - Size: 128 entries

## Main Algorithm:

1. Initialize:
   conversation_history = EMPTY_QUEUE()
   cache = LRU_CACHE(size=128)
   llm_client = INITIALIZE_LLM_CLIENT(api_key)

2. GetPoetryAnalysis(poem_content):
    prompt = CONSTRUCT_PROMPT(
        template: "分析古诗朗读技巧：
                  - 声音重点
                  - 停顿位置
                  - 节奏控制
                  - 语调变化",
        content: poem_content
    )
    
    ADD_TO_HISTORY("user", prompt)
    response = GetCachedResponse(conversation_history)
    RETURN response

3. GetStudentFeedback(student_results):
    prompt = CONSTRUCT_PROMPT(
        template: "分析学生朗读表现：
                  - 声量分析 (高/中/低)
                  - 发声时长 (长/中/短)
                  - 停顿控制 (长/短/无)
                  - 音高变化 (高/中/低)",
        content: student_results
    )
    
    ADD_TO_HISTORY("user", prompt)
    response = GetCachedResponse(conversation_history)
    RETURN response

4. GetCachedResponse(messages):
    messages_tuple = CONVERT_TO_TUPLE(messages)
    IF messages_tuple IN cache:
        RETURN cache[messages_tuple]
    ELSE:
        response = llm_client.REQUEST(
            model="glm-4",
            messages=messages
        )
        cache[messages_tuple] = response
        RETURN response

## Auxiliary Functions:

1. AddToHistory(role, content):
    conversation_history.APPEND({
        role: role,
        content: content
    })

2. ConvertToTuple(messages):
    RETURN TUPLE(
        FOR each message IN messages:
            TUPLE(message.items())
    )

## System Features:
1. Caching Mechanism:
   - LRU cache implementation
   - Prevents redundant API calls
   - Optimizes response time

2. Analysis Components:
   - Poetry recitation techniques
   - Volume analysis
   - Timing and rhythm
   - Pause placement
   - Pitch variation

## Complexity:
- Time: O(1) for cached responses
- Time: O(n) for API calls, where n is response length
- Space: O(k) where k is cache size (128 entries)