this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

6k

alice 35b0242a 2e325c3a

+298 -13
+122 -5
CLAUDE.md
··· 8 8 9 9 ## Build Commands 10 10 11 + **IMPORTANT: Never run `./debug-macos.sh` or any build commands unless explicitly instructed by the user. The user will handle building and testing.** 12 + 11 13 ### macOS 12 14 ```bash 13 15 # Recommended: Use the debug script ··· 371 373 - 20 Hz start preserved for electronic music sub-bass 372 374 - Created test_cqt_spectrum_v2.lua with correct note display 373 375 374 - #### Phase 3: Next Steps 375 - 1. Add remaining API functions: `cqts()`, `cqto()`, `cqtos()` 376 - 2. Create separate audio buffer for CQT (restore FFT_SIZE to 1024) 377 - 3. Create FFT vs CQT comparison demo 378 - 4. Performance optimization if needed 376 + **NEW ISSUE - Excessive Frequency Spreading**: 377 + Testing with pure sine waves shows excessive frequency spreading: 378 + - 20 Hz sine wave: Energy spreads across ~6-7 bins 379 + - 50 Hz sine wave: Energy spreads across ~8-9 bins 380 + - 100 Hz sine wave: Energy spreads across ~4-5 bins 381 + 382 + ## Root Cause Analysis: 383 + The issue is **window truncation** in our current implementation: 384 + - At 20 Hz: Window should be 37,485 samples but is clamped to 4,096 (line 90 in cqt_kernel.c) 385 + - At 50 Hz: Window should be 14,994 samples but is clamped to 4,096 386 + - At 100 Hz: Window should be 7,497 samples but is clamped to 4,096 387 + 388 + This truncation reduces the effective Q factor: 389 + - At 20 Hz: Effective Q ≈ 1.86 instead of 17 390 + - At 50 Hz: Effective Q ≈ 4.64 instead of 17 391 + - At 100 Hz: Effective Q ≈ 9.29 instead of 17 392 + 393 + ## Comparison: Our Approach vs ESP32 Approach 394 + 395 + ### Our Current Approach (Constant Q): 396 + ```c 397 + windowLength = Q * sampleRate / centerFreq; // Q ≈ 17 398 + if (windowLength > fftSize) windowLength = fftSize; // TRUNCATION! 399 + ``` 400 + - **Pros**: Excellent frequency resolution at high frequencies 401 + - **Cons**: Severe truncation at low frequencies causing spreading 402 + 403 + ### ESP32 Approach (Variable Window): 404 + ```c 405 + windowLength = fftSize / (centerFreq / minFreq); // scales with 1/f 406 + ``` 407 + - **Pros**: All windows fit within FFT size, no truncation 408 + - **Cons**: Very low Q (≈1.86) giving ~9.56 semitone bandwidth 409 + 410 + ### Analysis of ESP32 Approach: 411 + - Constant effective Q ≈ 1.86 for all frequencies 412 + - Bandwidth: ~956 cents (9.56 semitones) - can't distinguish adjacent notes 413 + - At 20 Hz: 4096 samples (92.9ms) 414 + - At 20 kHz: 4 samples (0.1ms) - too short for analysis 415 + 416 + ## Alternative Approaches Considered: 417 + 418 + ### 1. Larger FFT Size (16K or 32K): 419 + - **16K FFT**: Handles windows down to ~45 Hz properly 420 + - **Computational cost**: ~5-10ms (still acceptable for 60 FPS) 421 + - **Memory**: ~2.4MB for 16K, ~4.9MB for 32K 422 + - **Pros**: Full constant-Q accuracy 423 + - **Cons**: Higher resource usage 424 + 425 + ### 2. Multi-Resolution CQT: 426 + - Use 16K FFT for lowest 2 octaves (20-80 Hz) 427 + - Use 4K FFT for everything else 428 + - **Pros**: Best accuracy at all frequencies 429 + - **Cons**: Complex implementation, multiple FFTs per frame 430 + 431 + ### 3. Adaptive Q Factor: 432 + - Reduce Q for low frequencies to fit within FFT size 433 + - **Pros**: Single FFT, reasonable accuracy 434 + - **Cons**: Variable frequency resolution 435 + 436 + ## SELECTED SOLUTION: Hybrid Approach 437 + 438 + Combine the best of both methods: 439 + - **Below 100 Hz**: Use ESP32-style window scaling 440 + - **Above 100 Hz**: Use constant-Q approach 441 + - **Transition**: Smooth crossfade between methods 442 + 443 + ### Implementation Plan: 444 + 445 + #### Step 1: Modify Window Length Calculation 446 + In `cqt_kernel.c`, function `generateSingleKernel()` around line 86: 447 + ```c 448 + // Current code to replace: 449 + float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 450 + int windowLength = (int)(Q * sampleRate / centerFreq); 451 + 452 + // New hybrid approach: 453 + float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 454 + int windowLength; 455 + 456 + if (centerFreq < 100.0f) { 457 + // ESP32-style for low frequencies 458 + float factor = centerFreq / minFreq; 459 + windowLength = (int)(fftSize / factor); 460 + } else { 461 + // Constant-Q for higher frequencies 462 + windowLength = (int)(Q * sampleRate / centerFreq); 463 + 464 + // Ensure it fits in FFT size with some margin 465 + if (windowLength > fftSize * 0.9) { 466 + windowLength = (int)(fftSize * 0.9); 467 + } 468 + } 469 + ``` 470 + 471 + #### Step 2: Adjust Normalization 472 + The normalization may need adjustment based on actual window length used. 473 + 474 + #### Step 3: Test and Validate 475 + - Test with pure tones at 20, 50, 100, 200, 440, 1000 Hz 476 + - Verify smooth transition at 100 Hz boundary 477 + - Check musical accuracy across spectrum 478 + 479 + ### Expected Results: 480 + - **20 Hz**: ~1-2 bin spread (using ESP32 method) 481 + - **100 Hz**: ~1-2 bin spread (transition point) 482 + - **440 Hz**: <1 bin spread (constant-Q) 483 + - **Electronic music**: Good sub-bass and treble accuracy 484 + 485 + ### Future Improvements: 486 + 1. Smooth transition zone (80-120 Hz) instead of hard cutoff 487 + 2. Configurable transition frequency 488 + 3. Optional multi-resolution mode for maximum accuracy 489 + 490 + ## Phase 3: Next Steps 491 + 1. Implement hybrid windowing approach (immediate priority) 492 + 2. Add remaining API functions: `cqts()`, `cqto()`, `cqtos()` 493 + 3. Create separate audio buffer for CQT (restore FFT_SIZE to 1024) 494 + 4. Create FFT vs CQT comparison demo 495 + 5. Performance optimization if needed 379 496 380 497 ### Test Script Example 381 498 ```lua
+44
analyze_esp32_q.py
··· 1 + #!/usr/bin/env python3 2 + # Analyze effective Q factor in ESP32 approach 3 + 4 + import math 5 + 6 + fs = 44100 # Sample rate 7 + N = 4096 # FFT size 8 + fmin = 20 # Minimum frequency 9 + 10 + print("ESP32 CQT Analysis") 11 + print("==================") 12 + print(f"FFT size: {N}, Sample rate: {fs} Hz, Min freq: {fmin} Hz") 13 + print() 14 + 15 + # Test frequencies 16 + freqs = [20, 27.5, 50, 100, 200, 440, 1000, 2000, 5000, 10000, 20000] 17 + 18 + print("Freq(Hz) | Window | Eff. Q | BW(Hz) | BW(semitones)") 19 + print("---------|--------|--------|--------|---------------") 20 + 21 + for f in freqs: 22 + # ESP32 window calculation: N_window = N / (f/fmin) 23 + factor = f / fmin 24 + window_length = int(N / factor) 25 + 26 + # Effective Q = window_length * f / fs 27 + eff_q = window_length * f / fs 28 + 29 + # Bandwidth = f / Q 30 + bandwidth = f / eff_q if eff_q > 0 else float('inf') 31 + 32 + # Bandwidth in semitones = 12 * log2(1 + bandwidth/f) 33 + bw_semitones = 12 * math.log2(1 + bandwidth/f) if bandwidth < float('inf') else float('inf') 34 + 35 + print(f"{f:7.1f} | {window_length:6d} | {eff_q:6.1f} | {bandwidth:6.1f} | {bw_semitones:13.1f}") 36 + 37 + print() 38 + print("Observations:") 39 + print("1. Q varies dramatically with frequency (1.9 to 93.0)") 40 + print("2. Low frequencies: Very low Q (wide bandwidth)") 41 + print("3. High frequencies: Very high Q (narrow bandwidth)") 42 + print() 43 + print("For 12 bins/octave, ideal constant Q ≈ 17") 44 + print("ESP32 only achieves this around 200-300 Hz")
+58
analyze_esp32_q_fixed.py
··· 1 + #!/usr/bin/env python3 2 + # Analyze effective Q factor in ESP32 approach - FIXED 3 + 4 + import math 5 + 6 + fs = 44100 # Sample rate 7 + N = 4096 # FFT size 8 + fmin = 20 # Minimum frequency 9 + 10 + print("ESP32 CQT Analysis (Corrected)") 11 + print("==============================") 12 + print(f"FFT size: {N}, Sample rate: {fs} Hz, Min freq: {fmin} Hz") 13 + print() 14 + 15 + # Test frequencies spanning 10 octaves 16 + freqs = [20, 27.5, 40, 55, 80, 110, 160, 220, 440, 880, 1760, 3520, 7040, 14080, 20000] 17 + 18 + print("Freq(Hz) | Factor | Window | Duration(ms) | Eff. Q | BW(Hz) | BW(cents)") 19 + print("---------|--------|--------|--------------|--------|--------|----------") 20 + 21 + for f in freqs: 22 + # ESP32 window calculation: N_window = N / (f/fmin) 23 + factor = f / fmin 24 + window_length = N / factor # Keep as float for accurate Q 25 + window_length_int = int(window_length) 26 + 27 + # Window duration in milliseconds 28 + duration_ms = window_length / fs * 1000 29 + 30 + # Effective Q = window_length * f / fs 31 + eff_q = window_length * f / fs 32 + 33 + # Bandwidth = f / Q 34 + bandwidth = f / eff_q 35 + 36 + # Bandwidth in cents (1 semitone = 100 cents) 37 + # cents = 1200 * log2(f2/f1) where f2 = f + bandwidth/2, f1 = f - bandwidth/2 38 + bw_cents = 1200 * math.log2((f + bandwidth/2) / (f - bandwidth/2)) 39 + 40 + print(f"{f:7.1f} | {factor:6.1f} | {window_length_int:6d} | {duration_ms:12.1f} | " 41 + f"{eff_q:6.1f} | {bandwidth:6.1f} | {bw_cents:8.0f}") 42 + 43 + print() 44 + print("Key Findings:") 45 + print("-------------") 46 + print("1. ESP32 method gives CONSTANT Q ≈ 1.86 for all frequencies!") 47 + print("2. This is about 9x lower than ideal Q ≈ 17 for 12 bins/octave") 48 + print("3. Bandwidth is constant at ~7.5 semitones (should be ~0.83 semitones)") 49 + print() 50 + print("Trade-offs:") 51 + print("- PRO: All windows fit in FFT size, simple implementation") 52 + print("- PRO: No truncation artifacts") 53 + print("- CON: Poor frequency resolution (9x worse than ideal)") 54 + print("- CON: Can't distinguish adjacent notes (bandwidth > 1 semitone)") 55 + print() 56 + print("For electronic music with 20Hz start:") 57 + print(f"- Window at 20Hz: {N/1:.0f} samples = {N/1/fs*1000:.1f}ms") 58 + print(f"- Window at 20kHz: {N/1000:.0f} samples = {N/1000/fs*1000:.1f}ms")
+46
fft_compute_analysis.py
··· 1 + #!/usr/bin/env python3 2 + # Analyze computational cost of different FFT sizes 3 + 4 + import math 5 + 6 + print("FFT Computational Cost Analysis") 7 + print("===============================") 8 + print() 9 + 10 + # FFT complexity is O(N log N) 11 + sizes = [4096, 8192, 16384, 32768, 65536] 12 + base_size = 4096 13 + 14 + print("FFT Size | Ops (N log N) | Relative Cost | Time @ 1GHz") 15 + print("---------|---------------|---------------|-------------") 16 + 17 + for N in sizes: 18 + ops = N * math.log2(N) 19 + relative = ops / (base_size * math.log2(base_size)) 20 + # Assume 10 clock cycles per complex multiply-add 21 + # Modern CPUs can do ~1 operation per clock with SIMD 22 + time_ms = (ops * 10) / (1e9) * 1000 # milliseconds at 1GHz 23 + 24 + print(f"{N:7d} | {ops:13.0f} | {relative:13.1f}x | {time_ms:10.2f}ms") 25 + 26 + print() 27 + print("Real-world estimates (with KISS FFT, no SIMD):") 28 + print("- 4096-point: ~1-2ms on modern CPU") 29 + print("- 16384-point: ~5-10ms") 30 + print("- 32768-point: ~12-24ms") 31 + print() 32 + print("For 60 FPS: 16.7ms per frame total") 33 + print("For 30 FPS: 33.3ms per frame total") 34 + print() 35 + 36 + # Memory usage 37 + print("Memory Requirements:") 38 + print("Size | Samples | Real FFT Output | Kernels (120 bins)") 39 + print("------|---------|-----------------|-------------------") 40 + for N in sizes: 41 + samples = N * 4 # float32 42 + fft_out = (N//2 + 1) * 8 # complex float32 43 + # Assume 30% sparsity for kernels 44 + kernels = 120 * (N//2 + 1) * 8 * 0.3 45 + total = (samples + fft_out + kernels) / 1024 46 + print(f"{N:5d} | {samples/1024:7.0f}K | {fft_out/1024:15.0f}K | {kernels/1024:17.0f}K (Total: {total:.0f}K)")
+1 -1
src/cqtdata.h
··· 4 4 #define CQT_BINS 120 5 5 #define CQT_OCTAVES 10 6 6 #define CQT_BINS_PER_OCTAVE 12 7 - #define CQT_FFT_SIZE 4096 // Larger FFT for better sub-bass resolution 7 + #define CQT_FFT_SIZE 6144 // 6K FFT - balance between quality and performance 8 8 9 9 // CQT frequency range 10 10 #define CQT_MIN_FREQ 20.0f // Sub-bass for electronic music
+25 -5
src/ext/cqt_kernel.c
··· 79 79 CqtWindowType windowType, 80 80 float sparsityThreshold) 81 81 { 82 - // Calculate window length based on Q factor 83 - // windowLength = Q * sampleRate / centerFreq 84 - // This gives us the proper frequency resolution 82 + // Hybrid approach: ESP32-style for low frequencies, constant-Q for higher 85 83 float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 86 - int windowLength = (int)(Q * sampleRate / centerFreq); 84 + int windowLength; 85 + 86 + if (centerFreq < 100.0f) { 87 + // With 6K FFT, we can use higher Q for better resolution 88 + // 20Hz: Q=2.8 gives ~6144 samples (full FFT) 89 + // 50Hz: Q=7 gives ~6174 samples (slightly truncated) 90 + // 100Hz: Q=14 gives ~6174 samples (slightly truncated) 91 + float targetQ = Q; // Start with ideal Q 92 + windowLength = (int)(targetQ * sampleRate / centerFreq); 93 + 94 + // If window doesn't fit, reduce Q to fit exactly 95 + if (windowLength > fftSize) { 96 + targetQ = (float)fftSize * centerFreq / sampleRate; 97 + windowLength = fftSize; 98 + } 99 + } else { 100 + // Constant-Q for higher frequencies 101 + windowLength = (int)(Q * sampleRate / centerFreq); 102 + 103 + // Ensure it fits in FFT size with some margin 104 + if (windowLength > fftSize * 0.9) { 105 + windowLength = (int)(fftSize * 0.9); 106 + } 107 + } 87 108 88 109 // Ensure window length is reasonable 89 110 if (windowLength < 32) windowLength = 32; // Minimum window size 90 - if (windowLength > fftSize) windowLength = fftSize; 91 111 92 112 // Allocate temporary arrays 93 113 float* timeKernel = (float*)calloc(fftSize, sizeof(float));
+2 -2
src/fftdata.h
··· 1 1 #pragma once 2 2 #include <stdbool.h> 3 - // TEMPORARY: Changed from 1024 to 2048 to support CQT's 4096-point FFT 3 + // TEMPORARY: Changed from 1024 to 3072 to support CQT's 6144-point FFT 4 4 // This breaks FFT bin resolution but enables CQT to work properly 5 5 // TODO: Restore to 1024 and implement separate buffer for CQT 6 - #define FFT_SIZE 2048 6 + #define FFT_SIZE 3072 7 7 extern float fPeakMinValue; 8 8 extern float fPeakSmoothing; 9 9 extern float fPeakSmoothValue;