···8899## Build Commands
10101111+**IMPORTANT: Never run `./debug-macos.sh` or any build commands unless explicitly instructed by the user. The user will handle building and testing.**
1212+1113### macOS
1214```bash
1315# Recommended: Use the debug script
···371373- 20 Hz start preserved for electronic music sub-bass
372374- Created test_cqt_spectrum_v2.lua with correct note display
373375374374-#### Phase 3: Next Steps
375375-1. Add remaining API functions: `cqts()`, `cqto()`, `cqtos()`
376376-2. Create separate audio buffer for CQT (restore FFT_SIZE to 1024)
377377-3. Create FFT vs CQT comparison demo
378378-4. Performance optimization if needed
376376+**NEW ISSUE - Excessive Frequency Spreading**:
377377+Testing with pure sine waves shows excessive frequency spreading:
378378+- 20 Hz sine wave: Energy spreads across ~6-7 bins
379379+- 50 Hz sine wave: Energy spreads across ~8-9 bins
380380+- 100 Hz sine wave: Energy spreads across ~4-5 bins
381381+382382+## Root Cause Analysis:
383383+The issue is **window truncation** in our current implementation:
384384+- At 20 Hz: Window should be 37,485 samples but is clamped to 4,096 (line 90 in cqt_kernel.c)
385385+- At 50 Hz: Window should be 14,994 samples but is clamped to 4,096
386386+- At 100 Hz: Window should be 7,497 samples but is clamped to 4,096
387387+388388+This truncation reduces the effective Q factor:
389389+- At 20 Hz: Effective Q ≈ 1.86 instead of 17
390390+- At 50 Hz: Effective Q ≈ 4.64 instead of 17
391391+- At 100 Hz: Effective Q ≈ 9.29 instead of 17
392392+393393+## Comparison: Our Approach vs ESP32 Approach
394394+395395+### Our Current Approach (Constant Q):
396396+```c
397397+windowLength = Q * sampleRate / centerFreq; // Q ≈ 17
398398+if (windowLength > fftSize) windowLength = fftSize; // TRUNCATION!
399399+```
400400+- **Pros**: Excellent frequency resolution at high frequencies
401401+- **Cons**: Severe truncation at low frequencies causing spreading
402402+403403+### ESP32 Approach (Variable Window):
404404+```c
405405+windowLength = fftSize / (centerFreq / minFreq); // scales with 1/f
406406+```
407407+- **Pros**: All windows fit within FFT size, no truncation
408408+- **Cons**: Very low Q (≈1.86) giving ~9.56 semitone bandwidth
409409+410410+### Analysis of ESP32 Approach:
411411+- Constant effective Q ≈ 1.86 for all frequencies
412412+- Bandwidth: ~956 cents (9.56 semitones) - can't distinguish adjacent notes
413413+- At 20 Hz: 4096 samples (92.9ms)
414414+- At 20 kHz: 4 samples (0.1ms) - too short for analysis
415415+416416+## Alternative Approaches Considered:
417417+418418+### 1. Larger FFT Size (16K or 32K):
419419+- **16K FFT**: Handles windows down to ~45 Hz properly
420420+- **Computational cost**: ~5-10ms (still acceptable for 60 FPS)
421421+- **Memory**: ~2.4MB for 16K, ~4.9MB for 32K
422422+- **Pros**: Full constant-Q accuracy
423423+- **Cons**: Higher resource usage
424424+425425+### 2. Multi-Resolution CQT:
426426+- Use 16K FFT for lowest 2 octaves (20-80 Hz)
427427+- Use 4K FFT for everything else
428428+- **Pros**: Best accuracy at all frequencies
429429+- **Cons**: Complex implementation, multiple FFTs per frame
430430+431431+### 3. Adaptive Q Factor:
432432+- Reduce Q for low frequencies to fit within FFT size
433433+- **Pros**: Single FFT, reasonable accuracy
434434+- **Cons**: Variable frequency resolution
435435+436436+## SELECTED SOLUTION: Hybrid Approach
437437+438438+Combine the best of both methods:
439439+- **Below 100 Hz**: Use ESP32-style window scaling
440440+- **Above 100 Hz**: Use constant-Q approach
441441+- **Transition**: Smooth crossfade between methods
442442+443443+### Implementation Plan:
444444+445445+#### Step 1: Modify Window Length Calculation
446446+In `cqt_kernel.c`, function `generateSingleKernel()` around line 86:
447447+```c
448448+// Current code to replace:
449449+float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE);
450450+int windowLength = (int)(Q * sampleRate / centerFreq);
451451+452452+// New hybrid approach:
453453+float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE);
454454+int windowLength;
455455+456456+if (centerFreq < 100.0f) {
457457+ // ESP32-style for low frequencies
458458+ float factor = centerFreq / minFreq;
459459+ windowLength = (int)(fftSize / factor);
460460+} else {
461461+ // Constant-Q for higher frequencies
462462+ windowLength = (int)(Q * sampleRate / centerFreq);
463463+464464+ // Ensure it fits in FFT size with some margin
465465+ if (windowLength > fftSize * 0.9) {
466466+ windowLength = (int)(fftSize * 0.9);
467467+ }
468468+}
469469+```
470470+471471+#### Step 2: Adjust Normalization
472472+The normalization may need adjustment based on actual window length used.
473473+474474+#### Step 3: Test and Validate
475475+- Test with pure tones at 20, 50, 100, 200, 440, 1000 Hz
476476+- Verify smooth transition at 100 Hz boundary
477477+- Check musical accuracy across spectrum
478478+479479+### Expected Results:
480480+- **20 Hz**: ~1-2 bin spread (using ESP32 method)
481481+- **100 Hz**: ~1-2 bin spread (transition point)
482482+- **440 Hz**: <1 bin spread (constant-Q)
483483+- **Electronic music**: Good sub-bass and treble accuracy
484484+485485+### Future Improvements:
486486+1. Smooth transition zone (80-120 Hz) instead of hard cutoff
487487+2. Configurable transition frequency
488488+3. Optional multi-resolution mode for maximum accuracy
489489+490490+## Phase 3: Next Steps
491491+1. Implement hybrid windowing approach (immediate priority)
492492+2. Add remaining API functions: `cqts()`, `cqto()`, `cqtos()`
493493+3. Create separate audio buffer for CQT (restore FFT_SIZE to 1024)
494494+4. Create FFT vs CQT comparison demo
495495+5. Performance optimization if needed
379496380497### Test Script Example
381498```lua
+44
analyze_esp32_q.py
···11+#!/usr/bin/env python3
22+# Analyze effective Q factor in ESP32 approach
33+44+import math
55+66+fs = 44100 # Sample rate
77+N = 4096 # FFT size
88+fmin = 20 # Minimum frequency
99+1010+print("ESP32 CQT Analysis")
1111+print("==================")
1212+print(f"FFT size: {N}, Sample rate: {fs} Hz, Min freq: {fmin} Hz")
1313+print()
1414+1515+# Test frequencies
1616+freqs = [20, 27.5, 50, 100, 200, 440, 1000, 2000, 5000, 10000, 20000]
1717+1818+print("Freq(Hz) | Window | Eff. Q | BW(Hz) | BW(semitones)")
1919+print("---------|--------|--------|--------|---------------")
2020+2121+for f in freqs:
2222+ # ESP32 window calculation: N_window = N / (f/fmin)
2323+ factor = f / fmin
2424+ window_length = int(N / factor)
2525+2626+ # Effective Q = window_length * f / fs
2727+ eff_q = window_length * f / fs
2828+2929+ # Bandwidth = f / Q
3030+ bandwidth = f / eff_q if eff_q > 0 else float('inf')
3131+3232+ # Bandwidth in semitones = 12 * log2(1 + bandwidth/f)
3333+ bw_semitones = 12 * math.log2(1 + bandwidth/f) if bandwidth < float('inf') else float('inf')
3434+3535+ print(f"{f:7.1f} | {window_length:6d} | {eff_q:6.1f} | {bandwidth:6.1f} | {bw_semitones:13.1f}")
3636+3737+print()
3838+print("Observations:")
3939+print("1. Q varies dramatically with frequency (1.9 to 93.0)")
4040+print("2. Low frequencies: Very low Q (wide bandwidth)")
4141+print("3. High frequencies: Very high Q (narrow bandwidth)")
4242+print()
4343+print("For 12 bins/octave, ideal constant Q ≈ 17")
4444+print("ESP32 only achieves this around 200-300 Hz")
+58
analyze_esp32_q_fixed.py
···11+#!/usr/bin/env python3
22+# Analyze effective Q factor in ESP32 approach - FIXED
33+44+import math
55+66+fs = 44100 # Sample rate
77+N = 4096 # FFT size
88+fmin = 20 # Minimum frequency
99+1010+print("ESP32 CQT Analysis (Corrected)")
1111+print("==============================")
1212+print(f"FFT size: {N}, Sample rate: {fs} Hz, Min freq: {fmin} Hz")
1313+print()
1414+1515+# Test frequencies spanning 10 octaves
1616+freqs = [20, 27.5, 40, 55, 80, 110, 160, 220, 440, 880, 1760, 3520, 7040, 14080, 20000]
1717+1818+print("Freq(Hz) | Factor | Window | Duration(ms) | Eff. Q | BW(Hz) | BW(cents)")
1919+print("---------|--------|--------|--------------|--------|--------|----------")
2020+2121+for f in freqs:
2222+ # ESP32 window calculation: N_window = N / (f/fmin)
2323+ factor = f / fmin
2424+ window_length = N / factor # Keep as float for accurate Q
2525+ window_length_int = int(window_length)
2626+2727+ # Window duration in milliseconds
2828+ duration_ms = window_length / fs * 1000
2929+3030+ # Effective Q = window_length * f / fs
3131+ eff_q = window_length * f / fs
3232+3333+ # Bandwidth = f / Q
3434+ bandwidth = f / eff_q
3535+3636+ # Bandwidth in cents (1 semitone = 100 cents)
3737+ # cents = 1200 * log2(f2/f1) where f2 = f + bandwidth/2, f1 = f - bandwidth/2
3838+ bw_cents = 1200 * math.log2((f + bandwidth/2) / (f - bandwidth/2))
3939+4040+ print(f"{f:7.1f} | {factor:6.1f} | {window_length_int:6d} | {duration_ms:12.1f} | "
4141+ f"{eff_q:6.1f} | {bandwidth:6.1f} | {bw_cents:8.0f}")
4242+4343+print()
4444+print("Key Findings:")
4545+print("-------------")
4646+print("1. ESP32 method gives CONSTANT Q ≈ 1.86 for all frequencies!")
4747+print("2. This is about 9x lower than ideal Q ≈ 17 for 12 bins/octave")
4848+print("3. Bandwidth is constant at ~7.5 semitones (should be ~0.83 semitones)")
4949+print()
5050+print("Trade-offs:")
5151+print("- PRO: All windows fit in FFT size, simple implementation")
5252+print("- PRO: No truncation artifacts")
5353+print("- CON: Poor frequency resolution (9x worse than ideal)")
5454+print("- CON: Can't distinguish adjacent notes (bandwidth > 1 semitone)")
5555+print()
5656+print("For electronic music with 20Hz start:")
5757+print(f"- Window at 20Hz: {N/1:.0f} samples = {N/1/fs*1000:.1f}ms")
5858+print(f"- Window at 20kHz: {N/1000:.0f} samples = {N/1000/fs*1000:.1f}ms")
+46
fft_compute_analysis.py
···11+#!/usr/bin/env python3
22+# Analyze computational cost of different FFT sizes
33+44+import math
55+66+print("FFT Computational Cost Analysis")
77+print("===============================")
88+print()
99+1010+# FFT complexity is O(N log N)
1111+sizes = [4096, 8192, 16384, 32768, 65536]
1212+base_size = 4096
1313+1414+print("FFT Size | Ops (N log N) | Relative Cost | Time @ 1GHz")
1515+print("---------|---------------|---------------|-------------")
1616+1717+for N in sizes:
1818+ ops = N * math.log2(N)
1919+ relative = ops / (base_size * math.log2(base_size))
2020+ # Assume 10 clock cycles per complex multiply-add
2121+ # Modern CPUs can do ~1 operation per clock with SIMD
2222+ time_ms = (ops * 10) / (1e9) * 1000 # milliseconds at 1GHz
2323+2424+ print(f"{N:7d} | {ops:13.0f} | {relative:13.1f}x | {time_ms:10.2f}ms")
2525+2626+print()
2727+print("Real-world estimates (with KISS FFT, no SIMD):")
2828+print("- 4096-point: ~1-2ms on modern CPU")
2929+print("- 16384-point: ~5-10ms")
3030+print("- 32768-point: ~12-24ms")
3131+print()
3232+print("For 60 FPS: 16.7ms per frame total")
3333+print("For 30 FPS: 33.3ms per frame total")
3434+print()
3535+3636+# Memory usage
3737+print("Memory Requirements:")
3838+print("Size | Samples | Real FFT Output | Kernels (120 bins)")
3939+print("------|---------|-----------------|-------------------")
4040+for N in sizes:
4141+ samples = N * 4 # float32
4242+ fft_out = (N//2 + 1) * 8 # complex float32
4343+ # Assume 30% sparsity for kernels
4444+ kernels = 120 * (N//2 + 1) * 8 * 0.3
4545+ total = (samples + fft_out + kernels) / 1024
4646+ print(f"{N:5d} | {samples/1024:7.0f}K | {fft_out/1024:15.0f}K | {kernels/1024:17.0f}K (Total: {total:.0f}K)")
+1-1
src/cqtdata.h
···44#define CQT_BINS 120
55#define CQT_OCTAVES 10
66#define CQT_BINS_PER_OCTAVE 12
77-#define CQT_FFT_SIZE 4096 // Larger FFT for better sub-bass resolution
77+#define CQT_FFT_SIZE 6144 // 6K FFT - balance between quality and performance
8899// CQT frequency range
1010#define CQT_MIN_FREQ 20.0f // Sub-bass for electronic music
+25-5
src/ext/cqt_kernel.c
···7979 CqtWindowType windowType,
8080 float sparsityThreshold)
8181{
8282- // Calculate window length based on Q factor
8383- // windowLength = Q * sampleRate / centerFreq
8484- // This gives us the proper frequency resolution
8282+ // Hybrid approach: ESP32-style for low frequencies, constant-Q for higher
8583 float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE);
8686- int windowLength = (int)(Q * sampleRate / centerFreq);
8484+ int windowLength;
8585+8686+ if (centerFreq < 100.0f) {
8787+ // With 6K FFT, we can use higher Q for better resolution
8888+ // 20Hz: Q=2.8 gives ~6144 samples (full FFT)
8989+ // 50Hz: Q=7 gives ~6174 samples (slightly truncated)
9090+ // 100Hz: Q=14 gives ~6174 samples (slightly truncated)
9191+ float targetQ = Q; // Start with ideal Q
9292+ windowLength = (int)(targetQ * sampleRate / centerFreq);
9393+9494+ // If window doesn't fit, reduce Q to fit exactly
9595+ if (windowLength > fftSize) {
9696+ targetQ = (float)fftSize * centerFreq / sampleRate;
9797+ windowLength = fftSize;
9898+ }
9999+ } else {
100100+ // Constant-Q for higher frequencies
101101+ windowLength = (int)(Q * sampleRate / centerFreq);
102102+103103+ // Ensure it fits in FFT size with some margin
104104+ if (windowLength > fftSize * 0.9) {
105105+ windowLength = (int)(fftSize * 0.9);
106106+ }
107107+ }
8710888109 // Ensure window length is reasonable
89110 if (windowLength < 32) windowLength = 32; // Minimum window size
9090- if (windowLength > fftSize) windowLength = fftSize;
9111192112 // Allocate temporary arrays
93113 float* timeKernel = (float*)calloc(fftSize, sizeof(float));
+2-2
src/fftdata.h
···11#pragma once
22#include <stdbool.h>
33-// TEMPORARY: Changed from 1024 to 2048 to support CQT's 4096-point FFT
33+// TEMPORARY: Changed from 1024 to 3072 to support CQT's 6144-point FFT
44// This breaks FFT bin resolution but enables CQT to work properly
55// TODO: Restore to 1024 and implement separate buffer for CQT
66-#define FFT_SIZE 2048
66+#define FFT_SIZE 3072
77extern float fPeakMinValue;
88extern float fPeakSmoothing;
99extern float fPeakSmoothValue;