CLAUDE.md cleanup, HPS plan · alice.mosphere.at/TIC-80@a833576

+258 -1062

1 changed file

expand all

CLAUDE.md

+258 -1062

CLAUDE.md

··· 91 91 92 92 Performance benchmarks are in demos/bunny/ for comparing language implementations. 93 93 94 - ## FFT Implementation 94 + ## Audio Processing Features 95 95 96 - TIC-80 includes FFT support for audio visualization and livecoding: 96 + ### FFT Implementation 97 + 98 + TIC-80 includes FFT support for audio visualization and livecoding. 97 99 98 - ### FFT Architecture 100 + #### Architecture 99 101 - **src/ext/fft.c**: Audio capture and FFT processing using KISS FFT 100 102 - **src/fftdata.h/c**: FFT data storage (1024 bins, smoothing, normalization) 101 - - **Lua API**: `fft(freq)` and `ffts(freq)` functions for raw/smoothed data 103 + - **Processing**: In `tic_core_tick` before script execution 102 104 103 - ### Key Details 104 - - Uses miniaudio for audio capture (mic or loopback on Windows) 105 - - 44100 Hz sample rate, 2048 sample buffer 106 - - 1024 frequency bins (0-22050 Hz, ~21.5 Hz per bin) 105 + #### Specifications 106 + - Sample rate: 44100 Hz 107 + - Window size: 2048 samples (46ms) 108 + - Frequency bins: 1024 (0-22050 Hz, ~21.5 Hz per bin) 109 + - Update rate: ~21 fps 110 + - Smoothing factor: 0.6 107 111 - Auto-gain control with peak detection 108 - - Smoothing factor: 0.6 for visual stability 109 - - Processing in `tic_core_tick` before script execution 110 112 111 - ### Audio Flow 112 - 1. Audio capture via miniaudio callbacks 113 - 2. Stereo to mono conversion 114 - 3. KISS FFT real-to-complex transform 115 - 4. Magnitude calculation and normalization 116 - 5. Smoothing and peak tracking 117 - 6. Data available to scripts via API 118 - 119 - ## CQT Implementation Plan 120 - 121 - ### Overview 122 - Add Constant-Q Transform (CQT) support alongside existing FFT for better musical frequency analysis with logarithmic frequency spacing and constant Q factor across octaves. 123 - 124 - ### Design Specifications 125 - 126 - #### Parameters 127 - - **Octaves**: 10 (20 Hz - 20 kHz full audio spectrum) 128 - - **Bins per octave**: 12 (chromatic scale) 129 - - **Total bins**: 120 130 - - **Q factor**: ~17 (typical for 12 bins/octave) 131 - - **Minimum frequency**: 20 Hz (below piano's lowest A) 132 - - **Maximum frequency**: 20480 Hz (nearest note to 20 kHz) 133 - 134 - #### API Design 113 + #### API Functions 135 114 ```lua 136 - -- Direct bin access (0-119) 137 - value = cqt(bin) -- Get raw CQT magnitude for bin 138 - value = cqts(bin) -- Get smoothed CQT magnitude for bin 139 - 140 - -- Musical note access (octave 0-9, note 0-11) 141 - value = cqto(octave, note) -- Get raw CQT for specific note 142 - value = cqtos(octave, note) -- Get smoothed CQT for specific note 143 - 144 - -- Note mapping: C=0, C#=1, D=2, D#=3, E=4, F=5, F#=6, G=7, G#=8, A=9, A#=10, B=11 145 - ``` 146 - 147 - ### Implementation Architecture 148 - 149 - #### File Structure 150 - ``` 151 - src/ext/ 152 - ├── fft.c (existing FFT implementation) 153 - ├── cqt.c (new CQT implementation) 154 - └── cqt_kernel.c (CQT kernel generation from ESP32 reference) 155 - 156 - src/ 157 - ├── fftdata.h/c (existing FFT data storage) 158 - ├── cqtdata.h/c (new CQT data storage) 159 - 160 - src/api/ 161 - └── luaapi.c (add cqt(), cqts(), cqto(), cqtos() functions) 162 - ``` 163 - 164 - #### Data Structures (cqtdata.h) 165 - ```c 166 - #define CQT_BINS 120 167 - #define CQT_OCTAVES 10 168 - #define CQT_BINS_PER_OCTAVE 12 169 - 170 - typedef struct { 171 - float cqtData[CQT_BINS]; // Raw CQT magnitudes 172 - float cqtSmoothingData[CQT_BINS]; // Smoothed CQT data 173 - float cqtNormalizedData[CQT_BINS]; // Normalized (0-1) data 174 - bool cqtEnabled; // Enable flag (tied to fftEnabled initially) 175 - 176 - // Sparse kernel storage from ESP32 approach 177 - float* kernelReal[CQT_BINS]; // Real parts of kernels 178 - float* kernelImag[CQT_BINS]; // Imaginary parts 179 - int* kernelIndices[CQT_BINS]; // Non-zero indices 180 - int kernelLengths[CQT_BINS]; // Number of non-zero elements 181 - } CqtData; 182 - ``` 183 - 184 - ### Updated Design (Electronic Music Focus) 185 - 186 - #### FFT Size Change 187 - - **Use 4096-point FFT** instead of 2048 188 - - Bin spacing: 44100 / 4096 = 10.76 Hz (better for sub-bass) 189 - - Critical for electronic music's 20-60 Hz range 190 - - Extra ~0.5ms overhead acceptable 191 - 192 - ### Processing Pipeline 193 - 194 - 1. **Initialization** (startup): 195 - - Pre-compute CQT kernels using Hamming window 196 - - Apply sparsity threshold (0.01) to reduce memory 197 - - Store only non-zero kernel values 198 - 199 - 2. **Runtime Processing** (per frame): 200 - - Share audio buffer from existing capture 201 - - Perform separate 4096-point FFT for CQT 202 - - Apply sparse kernels to FFT output 203 - - Calculate magnitudes and normalize 204 - - Apply smoothing (factor 0.7 for stability) 205 - 206 - 3. **Memory Optimization**: 207 - - Sparse kernels: ~30-40% of full matrix size 208 - - Estimated memory: ~200KB for kernels + 2KB for data 209 - - Acceptable for fantasy computer constraints 210 - 211 - ### Performance Considerations 212 - 213 - #### Computational Cost 214 - - One 4096-point FFT: ~1ms on modern CPU 215 - - Sparse kernel application: ~0.5-1ms 216 - - Total overhead: 1.5-2ms per frame (acceptable) 217 - 218 - #### Optimization Strategies 219 - 1. Use SIMD where available (via compiler) 220 - 2. Process every other frame if needed 221 - 3. Cache kernel computations at startup 222 - 4. Use integer math for index calculations 223 - 224 - ### Integration Points 225 - 226 - 1. **Core Integration** (tic80.c): 227 - - Process CQT in `tic_core_tick` after FFT 228 - - Share `fftEnabled` flag initially 229 - - Add separate `cqtEnabled` flag later 230 - 231 - 2. **Build System** (CMakeLists.txt): 232 - - Add cqt.c and cqt_kernel.c to source list 233 - - No new dependencies (uses existing KISS FFT) 234 - 235 - 3. **Testing Strategy**: 236 - - Create demo showing CQT vs FFT comparison 237 - - Verify musical note accuracy with test tones 238 - - Benchmark on various platforms 239 - 240 - ### Rapid Implementation Plan (Test Early) 241 - 242 - #### Phase 1: Minimal CQT (Test in 1-2 hours) 243 - 1. **Create cqtdata.h/c** with basic data arrays 244 - 2. **Copy/adapt ESP32 kernel generation** to cqt_kernel.c 245 - 3. **Create minimal cqt.c** that: 246 - - Generates kernels on init 247 - - Reuses existing audio buffer 248 - - Performs 4096-point FFT 249 - - Applies kernels to get CQT 250 - 4. **Add single test function** in luaapi.c: `cqt(bin)` 251 - 5. **Hook into tic_core_tick** after FFT processing 252 - 6. **Test with simple Lua visualization** 253 - 254 - #### Phase 1 Status: COMPLETE ✓ 255 - - [x] cqtdata.h - Basic structure and constants 256 - - [x] cqtdata.c - Global data instance 257 - - [x] cqt_kernel.h - Kernel generation header 258 - - [x] cqt_kernel.c - Adapt ESP32 kernel generation 259 - - [x] cqt.h - Main CQT header 260 - - [x] cqt.c - Basic CQT processing (with placeholder FFT mapping) 261 - - [x] luaapi.c - Add cqt() function 262 - - [x] tic80.c - Hook into core tick 263 - - [x] CMakeLists.txt - Add new files to build 264 - 265 - **Test Results**: Basic API working, visualization shows 10 octaves with test data mapping 266 - 267 - #### Phase 2: Real CQT Processing (COMPLETE) 268 - 1. ✓ **Access raw audio buffer** from FFT capture system 269 - 2. ✓ **Implement 4096-point FFT** for CQT (separate from main FFT) 270 - 3. ✓ **Apply CQT kernels** to FFT output 271 - 4. ✓ **Fix kernel initialization** - kernels are properly generated on startup 272 - 5. ✓ **Debug and fix frequency mapping** - CQT now correctly detects frequencies 273 - 6. **Add remaining API functions**: `cqts()`, `cqto()`, `cqtos()` (TODO) 274 - 7. **Create comparison demo** showing FFT vs CQT side-by-side (TODO) 275 - 276 - **TEMPORARY CHANGE**: FFT_SIZE has been changed from 1024 to 2048 in fftdata.h to support CQT's 4096-point FFT requirement. This temporarily breaks FFT resolution (now 2048 bins instead of 1024) but allows CQT to function properly. This should be reverted once a separate audio buffer is implemented for CQT. 277 - 278 - **CURRENT ISSUE**: CQT frequency mapping is incorrect. Test shows: 279 - - Playing 110 Hz (A2) appears at bin 0-4 instead of expected bin 30 280 - - Error of ~26-30 bins (2-3 octaves too low) 281 - - CQT is detecting signals but at wrong frequency bins 282 - 283 - **Debugging Status**: 284 - 1. Changed smoothing from 0.7 to 0.3 - improved peak sharpness ✓ 285 - 2. Fixed window length calculation to use Q factor ✓ 286 - 3. Fixed kernel generation to use equal temperament spacing ✓ 287 - 4. Added gain boost (4x) for better visibility ✓ 288 - 5. Created test scripts: test_cqt_tone.lua, test_cqt_debug.lua ✓ 289 - 6. **FIXED kernel phase calculation** - Critical fix! ✓ 290 - - Problem: Was using `(i - windowLength/2) / sampleRate` for phase 291 - - Solution: Use `(idx - fftSize/2) * (centerFreq / sampleRate)` like ESP32 292 - - The modulation must be based on position in full FFT buffer, not window 293 - 294 - **Key Fix Applied**: 295 - ```c 296 - // OLD (incorrect): 297 - float t = (i - windowLength/2.0f) / sampleRate; 298 - float phase = 2.0f * M_PI * centerFreq * t; 299 - 300 - // NEW (correct - matches ESP32): 301 - float phase = 2.0f * M_PI * (centerFreq / sampleRate) * (idx - fftSize/2); 302 - ``` 303 - 304 - This should fix the frequency mapping issue. Ready for testing! 305 - 306 - **Update after testing**: 307 - - The phase fix helped but frequencies are still mapping to wrong bins 308 - - 110 Hz appears at bin 3 instead of 30 (27-bin error) 309 - - Created test_cqt_stable.lua for controlled testing 310 - - Created test_cqt_a4.lua for constant 440Hz tone 311 - - Added debug output to show kernel FFT bin ranges 312 - - Issue appears to be in kernel generation or application 313 - 314 - **Current hypothesis**: 315 - The kernels might be centered at the wrong FFT bins. Need to verify: 316 - 1. The FFT bin calculation for each center frequency 317 - 2. The kernel's actual FFT bin coverage 318 - 3. Whether the complex multiplication is being done correctly 319 - 320 - **Issues found**: 321 - 1. Audio generation in test cart was wrong - fixed in test_cqt_a4.lua 322 - 2. Added debug output to show: 323 - - FFT bin magnitudes around 440 Hz (should peak at bin 41) 324 - - Which CQT bins have high values 325 - - This will help identify if the issue is in kernel generation or application 326 - 327 - **To test**: 328 - Build and run with test_cqt_a4.lua. The console should show: 329 - - Every second: FFT bins around 440 Hz and active CQT bins 330 - - This will reveal if 440 Hz is detected at FFT bin 41 but mapped to wrong CQT bin 331 - 332 - **Debug results**: 333 - - FFT correctly shows 440 Hz peak at bin 41 (magnitude 730) ✓ 334 - - CQT bin 54 has high value (27595) which is correct for 440 Hz ✓ 335 - - BUT: Almost ALL CQT bins (0-105) have significant energy 336 - - This means kernels are not frequency-selective enough 337 - 338 - **Fix applied**: 339 - - Changed normalization to match ESP32: divide by windowLength before FFT 340 - - Added scaling factor after FFT: multiply by centerFreq/minFreq 341 - - This should make kernels more frequency-selective 342 - 343 - **Test results after fix**: 344 - - 440 Hz correctly peaks at CQT bin 54 ✓ 345 - - BUT: ALL bins (0-119) have significant values 346 - - High frequency bins (90-119) have values in thousands! 347 - - This means kernels are not properly bandpass filtered 348 - 349 - **New hypothesis**: 350 - The issue is that our kernels are "seeing" all frequencies. Possible causes: 351 - 1. The modulation in time domain might be creating aliases 352 - 2. The window might be too wide or too narrow 353 - 3. The FFT of the kernel might need different normalization 354 - 355 - **Next debugging steps**: 356 - - Removed scaling factor (made it worse) 357 - - Added debug to show kernel 54's FFT bin range 358 - - Need to verify kernels are properly bandpass filtered around their center frequencies 359 - 360 - **SOLUTION FOUND**: 361 - - Removed the scaling factor - this was the key fix! 362 - - Kernel 54 now correctly uses only 9 FFT bins (38-46) centered on bin 41 363 - - 440 Hz correctly peaks at CQT bin 54 with magnitude 681 364 - - Only ~50 bins have values > 0.1 (vs all 120 before) 365 - - CQT is now properly frequency-selective! 366 - 367 - **Working implementation**: 368 - - Kernels use Q-factor based window length 369 - - Normalize by windowLength before FFT 370 - - NO scaling factor after FFT 371 - - Proper phase calculation: `2π * (f/fs) * (idx - N/2)` 372 - 373 - **Final Status**: 374 - - CQT correctly detects frequencies (440 Hz → bin 54) 375 - - Good frequency selectivity (minimal spreading) 376 - - Kernels properly bandpass filtered 377 - - 20 Hz start preserved for electronic music sub-bass 378 - - Created test_cqt_spectrum_v2.lua with correct note display 379 - 380 - **NEW ISSUE - Excessive Frequency Spreading**: 381 - Testing with pure sine waves shows excessive frequency spreading: 382 - - 20 Hz sine wave: Energy spreads across ~6-7 bins 383 - - 50 Hz sine wave: Energy spreads across ~8-9 bins 384 - - 100 Hz sine wave: Energy spreads across ~4-5 bins 385 - 386 - ## Root Cause Analysis: 387 - The issue is **window truncation** in our current implementation: 388 - - At 20 Hz: Window should be 37,485 samples but is clamped to 4,096 (line 90 in cqt_kernel.c) 389 - - At 50 Hz: Window should be 14,994 samples but is clamped to 4,096 390 - - At 100 Hz: Window should be 7,497 samples but is clamped to 4,096 391 - 392 - This truncation reduces the effective Q factor: 393 - - At 20 Hz: Effective Q ≈ 1.86 instead of 17 394 - - At 50 Hz: Effective Q ≈ 4.64 instead of 17 395 - - At 100 Hz: Effective Q ≈ 9.29 instead of 17 396 - 397 - ## Comparison: Our Approach vs ESP32 Approach 398 - 399 - ### Our Current Approach (Constant Q): 400 - ```c 401 - windowLength = Q * sampleRate / centerFreq; // Q ≈ 17 402 - if (windowLength > fftSize) windowLength = fftSize; // TRUNCATION! 403 - ``` 404 - - **Pros**: Excellent frequency resolution at high frequencies 405 - - **Cons**: Severe truncation at low frequencies causing spreading 406 - 407 - ### ESP32 Approach (Variable Window): 408 - ```c 409 - windowLength = fftSize / (centerFreq / minFreq); // scales with 1/f 410 - ``` 411 - - **Pros**: All windows fit within FFT size, no truncation 412 - - **Cons**: Very low Q (≈1.86) giving ~9.56 semitone bandwidth 413 - 414 - ### Analysis of ESP32 Approach: 415 - - Constant effective Q ≈ 1.86 for all frequencies 416 - - Bandwidth: ~956 cents (9.56 semitones) - can't distinguish adjacent notes 417 - - At 20 Hz: 4096 samples (92.9ms) 418 - - At 20 kHz: 4 samples (0.1ms) - too short for analysis 419 - 420 - ## Alternative Approaches Considered: 421 - 422 - ### 1. Larger FFT Size (16K or 32K): 423 - - **16K FFT**: Handles windows down to ~45 Hz properly 424 - - **Computational cost**: ~5-10ms (still acceptable for 60 FPS) 425 - - **Memory**: ~2.4MB for 16K, ~4.9MB for 32K 426 - - **Pros**: Full constant-Q accuracy 427 - - **Cons**: Higher resource usage 428 - 429 - ### 2. Multi-Resolution CQT: 430 - - Use 16K FFT for lowest 2 octaves (20-80 Hz) 431 - - Use 4K FFT for everything else 432 - - **Pros**: Best accuracy at all frequencies 433 - - **Cons**: Complex implementation, multiple FFTs per frame 434 - 435 - ### 3. Adaptive Q Factor: 436 - - Reduce Q for low frequencies to fit within FFT size 437 - - **Pros**: Single FFT, reasonable accuracy 438 - - **Cons**: Variable frequency resolution 439 - 440 - ## SELECTED SOLUTION: Hybrid Approach 441 - 442 - Combine the best of both methods: 443 - - **Below 100 Hz**: Use ESP32-style window scaling 444 - - **Above 100 Hz**: Use constant-Q approach 445 - - **Transition**: Smooth crossfade between methods 446 - 447 - ### Implementation Plan: 448 - 449 - #### Step 1: Modify Window Length Calculation 450 - In `cqt_kernel.c`, function `generateSingleKernel()` around line 86: 451 - ```c 452 - // Current code to replace: 453 - float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 454 - int windowLength = (int)(Q * sampleRate / centerFreq); 455 - 456 - // New hybrid approach: 457 - float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 458 - int windowLength; 459 - 460 - if (centerFreq < 100.0f) { 461 - // ESP32-style for low frequencies 462 - float factor = centerFreq / minFreq; 463 - windowLength = (int)(fftSize / factor); 464 - } else { 465 - // Constant-Q for higher frequencies 466 - windowLength = (int)(Q * sampleRate / centerFreq); 467 - 468 - // Ensure it fits in FFT size with some margin 469 - if (windowLength > fftSize * 0.9) { 470 - windowLength = (int)(fftSize * 0.9); 471 - } 472 - } 115 + value = fft(bin) -- Get raw FFT magnitude for bin (0-1023) 116 + value = ffts(bin) -- Get smoothed FFT magnitude for bin (0-1023) 473 117 ``` 474 118 475 - #### Step 2: Adjust Normalization 476 - The normalization may need adjustment based on actual window length used. 477 - 478 - #### Step 3: Test and Validate 479 - - Test with pure tones at 20, 50, 100, 200, 440, 1000 Hz 480 - - Verify smooth transition at 100 Hz boundary 481 - - Check musical accuracy across spectrum 482 - 483 - ### Expected Results: 484 - - **20 Hz**: ~1-2 bin spread (using ESP32 method) 485 - - **100 Hz**: ~1-2 bin spread (transition point) 486 - - **440 Hz**: <1 bin spread (constant-Q) 487 - - **Electronic music**: Good sub-bass and treble accuracy 488 - 489 - ### Future Improvements: 490 - 1. Smooth transition zone (80-120 Hz) instead of hard cutoff 491 - 2. Configurable transition frequency 492 - 3. Optional multi-resolution mode for maximum accuracy 493 - 494 - ## FFT Performance Measurements (CRITICAL UPDATE) 495 - 496 - ### M1 Pro Benchmark Results: 497 - Actual measurements completely contradict initial estimates: 498 - ``` 499 - CQT FFT Benchmark on this CPU: 500 - ================================ 501 - 4096-point FFT: 0.014 ms 502 - 6144-point FFT: 0.021 ms (current implementation) 503 - 8192-point FFT: 0.025 ms 504 - 12288-point FFT: 0.047 ms 505 - 16384-point FFT: 0.066 ms (!!) 506 - ================================ 507 - ``` 508 - 509 - ### Key Findings: 510 - - **75-150x faster than conservative estimates** 511 - - 16K FFT uses only 0.066ms (0.4% of 16.67ms frame budget at 60fps) 512 - - Even 32K FFT would likely be ~0.15ms (still under 1% frame budget) 513 - - Apple Silicon (or auto-vectorization) provides exceptional FFT performance 514 - 515 - ### Performance Comparison Results: 516 - - **M1 Pro**: 16K FFT = 0.066ms 517 - - **Intel i5-1130G7 (ThinkPad X1 Nano)**: 518 - - Performance mode: 16K FFT = 0.112ms (0.67% of frame budget) 519 - - Power save mode: 16K FFT = 0.335ms (2% of frame budget) 520 - - Even in power save mode, 16K FFT is completely viable! 521 - 522 - ### Implications: 523 - **16K FFT is now implemented** and provides: 524 - - 20Hz: Q≈7.4 (truncated from ideal 17, but much better than previous 1.86) 525 - - 30Hz: Q≈11.2 (near ideal) 526 - - 45Hz+: Full Q≈17 (ideal resolution - no truncation!) 527 - - Dramatically improved low-frequency resolution for electronic music 528 - 529 - The profiling code has been added to measure actual performance on each platform. 530 - 531 - ## Current CQT Implementation Status (December 2024) 532 - 533 - ### What's Working: 534 - - **FFT fully restored** - Returns 1024 bins with exact original behavior preserved 535 - - **CQT with 8K FFT** - Optimized for responsive visualization (~5.4 fps) 536 - - **Variable-Q Implementation** ✨ NEW! - Optimized Q values for 8K FFT constraint 537 - - **Shared audio buffer** - Automatically sized to max(FFT needs, CQT needs) 538 - - **cqt(bin)** function working - Returns raw CQT magnitude for bin 0-119 539 - - **Frequency detection accurate** - 440Hz correctly maps to bin 54, etc. 540 - - **Kernels properly generated** - Sparse storage, good frequency selectivity 541 - - **Smoothing calculated** - But not exposed via API yet 542 - 543 - ### API Status: 544 - | Function | Status | Description | 545 - |----------|---------|-------------| 546 - | `fft(bin)` | ✅ Complete | Raw FFT data (0-1023) | 547 - | `ffts(bin)` | ✅ Complete | Smoothed FFT data | 548 - | `cqt(bin)` | ✅ Complete | Raw CQT data (0-119) | 549 - | `cqts(bin)` | ❌ TODO | Smoothed CQT data | 550 - | `cqto(octave, note)` | ❌ TODO | Raw CQT by musical note | 551 - | `cqtos(octave, note)` | ❌ TODO | Smoothed CQT by musical note | 552 - 553 - ### Variable-Q Implementation (8K FFT Optimized): 554 - The Variable-Q implementation provides frequency-dependent Q factors optimized to fit within 8K FFT window size: 555 - 556 - | Frequency Range | Design Q | Effective Q | Resolution | Notes | 557 - |----------------|----------|-------------|------------|-------| 558 - | 20-25 Hz | 7.4 | 3.7 | ~5.4 Hz | Limited by 8K FFT | 559 - | 25-30 Hz | 9.2 | 4.6 | ~5.4 Hz | Better than fixed Q | 560 - | 30-40 Hz | 11.5 | 5.6-7.4 | ~5.1 Hz | Good for bass | 561 - | 40-50 Hz | 14.5 | 7.4-9.3 | ~4.3 Hz | Near ideal | 562 - | 50-65 Hz | 16.0 | 9.3-12.1 | ~4.1 Hz | Almost full Q | 563 - | 65-80 Hz | 17.0 | 12.1-17.0 | ~4.7 Hz | Full standard Q | 564 - | 80+ Hz | 17.0 | 17.0 | Standard CQT | No truncation | 565 - 566 - ### Performance with 8K FFT: 567 - - **Update rate**: ~5.4 fps (good for responsive visualization) 568 - - **M1 Pro**: ~0.2ms total (1.2% of frame budget) 569 - - **Bass resolution**: Much improved over fixed Q=17 570 - - **All windows fit** within 8K samples above 80 Hz 571 - 572 - ### Key Benefits of 8K-Optimized Variable-Q: 573 - - **Smooth Q transition**: Gradually increases from 7.4 to 17 across frequency range 574 - - **No harsh cutoffs**: All windows designed to fit within 8K constraint 575 - - **Better than fixed Q**: ~5 Hz resolution at 20 Hz (vs ~11 Hz with fixed Q=17) 576 - - **Responsive updates**: Maintains 5.4 fps for livecoding applications 577 - - **Electronic music optimized**: Good sub-bass resolution where it matters most 578 - 579 - ### Implementation Architecture: 580 - ``` 581 - Audio Capture (44.1kHz) 582 - ↓ 583 - Shared Buffer (8192 samples currently) 584 - ├─→ FFT: Uses first 2048 samples → 1024 bins 585 - └─→ CQT: Uses first 8192 samples → 120 bins 586 - ``` 119 + ### CQT Implementation 587 120 588 - ### Key Implementation Details: 589 - 1. **Buffer management**: 590 - - `AUDIO_BUFFER_SIZE` in fft.c automatically adjusts 591 - - FFT always reads samples 0-2047 592 - - CQT reads samples 0-(CQT_FFT_SIZE-1) 121 + Constant-Q Transform provides logarithmic frequency spacing for better musical analysis. 593 122 594 - 2. **Smoothing implementation**: 595 - - FFT: 0.6 factor (60% old, 40% new) 596 - - CQT: 0.3 factor (30% old, 70% new) - calculated but not exposed 597 - - Both use peak tracking with 0.99 decay 123 + #### Architecture 124 + - **src/ext/cqt.c**: CQT processing with separate FFT 125 + - **src/ext/cqt_kernel.c**: Kernel generation with sparse storage 126 + - **src/cqtdata.h/c**: CQT data storage and configuration 127 + - **Processing**: In `tic_core_tick` after FFT processing 598 128 599 - 3. **Test scripts**: 600 - - `demo_fft_cqt_hybrid.lua` - Combined FFT/CQT visualization 601 - - `test_cqt_spectrum_v2.lua` - CQT spectrum analyzer 602 - - `test_fft_restored.lua` - FFT verification 603 - - `test_cqt_variable_q.lua` - Variable-Q demonstration (NEW!) 129 + #### Specifications 130 + - Frequency range: 20 Hz - 20480 Hz (10 octaves × 12 notes = 120 bins) 131 + - FFT size: 8192 samples (configurable) 132 + - Update rate: ~5.4 fps with 8K FFT 133 + - Variable-Q implementation optimized for 8K FFT constraint 134 + - Smoothing factor: 0.3 135 + - Spectral whitening: Enabled by default (toggle via `CQT_SPECTRAL_WHITENING_ENABLED`) 604 136 605 - ## Next Steps 606 - 1. ✅ ~~Implement configurable FFT for CQT~~ (COMPLETE - using 8K default) 607 - 2. ✅ ~~Create separate audio buffer for CQT~~ (COMPLETE - shared buffer) 608 - 3. ✅ ~~Restore FFT_SIZE to 1024~~ (COMPLETE) 609 - 4. ✅ ~~Implement Variable-Q for better bass resolution~~ (COMPLETE - 8K optimized) 610 - 5. ❌ Add remaining API functions: `cqts()`, `cqto()`, `cqtos()` 611 - 6. ❌ Add CQT to other language bindings (currently Lua only) 612 - 7. ❌ Create comprehensive FFT vs CQT comparison demo 613 - 8. ❌ Add configuration options for CQT parameters 614 - 9. ❌ Implement CQT enhancements (spectral whitening, HPS, etc.) 137 + #### Variable-Q Design (8K FFT Optimized) 138 + | Frequency Range | Design Q | Effective Q | Resolution | 139 + |----------------|----------|-------------|------------| 140 + | 20-25 Hz | 7.4 | 3.7 | ~5.4 Hz | 141 + | 25-30 Hz | 9.2 | 4.6 | ~5.4 Hz | 142 + | 30-40 Hz | 11.5 | 5.6-7.4 | ~5.1 Hz | 143 + | 40-50 Hz | 14.5 | 7.4-9.3 | ~4.3 Hz | 144 + | 50-65 Hz | 16.0 | 9.3-12.1 | ~4.1 Hz | 145 + | 65-80 Hz | 17.0 | 12.1-17.0 | ~4.7 Hz | 146 + | 80+ Hz | 17.0 | 17.0 | Standard CQT | 615 147 616 - ### Test Script Example 148 + #### API Functions 617 149 ```lua 618 - -- Quick CQT test visualization 619 - function TIC() 620 - cls(0) 621 - -- Draw CQT bins as bars 622 - for i=0,119 do 623 - local val = cqt(i) * 100 -- scale up for visibility 624 - rect(i*2, 136-val, 2, val, 12) 625 - end 626 - 627 - -- Draw octave markers 628 - for oct=0,9 do 629 - local x = oct * 12 * 2 630 - line(x, 0, x, 136, 8) 631 - print(oct, x+2, 2, 12, false, 1, true) 632 - end 633 - 634 - -- Show info 635 - print("CQT TEST - 10 octaves x 12 notes", 2, 120, 12, false, 1, true) 636 - end 150 + value = cqt(bin) -- Get raw CQT magnitude for bin (0-119) 151 + -- Note mapping: Bin = octave * 12 + note 152 + -- Note: C=0, C#=1, D=2, D#=3, E=4, F=5, F#=6, G=7, G#=8, A=9, A#=10, B=11 637 153 ``` 638 154 639 - ### Future Enhancements 640 - 1. Configurable octave range (6-10 octaves) 641 - 2. Variable bins per octave (24 for quarter-tones) 642 - 3. Separate enable flag for CQT 643 - 4. Phase information access 644 - 5. Inverse CQT for resynthesis 155 + ### FFT vs CQT Comparison 645 156 646 - ## FFT vs CQT: Understanding the Tradeoffs 647 - 648 - ### Fundamental Difference 649 - - **FFT**: Linear frequency spacing (each bin = 21.5 Hz) 650 - - **CQT**: Logarithmic frequency spacing (constant Q = ~17) 651 - 652 - ### Time-Frequency Resolution Tradeoff (Uncertainty Principle) 653 - This is fundamental physics - to distinguish frequencies, you must observe for sufficient time: 654 - - To tell 20 Hz from 21 Hz: Need 1 second of observation 655 - - To tell 1000 Hz from 1001 Hz: Still need 1 second 656 - - Shorter window = better time resolution, worse frequency resolution 657 - - Longer window = better frequency resolution, worse time resolution 658 - 659 - ### FFT Characteristics 660 - - **Window**: 2048 samples (46ms) 661 - - **Update rate**: ~21 fps 662 - - **Frequency resolution**: 21.5 Hz per bin (constant) 663 - - **Best for**: Beat detection, rhythm visualization, transients 664 - - **API**: `fft(bin)` raw, `ffts(bin)` smoothed (0.6 factor) 665 - 666 - ### CQT Characteristics 667 - - **Window**: Variable per frequency (up to 16384 samples) 668 - - **Update rate**: ~5.4 fps (8K FFT) or ~2.7 fps (16K FFT) 669 - - **Frequency resolution**: Constant Q≈17 (logarithmic spacing) 670 - - **Best for**: Note detection, chord analysis, harmonic content 671 - - **API**: `cqt(bin)` raw, `cqts(bin)` smoothed (0.3 factor) - smoothed not yet implemented 672 - 673 - ### Resolution Comparison 674 - 675 - | Method | 40 Hz Resolution | 440 Hz Resolution | 4400 Hz Resolution | 676 - |--------|------------------|-------------------|-------------------| 677 - | FFT | ±10.75 Hz (25%) | ±10.75 Hz (2.4%) | ±10.75 Hz (0.24%) | 678 - | CQT | ±1.4 Hz (3.5%) | ±15 Hz (3.5%) | ±150 Hz (3.5%) | 679 - 680 - ### Configurable CQT FFT Sizes 681 - 682 - | FFT Size | Update Rate | Low Freq Quality | Use Case | 683 - |----------|-------------|------------------|----------| 684 - | 4K | 10.8 fps | Poor (Q≈1.9 @ 20Hz) | Rhythm only, not musical | 685 - | 8K | 5.4 fps | Decent (Q≈3.7 @ 20Hz) | Good balance for livecoding | 686 - | 16K | 2.7 fps | Good (Q≈7.4 @ 20Hz) | Best accuracy, slow update | 157 + | Aspect | FFT | CQT | 158 + |--------|-----|-----| 159 + | Frequency spacing | Linear (21.5 Hz/bin) | Logarithmic (musical) | 160 + | Update rate | ~21 fps | ~5.4 fps | 161 + | Window size | 2048 samples (46ms) | Variable (up to 8192) | 162 + | Best for | Beat detection, rhythm | Note detection, harmony | 163 + | Low freq resolution | ±10.75 Hz | ~5 Hz @ 20 Hz | 164 + | High freq resolution | ±10.75 Hz | Proportional to frequency | 687 165 688 - #### 4K FFT Deep Dive (Not Recommended) 689 - With only 4096 samples, the maximum achievable Q at each frequency is severely limited: 166 + ### Shared Audio Buffer Architecture 690 167 691 - | Frequency | Max Q (4K) | Bandwidth | Musical Impact | 692 - |-----------|------------|-----------|----------------| 693 - | 20 Hz | 1.86 | 10.8 Hz | Can't distinguish notes in same octave | 694 - | 40 Hz | 3.72 | 10.8 Hz | E1 and F1 merge together | 695 - | 80 Hz | 7.43 | 10.8 Hz | ~1.5 semitone resolution | 696 - | 160 Hz | 14.87 | 10.8 Hz | Approaching usable | 697 - | 200+ Hz | 17+ | Standard | Full CQT resolution | 698 - 699 - **Verdict**: 4K FFT turns CQT into a "bass energy detector" rather than note detector. The 10+ fps is tempting but the musical accuracy is too poor for practical use. 700 - 701 - Current implementation uses 8K by default (configurable via `CQT_FFT_SIZE` in cqtdata.h), providing the best balance between update rate (5.4 fps) and frequency resolution. 702 - 703 - ### Smoothing Factors 704 - - **FFT**: 0.6 (60% old, 40% new) - more stable 705 - - **CQT**: 0.3 (30% old, 70% new) - more responsive 706 - - Peak tracking uses 0.99 factor for slow decay 707 - 708 - ### Shared Audio Buffer Architecture 709 168 Both FFT and CQT share the same audio capture buffer: 710 169 - Buffer size: Maximum of (2048, CQT_FFT_SIZE) samples 711 - - FFT reads samples 0-2047 (always the same) 170 + - FFT reads samples 0-2047 712 171 - CQT reads samples 0-(CQT_FFT_SIZE-1) 713 - - This preserves exact FFT behavior while allowing CQT flexibility 714 - 715 - ### Frame Rate vs BPM Limitations 716 - 717 - #### CQT Update Rate Analysis (5.4 fps = 185ms per frame) 718 - 719 - | BPM | Beat Duration | Beats per Frame | 16th Notes per Frame | Suitability | 720 - |-----|---------------|-----------------|---------------------|-------------| 721 - | 120 | 500ms | 0.37 beats | 1.5 sixteenths | ✅ Good | 722 - | 128 | 469ms | 0.39 beats | 1.6 sixteenths | ✅ Good | 723 - | 140 | 429ms | 0.43 beats | 1.7 sixteenths | ✅ Good | 724 - | 150 | 400ms | 0.46 beats | 1.9 sixteenths | ✅ OK | 725 - | 160 | 375ms | 0.49 beats | 2.0 sixteenths | ⚠️ Borderline | 726 - | **174** | **345ms** | **0.54 beats** | **2.1 sixteenths** | **❌ Critical point** | 727 - | 180 | 333ms | 0.56 beats | 2.2 sixteenths | ❌ Too fast | 728 - | 200 | 300ms | 0.62 beats | 2.5 sixteenths | ❌ Unusable | 729 - 730 - **At 174 BPM, CQT misses every other beat!** 731 - 732 - #### Genre Suitability 733 - 734 - - **House/Techno (120-130 BPM)**: ✅ Excellent - 2.5+ updates per beat 735 - - **Dubstep (140 BPM)**: ✅ Good - 2.3 updates per beat 736 - - **Trance (130-150 BPM)**: ✅ Mostly fine - 2+ updates per beat 737 - - **Drum & Bass (160-180 BPM)**: ⚠️ Problematic - Fast breaks blur 738 - - **Hardcore/Gabber (180-200+ BPM)**: ❌ Unusable - Complete desync 739 - 740 - #### What Actually Breaks 741 - - **Kick synchronization** fails above 170 BPM 742 - - **Hi-hat patterns** (32nd notes) blur into continuous energy 743 - - **Amen breaks** become unrecognizable smears 744 - - **Bass wobbles** look stepped instead of smooth 172 + - Uses miniaudio for audio capture (mic or loopback on Windows) 745 173 746 - ### Practical Usage for Electronic Music Visualization 174 + ### Practical Usage Guidelines 747 175 748 176 **Use FFT for:** 749 177 - Kick drum detection (bins 2-4, ~40-80 Hz) 750 178 - Beat synchronization 751 179 - Energy meters by frequency band 752 - - Reactive elements needing >10 fps update 753 - - **Any rhythm visualization above 150 BPM** 180 + - Any rhythm visualization above 150 BPM 754 181 755 182 **Use CQT for:** 756 183 - Bass note identification 757 - - Chord/key detection 184 + - Chord/key detection 758 185 - Color mapping from musical content 759 186 - Melodic visualization 760 - - **Harmonic content (not rhythm)** 761 187 762 - **Hybrid Approach (Recommended):** 188 + **Hybrid Approach Example:** 763 189 ```lua 764 190 -- Rhythm from FFT (21 fps) 765 191 local kick = fft(2) + fft(3) + fft(4) ··· 773 199 -- Combine for visuals 774 200 local pulse = kick * 2 -- Size from rhythm 775 201 local color = (bassNote % 12) + 1 -- Color from note 776 - 777 - -- For fast music, predict beats 778 - local bpm = 175 -- D&B tempo 779 - local beatPhase = (time() * bpm / 60) % 1 780 - local onBeat = beatPhase < 0.1 781 - ``` 782 - 783 - ### Sample Rate and Frequency Ranges 784 - - TIC-80 sample rate: 44100 Hz 785 - - Nyquist frequency: 22050 Hz 786 - - Both FFT and CQT analyze 0-22050 Hz 787 - - CQT specifically tuned for 20 Hz - 20480 Hz (musical range) 788 - 789 - ## CQT Enhancement Plan: Making Notes "Pop" for Electronic Music 790 - 791 - ### Overview 792 - Enhance the existing CQT implementation with signal processing techniques specifically designed to make musical notes stand out clearly in electronic music visualizations. This addresses the current issue where drums, noise, and overlapping harmonics can obscure the melodic content. 793 - 794 - ### Core Enhancements 795 - 796 - #### 1. Harmonic-Percussive Separation (HPS) 797 - - **Purpose**: Isolate harmonic (tonal) content from percussive (drums, transients) 798 - - **Method**: Median filtering on magnitude spectrogram 799 - - Horizontal median filter → Enhances harmonic (stable over time) 800 - - Vertical median filter → Enhances percussive (stable over frequency) 801 - - **Implementation**: 802 - - Store 5-7 frames of CQT history (circular buffer) 803 - - Apply median filters to create harmonic/percussive masks 804 - - Process only harmonic component through CQT display 805 - 806 - #### 2. Spectral Whitening 807 - - **Purpose**: Normalize the natural 1/f spectral tilt in music 808 - - **Method**: Per-bin normalization based on long-term average 809 - - **Implementation**: 810 - - Track running average per CQT bin (slow adaptation ~1-2 seconds) 811 - - Divide current magnitude by average (with floor to prevent divide-by-zero) 812 - - Optional: Use equal-loudness curves for perceptual weighting 813 - 814 - #### 3. Variable-Q Transform with Aggressive Bass Q 815 - - **Purpose**: Increase frequency resolution in bass region where electronic music needs it most 816 - - **Method**: Adaptive Q factor that's much higher for low frequencies 817 - - **Implementation**: 818 - - Q = 34 for 20-80 Hz (double the standard Q) 819 - - Q = 17 for 80-200 Hz (standard) 820 - - Q = 12 for 200+ Hz (slightly reduced for smoother visuals) 821 - - Regenerate kernels with frequency-dependent Q 822 - - May need 16K FFT to accommodate longer windows 823 - 824 - #### 4. Adaptive Thresholding 825 - - **Purpose**: Remove noise floor that varies across spectrum 826 - - **Method**: Dynamic threshold per bin based on recent minimum 827 - - **Implementation**: 828 - - Track minimum value per bin over ~1 second window 829 - - Set threshold at minimum + margin (e.g., 3-6 dB) 830 - - Zero out values below threshold 831 - 832 - #### 5. Note Onset Enhancement 833 - - **Purpose**: Make note attacks more visible 834 - - **Method**: Detect rapid energy increases per bin 835 - - **Implementation**: 836 - - Track rate of change per bin 837 - - Boost bins with positive derivatives (onset) 838 - - Quick attack, slow decay envelope 839 - 840 - ### Implementation Architecture 841 - 842 - #### New Data Structures (cqtdata.h additions) 843 - ```c 844 - // Enhancement data structures 845 - typedef struct { 846 - // Variable-Q Transform 847 - float variableQ[CQT_BINS]; // Q factor per bin 848 - bool kernelsNeedRegeneration; // Flag for kernel update 849 - 850 - // Harmonic-Percussive Separation 851 - float cqtHistory[CQT_HISTORY_SIZE][CQT_BINS]; // Circular buffer 852 - int historyIndex; // Current position 853 - float harmonicMask[CQT_BINS]; // Harmonic component 854 - float percussiveMask[CQT_BINS]; // Percussive component 855 - 856 - // Spectral Whitening 857 - float binAverages[CQT_BINS]; // Long-term averages 858 - float averageDecay; // Averaging factor (0.99) 859 - 860 - // Adaptive Thresholding 861 - float noiseFloor[CQT_BINS]; // Per-bin noise estimates 862 - float thresholdMargin; // dB above noise (3-6) 863 - 864 - // Onset Detection 865 - float previousMagnitudes[CQT_BINS]; // For derivative 866 - float onsetStrength[CQT_BINS]; // Onset envelope 867 - float onsetDecay; // Envelope decay (0.9) 868 - 869 - // Enhanced output 870 - float cqtEnhanced[CQT_BINS]; // Final enhanced data 871 - } CqtEnhancementData; 872 - ``` 873 - 874 - #### Processing Pipeline (cqt.c modifications) 875 - 1. **Variable-Q CQT computation**: 876 - - Regenerate kernels if Q values changed 877 - - Use frequency-dependent Q factors 878 - 2. **Store in history buffer** (for HPS) 879 - 3. **Harmonic-Percussive Separation**: 880 - - Compute median filters on history 881 - - Extract harmonic component 882 - 4. **Spectral Whitening**: 883 - - Update running averages 884 - - Apply normalization 885 - 5. **Adaptive Thresholding**: 886 - - Update noise floor estimates 887 - - Apply thresholding 888 - 6. **Onset Detection**: 889 - - Calculate derivatives 890 - - Update onset envelopes 891 - 7. **Combine and output** 892 - 893 - ### API Extensions 894 - 895 - #### New Functions 896 - ```lua 897 - -- Enhanced CQT functions 898 - value = cqte(bin) -- Get enhanced CQT (with all processing) 899 - value = cqtes(bin) -- Get enhanced + smoothed CQT 900 - 901 - -- Configuration functions 902 - cqt_enhance(enable) -- Enable/disable enhancement (default: true) 903 - cqt_variable_q(enable) -- Toggle variable-Q mode 904 - cqt_bass_q(q_factor) -- Set Q for bass region (20-80 Hz, default: 34) 905 - cqt_hps(enable) -- Toggle harmonic-percussive separation 906 - cqt_whitening(enable) -- Toggle spectral whitening 907 - cqt_threshold(margin) -- Set noise threshold margin (0-10 dB) 908 - cqt_onset_boost(factor) -- Set onset enhancement (0-2.0) 909 202 ``` 910 203 911 - ### Performance Considerations 204 + ## Current Implementation Status 912 205 913 - #### Computational Cost 914 - - **HPS**: ~0.5ms for median filtering (acceptable) 915 - - **Whitening**: Negligible (simple division) 916 - - **Thresholding**: Negligible (comparison) 917 - - **Onset**: Negligible (subtraction + envelope) 918 - - **Total overhead**: ~0.5-1ms additional 919 - 920 - #### Memory Usage 921 - - History buffer: 5 frames × 120 bins × 4 bytes = 2.4KB 922 - - Enhancement data: ~3KB total 923 - - Still well within fantasy computer constraints 206 + ### Completed Features 207 + - **FFT**: 1024 bins with exact original behavior preserved 208 + - **CQT**: 120 bins with Variable-Q implementation 209 + - **Spectral Whitening**: Per-bin normalization for CQT 210 + - **Shared Audio Buffer**: Automatic sizing for both FFT and CQT 211 + - **Lua API**: `fft()`, `ffts()`, `cqt()` functions implemented 924 212 925 213 ### Configuration Options 214 + - `CQT_FFT_SIZE`: Default 8192 (configurable in cqtdata.h) 215 + - `CQT_SPECTRAL_WHITENING_ENABLED`: Toggle spectral whitening (0/1) 216 + - `CQT_WHITENING_DECAY`: Running average decay factor (default 0.99) 217 + - `CQT_SMOOTHING_FACTOR`: CQT smoothing factor (default 0.3) 926 218 927 - #### Tunable Parameters 928 - - **HPS window**: 5-7 frames (time) × 3-5 bins (frequency) 929 - - **Whitening time constant**: 0.98-0.995 (1-2 second adaptation) 930 - - **Threshold margin**: 3-6 dB above noise floor 931 - - **Onset boost**: 1.5-3.0x multiplier 932 - - **Enhancement mix**: 0-100% enhanced vs raw 219 + ### Test Scripts 220 + - `demo_fft_cqt_hybrid.lua`: Combined FFT/CQT visualization 221 + - `test_cqt_spectrum_v2.lua`: CQT spectrum analyzer 222 + - `test_fft_restored.lua`: FFT verification 223 + - `test_cqt_variable_q.lua`: Variable-Q demonstration 224 + - `test_cqt_whitening.lua`: Spectral whitening comparison 933 225 934 - ### Testing Strategy 226 + ## Future Enhancements 935 227 936 - #### Test Scenarios 937 - 1. **Electronic track with heavy bass**: Should isolate bass notes from kick 938 - 2. **Chord progressions**: Should show clear note changes 939 - 3. **Melody over drums**: Should suppress drum interference 940 - 4. **Ambient/noise**: Should adapt to varying noise floor 941 - 5. **Fast arpeggios**: Should highlight note onsets 228 + ### Additional API Functions 229 + - `cqts(bin)`: Smoothed CQT data 230 + - `cqto(octave, note)`: Raw CQT by musical note 231 + - `cqtos(octave, note)`: Smoothed CQT by musical note 942 232 943 - #### Visualization Modes 944 - - Side-by-side comparison (raw vs enhanced) 945 - - Individual enhancement layers (harmonic, percussive, etc.) 946 - - Onset detection visualization 947 - - Noise floor tracking display 233 + ### Signal Processing Enhancements 234 + - **Harmonic-Percussive Separation**: Isolate tonal content from drums 235 + - **Adaptive Thresholding**: Dynamic noise floor removal 236 + - **Note Onset Enhancement**: Make note attacks more visible 237 + - **Enhanced Variable-Q**: Support for 16K FFT for better bass resolution 948 238 949 - ### Implementation Phases 239 + ### Platform Support 240 + - Add CQT to other language bindings (currently Lua only) 241 + - GPU acceleration for kernel application 242 + - Configurable FFT sizes at runtime 950 243 951 - #### Phase 1: Variable-Q Transform (Foundation) 952 - - Modify kernel generation to accept per-bin Q values 953 - - Implement aggressive Q for bass frequencies (20-80 Hz) 954 - - Test frequency resolution improvements 955 - - May require switching to 16K FFT for bass accuracy 244 + ## Important Notes 956 245 957 - #### Phase 2: Spectral Whitening (Simplest enhancement) 958 - - Add running average tracking 959 - - Implement normalization 960 - - Test with electronic music 246 + - The project supports multiple scripting languages, all exposed through the same API 247 + - Platform-specific code should be isolated in src/system/ 248 + - The studio uses immediate mode GUI principles 249 + - Cartridge format is documented in wiki and src/studio/project.c 250 + - PRO version enables additional features like extra memory banks 961 251 962 - #### Phase 3: Adaptive Thresholding 963 - - Add noise floor estimation 964 - - Implement thresholding 965 - - Combine with whitening 966 - 967 - #### Phase 4: Harmonic-Percussive Separation 968 - - Add history buffer 969 - - Implement median filters 970 - - Test separation quality 971 - 972 - #### Phase 5: Onset Enhancement 973 - - Add derivative calculation 974 - - Implement onset envelopes 975 - - Fine-tune parameters 976 - 977 - #### Phase 6: API and Integration 978 - - Add Lua API functions 979 - - Create demo visualizations 980 - - Document usage 981 - 982 - ### Expected Results 983 - 984 - For electronic music visualization: 985 - - **Before**: All CQT bins active, drums obscure notes 986 - - **After**: Only active musical notes visible, clean separation 987 - - **Visual impact**: Notes "pop" with clear onset and sustain 988 - - **Frame rate**: Minimal impact (still ~5 fps with 8K FFT) 989 - 990 - ### Future Extensions 991 - 992 - 1. **Genre-specific presets**: EDM, ambient, classical settings 993 - 2. **MIDI output**: Convert enhanced CQT to MIDI notes 994 - 3. **Key detection**: Analyze enhanced harmonic content 995 - 4. **Chord recognition**: Pattern matching on clean note data 996 - 5. **Multi-resolution**: Different processing for bass/treble 997 - 998 - ## Detailed Variable-Q Implementation Plan 252 + ## Harmonic-Percussive Separation (HPS) Plan 999 253 1000 254 ### Overview 1001 - Variable-Q CQT allows different frequency resolution across the spectrum, optimized for electronic music where bass note separation is critical. With 16K FFT, we can achieve excellent bass resolution while maintaining computational efficiency. 1002 255 1003 - ### Q Factor Design 1004 - 1005 - #### Frequency-Dependent Q Values 1006 - ``` 1007 - 20-40 Hz: Q = 34 (2.9% bandwidth, ~1 semitone) 1008 - 40-80 Hz: Q = 28 (3.6% bandwidth, ~0.6 semitones) 1009 - 80-160 Hz: Q = 20 (5% bandwidth, ~0.8 semitones) 1010 - 160-320 Hz: Q = 17 (5.9% bandwidth, standard CQT) 1011 - 320-640 Hz: Q = 14 (7.1% bandwidth, slightly wider) 1012 - 640+ Hz: Q = 12 (8.3% bandwidth, smoother visualization) 1013 - ``` 256 + Harmonic-Percussive Separation isolates tonal (harmonic) content from rhythmic (percussive) content in audio signals. This enhancement will apply HPS exclusively to CQT data, providing better separation of musical elements like sustained notes from drum hits. 1014 257 1015 - #### Rationale 1016 - - **Ultra-high Q in sub-bass** (20-40 Hz): Electronic music often has closely spaced bass notes 1017 - - **High Q in bass** (40-80 Hz): Critical for distinguishing kick from bass 1018 - - **Gradual reduction**: Smoother visualization at higher frequencies where exact pitch less critical 258 + ### Algorithm Design 1019 259 1020 - ### Implementation Details 260 + **Median Filtering Approach (Fitzgerald 2010):** 261 + - Apply median filters on CQT magnitude spectrogram 262 + - Horizontal (time) median filter → captures harmonic content (sustained tones) 263 + - Vertical (frequency) median filter → captures percussive content (transients) 264 + - Generate masks by comparing filtered outputs 265 + - Apply masks to separate harmonic and percussive components 1021 266 1022 - #### 1. Kernel Generation Modifications (cqt_kernel.c) 267 + ### Architecture 1023 268 1024 - ##### Calculate Variable Q 1025 - ```c 1026 - float calculateVariableQ(float centerFreq) { 1027 - if (centerFreq < 40.0f) return 34.0f; 1028 - else if (centerFreq < 80.0f) return 28.0f; 1029 - else if (centerFreq < 160.0f) return 20.0f; 1030 - else if (centerFreq < 320.0f) return 17.0f; 1031 - else if (centerFreq < 640.0f) return 14.0f; 1032 - else return 12.0f; 1033 - } 1034 - ``` 1035 - 1036 - ##### Window Length Calculation 1037 - ```c 1038 - // In generateSingleKernel() 1039 - float Q = calculateVariableQ(centerFreq); 1040 - int windowLength = (int)(Q * sampleRate / centerFreq); 269 + #### New Files 270 + - **src/ext/hps.c**: HPS processing implementation 271 + - **src/ext/hps.h**: HPS interface and function declarations 272 + - **src/hpsdata.h/c**: HPS data structures and storage 1041 273 1042 - // With 16K FFT, we can accommodate much longer windows 1043 - if (windowLength > fftSize) { 1044 - // For very low frequencies, apply gentle tapering instead of hard truncation 1045 - windowLength = fftSize; 1046 - // Adjust Q to match actual window: Q_effective = windowLength * centerFreq / sampleRate 1047 - } 274 + #### Data Flow 1048 275 ``` 1049 - 1050 - ##### Expected Window Lengths (16K FFT) 1051 - - 20 Hz: Q=34 → 74,970 samples → clamped to 16,384 (Q_eff ≈ 7.4) 1052 - - 30 Hz: Q=34 → 49,980 samples → clamped to 16,384 (Q_eff ≈ 11.2) 1053 - - 40 Hz: Q=28 → 30,975 samples → clamped to 16,384 (Q_eff ≈ 14.9) 1054 - - 60 Hz: Q=28 → 20,650 samples → clamped to 16,384 (Q_eff ≈ 22.3) 1055 - - 80 Hz: Q=20 → 11,025 samples → fits! (Q = 20) 1056 - - 100+ Hz: All fit within 16K samples 1057 - 1058 - #### 2. Memory Management 1059 - 1060 - ##### Sparse Kernel Storage Adaptation 1061 - With variable Q, kernel sparsity varies by frequency: 1062 - - Low frequencies (high Q): More non-zero values 1063 - - High frequencies (low Q): Fewer non-zero values 1064 - 1065 - ```c 1066 - // Adaptive sparsity threshold 1067 - float getSparsityThreshold(float centerFreq) { 1068 - float Q = calculateVariableQ(centerFreq); 1069 - // Higher Q needs lower threshold to preserve frequency selectivity 1070 - if (Q > 30) return 0.005f; 1071 - else if (Q > 20) return 0.01f; 1072 - else return 0.02f; 1073 - } 276 + Audio Buffer → CQT Processing → CQT Magnitude → HPS Processing → Separated Components 277 + ↓ 278 + Harmonic & Percussive CQT 1074 279 ``` 1075 280 1076 - #### 3. Kernel Normalization 1077 - 1078 - Variable-Q requires careful normalization to ensure consistent output levels: 1079 - 281 + #### Integration in tic_core_tick 1080 282 ```c 1081 - // Energy normalization per kernel 1082 - float kernelEnergy = 0.0f; 1083 - for (int i = 0; i < windowLength; i++) { 1084 - kernelEnergy += window[i] * window[i]; 1085 - } 1086 - float normFactor = sqrtf(windowLength / kernelEnergy); 1087 - 1088 - // Apply to kernel after windowing 1089 - for (int i = 0; i < windowLength; i++) { 1090 - kernel[i] *= normFactor; 283 + if (fftEnabled) { 284 + FFT_GetFFT(fftData); // Regular FFT processing (unchanged) 285 + 286 + if (cqtEnabled) { 287 + CQT_ProcessAudio(); 288 + 289 + if (hpsEnabled) { 290 + HPS_ProcessCQT(cqtData); // Apply HPS to CQT only 291 + } 292 + } 1091 293 } 1092 294 ``` 1093 295 1094 - ### 16K FFT Configuration 296 + ### Data Structures 1095 297 1096 - #### Update cqtdata.h 1097 298 ```c 1098 - // Change from 8K to 16K 1099 - #define CQT_FFT_SIZE 16384 299 + // In hpsdata.h 300 + #define HPS_HISTORY_SIZE 32 // Frames of history for median filtering 301 + #define HPS_MEDIAN_SIZE_H 17 // Horizontal filter size (0.8-1.5s @ 5.4fps) 302 + #define HPS_MEDIAN_SIZE_P 17 // Vertical filter size (1.4 octaves) 1100 303 1101 - // Add variable-Q configuration 1102 - #define CQT_VARIABLE_Q_ENABLED 1 1103 - #define CQT_BASS_Q_FACTOR 34.0f 1104 - #define CQT_MID_Q_FACTOR 17.0f 1105 - #define CQT_TREBLE_Q_FACTOR 12.0f 304 + typedef struct { 305 + // CQT magnitude history buffer 306 + float cqtHistory[HPS_HISTORY_SIZE][120]; 307 + int historyIndex; 308 + 309 + // Separated CQT components 310 + float harmonicCQT[120]; // Sustained tonal content 311 + float percussiveCQT[120]; // Transient/rhythmic content 312 + 313 + // Smoothed outputs 314 + float harmonicSmoothing[120]; 315 + float percussiveSmoothing[120]; 316 + 317 + // Normalization data 318 + float harmonicNormalized[120]; 319 + float percussiveNormalized[120]; 320 + 321 + // Configuration 322 + float separationStrength; // 0.0-1.0, controls mask hardness 323 + float harmonicGain; // Gain for harmonic component 324 + float percussiveGain; // Gain for percussive component 325 + bool enabled; 326 + } HpsData; 1106 327 ``` 1107 328 1108 - #### Update Buffer Management (fft.c) 1109 - ```c 1110 - // AUDIO_BUFFER_SIZE will automatically adjust to 16384 1111 - // This provides 371ms of audio at 44.1kHz 1112 - // Update rate: 44100/16384 = 2.69 fps 1113 - ``` 329 + ### API Functions 1114 330 1115 - ### Performance Optimization 331 + ```lua 332 + -- CQT-based HPS functions (update rate ~5.4 fps) 333 + hpsh(bin) -- Get harmonic component at CQT bin (0-119) 334 + hpsp(bin) -- Get percussive component at CQT bin (0-119) 335 + hpshs(bin) -- Get smoothed harmonic component 336 + hpsps(bin) -- Get smoothed percussive component 1116 337 1117 - #### 1. Kernel Caching Strategy 1118 - Since kernels are larger with variable-Q: 1119 - - Pre-compute all kernels at startup 1120 - - Store in optimized sparse format 1121 - - Total memory: ~400-500KB (acceptable) 338 + -- Musical note access (octave 0-9, note 0-11) 339 + hpsho(octave, note) -- Harmonic by musical note 340 + hpspo(octave, note) -- Percussive by musical note 1122 341 1123 - #### 2. Processing Optimization 1124 - ```c 1125 - // Process CQT every N frames if needed 1126 - static int frameCounter = 0; 1127 - if (++frameCounter >= CQT_PROCESS_INTERVAL) { 1128 - frameCounter = 0; 1129 - CQT_Process(audioBuffer, fftBuffer, cqtData); 1130 - } 342 + -- Configuration 343 + hpscfg(param, value) -- Configure HPS parameters 344 + -- param: "strength" (0.0-1.0), "hgain" (0.0-2.0), "pgain" (0.0-2.0) 1131 345 ``` 1132 346 1133 - #### 3. SIMD Considerations 1134 - - Ensure kernel application loops are vectorization-friendly 1135 - - Keep data aligned for SIMD operations 1136 - - Profile on target platforms 347 + ### Implementation Details 1137 348 1138 - ### Testing and Validation 349 + #### Median Filtering 350 + - Use efficient sliding window median algorithm 351 + - Handle CQT's logarithmic frequency spacing 352 + - Circular buffer for time history 353 + - Optimize for 120 bins and 32 frame history 1139 354 1140 - #### Test Signal Generation 1141 - Create test signals for each Q region: 1142 - ```lua 1143 - -- Test script for variable-Q validation 1144 - function generateTestTone(freq, duration) 1145 - -- Generate pure tone at specific frequency 1146 - -- Measure CQT bin spread 1147 - -- Verify Q factor matches design 1148 - end 1149 - 1150 - -- Test cases: 1151 - -- 25 Hz: Should show narrow peak (Q=34) 1152 - -- 50 Hz: Should show narrow peak (Q=28) 1153 - -- 100 Hz: Should show moderate peak (Q=20) 1154 - -- 440 Hz: Should show standard CQT peak (Q=14) 1155 - ``` 1156 - 1157 - #### Measurement Metrics 1158 - 1. **3dB Bandwidth**: Measure actual vs theoretical 1159 - 2. **Sidelobe Suppression**: Should be >40dB 1160 - 3. **Cross-talk**: Adjacent bins should have <-20dB leakage 1161 - 1162 - ### Integration with Enhancement Pipeline 355 + #### Masking Strategy 356 + 1. Compute harmonic emphasis: `H = median_h(|CQT|²)` 357 + 2. Compute percussive emphasis: `P = median_p(|CQT|²)` 358 + 3. Generate binary masks: 359 + - Harmonic mask: `M_h = H >= P` 360 + - Percussive mask: `M_p = P > H` 361 + 4. Optional soft masking using Wiener filtering 1163 362 1164 - #### Variable-Q as Foundation 1165 - Variable-Q must be implemented first because: 1166 - 1. Kernel generation is fundamental to CQT 1167 - 2. Other enhancements depend on accurate frequency detection 1168 - 3. Memory layout changes affect all subsequent processing 363 + #### Memory Usage 364 + - History buffer: 32 × 120 × 4 bytes = 15KB 365 + - Processing buffers: ~10KB 366 + - Total additional memory: ~25KB 1169 367 1170 - #### Interaction with Other Enhancements 1171 - - **HPS**: Benefits from better frequency resolution 1172 - - **Whitening**: May need per-Q normalization 1173 - - **Thresholding**: Noise floor varies with Q 1174 - - **Onset**: Higher Q means better temporal smearing 368 + ### Performance Considerations 1175 369 1176 - ### API Implementation 370 + - Process only when CQT is updated (~5.4 fps) 371 + - Reuse existing CQT magnitude calculations 372 + - Optimize median filters for small kernel sizes 373 + - Cache-friendly memory access patterns 374 + - Optional SIMD for median operations 1177 375 1178 - #### Configuration Functions 1179 - ```c 1180 - // In cqt.c 1181 - static float bassQFactor = 34.0f; 1182 - static float midQFactor = 17.0f; 1183 - static float trebleQFactor = 12.0f; 1184 - static bool variableQEnabled = true; 376 + ### Example Usage 1185 377 1186 - void CQT_SetVariableQ(bool enabled) { 1187 - if (variableQEnabled != enabled) { 1188 - variableQEnabled = enabled; 1189 - CQT_RegenerateKernels(); 1190 - } 1191 - } 378 + ```lua 379 + -- Visualize harmonic vs percussive content 380 + function TIC() 381 + cls(0) 382 + 383 + -- Draw harmonic content (sustained notes) 384 + for oct=2,6 do 385 + for note=0,11 do 386 + local h = hpsh(oct*12+note) * 50 387 + local x = note * 20 388 + local y = 100 - oct * 15 389 + rect(x, y-h, 18, h, note+1) 390 + end 391 + end 392 + 393 + -- Draw percussive hits 394 + for i=0,119 do 395 + local p = hpsp(i) 396 + if p > 0.1 then 397 + local x = (i % 12) * 20 398 + local y = 120 - (i // 12) * 10 399 + circ(x+9, y, p*20, 15) 400 + end 401 + end 402 + end 1192 403 1193 - void CQT_SetBassQ(float q) { 1194 - if (bassQFactor != q) { 1195 - bassQFactor = q; 1196 - if (variableQEnabled) CQT_RegenerateKernels(); 1197 - } 1198 - } 404 + -- React to bass drum vs bass note 405 + function TIC() 406 + local bassDrum = 0 407 + local bassNote = 0 408 + 409 + -- Sum percussive energy in bass range (20-80 Hz) 410 + for i=0,35 do -- First 3 octaves 411 + bassDrum = bassDrum + hpsp(i) 412 + end 413 + 414 + -- Find strongest harmonic in bass range 415 + for i=24,35 do -- 2nd octave 416 + if hpsh(i) > hpsh(bassNote) then 417 + bassNote = i 418 + end 419 + end 420 + 421 + -- Visualize separately 422 + cls(0) 423 + circ(120, 68, bassDrum * 30, 8) -- Drum pulse 424 + print("Note: " .. (bassNote % 12), 100, 60, (bassNote % 12) + 1) 425 + end 1199 426 ``` 1200 427 1201 - ### Expected Results with 16K FFT 1202 - 1203 - #### Bass Region (20-80 Hz) 1204 - - **20 Hz**: Q_eff ≈ 7.4 (limited by FFT size, but much better than current 3.7) 1205 - - **30 Hz**: Q_eff ≈ 11.2 (good separation) 1206 - - **40 Hz**: Q_eff ≈ 14.9 (excellent) 1207 - - **60 Hz**: Q_eff = 22.3 (better than designed!) 1208 - - **80 Hz**: Q = 20 (perfect) 1209 - 1210 - #### Electronic Music Benefits 1211 - 1. **Sub-bass**: Can distinguish notes 1-2 semitones apart 1212 - 2. **Bass**: Clear separation between kick and bassline 1213 - 3. **Midrange**: Standard CQT resolution 1214 - 4. **Treble**: Smooth visualization without artifacts 1215 - 1216 - ### Migration Path 1217 - 1218 - #### From 8K to 16K FFT 1219 - 1. Update `CQT_FFT_SIZE` in cqtdata.h 1220 - 2. Verify buffer allocation in fft.c 1221 - 3. Test performance on target platforms 1222 - 4. Adjust frame processing if needed 1223 - 1224 - #### Backwards Compatibility 1225 - - Keep 8K as compile-time option 1226 - - Allow runtime FFT size selection (future) 1227 - - Maintain existing API behavior 1228 - 1229 - ### Debugging and Profiling 1230 - 1231 - #### Debug Output 1232 - ```c 1233 - // Add debug info for each kernel 1234 - printf("Bin %d: Freq %.1f Hz, Q=%.1f, Window=%d samples, Q_eff=%.1f\n", 1235 - bin, centerFreq, Q, windowLength, effectiveQ); 1236 - ``` 428 + ### Implementation Phases 1237 429 1238 - #### Performance Profiling 1239 - - Measure kernel generation time 1240 - - Track per-frame CQT processing time 1241 - - Monitor memory usage 1242 - - Test on various CPUs 430 + **Phase 1: Core HPS Algorithm** 431 + - Implement circular buffer for CQT history 432 + - Create median filtering functions 433 + - Implement basic binary masking 434 + - Add core API functions 1243 435 1244 - ### Future Enhancements 436 + **Phase 2: Enhancements** 437 + - Add Wiener soft masking option 438 + - Implement smoothing and normalization 439 + - Add gain controls 440 + - Optimize performance 1245 441 1246 - 1. **Smooth Q Transitions**: Interpolate Q between frequency bands 1247 - 2. **Adaptive Q**: Adjust based on signal content 1248 - 3. **Multi-resolution**: Different FFT sizes for different octaves 1249 - 4. **GPU Acceleration**: Parallel kernel application 442 + **Phase 3: Extended API** 443 + - Add musical note access functions 444 + - Implement configuration system 445 + - Create demo scripts 446 + - Documentation 1250 447 1251 - ## Important Notes 448 + ### Benefits 1252 449 1253 - - The project supports multiple scripting languages, all exposed through the same API 1254 - - Platform-specific code should be isolated in src/system/ 1255 - - The studio uses immediate mode GUI principles 1256 - - Cartridge format is documented in wiki and src/studio/project.c 1257 - - PRO version enables additional features like extra memory banks 450 + - **Better Bass Separation**: Distinguish bass drums from bass notes 451 + - **Cleaner Visualizations**: Separate melody from rhythm 452 + - **Musical Analysis**: Identify chord progressions without drum interference 453 + - **Creative Effects**: Process harmonic and percussive content differently

Configure Feed

Configure Feed