this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

16k

alice fe803460 924b9be5

+58 -30
+39 -2
CLAUDE.md
··· 487 487 2. Configurable transition frequency 488 488 3. Optional multi-resolution mode for maximum accuracy 489 489 490 + ## FFT Performance Measurements (CRITICAL UPDATE) 491 + 492 + ### M1 Pro Benchmark Results: 493 + Actual measurements completely contradict initial estimates: 494 + ``` 495 + CQT FFT Benchmark on this CPU: 496 + ================================ 497 + 4096-point FFT: 0.014 ms 498 + 6144-point FFT: 0.021 ms (current implementation) 499 + 8192-point FFT: 0.025 ms 500 + 12288-point FFT: 0.047 ms 501 + 16384-point FFT: 0.066 ms (!!) 502 + ================================ 503 + ``` 504 + 505 + ### Key Findings: 506 + - **75-150x faster than conservative estimates** 507 + - 16K FFT uses only 0.066ms (0.4% of 16.67ms frame budget at 60fps) 508 + - Even 32K FFT would likely be ~0.15ms (still under 1% frame budget) 509 + - Apple Silicon (or auto-vectorization) provides exceptional FFT performance 510 + 511 + ### Performance Comparison Results: 512 + - **M1 Pro**: 16K FFT = 0.066ms 513 + - **Intel i5-1130G7 (ThinkPad X1 Nano)**: 514 + - Performance mode: 16K FFT = 0.112ms (0.67% of frame budget) 515 + - Power save mode: 16K FFT = 0.335ms (2% of frame budget) 516 + - Even in power save mode, 16K FFT is completely viable! 517 + 518 + ### Implications: 519 + **16K FFT is now implemented** and provides: 520 + - 20Hz: Q≈7.4 (truncated from ideal 17, but much better than previous 1.86) 521 + - 30Hz: Q≈11.2 (near ideal) 522 + - 45Hz+: Full Q≈17 (ideal resolution - no truncation!) 523 + - Dramatically improved low-frequency resolution for electronic music 524 + 525 + The profiling code has been added to measure actual performance on each platform. 526 + 490 527 ## Phase 3: Next Steps 491 - 1. Implement hybrid windowing approach (immediate priority) 528 + 1. ~~Implement 16K FFT based on benchmark results~~ (COMPLETE) 492 529 2. Add remaining API functions: `cqts()`, `cqto()`, `cqtos()` 493 530 3. Create separate audio buffer for CQT (restore FFT_SIZE to 1024) 494 531 4. Create FFT vs CQT comparison demo 495 - 5. Performance optimization if needed 532 + 5. Test and verify improved frequency resolution at 20Hz, 50Hz, 100Hz 496 533 497 534 ### Test Script Example 498 535 ```lua
+1 -1
src/cqtdata.h
··· 4 4 #define CQT_BINS 120 5 5 #define CQT_OCTAVES 10 6 6 #define CQT_BINS_PER_OCTAVE 12 7 - #define CQT_FFT_SIZE 6144 // 6K FFT - balance between quality and performance 7 + #define CQT_FFT_SIZE 16384 // 16K FFT - excellent low-frequency resolution with minimal performance impact 8 8 9 9 // CQT frequency range 10 10 #define CQT_MIN_FREQ 20.0f // Sub-bass for electronic music
+4 -4
src/ext/cqt.c
··· 23 23 printf("\nCQT FFT Benchmark on this CPU:\n"); 24 24 printf("================================\n"); 25 25 26 - int sizes[] = {4096, 6144, 8192, 12288, 16384}; 27 - int numSizes = 5; 26 + int sizes[] = {4096, 6144, 8192, 12288, 16384, 24576, 32768}; 27 + int numSizes = 7; 28 28 29 29 for (int s = 0; s < numSizes; s++) 30 30 { ··· 242 242 static double totalKernelTime = 0.0; 243 243 static int profileCount = 0; 244 244 245 - // Perform 6144-point FFT with timing 245 + // Perform 16384-point FFT with timing 246 246 clock_t fftStart = clock(); 247 247 kiss_fftr(cqtFftCfg, cqtAudioBuffer, cqtFftOutput); 248 248 clock_t fftEnd = clock(); ··· 273 273 // Print profiling info every 60 frames (~1 second) 274 274 if (profileCount % 60 == 0) 275 275 { 276 - printf("CQT Performance (6K FFT):\n"); 276 + printf("CQT Performance (16K FFT):\n"); 277 277 printf(" FFT avg: %.3fms\n", totalFftTime / profileCount); 278 278 printf(" Kernels avg: %.3fms\n", totalKernelTime / profileCount); 279 279 printf(" Total avg: %.3fms\n", (totalFftTime + totalKernelTime) / profileCount);
+12 -21
src/ext/cqt_kernel.c
··· 83 83 float Q = CQT_CalculateQ(CQT_BINS_PER_OCTAVE); 84 84 int windowLength; 85 85 86 - if (centerFreq < 100.0f) { 87 - // With 6K FFT, we can use higher Q for better resolution 88 - // 20Hz: Q=2.8 gives ~6144 samples (full FFT) 89 - // 50Hz: Q=7 gives ~6174 samples (slightly truncated) 90 - // 100Hz: Q=14 gives ~6174 samples (slightly truncated) 91 - float targetQ = Q; // Start with ideal Q 92 - windowLength = (int)(targetQ * sampleRate / centerFreq); 93 - 94 - // If window doesn't fit, reduce Q to fit exactly 95 - if (windowLength > fftSize) { 96 - targetQ = (float)fftSize * centerFreq / sampleRate; 97 - windowLength = fftSize; 98 - } 99 - } else { 100 - // Constant-Q for higher frequencies 101 - windowLength = (int)(Q * sampleRate / centerFreq); 102 - 103 - // Ensure it fits in FFT size with some margin 104 - if (windowLength > fftSize * 0.9) { 105 - windowLength = (int)(fftSize * 0.9); 106 - } 86 + // With 16K FFT, we can use full constant-Q across the entire spectrum! 87 + windowLength = (int)(Q * sampleRate / centerFreq); 88 + 89 + // At 20Hz: windowLength = 17 * 44100 / 20 = 37,485 samples 90 + // 16K FFT can handle up to frequencies down to ~45 Hz without truncation 91 + // For lower frequencies, we'll still get better Q than before 92 + 93 + // Ensure it fits in FFT size 94 + if (windowLength > fftSize) { 95 + windowLength = fftSize; 96 + // Even at 20Hz with truncation to 16384 samples: 97 + // Effective Q = 16384 * 20 / 44100 = 7.4 (much better than 1.86!) 107 98 } 108 99 109 100 // Ensure window length is reasonable
+2 -2
src/fftdata.h
··· 1 1 #pragma once 2 2 #include <stdbool.h> 3 - // TEMPORARY: Changed from 1024 to 3072 to support CQT's 6144-point FFT 3 + // TEMPORARY: Changed from 1024 to 8192 to support CQT's 16384-point FFT 4 4 // This breaks FFT bin resolution but enables CQT to work properly 5 5 // TODO: Restore to 1024 and implement separate buffer for CQT 6 - #define FFT_SIZE 3072 6 + #define FFT_SIZE 8192 7 7 extern float fPeakMinValue; 8 8 extern float fPeakSmoothing; 9 9 extern float fPeakSmoothValue;