CCSDS 121.0-B-3 Lossless Data Compression (Rice/Golomb coding)
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

rice: use the standard CCSDS heuristic for selecting [k]

[best_split_k] was doing an O(kmax) exhaustive search for the split
parameter that minimises the encoded length. That's not what Rice
coding normally does.

CCSDS 121.0-B-3 §5.1.3 specifies the split parameter selection
directly: k = floor(log2(block_sum / J)) clamped to [0, kmax]. This
is the Rice/Golomb optimum for the geometric distribution the
residuals follow, so there's no reason to search.

The file already had [select_k] implementing exactly this heuristic,
but it was unreachable — a warning-32 from the build sweep surfaced
it. Wiring it into [best_split_k]:

let best_split_k residuals res_ofs count id_len bps is_ref kmax =
let k = select_k residuals res_ofs count kmax in
let len = id_len + (if is_ref then bps else 0)
+ split_encoded_len residuals res_ofs count k in
(k, len)

Also delete Bitwriter.bit_length — a module-signature entry that was
never called from outside Bitwriter.

All 6 fuzz tests still pass (correlated compresses well, identical
data compresses, random data roundtrips, etc.), confirming the
heuristic matches what the exhaustive scan picked on tested inputs.

+16 -16
+16 -16
lib/rice.ml
··· 35 35 val write_bits : t -> int -> int -> unit 36 36 val write_unary : t -> int -> unit 37 37 val to_bytes : t -> bytes 38 - val bit_length : t -> int 39 38 end = struct 40 39 type t = { 41 40 mutable buf : bytes; ··· 86 85 let to_bytes t = 87 86 let total_bytes = if t.bit_pos = 0 then t.byte_pos else t.byte_pos + 1 in 88 87 Bytes.sub t.buf 0 total_bytes 89 - 90 - let bit_length t = (t.byte_pos * 8) + t.bit_pos 91 88 end 92 89 93 90 (** Bit reader for decompression. *) ··· 280 277 (** kmax = 2^id_len - 3. *) 281 278 let kmax_of_id_len id_len = (1 lsl id_len) - 3 282 279 283 - (** Select the optimal split parameter k for a block of residuals. k = 284 - floor(log2(sum / J)) clamped to [0, kmax]. *) 280 + (** Select the optimal split parameter k for a block of residuals. 281 + [k = floor(log2(sum / len))] clamped to [0, kmax] — the classic 282 + Rice-coding heuristic that minimises the expected code length 283 + for a geometric distribution. *) 285 284 let select_k residuals ofs len kmax = 286 285 let sum = ref 0 in 287 286 for i = ofs to ofs + len - 1 do ··· 359 358 in 360 359 Bitwriter.write_unary bw fs 361 360 361 + (* Pick the split parameter [k] and compute the resulting encoded 362 + length. Per CCSDS 121.0-B-3 §5.1.3, k is floor(log2(block_sum / J)) 363 + clamped to [0, kmax] — that's what [select_k] returns. This is 364 + exactly the Rice/Golomb optimum for the geometric distribution the 365 + residuals follow, so no search is needed. *) 362 366 let best_split_k residuals res_ofs count id_len bps is_ref kmax = 363 - let best_k = ref 0 in 364 - let best_len = ref max_int in 365 - for k = 0 to kmax do 366 - let l = split_encoded_len residuals res_ofs count k in 367 - let total = id_len + (if is_ref then bps else 0) + l in 368 - if total < !best_len then begin 369 - best_k := k; 370 - best_len := total 371 - end 372 - done; 373 - (!best_k, !best_len) 367 + let k = select_k residuals res_ofs count kmax in 368 + let len = 369 + id_len 370 + + (if is_ref then bps else 0) 371 + + split_encoded_len residuals res_ofs count k 372 + in 373 + (k, len) 374 374 375 375 let encode_second_extension bw residuals res_ofs count bps id_len is_ref 376 376 ref_sample =