HTTP types: headers, status codes, methods, bodies, MIME types
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

http, s3, requests: align canonicalize_value with boto3 + SDKs, add adversarial + interop tests

Researched what real SigV4 implementations do with quoted strings in
header values. boto3 (_header_value = ' '.join(v.split())), AWS Java
SDK v2 (SignerUtils.trimAll), and AWS Go SDK v2 (v4.stripExcessSpaces)
all treat the value as an opaque byte string and collapse every run
of ASCII whitespace to a single space -- no exception for double-
quoted segments. The AWS SigV4 spec itself says the same.

Our previous implementation followed RFC 7230 §3.2.3 and preserved
interior whitespace inside quoted strings. That silently produces
signatures that disagree with every major SDK the moment a header
like x-amz-meta-note carries a quoted value. Switch to the collapse-
all rule so we match the ecosystem.

- http: canonicalize_value uses a single pass matching Python's
[' '.join(v.split())] without any pre-trim so ASCII whitespace
(SP, HTAB, LF, CR, VT, FF) is handled consistently. mli cites
boto3 / SDK equivalents and documents the intentional deviation
from RFC 7230.

- ocaml-http adversarial tests: 22 cases covering empty, quoted
empties, unmatched quotes, backslashes in/out of quotes,
adjacent/multiple quoted segments, high-byte UTF-8, idempotence,
large inputs, many quoted segments, case preservation,
comma-list preservation.

- ocaml-http boto3 interop (new): scripts/generate.py dumps 36
(input, expected) pairs via botocore._header_value, hex-encoded
CSV. test.ml checks Headers.canonicalize_value byte-for-byte.
Regenerate via dune build @regen-traces.

- ocaml-requests RFC 9421 signature module: 14 adversarial tests
covering whitespace irrelevance at outer bounds, significance of
interior whitespace, case sensitivity, content tampering,
multi-line combining, ;bs byte-sequence handling, cross-component
isolation, and end-to-end verify-side roundtrips.

- ocaml-s3: SigV4 test updated from 'quoted string preserved' to
'quoted string collapsed' to reflect the corrected behaviour.

All 36 boto3 interop fixtures match byte-for-byte. All 16 RFC 9421
Appendix B vectors still pass.

+277 -52
+22 -25
lib/headers.ml
··· 399 399 let count = List.length headers in 400 400 Fmt.pf ppf "Headers(%d entries)" count 401 401 402 - (* RFC 7230 §3.2.3. *) 402 + (* Matches boto3's [' '.join(value.split())]: split on any ASCII 403 + whitespace (SP, HTAB, LF, CR, VT, FF), rejoin with single SP. 404 + AWS SDK for Java v2 ([SignerUtils.trimAll]) and AWS SDK for Go 405 + v2 ([v4.stripExcessSpaces]) use the same rule. *) 406 + let is_ws c = 407 + c = ' ' || c = '\t' || c = '\n' || c = '\r' || c = '\x0b' || c = '\x0c' 408 + 403 409 let canonicalize_value v = 404 - let trimmed = String.trim v in 405 - let n = String.length trimmed in 410 + (* [' '.join(v.split())] without allocating the intermediate list. 411 + Never emit leading or trailing whitespace; between non-whitespace 412 + bytes, emit one SP if any whitespace separated them. *) 413 + let n = String.length v in 406 414 let buf = Buffer.create n in 407 - let in_quotes = ref false in 408 - let prev_space = ref false in 409 - let i = ref 0 in 410 - while !i < n do 411 - let c = trimmed.[!i] in 412 - (match c with 413 - | '"' -> 414 - Buffer.add_char buf c; 415 - in_quotes := not !in_quotes; 416 - prev_space := false 417 - | '\\' when !in_quotes && !i + 1 < n -> 418 - Buffer.add_char buf c; 419 - Buffer.add_char buf trimmed.[!i + 1]; 420 - incr i; 421 - prev_space := false 422 - | (' ' | '\t') when not !in_quotes -> 423 - if not !prev_space then Buffer.add_char buf ' '; 424 - prev_space := true 425 - | c -> 426 - Buffer.add_char buf c; 427 - prev_space := false); 428 - incr i 415 + let pending_space = ref false in 416 + let seen_non_ws = ref false in 417 + for i = 0 to n - 1 do 418 + let c = v.[i] in 419 + if is_ws c then pending_space := !seen_non_ws 420 + else begin 421 + if !pending_space then Buffer.add_char buf ' '; 422 + Buffer.add_char buf c; 423 + pending_space := false; 424 + seen_non_ws := true 425 + end 429 426 done; 430 427 Buffer.contents buf 431 428
+15 -11
lib/headers.mli
··· 361 361 (** {1 Value canonicalisation} *) 362 362 363 363 val canonicalize_value : string -> string 364 - (** [canonicalize_value v] applies the RFC 7230 "conditional whitespace" 365 - normalisation used by HTTP header canonicalisation and consumers such as AWS 366 - SigV4 signing: 364 + (** [canonicalize_value v] produces the whitespace-normalised header value used 365 + when two implementations must agree on an exact byte representation (SigV4 366 + canonical headers, Vary-keyed cache lookups, …): 367 367 368 - - Leading and trailing whitespace is trimmed. 369 - - Outside a double-quoted string, runs of SP / HTAB collapse to a single SP. 370 - - Inside a double-quoted string, whitespace is preserved verbatim; 371 - backslash-escaped characters are kept as-is. 368 + - Leading and trailing SP / HTAB is trimmed. 369 + - Every interior run of SP / HTAB collapses to a single SP. 372 370 373 - Used when two implementations must agree on a header value's exact byte 374 - representation (e.g. comparing the headers they included in a signature). 371 + Matches the behaviour of {{:https://github.com/boto/botocore}botocore}'s 372 + [_canonical_header_values], AWS SDK for Java v2's [SignerUtils.trimAll], and 373 + AWS SDK for Go v2's [v4.stripExcessSpaces]: ["' '.join(v.split())"] 374 + semantics with no exception for double-quoted strings. The AWS SigV4 375 + specification itself — "Convert sequential spaces in header values to a 376 + single space" — does not carve out quoted strings either, so honouring the 377 + RFC 7230 §3.2.3 quoted-string exception would produce signatures that 378 + disagree with every major SDK. 375 379 376 380 {[ 377 - canonicalize_value " \"hello world\" " = "\"hello world\""; 378 - canonicalize_value " foo bar " = "foo bar" 381 + canonicalize_value " foo bar " = "foo bar"; 382 + canonicalize_value " \"hello world\" " = "\"hello world\"" 379 383 ]} *) 380 384 381 385 (** {1 HTTP/2 Pseudo-Header Support}
+3
test/interop/boto3/.gitignore
··· 1 + .venv/ 2 + __pycache__/ 3 + *.pyc
+17
test/interop/boto3/dune
··· 1 + (test 2 + (name test) 3 + (libraries http csvt alcotest fmt) 4 + (deps 5 + (source_tree traces) 6 + (source_tree scripts))) 7 + 8 + ; Regenerate traces against botocore: dune build @regen-traces 9 + 10 + (rule 11 + (alias regen-traces) 12 + (deps 13 + (source_tree scripts)) 14 + (action 15 + (chdir 16 + scripts 17 + (run ./generate.sh))))
+84
test/interop/boto3/scripts/generate.py
··· 1 + #!/usr/bin/env python3 2 + """Generate canonicalisation fixtures using botocore. 3 + 4 + The oracle is ``botocore.auth.SigV4Auth._header_value``, which AWS 5 + SigV4 and RFC 7230-compatible consumers use to turn a raw header 6 + value into its canonical comparison form. AWS Java SDK v2 7 + (``SignerUtils.trimAll``) and AWS Go SDK v2 (``v4.stripExcessSpaces``) 8 + apply the same rule. 9 + 10 + For each input string we record the exact byte output that boto3 11 + produces; our ``Http.Headers.canonicalize_value`` must match it 12 + byte-for-byte. 13 + """ 14 + 15 + import csv 16 + import os 17 + import sys 18 + 19 + from botocore.auth import SigV4Auth 20 + 21 + 22 + def canonicalize(value: str) -> str: 23 + return SigV4Auth._header_value(None, value) 24 + 25 + 26 + FIXTURES = [ 27 + ("empty", ""), 28 + ("single-space", " "), 29 + ("only-spaces", " "), 30 + ("only-tabs", "\t\t\t"), 31 + ("mixed-whitespace", " \t \t "), 32 + ("plain", "foo"), 33 + ("leading-space", " foo"), 34 + ("trailing-space", "foo "), 35 + ("leading-trailing", " foo "), 36 + ("interior-runs", "foo bar"), 37 + ("tabs-collapse", "foo\t\tbar"), 38 + ("mixed-ws-runs", "foo \t \t bar"), 39 + ("comma-list", "gzip, deflate, br"), 40 + ("quoted-plain", "\"hello world\""), 41 + ("quoted-interior-ws", "\"hello world\""), 42 + ("quoted-leading-ws-inside", "\" hello\""), 43 + ("quoted-tabs-inside", "\"a\tb\""), 44 + ("around-quoted", " \"a b\" "), 45 + ("unmatched-open-quote", "\"foo bar"), 46 + ("unmatched-close-quote", "foo\" bar"), 47 + ("adjacent-quoted", "\"a\"\"b\""), 48 + ("multi-quoted-spaced", "\"a\" \"b\""), 49 + ("backslash-in-quotes", "\"a\\\"b\""), 50 + ("double-backslash", "\"\\\\\""), 51 + ("utf8-plain", "\u00e9"), # é 52 + ("utf8-with-space", " \u00e9\t\u00e9 "), 53 + ("newline-interior", "foo\nbar"), 54 + ("cr-interior", "foo\rbar"), 55 + ("vt-interior", "foo\x0bbar"), 56 + ("ff-interior", "foo\x0cbar"), 57 + ("mixed-control-ws", "foo \t\n\r\x0b\x0c bar"), 58 + ("case-preserved", "HeLLo WoRLd"), 59 + ("commas-preserved", "a,b,c"), 60 + ("padded-commas", "a , b , c"), 61 + ("single-char", "x"), 62 + ("large-run", " " * 200 + "x" + " " * 200), 63 + ] 64 + 65 + 66 + def main(): 67 + if len(sys.argv) != 2: 68 + print("usage: generate.py <trace-dir>", file=sys.stderr) 69 + sys.exit(2) 70 + trace_dir = sys.argv[1] 71 + os.makedirs(trace_dir, exist_ok=True) 72 + out_path = os.path.join(trace_dir, "canonicalize.csv") 73 + with open(out_path, "w", newline="") as f: 74 + w = csv.writer(f) 75 + w.writerow(["name", "input_hex", "expected_hex"]) 76 + for name, raw in FIXTURES: 77 + raw_b = raw.encode("utf-8") 78 + out_b = canonicalize(raw).encode("utf-8") 79 + w.writerow([name, raw_b.hex(), out_b.hex()]) 80 + print(f"Wrote {len(FIXTURES)} fixtures to {out_path}") 81 + 82 + 83 + if __name__ == "__main__": 84 + main()
+14
test/interop/boto3/scripts/generate.sh
··· 1 + #!/bin/bash 2 + # Regenerate canonicalize.csv against botocore: dune build @regen-traces 3 + 4 + set -euo pipefail 5 + SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" 6 + TRACE_DIR="$(cd "$SCRIPT_DIR/../traces" && pwd)" 7 + 8 + cd "$SCRIPT_DIR" 9 + if [ ! -d .venv ]; then 10 + python3 -m venv .venv 11 + .venv/bin/pip install --quiet --upgrade pip 12 + .venv/bin/pip install --quiet -r requirements.txt 13 + fi 14 + .venv/bin/python3 generate.py "$TRACE_DIR"
+1
test/interop/boto3/scripts/requirements.txt
··· 1 + botocore==1.35.0
+73
test/interop/boto3/test.ml
··· 1 + (** boto3 interop tests for {!Http.Headers.canonicalize_value}. 2 + 3 + Fixtures are raw (input, expected) pairs produced by 4 + [botocore.auth.SigV4Auth._header_value], which is the single function every 5 + AWS SDK uses to turn a raw header value into its canonical comparison form. 6 + Our implementation must match it byte-for-byte; divergence here is a 7 + signature-validation bug waiting to happen. 8 + 9 + Regenerate: [dune build @regen-traces]. *) 10 + 11 + module Headers = Http.Headers 12 + 13 + let trace_path = Filename.concat "traces" "canonicalize.csv" 14 + 15 + type fixture = { name : string; input : string; expected : string } 16 + 17 + let hex_to_bytes hex = 18 + let n = String.length hex in 19 + if n mod 2 <> 0 then invalid_arg (Fmt.str "odd-length hex: %S" hex); 20 + let b = Bytes.create (n / 2) in 21 + for i = 0 to (n / 2) - 1 do 22 + let h = int_of_string (Fmt.str "0x%s" (String.sub hex (2 * i) 2)) in 23 + Bytes.set_uint8 b i h 24 + done; 25 + Bytes.unsafe_to_string b 26 + 27 + let fixture_codec = 28 + Csvt.Row.( 29 + obj (fun name input_hex expected_hex -> 30 + { 31 + name; 32 + input = hex_to_bytes input_hex; 33 + expected = hex_to_bytes expected_hex; 34 + }) 35 + |> col "name" Csvt.string ~enc:(fun r -> r.name) 36 + |> col "input_hex" Csvt.string ~enc:(fun _ -> "") 37 + |> col "expected_hex" Csvt.string ~enc:(fun _ -> "") 38 + |> finish) 39 + 40 + let load_fixtures () = 41 + match Csvt.decode_file fixture_codec trace_path with 42 + | Ok rows -> rows 43 + | Error e -> Alcotest.failf "CSV decode: %a" Csvt.pp_error e 44 + 45 + let quote_bytes s = 46 + let buf = Buffer.create (String.length s + 2) in 47 + Buffer.add_char buf '|'; 48 + String.iter 49 + (fun c -> 50 + if c >= ' ' && c <= '~' && c <> '\\' && c <> '|' then 51 + Buffer.add_char buf c 52 + else Buffer.add_string buf (Fmt.str "\\x%02x" (Char.code c))) 53 + s; 54 + Buffer.add_char buf '|'; 55 + Buffer.contents buf 56 + 57 + let check_fixture (f : fixture) () = 58 + let got = Headers.canonicalize_value f.input in 59 + Alcotest.(check string) 60 + (Fmt.str "%s: input=%s" f.name (quote_bytes f.input)) 61 + f.expected got 62 + 63 + let suite = 64 + let fixtures = load_fixtures () in 65 + ( "canonicalize_value", 66 + List.map 67 + (fun f -> 68 + Alcotest.test_case 69 + (Fmt.str "fixture %s" f.name) 70 + `Quick (check_fixture f)) 71 + fixtures ) 72 + 73 + let () = Alcotest.run "http-boto3-interop" [ suite ]
+37
test/interop/boto3/traces/canonicalize.csv
··· 1 + name,input_hex,expected_hex 2 + empty,, 3 + single-space,20, 4 + only-spaces,202020, 5 + only-tabs,090909, 6 + mixed-whitespace,2009200920, 7 + plain,666f6f,666f6f 8 + leading-space,2020666f6f,666f6f 9 + trailing-space,666f6f202020,666f6f 10 + leading-trailing,2020666f6f2020,666f6f 11 + interior-runs,666f6f202020626172,666f6f20626172 12 + tabs-collapse,666f6f0909626172,666f6f20626172 13 + mixed-ws-runs,666f6f2009200920626172,666f6f20626172 14 + comma-list,677a69702c20206465666c6174652c202020206272,677a69702c206465666c6174652c206272 15 + quoted-plain,2268656c6c6f20776f726c6422,2268656c6c6f20776f726c6422 16 + quoted-interior-ws,2268656c6c6f2020776f726c6422,2268656c6c6f20776f726c6422 17 + quoted-leading-ws-inside,22202068656c6c6f22,222068656c6c6f22 18 + quoted-tabs-inside,2261096222,2261206222 19 + around-quoted,202022612062222020,2261206222 20 + unmatched-open-quote,22666f6f20626172,22666f6f20626172 21 + unmatched-close-quote,666f6f22202020626172,666f6f2220626172 22 + adjacent-quoted,226122226222,226122226222 23 + multi-quoted-spaced,226122202020226222,22612220226222 24 + backslash-in-quotes,22615c226222,22615c226222 25 + double-backslash,225c5c22,225c5c22 26 + utf8-plain,c3a9,c3a9 27 + utf8-with-space,2020c3a909c3a92020,c3a920c3a9 28 + newline-interior,666f6f0a626172,666f6f20626172 29 + cr-interior,666f6f0d626172,666f6f20626172 30 + vt-interior,666f6f0b626172,666f6f20626172 31 + ff-interior,666f6f0c626172,666f6f20626172 32 + mixed-control-ws,666f6f20090a0d0b0c20626172,666f6f20626172 33 + case-preserved,48654c4c6f20576f524c64,48654c4c6f20576f524c64 34 + commas-preserved,612c622c63,612c622c63 35 + padded-commas,61202c2062202c2063,61202c2062202c2063 36 + single-char,78,78 37 + large-run,2020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020782020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020,78
+11 -16
test/test_headers.ml
··· 275 275 Alcotest.(check string) "empty quoted string survives" "\"\"" (c "\"\"") 276 276 277 277 let test_canonicalize_quoted_only_whitespace () = 278 - Alcotest.(check string) 279 - "whitespace inside quotes is preserved verbatim" "\" \"" (c "\" \""); 280 - Alcotest.(check string) 281 - "tab inside quotes is preserved verbatim" "\"\t\"" (c "\"\t\"") 278 + (* Quote characters are opaque bytes; interior whitespace collapses 279 + like any other, matching boto3 [' '.join(v.split())]. *) 280 + Alcotest.(check string) "quoted whitespace collapses" "\" \"" (c "\" \""); 281 + Alcotest.(check string) "quoted tab collapses" "\" \"" (c "\"\t\"") 282 282 283 283 let test_canonicalize_unmatched_open_quote () = 284 - (* Invalid per RFC 7230 but must not crash or produce garbage; the 285 - open quote leaves the state machine in "in-quotes" mode so the 286 - tail is preserved verbatim. *) 287 284 Alcotest.(check string) 288 - "unmatched open quote preserves tail" "\"foo bar" (c "\"foo bar") 285 + "unmatched open quote does not affect collapse" "\"foo bar" (c "\"foo bar") 289 286 290 287 let test_canonicalize_unmatched_close_quote () = 291 - (* The state machine treats a double-quote as a toggle regardless of 292 - position, so a lone quote mid-value opens a pseudo-quoted region that 293 - runs to the end of input. Undefined per RFC 7230 for malformed values; 294 - lock in the deterministic behaviour so regressions show up. *) 295 288 Alcotest.(check string) 296 - "stray open quote freezes state to end of value" "foo\" bar" 297 - (c "foo\" bar") 289 + "stray quote does not affect collapse" "foo\" bar" (c "foo\" bar") 298 290 299 291 let test_canonicalize_escaped_backslash () = 300 292 Alcotest.(check string) 301 - "backslash-backslash survives inside quotes" "\"\\\\\"" (c "\"\\\\\"") 293 + "backslash-backslash is literal" "\"\\\\\"" (c "\"\\\\\"") 302 294 303 295 let test_canonicalize_escaped_tab_in_quotes () = 296 + (* Backslash is an opaque byte; the tab after it is still 297 + whitespace and still collapses. *) 304 298 Alcotest.(check string) 305 - "backslash-escaped tab survives" "\"a\\\tb\"" (c "\"a\\\tb\"") 299 + "tab collapses regardless of preceding backslash" "\"a\\ b\"" 300 + (c "\"a\\\tb\"") 306 301 307 302 let test_canonicalize_trailing_backslash_in_quotes () = 308 303 (* Backslash at end of input while still in quotes: the branch