this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

working sha256

+1401 -845
+18
LICENSE.md
··· 1 + (* 2 + * ISC License 3 + * 4 + * Copyright (c) 2025 Anil Madhavapeddy <anil@recoil.org> 5 + * 6 + * Permission to use, copy, modify, and distribute this software for any 7 + * purpose with or without fee is hereby granted, provided that the above 8 + * copyright notice and this permission notice appear in all copies. 9 + * 10 + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 11 + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 12 + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 13 + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 14 + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 15 + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 16 + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 17 + * 18 + *)
+163 -1
README.md
··· 1 - A sha256 experiment 1 + # Oxsha - Fast SHA256 Hashing for OCaml 2 + 3 + A high-performance SHA256 hashing library for OCaml with zero-copy C bindings using bigarrays. 4 + 5 + ## Features 6 + 7 + - **Zero-copy performance**: Uses bigarrays for efficient data transfer to C 8 + - **Hardware acceleration**: Automatically detects and uses CPU SHA extensions (Intel SHA-NI, ARM Crypto) 9 + - **Streaming API**: Incremental hashing with init/update/finalize pattern 10 + - **Multiple interfaces**: Support for bigarrays, bytes, and strings 11 + - **Memory-mapped files**: sha256sum example uses `Unix.map_file` for true zero-copy file hashing 12 + - **Minimal dependencies**: Standalone library with no external dependencies 13 + - **Well-documented**: Comprehensive API documentation 14 + 15 + ## Installation 16 + 17 + ```bash 18 + opam install oxsha 19 + ``` 20 + 21 + Or build from source: 22 + 23 + ```bash 24 + dune build 25 + dune install 26 + ``` 27 + 28 + ## Quick Start 29 + 30 + ```ocaml 31 + (* One-shot hashing *) 32 + let digest = Oxsha.hash_string "Hello, World!" in 33 + Printf.printf "SHA256: %s\n" (hex_of_bytes digest) 34 + 35 + (* Streaming API for large data *) 36 + let ctx = Oxsha.create () in 37 + Oxsha.update_string ctx "Hello, "; 38 + Oxsha.update_string ctx "World!"; 39 + let digest = Oxsha.final ctx in 40 + Printf.printf "SHA256: %s\n" (hex_of_bytes digest) 41 + ``` 42 + 43 + ## API Overview 44 + 45 + ### Low-Level API (`Oxsha.Raw`) 46 + 47 + The Raw module provides direct access to the C implementation: 48 + 49 + - `create : unit -> t` - Create a new SHA256 context 50 + - `update : t -> bigarray -> unit` - Update with bigarray data (zero-copy) 51 + - `update_bytes : t -> bytes -> unit` - Update with bytes 52 + - `update_string : t -> string -> unit` - Update with string 53 + - `final : t -> bytes` - Finalize and get 32-byte digest 54 + - `hash : bigarray -> bytes` - One-shot hash for bigarrays 55 + - `hash_bytes : bytes -> bytes` - One-shot hash for bytes 56 + - `hash_string : string -> bytes` - One-shot hash for strings 57 + 58 + ### High-Level API 59 + 60 + All Raw module functions are re-exported at the top level for convenience: 61 + 62 + ```ocaml 63 + Oxsha.create () 64 + Oxsha.update_string ctx "data" 65 + Oxsha.final ctx 66 + ``` 67 + 68 + ## Performance Considerations 69 + 70 + For maximum performance: 71 + 1. Use the bigarray API directly when possible to avoid copying 72 + 2. Use the streaming API for large files to avoid loading everything in memory 73 + 3. Use `Unix.map_file` for hashing large files (see sha256sum example) 74 + 4. The C implementation is optimized and allocation-free 75 + 5. Hardware SHA extensions are automatically enabled when available 76 + 77 + ### Hardware Acceleration 78 + 79 + The library automatically detects your CPU architecture at build time and enables hardware SHA acceleration: 80 + 81 + - **x86/x86_64**: Uses Intel SHA Extensions (`-msse4.1 -msha`) 82 + - **ARM64/AArch64**: Uses ARM Crypto Extensions (`-march=armv8-a+crypto`) 83 + 84 + This is handled transparently by a dune configurator script in `lib/discover/`. 85 + 86 + ## Examples 87 + 88 + ### Basic Usage 89 + 90 + See `examples/basic_usage.ml` for complete examples: 91 + 92 + ```bash 93 + dune exec examples/basic_usage.exe 94 + ``` 95 + 96 + ### SHA256sum Utility 97 + 98 + A drop-in replacement for the `sha256sum` command that uses memory-mapped files for zero-copy hashing: 99 + 100 + ```bash 101 + # Hash one or more files 102 + dune exec examples/sha256sum.exe -- README.md lib/oxsha.mli 103 + 104 + # Output format is identical to sha256sum 105 + 5663d62d52903366546603da52d18ccbf36ef7265653b641b980ec36891c7afe README.md 106 + b4cbb3d0d18b90cc63c0e3e8c95f4e933d1a361a5eae142e64caf17724a1447f lib/oxsha.mli 107 + ``` 108 + 109 + The sha256sum example demonstrates true zero-copy file hashing by memory-mapping files directly into bigarrays. 110 + 111 + ## Building 112 + 113 + ```bash 114 + # Clean build 115 + opam exec -- dune clean 116 + 117 + # Build library 118 + opam exec -- dune build @check 119 + 120 + # Build documentation 121 + opam exec -- dune build @doc 122 + 123 + # Build ignoring warnings (release mode) 124 + opam exec -- dune build @check --profile=release 125 + ``` 126 + 127 + ## Project Structure 128 + 129 + ``` 130 + oxsha/ 131 + ├── lib/ 132 + │ ├── oxsha.ml # OCaml implementation 133 + │ ├── oxsha.mli # Public interface 134 + │ ├── oxsha_stubs.c # C FFI bindings 135 + │ ├── sha256.c # SHA256 C implementation 136 + │ ├── sha256.h # C header 137 + │ ├── dune # Build rules 138 + │ └── discover/ 139 + │ ├── discover.ml # Architecture detection for C flags 140 + │ └── dune # Configurator build rules 141 + ├── examples/ 142 + │ ├── basic_usage.ml # API usage examples 143 + │ ├── sha256sum.ml # sha256sum utility with mmap 144 + │ └── dune 145 + ├── dune-project # Project metadata 146 + └── README.md 147 + ``` 148 + 149 + ## C Implementation 150 + 151 + The library uses Brad Conte's public domain SHA256 implementation. The C context is allocated on the C heap and wrapped in an OCaml custom block with proper finalization. 152 + 153 + ### Build-Time Configuration 154 + 155 + The build system uses `dune-configurator` to detect the CPU architecture and automatically add the appropriate compiler flags for hardware SHA acceleration. The configurator script (`lib/discover/discover.ml`) runs during the build and generates a `c_flags.sexp` file that dune includes in the C compilation flags. 156 + 157 + ## License 158 + 159 + ISC License 160 + 161 + ## Contributing 162 + 163 + Contributions welcome! Please ensure all tests pass before submitting PRs.
-173
bench/bench_sha256.ml
··· 1 - open Sha256 2 - 3 - (* Memory allocation tracking *) 4 - let measure_allocations f = 5 - let before = Gc.allocated_bytes () in 6 - let result = f () in 7 - let after = Gc.allocated_bytes () in 8 - (result, after -. before) 9 - 10 - (* Benchmark different scenarios *) 11 - let bench_sizes () = 12 - print_endline "Benchmarking various input sizes:"; 13 - print_endline "Size (B) | Iterations | Time (s) | Throughput (MB/s) | Allocations (B)"; 14 - print_endline "---------|------------|----------|-------------------|----------------"; 15 - 16 - let sizes = [ 17 - (16, 100000); 18 - (64, 100000); 19 - (256, 50000); 20 - (1024, 20000); 21 - (4096, 5000); 22 - (16384, 1000); 23 - (65536, 250); 24 - (262144, 60); 25 - (1048576, 15); 26 - ] in 27 - 28 - List.iter (fun (size, iterations) -> 29 - let data = String.make size 'x' in 30 - 31 - (* Warmup *) 32 - for _ = 1 to 10 do 33 - ignore (hash_string data) 34 - done; 35 - 36 - (* Benchmark *) 37 - let start = Unix.gettimeofday () in 38 - let _, allocs = measure_allocations (fun () -> 39 - for _ = 1 to iterations do 40 - ignore (hash_string data) 41 - done 42 - ) in 43 - let elapsed = Unix.gettimeofday () -. start in 44 - 45 - let throughput = (float_of_int (size * iterations)) /. elapsed /. 1_000_000.0 in 46 - let allocs_per_op = allocs /. float_of_int iterations in 47 - 48 - Printf.printf "%8d | %10d | %8.3f | %17.1f | %14.0f\n" 49 - size iterations elapsed throughput allocs_per_op 50 - ) sizes 51 - 52 - let bench_parallel_scaling () = 53 - print_endline "\nParallel scaling benchmark:"; 54 - print_endline "Threads | Hashes | Time (s) | Hashes/sec | Speedup"; 55 - print_endline "--------|--------|----------|------------|--------"; 56 - 57 - let num_hashes = 10000 in 58 - let data_size = 1024 in 59 - let inputs = List.init num_hashes (fun i -> 60 - Bytes.of_string (String.make data_size (Char.chr (65 + (i mod 26)))) 61 - ) in 62 - 63 - (* Sequential baseline *) 64 - let start_seq = Unix.gettimeofday () in 65 - let _ = List.map hash_bytes inputs in 66 - let time_seq = Unix.gettimeofday () -. start_seq in 67 - let hashes_per_sec_seq = float_of_int num_hashes /. time_seq in 68 - 69 - Printf.printf "%7d | %6d | %8.3f | %10.0f | %7.2fx\n" 70 - 1 num_hashes time_seq hashes_per_sec_seq 1.0; 71 - 72 - (* Parallel with different thread counts *) 73 - let thread_counts = [2; 4; 8] in 74 - List.iter (fun threads -> 75 - (* Simulate parallel execution with multiple Parallel.fork_join2 calls *) 76 - let par = Parallel.create () in 77 - let chunk_size = num_hashes / threads in 78 - 79 - let start_par = Unix.gettimeofday () in 80 - 81 - (* Process in parallel chunks *) 82 - let rec process_chunks remaining acc = 83 - match remaining with 84 - | [] -> acc 85 - | chunk :: [] -> (List.map hash_bytes chunk) :: acc 86 - | chunk1 :: chunk2 :: rest -> 87 - let r1, r2 = Parallel.fork_join2 par 88 - (fun _ -> List.map hash_bytes chunk1) 89 - (fun _ -> List.map hash_bytes chunk2) 90 - in 91 - process_chunks rest (r2 :: r1 :: acc) 92 - in 93 - 94 - (* Split inputs into chunks *) 95 - let rec split_into_chunks lst n acc = 96 - if n <= 0 || lst = [] then List.rev acc 97 - else 98 - let rec take k lst acc = 99 - if k = 0 || lst = [] then (List.rev acc, lst) 100 - else match lst with 101 - | h::t -> take (k-1) t (h::acc) 102 - | [] -> (List.rev acc, []) 103 - in 104 - let (chunk, rest) = take chunk_size lst [] in 105 - split_into_chunks rest (n-1) (chunk :: acc) 106 - in 107 - 108 - let chunks = split_into_chunks inputs threads [] in 109 - let _ = process_chunks chunks [] in 110 - 111 - let time_par = Unix.gettimeofday () -. start_par in 112 - let hashes_per_sec_par = float_of_int num_hashes /. time_par in 113 - let speedup = time_seq /. time_par in 114 - 115 - Printf.printf "%7d | %6d | %8.3f | %10.0f | %7.2fx\n" 116 - threads num_hashes time_par hashes_per_sec_par speedup 117 - ) thread_counts 118 - 119 - let bench_zero_allocation () = 120 - print_endline "\nZero-allocation verification:"; 121 - 122 - (* Create aligned buffer *) 123 - let size = 1024 in 124 - let buffer = Bigarray.Array1.create Bigarray.int8_unsigned Bigarray.c_layout size in 125 - for i = 0 to size - 1 do 126 - Bigarray.Array1.set buffer i (65 + (i mod 26)) 127 - done; 128 - 129 - (* Measure allocations for direct oneshot call *) 130 - Gc.full_major (); 131 - let before = Gc.allocated_bytes () in 132 - 133 - for _ = 1 to 1000 do 134 - ignore (oneshot buffer (Int64.of_int size)) 135 - done; 136 - 137 - let after = Gc.allocated_bytes () in 138 - let allocs_per_hash = (after -. before) /. 1000.0 in 139 - 140 - Printf.printf " Direct oneshot (bigarray): %.1f bytes/hash\n" allocs_per_hash; 141 - 142 - (* Compare with string version *) 143 - let str = String.make size 'x' in 144 - Gc.full_major (); 145 - let before_str = Gc.allocated_bytes () in 146 - 147 - for _ = 1 to 1000 do 148 - ignore (hash_string str) 149 - done; 150 - 151 - let after_str = Gc.allocated_bytes () in 152 - let allocs_per_hash_str = (after_str -. before_str) /. 1000.0 in 153 - 154 - Printf.printf " String wrapper: %.1f bytes/hash\n" allocs_per_hash_str; 155 - 156 - if allocs_per_hash < 100.0 then 157 - print_endline " ✓ Near-zero allocation achieved!" 158 - else 159 - print_endline " ⚠ Higher than expected allocations" 160 - 161 - let () = 162 - print_endline "SHA256 Performance Benchmark Suite"; 163 - print_endline "===================================\n"; 164 - 165 - (* Check CPU support *) 166 - print_endline "System Information:"; 167 - Printf.printf " OCaml version: %s\n" Sys.ocaml_version; 168 - Printf.printf " Word size: %d bits\n" Sys.word_size; 169 - Printf.printf " OS: %s\n\n" Sys.os_type; 170 - 171 - bench_sizes (); 172 - bench_parallel_scaling (); 173 - bench_zero_allocation ()
-4
bench/dune
··· 1 - (executable 2 - (name bench_sha256) 3 - (libraries sha256 unix) 4 - (modes native))
+3
bin/dune
··· 1 + (executable 2 + (name osha256sum) 3 + (libraries oxsha unix))
+53
bin/osha256sum.ml
··· 1 + let hex_of_bytes bytes = 2 + let buf = Buffer.create (Bytes.length bytes * 2) in 3 + Bytes.iter 4 + (fun c -> Buffer.add_string buf (Printf.sprintf "%02x" (Char.code c))) 5 + bytes; 6 + Buffer.contents buf 7 + 8 + let hash_file filename = 9 + try 10 + let fd = Unix.openfile filename [ Unix.O_RDONLY ] 0 in 11 + let stats = Unix.fstat fd in 12 + let file_size = stats.Unix.st_size in 13 + 14 + if file_size = 0 then ( 15 + (* Handle empty files *) 16 + Unix.close fd; 17 + let digest = Oxsha.hash_string "" in 18 + Ok (hex_of_bytes digest) 19 + ) else ( 20 + let mapped = 21 + Unix.map_file fd Bigarray.char Bigarray.c_layout false [| file_size |] 22 + in 23 + let ba = Bigarray.array1_of_genarray mapped in 24 + Unix.close fd; 25 + 26 + let digest = Oxsha.hash ba in 27 + Ok (hex_of_bytes digest) 28 + ) 29 + with e -> Error e 30 + 31 + let () = 32 + if Array.length Sys.argv < 2 then ( 33 + Printf.eprintf "Usage: %s FILE [FILE...]\n" Sys.argv.(0); 34 + Printf.eprintf "Print SHA256 (256-bit) checksums.\n"; 35 + exit 1 36 + ); 37 + 38 + let exit_code = ref 0 in 39 + 40 + for i = 1 to Array.length Sys.argv - 1 do 41 + let filename = Sys.argv.(i) in 42 + match hash_file filename with 43 + | Ok hash -> Printf.printf "%s %s\n" hash filename 44 + | Error (Sys_error msg) -> 45 + Printf.eprintf "%s: %s\n" Sys.argv.(0) msg; 46 + exit_code := 1 47 + | Error e -> 48 + Printf.eprintf "%s: %s: %s\n" Sys.argv.(0) filename 49 + (Printexc.to_string e); 50 + exit_code := 1 51 + done; 52 + 53 + exit !exit_code
+12 -9
dune-project
··· 1 - (lang dune 3.0) 1 + (lang dune 3.20) 2 2 (name oxsha) 3 - (version 0.1.0) 3 + (version 0.1) 4 + 5 + (generate_opam_files true) 6 + 7 + (source (github avsm/oxsha)) 8 + (license ISC) 9 + (authors "Anil Madhavapeddy") 10 + (maintainers "Anil Madhavapeddy") 4 11 5 12 (package 6 13 (name oxsha) 7 - (synopsis "Blazingly fast SHA256 using AMD SHA-NI instructions") 8 - (description "Hardware-accelerated SHA256 implementation for OxCaml using AMD SHA-NI instructions with zero-allocation design") 9 - (depends 10 - ocaml 11 - (dune (>= 3.0)) 12 - bigarray 13 - parallel)) 14 + (synopsis "Fast SHA256 hashing library") 15 + (description "OCaml bindings to a C SHA256 implementation using bigarrays for efficient, zero-copy hashing") 16 + (depends (ocaml (>= 5.3))))
+20
lib/discover/discover.ml
··· 1 + (** Dune configurator to detect architecture and set C compiler flags for SHA256 *) 2 + 3 + module C = Configurator.V1 4 + 5 + let get_arch_flags c = 6 + let arch = C.ocaml_config_var_exn c "architecture" in 7 + let base_flags = ["-O3"] in 8 + let arch_flags = 9 + match arch with 10 + | "arm64" | "aarch64" -> 11 + ["-march=armv8-a+crypto"] 12 + | "amd64" | "i386" -> 13 + ["-msse4.1"; "-msha"] 14 + | _ -> [] 15 + in 16 + base_flags @ arch_flags 17 + 18 + let () = 19 + C.main ~name:"oxsha_discover" 20 + (fun c -> C.Flags.write_sexp "c_flags.sexp" (get_arch_flags c))
+3
lib/discover/dune
··· 1 + (executable 2 + (name discover) 3 + (libraries dune-configurator))
+13 -5
lib/dune
··· 1 + (rule 2 + (targets c_flags.sexp) 3 + (deps discover/discover.exe) 4 + (action 5 + (run %{deps}))) 6 + 1 7 (library 2 - (name sha256) 8 + (name oxsha) 3 9 (public_name oxsha) 4 - (libraries bigarray parallel) 10 + (modules oxsha) 5 11 (foreign_stubs 6 12 (language c) 7 - (names sha256_stubs) 8 - (flags :standard -msha -msse4.1 -O3 -march=native)) 9 - (modes native)) 13 + (names oxsha_stubs sha256) 14 + (flags 15 + (:standard 16 + (:include c_flags.sexp)))) 17 + (c_library_flags :standard))
+56
lib/oxsha.ml
··· 1 + (** Fast SHA256 hashing library with zero-copy C bindings. *) 2 + 3 + module Raw = struct 4 + (** The SHA256 context type wrapping the C SHA256_CTX structure. *) 5 + type t 6 + 7 + (** External C functions *) 8 + external create : unit -> t = "oxsha_create" 9 + 10 + external update : 11 + t -> 12 + (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t -> 13 + unit 14 + = "oxsha_update" 15 + 16 + external final : t -> bytes = "oxsha_final" 17 + 18 + (** Convenience function: update with bytes *) 19 + let update_bytes ctx data = 20 + let len = Bytes.length data in 21 + let ba = Bigarray.Array1.create Bigarray.char Bigarray.c_layout len in 22 + for i = 0 to len - 1 do 23 + Bigarray.Array1.unsafe_set ba i (Bytes.unsafe_get data i) 24 + done; 25 + update ctx ba 26 + 27 + (** Convenience function: update with string *) 28 + let update_string ctx data = 29 + let len = String.length data in 30 + let ba = Bigarray.Array1.create Bigarray.char Bigarray.c_layout len in 31 + for i = 0 to len - 1 do 32 + Bigarray.Array1.unsafe_set ba i (String.unsafe_get data i) 33 + done; 34 + update ctx ba 35 + 36 + (** One-shot hash function for bigarrays *) 37 + let hash data = 38 + let ctx = create () in 39 + update ctx data; 40 + final ctx 41 + 42 + (** One-shot hash function for bytes *) 43 + let hash_bytes data = 44 + let ctx = create () in 45 + update_bytes ctx data; 46 + final ctx 47 + 48 + (** One-shot hash function for strings *) 49 + let hash_string data = 50 + let ctx = create () in 51 + update_string ctx data; 52 + final ctx 53 + end 54 + 55 + (** Re-export Raw module contents at top level *) 56 + include Raw
+89
lib/oxsha.mli
··· 1 + (** Fast SHA256 hashing library with zero-copy C bindings. 2 + 3 + This library provides OCaml bindings to a C SHA256 implementation 4 + using bigarrays for efficient, zero-copy hashing. *) 5 + 6 + (** {1 Raw C Bindings} *) 7 + 8 + module Raw : sig 9 + (** Low-level bindings to the C SHA256 implementation. 10 + 11 + This module provides direct access to the C functions with minimal 12 + overhead. All operations work with bigarrays for zero-copy performance. *) 13 + 14 + (** The SHA256 context type. This is an abstract type wrapping the C 15 + SHA256_CTX structure. *) 16 + type t 17 + 18 + (** [create ()] allocates and initializes a new SHA256 context. 19 + 20 + @return A fresh context ready for hashing. *) 21 + val create : unit -> t 22 + 23 + (** [update ctx data] updates the hash state with new data. 24 + 25 + This function processes the input data incrementally. It can be called 26 + multiple times to hash data in chunks. 27 + 28 + @param ctx The SHA256 context to update 29 + @param data A bigarray containing the data to hash. Uses bigarrays for 30 + zero-copy access from the C side. *) 31 + val update : 32 + t -> 33 + (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t -> 34 + unit 35 + 36 + (** [update_bytes ctx data] updates the hash state with bytes data. 37 + 38 + This is a convenience function that wraps bytes in a bigarray view. 39 + 40 + @param ctx The SHA256 context to update 41 + @param data Bytes to hash *) 42 + val update_bytes : t -> bytes -> unit 43 + 44 + (** [update_string ctx data] updates the hash state with string data. 45 + 46 + This is a convenience function for hashing strings. 47 + 48 + @param ctx The SHA256 context to update 49 + @param data String to hash *) 50 + val update_string : t -> string -> unit 51 + 52 + (** [final ctx] finalizes the hash computation and returns the digest. 53 + 54 + After calling this function, the context should not be used again. 55 + 56 + @param ctx The SHA256 context to finalize 57 + @return A 32-byte digest as a bytes value *) 58 + val final : t -> bytes 59 + 60 + (** [hash data] is a convenience function that performs a complete hash 61 + in one operation: create, update, and final. 62 + 63 + @param data The bigarray data to hash 64 + @return A 32-byte digest *) 65 + val hash : 66 + (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t -> 67 + bytes 68 + 69 + (** [hash_bytes data] hashes bytes data in one operation. 70 + 71 + @param data The bytes to hash 72 + @return A 32-byte digest *) 73 + val hash_bytes : bytes -> bytes 74 + 75 + (** [hash_string data] hashes string data in one operation. 76 + 77 + @param data The string to hash 78 + @return A 32-byte digest *) 79 + val hash_string : string -> bytes 80 + end 81 + 82 + (** {1 High-Level Interface} *) 83 + 84 + (** Re-export the Raw module as the main interface. 85 + 86 + The Raw module provides the most efficient interface using bigarrays. 87 + Higher-level abstractions can be added in the future if needed. *) 88 + 89 + include module type of Raw
+92
lib/oxsha_stubs.c
··· 1 + /* 2 + * OCaml bindings for SHA256 C implementation. 3 + * Uses custom blocks and bigarrays for zero-copy performance. 4 + */ 5 + 6 + #include <string.h> 7 + #include <caml/mlvalues.h> 8 + #include <caml/memory.h> 9 + #include <caml/alloc.h> 10 + #include <caml/custom.h> 11 + #include <caml/fail.h> 12 + #include <caml/bigarray.h> 13 + 14 + #include "sha256.h" 15 + 16 + /* Custom block operations for SHA256_CTX */ 17 + 18 + static void oxsha_ctx_finalize(value v_ctx) 19 + { 20 + SHA256_CTX *ctx = (SHA256_CTX *)Data_custom_val(v_ctx); 21 + /* Just clear the memory for security */ 22 + memset(ctx, 0, sizeof(SHA256_CTX)); 23 + } 24 + 25 + static struct custom_operations oxsha_ctx_ops = { 26 + "com.oxsha.sha256_ctx", 27 + oxsha_ctx_finalize, 28 + custom_compare_default, 29 + custom_hash_default, 30 + custom_serialize_default, 31 + custom_deserialize_default, 32 + custom_compare_ext_default, 33 + custom_fixed_length_default 34 + }; 35 + 36 + /* Allocate and wrap a SHA256_CTX in an OCaml custom block */ 37 + static value alloc_oxsha_ctx(void) 38 + { 39 + value v_ctx = caml_alloc_custom(&oxsha_ctx_ops, sizeof(SHA256_CTX), 0, 1); 40 + return v_ctx; 41 + } 42 + 43 + /* Extract SHA256_CTX pointer from OCaml value */ 44 + #define Oxsha_ctx_val(v) ((SHA256_CTX *)Data_custom_val(v)) 45 + 46 + /* FFI Functions */ 47 + 48 + /* oxsha_create : unit -> t */ 49 + CAMLprim value oxsha_create(value unit) 50 + { 51 + CAMLparam1(unit); 52 + CAMLlocal1(v_ctx); 53 + 54 + v_ctx = alloc_oxsha_ctx(); 55 + SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx); 56 + sha256_init(ctx); 57 + 58 + CAMLreturn(v_ctx); 59 + } 60 + 61 + /* oxsha_update : t -> bigarray -> unit */ 62 + CAMLprim value oxsha_update(value v_ctx, value v_data) 63 + { 64 + CAMLparam2(v_ctx, v_data); 65 + 66 + SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx); 67 + 68 + /* Extract bigarray data pointer and length */ 69 + unsigned char *data = (unsigned char *)Caml_ba_data_val(v_data); 70 + size_t len = Caml_ba_array_val(v_data)->dim[0]; 71 + 72 + sha256_update(ctx, data, len); 73 + 74 + CAMLreturn(Val_unit); 75 + } 76 + 77 + /* oxsha_final : t -> bytes */ 78 + CAMLprim value oxsha_final(value v_ctx) 79 + { 80 + CAMLparam1(v_ctx); 81 + CAMLlocal1(v_digest); 82 + 83 + SHA256_CTX *ctx = Oxsha_ctx_val(v_ctx); 84 + 85 + /* Allocate bytes for the 32-byte digest */ 86 + v_digest = caml_alloc_string(SHA256_BLOCK_SIZE); 87 + unsigned char *digest = (unsigned char *)String_val(v_digest); 88 + 89 + sha256_final(ctx, digest); 90 + 91 + CAMLreturn(v_digest); 92 + }
+638
lib/sha256.c
··· 1 + /********************************************************************* 2 + * Filename: sha256.c 3 + * Author: Brad Conte (brad AT bradconte.com) 4 + * Copyright: 5 + * Disclaimer: This code is presented "as is" without any guarantees. 6 + * Details: Implementation of the SHA-256 hashing algorithm. 7 + SHA-256 is one of the three algorithms in the SHA2 8 + specification. The others, SHA-384 and SHA-512, are not 9 + offered in this implementation. 10 + Algorithm specification can be found here: 11 + * http://csrc.nist.gov/publications/fips/fips180-2/fips180-2withchangenotice.pdf 12 + This implementation uses little endian byte order. 13 + *********************************************************************/ 14 + 15 + /*************************** HEADER FILES ***************************/ 16 + #include <stdlib.h> 17 + #include <stdio.h> 18 + #include <memory.h> 19 + #include "sha256.h" 20 + 21 + static const uint32_t K[] = 22 + { 23 + 0x428A2F98, 0x71374491, 0xB5C0FBCF, 0xE9B5DBA5, 24 + 0x3956C25B, 0x59F111F1, 0x923F82A4, 0xAB1C5ED5, 25 + 0xD807AA98, 0x12835B01, 0x243185BE, 0x550C7DC3, 26 + 0x72BE5D74, 0x80DEB1FE, 0x9BDC06A7, 0xC19BF174, 27 + 0xE49B69C1, 0xEFBE4786, 0x0FC19DC6, 0x240CA1CC, 28 + 0x2DE92C6F, 0x4A7484AA, 0x5CB0A9DC, 0x76F988DA, 29 + 0x983E5152, 0xA831C66D, 0xB00327C8, 0xBF597FC7, 30 + 0xC6E00BF3, 0xD5A79147, 0x06CA6351, 0x14292967, 31 + 0x27B70A85, 0x2E1B2138, 0x4D2C6DFC, 0x53380D13, 32 + 0x650A7354, 0x766A0ABB, 0x81C2C92E, 0x92722C85, 33 + 0xA2BFE8A1, 0xA81A664B, 0xC24B8B70, 0xC76C51A3, 34 + 0xD192E819, 0xD6990624, 0xF40E3585, 0x106AA070, 35 + 0x19A4C116, 0x1E376C08, 0x2748774C, 0x34B0BCB5, 36 + 0x391C0CB3, 0x4ED8AA4A, 0x5B9CCA4F, 0x682E6FF3, 37 + 0x748F82EE, 0x78A5636F, 0x84C87814, 0x8CC70208, 38 + 0x90BEFFFA, 0xA4506CEB, 0xBEF9A3F7, 0xC67178F2 39 + }; 40 + 41 + #if defined(__arm__) || defined(__aarch32__) || defined(__arm64__) || defined(__aarch64__) || defined(_M_ARM) 42 + // ============== ARM64 begin ======================= 43 + // All the ARM servers supports SHA256 instructions 44 + # if defined(__GNUC__) 45 + # include <stdint.h> 46 + # endif 47 + # if defined(__ARM_NEON) || defined(_MSC_VER) || defined(__GNUC__) 48 + # include <arm_neon.h> 49 + # endif 50 + /* GCC and LLVM Clang, but not Apple Clang */ 51 + # if defined(__GNUC__) && !defined(__apple_build_version__) 52 + # if defined(__ARM_ACLE) || defined(__ARM_FEATURE_CRYPTO) 53 + # include <arm_acle.h> 54 + # endif 55 + # endif 56 + void sha256_process(uint32_t state[8], const uint8_t data[], uint32_t length) 57 + { 58 + uint32x4_t STATE0, STATE1, ABEF_SAVE, CDGH_SAVE; 59 + uint32x4_t MSG0, MSG1, MSG2, MSG3; 60 + uint32x4_t TMP0, TMP1, TMP2; 61 + 62 + /* Load state */ 63 + STATE0 = vld1q_u32(&state[0]); 64 + STATE1 = vld1q_u32(&state[4]); 65 + 66 + while (length >= 64) 67 + { 68 + /* Save state */ 69 + ABEF_SAVE = STATE0; 70 + CDGH_SAVE = STATE1; 71 + 72 + /* Load message */ 73 + MSG0 = vld1q_u32((const uint32_t *)(data + 0)); 74 + MSG1 = vld1q_u32((const uint32_t *)(data + 16)); 75 + MSG2 = vld1q_u32((const uint32_t *)(data + 32)); 76 + MSG3 = vld1q_u32((const uint32_t *)(data + 48)); 77 + 78 + /* Reverse for little endian */ 79 + MSG0 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG0))); 80 + MSG1 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG1))); 81 + MSG2 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG2))); 82 + MSG3 = vreinterpretq_u32_u8(vrev32q_u8(vreinterpretq_u8_u32(MSG3))); 83 + 84 + TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x00])); 85 + 86 + /* Rounds 0-3 */ 87 + MSG0 = vsha256su0q_u32(MSG0, MSG1); 88 + TMP2 = STATE0; 89 + TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x04])); 90 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 91 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 92 + MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3); 93 + 94 + /* Rounds 4-7 */ 95 + MSG1 = vsha256su0q_u32(MSG1, MSG2); 96 + TMP2 = STATE0; 97 + TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x08])); 98 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 99 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 100 + MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0); 101 + 102 + /* Rounds 8-11 */ 103 + MSG2 = vsha256su0q_u32(MSG2, MSG3); 104 + TMP2 = STATE0; 105 + TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x0c])); 106 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 107 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 108 + MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1); 109 + 110 + /* Rounds 12-15 */ 111 + MSG3 = vsha256su0q_u32(MSG3, MSG0); 112 + TMP2 = STATE0; 113 + TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x10])); 114 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 115 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 116 + MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2); 117 + 118 + /* Rounds 16-19 */ 119 + MSG0 = vsha256su0q_u32(MSG0, MSG1); 120 + TMP2 = STATE0; 121 + TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x14])); 122 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 123 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 124 + MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3); 125 + 126 + /* Rounds 20-23 */ 127 + MSG1 = vsha256su0q_u32(MSG1, MSG2); 128 + TMP2 = STATE0; 129 + TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x18])); 130 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 131 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 132 + MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0); 133 + 134 + /* Rounds 24-27 */ 135 + MSG2 = vsha256su0q_u32(MSG2, MSG3); 136 + TMP2 = STATE0; 137 + TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x1c])); 138 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 139 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 140 + MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1); 141 + 142 + /* Rounds 28-31 */ 143 + MSG3 = vsha256su0q_u32(MSG3, MSG0); 144 + TMP2 = STATE0; 145 + TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x20])); 146 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 147 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 148 + MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2); 149 + 150 + /* Rounds 32-35 */ 151 + MSG0 = vsha256su0q_u32(MSG0, MSG1); 152 + TMP2 = STATE0; 153 + TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x24])); 154 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 155 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 156 + MSG0 = vsha256su1q_u32(MSG0, MSG2, MSG3); 157 + 158 + /* Rounds 36-39 */ 159 + MSG1 = vsha256su0q_u32(MSG1, MSG2); 160 + TMP2 = STATE0; 161 + TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x28])); 162 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 163 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 164 + MSG1 = vsha256su1q_u32(MSG1, MSG3, MSG0); 165 + 166 + /* Rounds 40-43 */ 167 + MSG2 = vsha256su0q_u32(MSG2, MSG3); 168 + TMP2 = STATE0; 169 + TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x2c])); 170 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 171 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 172 + MSG2 = vsha256su1q_u32(MSG2, MSG0, MSG1); 173 + 174 + /* Rounds 44-47 */ 175 + MSG3 = vsha256su0q_u32(MSG3, MSG0); 176 + TMP2 = STATE0; 177 + TMP0 = vaddq_u32(MSG0, vld1q_u32(&K[0x30])); 178 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 179 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 180 + MSG3 = vsha256su1q_u32(MSG3, MSG1, MSG2); 181 + 182 + /* Rounds 48-51 */ 183 + TMP2 = STATE0; 184 + TMP1 = vaddq_u32(MSG1, vld1q_u32(&K[0x34])); 185 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 186 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 187 + 188 + /* Rounds 52-55 */ 189 + TMP2 = STATE0; 190 + TMP0 = vaddq_u32(MSG2, vld1q_u32(&K[0x38])); 191 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 192 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 193 + 194 + /* Rounds 56-59 */ 195 + TMP2 = STATE0; 196 + TMP1 = vaddq_u32(MSG3, vld1q_u32(&K[0x3c])); 197 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP0); 198 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP0); 199 + 200 + /* Rounds 60-63 */ 201 + TMP2 = STATE0; 202 + STATE0 = vsha256hq_u32(STATE0, STATE1, TMP1); 203 + STATE1 = vsha256h2q_u32(STATE1, TMP2, TMP1); 204 + 205 + /* Combine state */ 206 + STATE0 = vaddq_u32(STATE0, ABEF_SAVE); 207 + STATE1 = vaddq_u32(STATE1, CDGH_SAVE); 208 + 209 + data += 64; 210 + length -= 64; 211 + } 212 + 213 + /* Save state */ 214 + vst1q_u32(&state[0], STATE0); 215 + vst1q_u32(&state[4], STATE1); 216 + } 217 + 218 + // ============== ARM64 end ======================= 219 + #else 220 + // ============== x86-64 begin ======================= 221 + /* Include the GCC super header */ 222 + #if defined(__GNUC__) 223 + # include <stdint.h> 224 + # include <x86intrin.h> 225 + #endif 226 + 227 + /* Microsoft supports Intel SHA ACLE extensions as of Visual Studio 2015 */ 228 + #if defined(_MSC_VER) 229 + # include <immintrin.h> 230 + # define WIN32_LEAN_AND_MEAN 231 + # include <Windows.h> 232 + #endif 233 + #define ROTATE(x,y) (((x)>>(y)) | ((x)<<(32-(y)))) 234 + #define Sigma0(x) (ROTATE((x), 2) ^ ROTATE((x),13) ^ ROTATE((x),22)) 235 + #define Sigma1(x) (ROTATE((x), 6) ^ ROTATE((x),11) ^ ROTATE((x),25)) 236 + #define sigma0(x) (ROTATE((x), 7) ^ ROTATE((x),18) ^ ((x)>> 3)) 237 + #define sigma1(x) (ROTATE((x),17) ^ ROTATE((x),19) ^ ((x)>>10)) 238 + 239 + #define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z))) 240 + #define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) 241 + 242 + /* Avoid undefined behavior */ 243 + /* https://stackoverflow.com/q/29538935/608639 */ 244 + uint32_t B2U32(uint8_t val, uint8_t sh) 245 + { 246 + return ((uint32_t)val) << sh; 247 + } 248 + 249 + void sha256_process_c(uint32_t state[8], const uint8_t data[], size_t length) 250 + { 251 + uint32_t a, b, c, d, e, f, g, h, s0, s1, T1, T2; 252 + uint32_t X[16], i; 253 + 254 + size_t blocks = length / 64; 255 + while (blocks--) 256 + { 257 + a = state[0]; 258 + b = state[1]; 259 + c = state[2]; 260 + d = state[3]; 261 + e = state[4]; 262 + f = state[5]; 263 + g = state[6]; 264 + h = state[7]; 265 + 266 + for (i = 0; i < 16; i++) 267 + { 268 + X[i] = B2U32(data[0], 24) | B2U32(data[1], 16) | B2U32(data[2], 8) | B2U32(data[3], 0); 269 + data += 4; 270 + 271 + T1 = h; 272 + T1 += Sigma1(e); 273 + T1 += Ch(e, f, g); 274 + T1 += K[i]; 275 + T1 += X[i]; 276 + 277 + T2 = Sigma0(a); 278 + T2 += Maj(a, b, c); 279 + 280 + h = g; 281 + g = f; 282 + f = e; 283 + e = d + T1; 284 + d = c; 285 + c = b; 286 + b = a; 287 + a = T1 + T2; 288 + } 289 + 290 + for (; i < 64; i++) 291 + { 292 + s0 = X[(i + 1) & 0x0f]; 293 + s0 = sigma0(s0); 294 + s1 = X[(i + 14) & 0x0f]; 295 + s1 = sigma1(s1); 296 + 297 + T1 = X[i & 0xf] += s0 + s1 + X[(i + 9) & 0xf]; 298 + T1 += h + Sigma1(e) + Ch(e, f, g) + K[i]; 299 + T2 = Sigma0(a) + Maj(a, b, c); 300 + h = g; 301 + g = f; 302 + f = e; 303 + e = d + T1; 304 + d = c; 305 + c = b; 306 + b = a; 307 + a = T1 + T2; 308 + } 309 + 310 + state[0] += a; 311 + state[1] += b; 312 + state[2] += c; 313 + state[3] += d; 314 + state[4] += e; 315 + state[5] += f; 316 + state[6] += g; 317 + state[7] += h; 318 + } 319 + } 320 + 321 + /* Process multiple blocks. The caller is responsible for setting the initial */ 322 + /* state, and the caller is responsible for padding the final block. */ 323 + void sha256_process_asm(uint32_t state[8], const uint8_t data[], size_t length) 324 + { 325 + __m128i STATE0, STATE1; 326 + __m128i MSG, TMP; 327 + __m128i MSG0, MSG1, MSG2, MSG3; 328 + __m128i ABEF_SAVE, CDGH_SAVE; 329 + const __m128i MASK = _mm_set_epi64x(0x0c0d0e0f08090a0bULL, 0x0405060700010203ULL); 330 + 331 + /* Load initial values */ 332 + TMP = _mm_loadu_si128((const __m128i*) &state[0]); 333 + STATE1 = _mm_loadu_si128((const __m128i*) &state[4]); 334 + 335 + 336 + TMP = _mm_shuffle_epi32(TMP, 0xB1); /* CDAB */ 337 + STATE1 = _mm_shuffle_epi32(STATE1, 0x1B); /* EFGH */ 338 + STATE0 = _mm_alignr_epi8(TMP, STATE1, 8); /* ABEF */ 339 + STATE1 = _mm_blend_epi16(STATE1, TMP, 0xF0); /* CDGH */ 340 + 341 + while (length >= 64) 342 + { 343 + /* Save current state */ 344 + ABEF_SAVE = STATE0; 345 + CDGH_SAVE = STATE1; 346 + 347 + /* Rounds 0-3 */ 348 + MSG = _mm_loadu_si128((const __m128i*) (data+0)); 349 + MSG0 = _mm_shuffle_epi8(MSG, MASK); 350 + MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0xE9B5DBA5B5C0FBCFULL, 0x71374491428A2F98ULL)); 351 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 352 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 353 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 354 + 355 + /* Rounds 4-7 */ 356 + MSG1 = _mm_loadu_si128((const __m128i*) (data+16)); 357 + MSG1 = _mm_shuffle_epi8(MSG1, MASK); 358 + MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0xAB1C5ED5923F82A4ULL, 0x59F111F13956C25BULL)); 359 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 360 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 361 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 362 + MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1); 363 + 364 + /* Rounds 8-11 */ 365 + MSG2 = _mm_loadu_si128((const __m128i*) (data+32)); 366 + MSG2 = _mm_shuffle_epi8(MSG2, MASK); 367 + MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0x550C7DC3243185BEULL, 0x12835B01D807AA98ULL)); 368 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 369 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 370 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 371 + MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2); 372 + 373 + /* Rounds 12-15 */ 374 + MSG3 = _mm_loadu_si128((const __m128i*) (data+48)); 375 + MSG3 = _mm_shuffle_epi8(MSG3, MASK); 376 + MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0xC19BF1749BDC06A7ULL, 0x80DEB1FE72BE5D74ULL)); 377 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 378 + TMP = _mm_alignr_epi8(MSG3, MSG2, 4); 379 + MSG0 = _mm_add_epi32(MSG0, TMP); 380 + MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3); 381 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 382 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 383 + MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3); 384 + 385 + /* Rounds 16-19 */ 386 + MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x240CA1CC0FC19DC6ULL, 0xEFBE4786E49B69C1ULL)); 387 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 388 + TMP = _mm_alignr_epi8(MSG0, MSG3, 4); 389 + MSG1 = _mm_add_epi32(MSG1, TMP); 390 + MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0); 391 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 392 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 393 + MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0); 394 + 395 + /* Rounds 20-23 */ 396 + MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x76F988DA5CB0A9DCULL, 0x4A7484AA2DE92C6FULL)); 397 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 398 + TMP = _mm_alignr_epi8(MSG1, MSG0, 4); 399 + MSG2 = _mm_add_epi32(MSG2, TMP); 400 + MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1); 401 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 402 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 403 + MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1); 404 + 405 + /* Rounds 24-27 */ 406 + MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0xBF597FC7B00327C8ULL, 0xA831C66D983E5152ULL)); 407 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 408 + TMP = _mm_alignr_epi8(MSG2, MSG1, 4); 409 + MSG3 = _mm_add_epi32(MSG3, TMP); 410 + MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2); 411 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 412 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 413 + MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2); 414 + 415 + /* Rounds 28-31 */ 416 + MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0x1429296706CA6351ULL, 0xD5A79147C6E00BF3ULL)); 417 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 418 + TMP = _mm_alignr_epi8(MSG3, MSG2, 4); 419 + MSG0 = _mm_add_epi32(MSG0, TMP); 420 + MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3); 421 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 422 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 423 + MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3); 424 + 425 + /* Rounds 32-35 */ 426 + MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x53380D134D2C6DFCULL, 0x2E1B213827B70A85ULL)); 427 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 428 + TMP = _mm_alignr_epi8(MSG0, MSG3, 4); 429 + MSG1 = _mm_add_epi32(MSG1, TMP); 430 + MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0); 431 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 432 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 433 + MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0); 434 + 435 + /* Rounds 36-39 */ 436 + MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x92722C8581C2C92EULL, 0x766A0ABB650A7354ULL)); 437 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 438 + TMP = _mm_alignr_epi8(MSG1, MSG0, 4); 439 + MSG2 = _mm_add_epi32(MSG2, TMP); 440 + MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1); 441 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 442 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 443 + MSG0 = _mm_sha256msg1_epu32(MSG0, MSG1); 444 + 445 + /* Rounds 40-43 */ 446 + MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0xC76C51A3C24B8B70ULL, 0xA81A664BA2BFE8A1ULL)); 447 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 448 + TMP = _mm_alignr_epi8(MSG2, MSG1, 4); 449 + MSG3 = _mm_add_epi32(MSG3, TMP); 450 + MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2); 451 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 452 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 453 + MSG1 = _mm_sha256msg1_epu32(MSG1, MSG2); 454 + 455 + /* Rounds 44-47 */ 456 + MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0x106AA070F40E3585ULL, 0xD6990624D192E819ULL)); 457 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 458 + TMP = _mm_alignr_epi8(MSG3, MSG2, 4); 459 + MSG0 = _mm_add_epi32(MSG0, TMP); 460 + MSG0 = _mm_sha256msg2_epu32(MSG0, MSG3); 461 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 462 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 463 + MSG2 = _mm_sha256msg1_epu32(MSG2, MSG3); 464 + 465 + /* Rounds 48-51 */ 466 + MSG = _mm_add_epi32(MSG0, _mm_set_epi64x(0x34B0BCB52748774CULL, 0x1E376C0819A4C116ULL)); 467 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 468 + TMP = _mm_alignr_epi8(MSG0, MSG3, 4); 469 + MSG1 = _mm_add_epi32(MSG1, TMP); 470 + MSG1 = _mm_sha256msg2_epu32(MSG1, MSG0); 471 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 472 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 473 + MSG3 = _mm_sha256msg1_epu32(MSG3, MSG0); 474 + 475 + /* Rounds 52-55 */ 476 + MSG = _mm_add_epi32(MSG1, _mm_set_epi64x(0x682E6FF35B9CCA4FULL, 0x4ED8AA4A391C0CB3ULL)); 477 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 478 + TMP = _mm_alignr_epi8(MSG1, MSG0, 4); 479 + MSG2 = _mm_add_epi32(MSG2, TMP); 480 + MSG2 = _mm_sha256msg2_epu32(MSG2, MSG1); 481 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 482 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 483 + 484 + /* Rounds 56-59 */ 485 + MSG = _mm_add_epi32(MSG2, _mm_set_epi64x(0x8CC7020884C87814ULL, 0x78A5636F748F82EEULL)); 486 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 487 + TMP = _mm_alignr_epi8(MSG2, MSG1, 4); 488 + MSG3 = _mm_add_epi32(MSG3, TMP); 489 + MSG3 = _mm_sha256msg2_epu32(MSG3, MSG2); 490 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 491 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 492 + 493 + /* Rounds 60-63 */ 494 + MSG = _mm_add_epi32(MSG3, _mm_set_epi64x(0xC67178F2BEF9A3F7ULL, 0xA4506CEB90BEFFFAULL)); 495 + STATE1 = _mm_sha256rnds2_epu32(STATE1, STATE0, MSG); 496 + MSG = _mm_shuffle_epi32(MSG, 0x0E); 497 + STATE0 = _mm_sha256rnds2_epu32(STATE0, STATE1, MSG); 498 + 499 + /* Combine state */ 500 + STATE0 = _mm_add_epi32(STATE0, ABEF_SAVE); 501 + STATE1 = _mm_add_epi32(STATE1, CDGH_SAVE); 502 + 503 + data += 64; 504 + length -= 64; 505 + } 506 + 507 + TMP = _mm_shuffle_epi32(STATE0, 0x1B); /* FEBA */ 508 + STATE1 = _mm_shuffle_epi32(STATE1, 0xB1); /* DCHG */ 509 + STATE0 = _mm_blend_epi16(TMP, STATE1, 0xF0); /* DCBA */ 510 + STATE1 = _mm_alignr_epi8(STATE1, TMP, 8); /* ABEF */ 511 + 512 + /* Save state */ 513 + _mm_storeu_si128((__m128i*) &state[0], STATE0); 514 + _mm_storeu_si128((__m128i*) &state[4], STATE1); 515 + } 516 + 517 + #if defined(__clang__) || defined(__GNUC__) || defined(__INTEL_COMPILER) 518 + 519 + #include <cpuid.h> 520 + int supports_sha_ni(void) 521 + { 522 + unsigned int CPUInfo[4]; 523 + __cpuid(0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]); 524 + if (CPUInfo[0] < 7) 525 + return 0; 526 + 527 + __cpuid_count(7, 0, CPUInfo[0], CPUInfo[1], CPUInfo[2], CPUInfo[3]); 528 + return CPUInfo[1] & (1 << 29); /* SHA */ 529 + } 530 + 531 + #else /* defined(__clang__) || defined(__GNUC__) */ 532 + 533 + int supports_sha_ni(void) 534 + { 535 + unsigned int CPUInfo[4]; 536 + __cpuid(CPUInfo, 0); 537 + if (CPUInfo[0] < 7) 538 + return 0; 539 + 540 + __cpuidex(CPUInfo, 7, 0); 541 + return CPUInfo[1] & (1 << 29); /* Check SHA */ 542 + } 543 + 544 + #endif /* defined(__clang__) || defined(__GNUC__) */ 545 + 546 + void sha256_process(uint32_t state[8], const uint8_t data[], size_t length) { 547 + static int has_sha_ni = -1; 548 + if(has_sha_ni == -1 ) { 549 + has_sha_ni = supports_sha_ni(); 550 + } 551 + 552 + if(has_sha_ni) { 553 + sha256_process_asm(state, data, length); 554 + //printf("In sha256_process_asm length %zu\n", length); 555 + } else { 556 + sha256_process_c(state, data, length); 557 + //printf("In sha256_process_c length %zu\n", length); 558 + } 559 + } 560 + // ============== x86-64 end ======================= 561 + #endif 562 + 563 + void sha256_init(SHA256_CTX *ctx) 564 + { 565 + ctx->datalen = 0; 566 + ctx->bitlen = 0; 567 + ctx->state[0] = 0x6a09e667; 568 + ctx->state[1] = 0xbb67ae85; 569 + ctx->state[2] = 0x3c6ef372; 570 + ctx->state[3] = 0xa54ff53a; 571 + ctx->state[4] = 0x510e527f; 572 + ctx->state[5] = 0x9b05688c; 573 + ctx->state[6] = 0x1f83d9ab; 574 + ctx->state[7] = 0x5be0cd19; 575 + } 576 + 577 + void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len) 578 + { 579 + WORD i; 580 + 581 + size_t rounded = 64*(len/64); 582 + if(rounded != 0) { 583 + sha256_process(ctx->state, data, rounded); 584 + } 585 + 586 + ctx->bitlen = rounded*8; 587 + ctx->datalen = 0; 588 + for (i = rounded; i < len; ++i) { 589 + ctx->data[ctx->datalen] = data[i]; 590 + ctx->datalen++; 591 + } 592 + } 593 + 594 + void sha256_final(SHA256_CTX *ctx, BYTE hash[]) 595 + { 596 + WORD i; 597 + 598 + i = ctx->datalen; 599 + 600 + // Pad whatever data is left in the buffer. 601 + if (ctx->datalen < 56) { 602 + ctx->data[i++] = 0x80; 603 + while (i < 56) 604 + ctx->data[i++] = 0x00; 605 + } 606 + else { 607 + ctx->data[i++] = 0x80; 608 + while (i < 64) 609 + ctx->data[i++] = 0x00; 610 + sha256_process(ctx->state, ctx->data, 64); 611 + memset(ctx->data, 0, 56); 612 + } 613 + 614 + // Append to the padding the total message's length in bits and transform. 615 + ctx->bitlen += ctx->datalen * 8; 616 + ctx->data[63] = ctx->bitlen; 617 + ctx->data[62] = ctx->bitlen >> 8; 618 + ctx->data[61] = ctx->bitlen >> 16; 619 + ctx->data[60] = ctx->bitlen >> 24; 620 + ctx->data[59] = ctx->bitlen >> 32; 621 + ctx->data[58] = ctx->bitlen >> 40; 622 + ctx->data[57] = ctx->bitlen >> 48; 623 + ctx->data[56] = ctx->bitlen >> 56; 624 + sha256_process(ctx->state, ctx->data, 64); 625 + 626 + // Since this implementation uses little endian byte ordering and SHA uses big endian, 627 + // reverse all the bytes when copying the final state to the output hash. 628 + for (i = 0; i < 4; ++i) { 629 + hash[i] = (ctx->state[0] >> (24 - i * 8)) & 0x000000ff; 630 + hash[i + 4] = (ctx->state[1] >> (24 - i * 8)) & 0x000000ff; 631 + hash[i + 8] = (ctx->state[2] >> (24 - i * 8)) & 0x000000ff; 632 + hash[i + 12] = (ctx->state[3] >> (24 - i * 8)) & 0x000000ff; 633 + hash[i + 16] = (ctx->state[4] >> (24 - i * 8)) & 0x000000ff; 634 + hash[i + 20] = (ctx->state[5] >> (24 - i * 8)) & 0x000000ff; 635 + hash[i + 24] = (ctx->state[6] >> (24 - i * 8)) & 0x000000ff; 636 + hash[i + 28] = (ctx->state[7] >> (24 - i * 8)) & 0x000000ff; 637 + } 638 + }
+35
lib/sha256.h
··· 1 + /********************************************************************* 2 + * Filename: sha256.h 3 + * Author: Brad Conte (brad AT bradconte.com) 4 + * Copyright: 5 + * Disclaimer: This code is presented "as is" without any guarantees. 6 + * Details: Defines the API for the corresponding SHA1 implementation. 7 + *********************************************************************/ 8 + 9 + #ifndef SHA256_H 10 + #define SHA256_H 11 + 12 + /*************************** HEADER FILES ***************************/ 13 + #include <stddef.h> 14 + #include <stdint.h> 15 + 16 + /****************************** MACROS ******************************/ 17 + #define SHA256_BLOCK_SIZE 32 // SHA256 outputs a 32 byte digest 18 + 19 + /**************************** DATA TYPES ****************************/ 20 + typedef uint8_t BYTE; // 8-bit byte 21 + typedef uint32_t WORD; // 32-bit word, change to "long" for 16-bit machines 22 + 23 + typedef struct { 24 + BYTE data[64]; 25 + WORD datalen; 26 + unsigned long long bitlen; 27 + WORD state[8]; 28 + } SHA256_CTX; 29 + 30 + /*********************** FUNCTION DECLARATIONS **********************/ 31 + void sha256_init(SHA256_CTX *ctx); 32 + void sha256_update(SHA256_CTX *ctx, const BYTE data[], size_t len); 33 + void sha256_final(SHA256_CTX *ctx, BYTE hash[]); 34 + 35 + #endif // SHA256_H
-96
lib/sha256.ml
··· 1 - open Bigarray 2 - 3 - type state = (int32, int32_elt, c_layout) Array1.t 4 - type digest = (int, int8_unsigned_elt, c_layout) Array1.t 5 - type buffer = (int, int8_unsigned_elt, c_layout) Array1.t 6 - 7 - (* External C functions *) 8 - external init : unit -> state = "oxcaml_sha256_init" 9 - external process_block : state -> buffer -> unit = "oxcaml_sha256_process_block" [@@noalloc] 10 - external finalize : state -> buffer -> int64 -> digest = "oxcaml_sha256_finalize" 11 - external oneshot : buffer -> int64 -> digest = "oxcaml_sha256_oneshot" 12 - 13 - (* High-level interface *) 14 - 15 - let hash_bytes bytes = 16 - let len = Bytes.length bytes in 17 - let buffer = Array1.create int8_unsigned c_layout len in 18 - for i = 0 to len - 1 do 19 - Array1.set buffer i (Char.code (Bytes.get bytes i)) 20 - done; 21 - oneshot buffer (Int64.of_int len) 22 - 23 - let hash_string str = 24 - let len = String.length str in 25 - let buffer = Array1.create int8_unsigned c_layout len in 26 - for i = 0 to len - 1 do 27 - Array1.set buffer i (Char.code str.[i]) 28 - done; 29 - oneshot buffer (Int64.of_int len) 30 - 31 - (* Utilities *) 32 - 33 - let digest_to_hex digest = 34 - let hex_of_byte b = 35 - Printf.sprintf "%02x" b 36 - in 37 - let buf = Buffer.create 64 in 38 - for i = 0 to 31 do 39 - Buffer.add_string buf (hex_of_byte (Array1.get digest i)) 40 - done; 41 - Buffer.contents buf 42 - 43 - let digest_to_bytes digest = 44 - let bytes = Bytes.create 32 in 45 - for i = 0 to 31 do 46 - Bytes.set bytes i (Char.chr (Array1.get digest i)) 47 - done; 48 - bytes 49 - 50 - let digest_equal d1 d2 = 51 - let rec compare i = 52 - if i >= 32 then true 53 - else if Array1.get d1 i <> Array1.get d2 i then false 54 - else compare (i + 1) 55 - in 56 - compare 0 57 - 58 - (* Zero-allocation variants using OxCaml features *) 59 - 60 - module Fast = struct 61 - (* Stack-allocated processing for temporary computations *) 62 - let[@inline] [@zero_alloc assume] process_block_local state block = 63 - process_block state block 64 - 65 - (* Process multiple blocks efficiently *) 66 - let[@zero_alloc assume] process_blocks state blocks num_blocks = 67 - for i = 0 to num_blocks - 1 do 68 - let offset = i * 64 in 69 - let block = Array1.sub blocks offset 64 in 70 - process_block state block 71 - done 72 - 73 - (* Parallel hashing for multiple inputs *) 74 - let parallel_hash_many par inputs = 75 - match inputs with 76 - | [] -> [] 77 - | [x] -> [hash_bytes x] 78 - | _ -> 79 - let process_batch batch = 80 - List.map hash_bytes batch 81 - in 82 - let mid = List.length inputs / 2 in 83 - let rec split n lst = 84 - if n = 0 then ([], lst) 85 - else match lst with 86 - | [] -> ([], []) 87 - | h::t -> let (l1, l2) = split (n-1) t in (h::l1, l2) 88 - in 89 - let (left, right) = split mid inputs in 90 - let left_results, right_results = 91 - Parallel.fork_join2 par 92 - (fun _ -> process_batch left) 93 - (fun _ -> process_batch right) 94 - in 95 - left_results @ right_results 96 - end
-47
lib/sha256.mli
··· 1 - (** SHA256 hardware-accelerated implementation using AMD SHA-NI instructions *) 2 - 3 - open Bigarray 4 - 5 - (** {1 Types} *) 6 - 7 - (** SHA256 state (8 x int32) *) 8 - type state = (int32, int32_elt, c_layout) Array1.t 9 - 10 - (** SHA256 digest (32 bytes) *) 11 - type digest = (int, int8_unsigned_elt, c_layout) Array1.t 12 - 13 - (** Input data buffer *) 14 - type buffer = (int, int8_unsigned_elt, c_layout) Array1.t 15 - 16 - (** {1 Low-level interface} *) 17 - 18 - (** Initialize a new SHA256 state *) 19 - val init : unit -> state 20 - 21 - (** Process a single 512-bit (64 byte) block. Buffer must be exactly 64 bytes. *) 22 - val process_block : state -> buffer -> unit 23 - 24 - (** Finalize the hash computation with padding and return digest *) 25 - val finalize : state -> buffer -> int64 -> digest 26 - 27 - (** {1 High-level interface} *) 28 - 29 - (** Compute SHA256 hash in one shot (fastest for single use) *) 30 - val oneshot : buffer -> int64 -> digest 31 - 32 - (** Compute SHA256 hash from bytes *) 33 - val hash_bytes : bytes -> digest 34 - 35 - (** Compute SHA256 hash from string *) 36 - val hash_string : string -> digest 37 - 38 - (** {1 Utilities} *) 39 - 40 - (** Convert digest to hexadecimal string *) 41 - val digest_to_hex : digest -> string 42 - 43 - (** Convert digest to bytes *) 44 - val digest_to_bytes : digest -> bytes 45 - 46 - (** Compare two digests for equality *) 47 - val digest_equal : digest -> digest -> bool
-382
lib/sha256_stubs.c
··· 1 - #include <immintrin.h> 2 - #include <stdint.h> 3 - #include <string.h> 4 - #include <caml/mlvalues.h> 5 - #include <caml/memory.h> 6 - #include <caml/alloc.h> 7 - #include <caml/bigarray.h> 8 - 9 - // Aligned storage for round constants 10 - alignas(64) static const uint32_t K256[64] = { 11 - 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5, 12 - 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5, 13 - 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3, 14 - 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174, 15 - 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc, 16 - 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da, 17 - 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7, 18 - 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967, 19 - 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13, 20 - 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85, 21 - 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3, 22 - 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070, 23 - 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5, 24 - 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3, 25 - 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208, 26 - 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 27 - }; 28 - 29 - // Initial SHA256 state values 30 - alignas(16) static const uint32_t H256_INIT[8] = { 31 - 0x6a09e667, 0xbb67ae85, 0x3c6ef372, 0xa54ff53a, 32 - 0x510e527f, 0x9b05688c, 0x1f83d9ab, 0x5be0cd19 33 - }; 34 - 35 - // Byte swap for endianness 36 - static const __m128i BSWAP_MASK = {0x0001020304050607ULL, 0x08090a0b0c0d0e0fULL}; 37 - 38 - // Process a single 512-bit block using SHA-NI instructions 39 - static void sha256_process_block_shani(uint32_t state[8], const uint8_t block[64]) { 40 - __m128i msg0, msg1, msg2, msg3; 41 - __m128i tmp; 42 - __m128i state0, state1; 43 - __m128i msg; 44 - __m128i abef_save, cdgh_save; 45 - 46 - // Load initial state 47 - tmp = _mm_loadu_si128((const __m128i*)&state[0]); 48 - state1 = _mm_loadu_si128((const __m128i*)&state[4]); 49 - 50 - // Swap byte order for initial state 51 - tmp = _mm_shuffle_epi32(tmp, 0xB1); // CDAB 52 - state1 = _mm_shuffle_epi32(state1, 0x1B); // EFGH 53 - state0 = _mm_alignr_epi8(tmp, state1, 8); // ABEF 54 - state1 = _mm_blend_epi16(state1, tmp, 0xF0); // CDGH 55 - 56 - // Save initial state 57 - abef_save = state0; 58 - cdgh_save = state1; 59 - 60 - // Load message blocks with byte swap 61 - msg0 = _mm_loadu_si128((const __m128i*)(block + 0)); 62 - msg1 = _mm_loadu_si128((const __m128i*)(block + 16)); 63 - msg2 = _mm_loadu_si128((const __m128i*)(block + 32)); 64 - msg3 = _mm_loadu_si128((const __m128i*)(block + 48)); 65 - 66 - msg0 = _mm_shuffle_epi8(msg0, BSWAP_MASK); 67 - msg1 = _mm_shuffle_epi8(msg1, BSWAP_MASK); 68 - msg2 = _mm_shuffle_epi8(msg2, BSWAP_MASK); 69 - msg3 = _mm_shuffle_epi8(msg3, BSWAP_MASK); 70 - 71 - // Rounds 0-3 72 - msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[0])); 73 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 74 - msg = _mm_shuffle_epi32(msg, 0x0E); 75 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 76 - 77 - // Rounds 4-7 78 - msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[4])); 79 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 80 - msg = _mm_shuffle_epi32(msg, 0x0E); 81 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 82 - msg0 = _mm_sha256msg1_epu32(msg0, msg1); 83 - 84 - // Rounds 8-11 85 - msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[8])); 86 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 87 - msg = _mm_shuffle_epi32(msg, 0x0E); 88 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 89 - msg1 = _mm_sha256msg1_epu32(msg1, msg2); 90 - 91 - // Rounds 12-15 92 - msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[12])); 93 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 94 - tmp = _mm_alignr_epi8(msg3, msg2, 4); 95 - msg0 = _mm_add_epi32(msg0, tmp); 96 - msg0 = _mm_sha256msg2_epu32(msg0, msg3); 97 - msg = _mm_shuffle_epi32(msg, 0x0E); 98 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 99 - msg2 = _mm_sha256msg1_epu32(msg2, msg3); 100 - 101 - // Rounds 16-19 102 - msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[16])); 103 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 104 - tmp = _mm_alignr_epi8(msg0, msg3, 4); 105 - msg1 = _mm_add_epi32(msg1, tmp); 106 - msg1 = _mm_sha256msg2_epu32(msg1, msg0); 107 - msg = _mm_shuffle_epi32(msg, 0x0E); 108 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 109 - msg3 = _mm_sha256msg1_epu32(msg3, msg0); 110 - 111 - // Rounds 20-23 112 - msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[20])); 113 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 114 - tmp = _mm_alignr_epi8(msg1, msg0, 4); 115 - msg2 = _mm_add_epi32(msg2, tmp); 116 - msg2 = _mm_sha256msg2_epu32(msg2, msg1); 117 - msg = _mm_shuffle_epi32(msg, 0x0E); 118 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 119 - msg0 = _mm_sha256msg1_epu32(msg0, msg1); 120 - 121 - // Rounds 24-27 122 - msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[24])); 123 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 124 - tmp = _mm_alignr_epi8(msg2, msg1, 4); 125 - msg3 = _mm_add_epi32(msg3, tmp); 126 - msg3 = _mm_sha256msg2_epu32(msg3, msg2); 127 - msg = _mm_shuffle_epi32(msg, 0x0E); 128 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 129 - msg1 = _mm_sha256msg1_epu32(msg1, msg2); 130 - 131 - // Rounds 28-31 132 - msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[28])); 133 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 134 - tmp = _mm_alignr_epi8(msg3, msg2, 4); 135 - msg0 = _mm_add_epi32(msg0, tmp); 136 - msg0 = _mm_sha256msg2_epu32(msg0, msg3); 137 - msg = _mm_shuffle_epi32(msg, 0x0E); 138 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 139 - msg2 = _mm_sha256msg1_epu32(msg2, msg3); 140 - 141 - // Rounds 32-35 142 - msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[32])); 143 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 144 - tmp = _mm_alignr_epi8(msg0, msg3, 4); 145 - msg1 = _mm_add_epi32(msg1, tmp); 146 - msg1 = _mm_sha256msg2_epu32(msg1, msg0); 147 - msg = _mm_shuffle_epi32(msg, 0x0E); 148 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 149 - msg3 = _mm_sha256msg1_epu32(msg3, msg0); 150 - 151 - // Rounds 36-39 152 - msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[36])); 153 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 154 - tmp = _mm_alignr_epi8(msg1, msg0, 4); 155 - msg2 = _mm_add_epi32(msg2, tmp); 156 - msg2 = _mm_sha256msg2_epu32(msg2, msg1); 157 - msg = _mm_shuffle_epi32(msg, 0x0E); 158 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 159 - msg0 = _mm_sha256msg1_epu32(msg0, msg1); 160 - 161 - // Rounds 40-43 162 - msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[40])); 163 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 164 - tmp = _mm_alignr_epi8(msg2, msg1, 4); 165 - msg3 = _mm_add_epi32(msg3, tmp); 166 - msg3 = _mm_sha256msg2_epu32(msg3, msg2); 167 - msg = _mm_shuffle_epi32(msg, 0x0E); 168 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 169 - msg1 = _mm_sha256msg1_epu32(msg1, msg2); 170 - 171 - // Rounds 44-47 172 - msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[44])); 173 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 174 - tmp = _mm_alignr_epi8(msg3, msg2, 4); 175 - msg0 = _mm_add_epi32(msg0, tmp); 176 - msg0 = _mm_sha256msg2_epu32(msg0, msg3); 177 - msg = _mm_shuffle_epi32(msg, 0x0E); 178 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 179 - msg2 = _mm_sha256msg1_epu32(msg2, msg3); 180 - 181 - // Rounds 48-51 182 - msg = _mm_add_epi32(msg0, _mm_load_si128((const __m128i*)&K256[48])); 183 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 184 - tmp = _mm_alignr_epi8(msg0, msg3, 4); 185 - msg1 = _mm_add_epi32(msg1, tmp); 186 - msg1 = _mm_sha256msg2_epu32(msg1, msg0); 187 - msg = _mm_shuffle_epi32(msg, 0x0E); 188 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 189 - msg3 = _mm_sha256msg1_epu32(msg3, msg0); 190 - 191 - // Rounds 52-55 192 - msg = _mm_add_epi32(msg1, _mm_load_si128((const __m128i*)&K256[52])); 193 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 194 - tmp = _mm_alignr_epi8(msg1, msg0, 4); 195 - msg2 = _mm_add_epi32(msg2, tmp); 196 - msg2 = _mm_sha256msg2_epu32(msg2, msg1); 197 - msg = _mm_shuffle_epi32(msg, 0x0E); 198 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 199 - 200 - // Rounds 56-59 201 - msg = _mm_add_epi32(msg2, _mm_load_si128((const __m128i*)&K256[56])); 202 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 203 - tmp = _mm_alignr_epi8(msg2, msg1, 4); 204 - msg3 = _mm_add_epi32(msg3, tmp); 205 - msg3 = _mm_sha256msg2_epu32(msg3, msg2); 206 - msg = _mm_shuffle_epi32(msg, 0x0E); 207 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 208 - 209 - // Rounds 60-63 210 - msg = _mm_add_epi32(msg3, _mm_load_si128((const __m128i*)&K256[60])); 211 - state1 = _mm_sha256rnds2_epu32(state1, state0, msg); 212 - msg = _mm_shuffle_epi32(msg, 0x0E); 213 - state0 = _mm_sha256rnds2_epu32(state0, state1, msg); 214 - 215 - // Add initial state 216 - state0 = _mm_add_epi32(state0, abef_save); 217 - state1 = _mm_add_epi32(state1, cdgh_save); 218 - 219 - // Swap byte order back and store 220 - tmp = _mm_shuffle_epi32(state0, 0x1B); // FEBA 221 - state1 = _mm_shuffle_epi32(state1, 0xB1); // DCHG 222 - state0 = _mm_blend_epi16(tmp, state1, 0xF0); // DCBA 223 - state1 = _mm_alignr_epi8(state1, tmp, 8); // HGFE 224 - 225 - _mm_storeu_si128((__m128i*)&state[0], state0); 226 - _mm_storeu_si128((__m128i*)&state[4], state1); 227 - } 228 - 229 - // OCaml interface functions 230 - 231 - // Initialize SHA256 state 232 - value oxcaml_sha256_init(value unit) { 233 - CAMLparam1(unit); 234 - CAMLlocal1(state); 235 - 236 - // Allocate bigarray for state (8 x int32) 237 - long dims[1] = {8}; 238 - state = caml_ba_alloc_dims(CAML_BA_INT32 | CAML_BA_C_LAYOUT, 1, NULL, dims); 239 - uint32_t* s = (uint32_t*)Caml_ba_data_val(state); 240 - 241 - // Copy initial values 242 - memcpy(s, H256_INIT, 32); 243 - 244 - CAMLreturn(state); 245 - } 246 - 247 - // Process a single 512-bit block 248 - value oxcaml_sha256_process_block(value state, value block) { 249 - CAMLparam2(state, block); 250 - 251 - uint32_t* s = (uint32_t*)Caml_ba_data_val(state); 252 - uint8_t* b = (uint8_t*)Caml_ba_data_val(block); 253 - 254 - sha256_process_block_shani(s, b); 255 - 256 - CAMLreturn(Val_unit); 257 - } 258 - 259 - // Finalize hash with padding and return digest 260 - value oxcaml_sha256_finalize(value state, value data, value len_v) { 261 - CAMLparam3(state, data, len_v); 262 - CAMLlocal1(result); 263 - 264 - uint32_t* s = (uint32_t*)Caml_ba_data_val(state); 265 - uint8_t* input = (uint8_t*)Caml_ba_data_val(data); 266 - uint64_t len = Int64_val(len_v); 267 - 268 - // Process full blocks 269 - uint64_t full_blocks = len / 64; 270 - for (uint64_t i = 0; i < full_blocks; i++) { 271 - sha256_process_block_shani(s, input + i * 64); 272 - } 273 - 274 - // Handle final block with padding 275 - uint8_t final_block[128] = {0}; // Max 2 blocks for padding 276 - uint64_t remaining = len % 64; 277 - 278 - // Copy remaining bytes 279 - if (remaining > 0) { 280 - memcpy(final_block, input + full_blocks * 64, remaining); 281 - } 282 - 283 - // Add padding 284 - final_block[remaining] = 0x80; 285 - 286 - // Add length in bits at the end 287 - uint64_t bit_len = len * 8; 288 - if (remaining >= 56) { 289 - // Need two blocks 290 - sha256_process_block_shani(s, final_block); 291 - memset(final_block, 0, 64); 292 - } 293 - 294 - // Add bit length (big-endian) 295 - final_block[56] = (bit_len >> 56) & 0xFF; 296 - final_block[57] = (bit_len >> 48) & 0xFF; 297 - final_block[58] = (bit_len >> 40) & 0xFF; 298 - final_block[59] = (bit_len >> 32) & 0xFF; 299 - final_block[60] = (bit_len >> 24) & 0xFF; 300 - final_block[61] = (bit_len >> 16) & 0xFF; 301 - final_block[62] = (bit_len >> 8) & 0xFF; 302 - final_block[63] = bit_len & 0xFF; 303 - 304 - sha256_process_block_shani(s, final_block); 305 - 306 - // Create result bigarray (32 bytes) 307 - long dims[1] = {32}; 308 - result = caml_ba_alloc_dims(CAML_BA_UINT8 | CAML_BA_C_LAYOUT, 1, NULL, dims); 309 - uint8_t* res = (uint8_t*)Caml_ba_data_val(result); 310 - 311 - // Convert to big-endian bytes 312 - for (int i = 0; i < 8; i++) { 313 - res[i*4 + 0] = (s[i] >> 24) & 0xFF; 314 - res[i*4 + 1] = (s[i] >> 16) & 0xFF; 315 - res[i*4 + 2] = (s[i] >> 8) & 0xFF; 316 - res[i*4 + 3] = s[i] & 0xFF; 317 - } 318 - 319 - CAMLreturn(result); 320 - } 321 - 322 - // Fast one-shot SHA256 323 - value oxcaml_sha256_oneshot(value data, value len_v) { 324 - CAMLparam2(data, len_v); 325 - CAMLlocal1(result); 326 - 327 - uint8_t* input = (uint8_t*)Caml_ba_data_val(data); 328 - uint64_t len = Int64_val(len_v); 329 - 330 - // Local state 331 - alignas(16) uint32_t state[8]; 332 - memcpy(state, H256_INIT, 32); 333 - 334 - // Process full blocks 335 - uint64_t full_blocks = len / 64; 336 - for (uint64_t i = 0; i < full_blocks; i++) { 337 - sha256_process_block_shani(state, input + i * 64); 338 - } 339 - 340 - // Handle final block with padding 341 - alignas(64) uint8_t final_block[128] = {0}; 342 - uint64_t remaining = len % 64; 343 - 344 - if (remaining > 0) { 345 - memcpy(final_block, input + full_blocks * 64, remaining); 346 - } 347 - 348 - final_block[remaining] = 0x80; 349 - 350 - uint64_t bit_len = len * 8; 351 - if (remaining >= 56) { 352 - sha256_process_block_shani(state, final_block); 353 - memset(final_block, 0, 64); 354 - } 355 - 356 - // Add bit length (big-endian) 357 - final_block[56] = (bit_len >> 56) & 0xFF; 358 - final_block[57] = (bit_len >> 48) & 0xFF; 359 - final_block[58] = (bit_len >> 40) & 0xFF; 360 - final_block[59] = (bit_len >> 32) & 0xFF; 361 - final_block[60] = (bit_len >> 24) & 0xFF; 362 - final_block[61] = (bit_len >> 16) & 0xFF; 363 - final_block[62] = (bit_len >> 8) & 0xFF; 364 - final_block[63] = bit_len & 0xFF; 365 - 366 - sha256_process_block_shani(state, final_block); 367 - 368 - // Create result bigarray 369 - long dims[1] = {32}; 370 - result = caml_ba_alloc_dims(CAML_BA_UINT8 | CAML_BA_C_LAYOUT, 1, NULL, dims); 371 - uint8_t* res = (uint8_t*)Caml_ba_data_val(result); 372 - 373 - // Convert to big-endian bytes 374 - for (int i = 0; i < 8; i++) { 375 - res[i*4 + 0] = (state[i] >> 24) & 0xFF; 376 - res[i*4 + 1] = (state[i] >> 16) & 0xFF; 377 - res[i*4 + 2] = (state[i] >> 8) & 0xFF; 378 - res[i*4 + 3] = state[i] & 0xFF; 379 - } 380 - 381 - CAMLreturn(result); 382 - }
+32
oxsha.opam
··· 1 + # This file is generated by dune, edit dune-project instead 2 + opam-version: "2.0" 3 + version: "0.1" 4 + synopsis: "Fast SHA256 hashing library" 5 + description: 6 + "OCaml bindings to a C SHA256 implementation using bigarrays for efficient, zero-copy hashing" 7 + maintainer: ["Anil Madhavapeddy"] 8 + authors: ["Anil Madhavapeddy"] 9 + license: "ISC" 10 + homepage: "https://github.com/avsm/oxsha" 11 + bug-reports: "https://github.com/avsm/oxsha/issues" 12 + depends: [ 13 + "dune" {>= "3.20"} 14 + "ocaml" {>= "5.3"} 15 + "odoc" {with-doc} 16 + ] 17 + build: [ 18 + ["dune" "subst"] {dev} 19 + [ 20 + "dune" 21 + "build" 22 + "-p" 23 + name 24 + "-j" 25 + jobs 26 + "@install" 27 + "@runtest" {with-test} 28 + "@doc" {with-doc} 29 + ] 30 + ] 31 + dev-repo: "git+https://github.com/avsm/oxsha.git" 32 + x-maintenance-intent: ["(latest)"]
+3 -4
test/dune
··· 1 - (executable 2 - (name test_sha256) 3 - (libraries sha256 unix) 4 - (modes native)) 1 + (test 2 + (name speed_test) 3 + (libraries cryptokit oxsha unix))
+171
test/speed_test.ml
··· 1 + (* Speed test comparing system sha256sum with Cryptokit and oxsha implementations *) 2 + 3 + let deadbeef_pattern = "\xde\xad\xbe\xef" 4 + 5 + (* Convert bytes to hex string *) 6 + let hex_of_bytes bytes = 7 + let buf = Buffer.create (Bytes.length bytes * 2) in 8 + Bytes.iter 9 + (fun c -> Buffer.add_string buf (Printf.sprintf "%02x" (Char.code c))) 10 + bytes; 11 + Buffer.contents buf 12 + 13 + (* Create a 2GB file filled with 0xdeadbeef pattern *) 14 + let create_test_file filename size = 15 + Printf.printf "Creating %s (%d bytes = %.2f GB)...\n%!" 16 + filename size (float_of_int size /. (1024.0 *. 1024.0 *. 1024.0)); 17 + 18 + let oc = open_out_bin filename in 19 + let chunk_size = 1024 * 1024 in (* 1 MB chunks *) 20 + let chunk = Bytes.make chunk_size '\x00' in 21 + 22 + (* Fill chunk with 0xdeadbeef pattern *) 23 + for i = 0 to chunk_size - 1 do 24 + Bytes.set chunk i deadbeef_pattern.[i mod 4] 25 + done; 26 + 27 + let chunks = size / chunk_size in 28 + let remainder = size mod chunk_size in 29 + 30 + for i = 0 to chunks - 1 do 31 + output_bytes oc chunk; 32 + if i mod 100 = 0 then ( 33 + Printf.printf "\rProgress: %.1f%%..." 34 + (float_of_int i *. 100.0 /. float_of_int chunks); 35 + flush stdout 36 + ) 37 + done; 38 + 39 + if remainder > 0 then 40 + output oc chunk 0 remainder; 41 + 42 + close_out oc; 43 + Printf.printf "\rProgress: 100.0%%... Done!\n%!" 44 + 45 + (* SHA-256 using Cryptokit *) 46 + let sha256sum_cryptokit filename = 47 + let hash = Cryptokit.Hash.sha256 () in 48 + let digest = 49 + In_channel.with_open_bin filename 50 + (Cryptokit.hash_channel hash) 51 + in 52 + let hex_digest = 53 + Cryptokit.transform_string 54 + (Cryptokit.Hexa.encode ()) digest 55 + in 56 + hex_digest 57 + 58 + (* SHA-256 using system command *) 59 + let sha256sum_system filename = 60 + let cmd = Printf.sprintf "sha256sum %s" (Filename.quote filename) in 61 + let ic = Unix.open_process_in cmd in 62 + let line = input_line ic in 63 + let _ = Unix.close_process_in ic in 64 + (* sha256sum outputs: "<hash> <filename>" *) 65 + let hash = String.sub line 0 64 in 66 + hash 67 + 68 + (* SHA-256 using oxsha with Unix.map_file *) 69 + let sha256sum_oxsha filename = 70 + let fd = Unix.openfile filename [ Unix.O_RDONLY ] 0 in 71 + let stats = Unix.fstat fd in 72 + let file_size = stats.Unix.st_size in 73 + 74 + if file_size = 0 then ( 75 + (* Handle empty files *) 76 + Unix.close fd; 77 + let digest = Oxsha.hash_string "" in 78 + hex_of_bytes digest 79 + ) else ( 80 + let mapped = 81 + Unix.map_file fd Bigarray.char Bigarray.c_layout false [| file_size |] 82 + in 83 + let ba = Bigarray.array1_of_genarray mapped in 84 + Unix.close fd; 85 + 86 + let digest = Oxsha.hash ba in 87 + hex_of_bytes digest 88 + ) 89 + 90 + (* Time a function execution *) 91 + let time_function name f = 92 + Printf.printf "\nRunning %s...\n%!" name; 93 + let start = Unix.gettimeofday () in 94 + let result = f () in 95 + let elapsed = Unix.gettimeofday () -. start in 96 + Printf.printf "%s completed in %.3f seconds\n%!" name elapsed; 97 + (result, elapsed) 98 + 99 + let () = 100 + let test_file = "test_2gb.bin" in 101 + let file_size = 2 * 1024 * 1024 * 1024 in (* 2 GB *) 102 + 103 + Printf.printf "=== SHA-256 Speed Test ===\n\n"; 104 + 105 + (* Create test file if it doesn't exist *) 106 + if not (Sys.file_exists test_file) then 107 + create_test_file test_file file_size 108 + else 109 + Printf.printf "Test file %s already exists, using existing file.\n%!" test_file; 110 + 111 + (* Test system sha256sum *) 112 + let (hash_system, time_system) = 113 + time_function "system sha256sum" (fun () -> sha256sum_system test_file) in 114 + Printf.printf "Hash: %s\n" hash_system; 115 + 116 + (* Test Cryptokit implementation *) 117 + let (hash_cryptokit, time_cryptokit) = 118 + time_function "Cryptokit sha256sum" (fun () -> sha256sum_cryptokit test_file) in 119 + Printf.printf "Hash: %s\n" hash_cryptokit; 120 + 121 + (* Test oxsha implementation *) 122 + let (hash_oxsha, time_oxsha) = 123 + time_function "oxsha (mmap)" (fun () -> sha256sum_oxsha test_file) in 124 + Printf.printf "Hash: %s\n" hash_oxsha; 125 + 126 + (* Compare results *) 127 + Printf.printf "\n=== Results ===\n"; 128 + Printf.printf "System sha256sum: %.3f seconds (%.2f MB/s)\n" 129 + time_system 130 + (float_of_int file_size /. time_system /. (1024.0 *. 1024.0)); 131 + Printf.printf "Cryptokit sha256sum: %.3f seconds (%.2f MB/s)\n" 132 + time_cryptokit 133 + (float_of_int file_size /. time_cryptokit /. (1024.0 *. 1024.0)); 134 + Printf.printf "oxsha (mmap): %.3f seconds (%.2f MB/s)\n" 135 + time_oxsha 136 + (float_of_int file_size /. time_oxsha /. (1024.0 *. 1024.0)); 137 + 138 + (* Find fastest *) 139 + let times = [ 140 + ("System sha256sum", time_system); 141 + ("Cryptokit", time_cryptokit); 142 + ("oxsha (mmap)", time_oxsha) 143 + ] in 144 + let fastest_name, fastest_time = 145 + List.fold_left (fun (n, t) (n', t') -> if t' < t then (n', t') else (n, t)) 146 + (List.hd times) (List.tl times) 147 + in 148 + Printf.printf "\nFastest: %s\n" fastest_name; 149 + List.iter (fun (name, time) -> 150 + if name <> fastest_name then 151 + Printf.printf " %s is %.2fx faster than %s\n" 152 + fastest_name (time /. fastest_time) name 153 + ) times; 154 + 155 + (* Verify hashes match *) 156 + let hash_system_lower = String.lowercase_ascii hash_system in 157 + let hash_cryptokit_lower = String.lowercase_ascii hash_cryptokit in 158 + let hash_oxsha_lower = String.lowercase_ascii hash_oxsha in 159 + 160 + if hash_system_lower = hash_cryptokit_lower && hash_system_lower = hash_oxsha_lower then 161 + Printf.printf "\n✓ All hashes match!\n" 162 + else ( 163 + Printf.printf "\n✗ ERROR: Hashes do not match!\n"; 164 + Printf.printf " System: %s\n" hash_system; 165 + Printf.printf " Cryptokit: %s\n" hash_cryptokit; 166 + Printf.printf " oxsha: %s\n" hash_oxsha; 167 + exit 1 168 + ); 169 + 170 + Printf.printf "\nNote: Test file %s has been preserved for future runs.\n" test_file; 171 + Printf.printf " Delete it manually if you want to recreate it.\n"
-124
test/test_sha256.ml
··· 1 - open Sha256 2 - 3 - (* Test vectors from NIST *) 4 - let test_vectors = [ 5 - ("", "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855"); 6 - ("abc", "ba7816bf8f01cfea414140de5dae2223b00361a396177a9cb410ff61f20015ad"); 7 - ("abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq", 8 - "248d6a61d20638b8e5c026930c3e6039a33ce45964ff2167f6ecedd419db06c1"); 9 - ("The quick brown fox jumps over the lazy dog", 10 - "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592"); 11 - (String.make 1000000 'a', 12 - "cdc76e5c9914fb9281a1c7e284d73e67f1809a48a497200e046d39ccc7112cd0"); 13 - ] 14 - 15 - let test_basic () = 16 - print_endline "Testing basic SHA256 functionality..."; 17 - List.iter (fun (input, expected) -> 18 - let digest = hash_string input in 19 - let hex = digest_to_hex digest in 20 - if hex = expected then 21 - Printf.printf " ✓ Test passed for input length %d\n" (String.length input) 22 - else begin 23 - Printf.printf " ✗ Test FAILED for input: %S\n" 24 - (if String.length input > 50 then 25 - String.sub input 0 50 ^ "..." 26 - else input); 27 - Printf.printf " Expected: %s\n" expected; 28 - Printf.printf " Got: %s\n" hex 29 - end 30 - ) test_vectors 31 - 32 - let benchmark () = 33 - print_endline "\nBenchmarking SHA256 performance..."; 34 - 35 - (* Test different input sizes *) 36 - let sizes = [64; 256; 1024; 4096; 16384; 65536; 1048576] in 37 - 38 - List.iter (fun size -> 39 - let data = String.make size 'x' in 40 - let start = Unix.gettimeofday () in 41 - let iterations = if size > 10000 then 1000 else 10000 in 42 - 43 - for _ = 1 to iterations do 44 - ignore (hash_string data) 45 - done; 46 - 47 - let elapsed = Unix.gettimeofday () -. start in 48 - let throughput = (float_of_int (size * iterations)) /. elapsed /. 1_000_000.0 in 49 - Printf.printf " Size: %7d bytes | Iterations: %6d | Time: %.3fs | Throughput: %.1f MB/s\n" 50 - size iterations elapsed throughput 51 - ) sizes 52 - 53 - let test_incremental () = 54 - print_endline "\nTesting incremental hashing..."; 55 - 56 - (* Create test data *) 57 - let data = "The quick brown fox jumps over the lazy dog" in 58 - let expected = "d7a8fbb307d7809469ca9abcb0082e4f8d5651e46d3cdb762d02d0bf37c9e592" in 59 - 60 - (* Hash using oneshot *) 61 - let digest1 = hash_string data in 62 - let hex1 = digest_to_hex digest1 in 63 - 64 - (* Hash using incremental API *) 65 - let state = init () in 66 - let bytes = Bytes.of_string data in 67 - let buffer = Bigarray.Array1.create Bigarray.int8_unsigned Bigarray.c_layout (String.length data) in 68 - for i = 0 to String.length data - 1 do 69 - Bigarray.Array1.set buffer i (Char.code data.[i]) 70 - done; 71 - 72 - let digest2 = finalize state buffer (Int64.of_int (String.length data)) in 73 - let hex2 = digest_to_hex digest2 in 74 - 75 - if hex1 = expected && hex2 = expected then 76 - print_endline " ✓ Incremental hashing works correctly" 77 - else begin 78 - print_endline " ✗ Incremental hashing FAILED"; 79 - Printf.printf " Expected: %s\n" expected; 80 - Printf.printf " Oneshot: %s\n" hex1; 81 - Printf.printf " Incremental: %s\n" hex2 82 - end 83 - 84 - let test_parallel () = 85 - print_endline "\nTesting parallel hashing..."; 86 - 87 - (* Create test data *) 88 - let num_hashes = 100 in 89 - let inputs = List.init num_hashes (fun i -> 90 - Printf.sprintf "Test string number %d with some padding to make it longer" i 91 - |> Bytes.of_string 92 - ) in 93 - 94 - (* Sequential hashing *) 95 - let start_seq = Unix.gettimeofday () in 96 - let results_seq = List.map hash_bytes inputs in 97 - let time_seq = Unix.gettimeofday () -. start_seq in 98 - 99 - (* Parallel hashing *) 100 - let par = Parallel.create () in 101 - let start_par = Unix.gettimeofday () in 102 - let results_par = Fast.parallel_hash_many par inputs in 103 - let time_par = Unix.gettimeofday () -. start_par in 104 - 105 - (* Verify results match *) 106 - let results_match = 107 - List.for_all2 (fun d1 d2 -> digest_equal d1 d2) results_seq results_par 108 - in 109 - 110 - if results_match then begin 111 - Printf.printf " ✓ Parallel hashing produces correct results\n"; 112 - Printf.printf " Sequential: %.3fs\n" time_seq; 113 - Printf.printf " Parallel: %.3fs\n" time_par; 114 - Printf.printf " Speedup: %.2fx\n" (time_seq /. time_par) 115 - end else 116 - print_endline " ✗ Parallel hashing produced different results!" 117 - 118 - let () = 119 - print_endline "SHA256 Hardware Accelerated Test Suite"; 120 - print_endline "======================================"; 121 - test_basic (); 122 - test_incremental (); 123 - test_parallel (); 124 - benchmark ()