an efficient binary archive format
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

init

zach 13a55bb6 90fa97c3

+116 -1
+1
Cargo.toml
··· 17 17 zerocopy = { version = "0.8.38", features = ["std", "derive"] } 18 18 zstd = "0.13.3" 19 19 clap = { version = "4.5", features = ["derive"], optional = true } 20 + fs2 = "0.4.3" 20 21 21 22 [features] 22 23 default = ["cli"]
+15
LICENSE.md
··· 1 + ISC License 2 + 3 + Copyright (c) 2026, Zach Shipko 4 + 5 + Permission to use, copy, modify, and/or distribute this software for any 6 + purpose with or without fee is hereby granted, provided that the above 7 + copyright notice and this permission notice appear in all copies. 8 + 9 + THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 10 + WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 11 + MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 12 + ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 13 + WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 14 + ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 15 + OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+4
README.md
··· 1 + # bindle-file 2 + 3 + `bindle` is a general purpose, append only archive file format 4 +
+83
SPEC.md
··· 1 + # Bindle File Format (.bdnl) 2 + 3 + Bindle is a simple append-only binary archive format. It features a trailing index to support efficient writes and memory-mapped reads. 4 + 5 + --- 6 + 7 + ## 1. High-Level Layout 8 + 9 + The file contains an 8 byte signature, followed by the data and then the metadata map at the end of the file. 10 + 11 + | Offset | Component | Description | 12 + | :--- | :--- | :--- | 13 + | `0x00` | **Header** | 8-byte magic identification string. | 14 + | `0x08` | **Data Payload** | Sequential blobs of raw or compressed data. | 15 + | `Variable` | **Index** | A sequence of metadata entries and filenames. | 16 + | `EOF - 20` | **Footer** | Pointer to the index and file count. | 17 + 18 + --- 19 + 20 + ## 2. Components 21 + 22 + ### 2.1 Header 23 + Every Bindle file MUST begin with the following 8 bytes: 24 + `42 49 4e 44 4c 30 30 31` (ASCII: `BINDL001`) 25 + 26 + ### 2.2 Data Segment 27 + Data blobs are stored starting at offset `0x08`. 28 + - Each blob SHOULD be aligned to an **8-byte boundary** to ensure optimal performance when memory-mapping the file. 29 + - Data can be stored as-is (Raw) or compressed using **Zstandard (zstd)**. 30 + 31 + ### 2.3 Index Entry (`Entry`) 32 + The index consists of a series of entries. Each entry is a fixed-size header followed immediately by a variable-length UTF-8 filename. 33 + 34 + | Field | Size | Type | Description | 35 + | :--- | :--- | :--- | :--- | 36 + | `offset` | 8 bytes | u64 | Absolute file offset to start of data. | 37 + | `c_size` | 8 bytes | u64 | Compressed size on disk. | 38 + | `u_size` | 8 bytes | u64 | Original uncompressed size. | 39 + | `crc32` | 4 bytes | u32 | Checksum of the stored data. | 40 + | `name_len` | 2 bytes | u16 | Length of the following filename string. | 41 + | `comp_type` | 1 byte | u8 | `0` = Raw, `1` = Zstd. | 42 + | `reserved` | 1 byte | u8 | Alignment padding. | 43 + 44 + **Padding:** After the filename string, the file MUST be padded with null bytes until the next 8-byte boundary is reached. 45 + 46 + ### 2.4 Footer 47 + The last 20 bytes of the file contain the lookup information required to parse the archive. 48 + 49 + | Field | Size | Type | Description | 50 + | :--- | :--- | :--- | :--- | 51 + | `index_offset` | 8 bytes | u64 | Absolute offset to the start of the Index. | 52 + | `entry_count` | 4 bytes | u32 | Total number of entries in the file. | 53 + 54 + --- 55 + 56 + ## 3. Implementation Guidelines 57 + 58 + ### 3.1 Reading Logic 59 + To read a Bindle file: 60 + 1. Validate the file size (must be at least 28 bytes). 61 + 2. Read the first 8 bytes and the last 8 bytes to verify the `BINDL001` magic. 62 + 3. Read the `index_offset` from the footer (EOF - 20). 63 + 4. Seek to `index_offset` and iterate `entry_count` times to populate an in-memory map of files. 64 + 65 + ### 3.2 Writing Logic (Atomic Updates) 66 + To maintain an append-only structure: 67 + 1. Seek to the `index_offset` found in the current footer (effectively overwriting the old index). 68 + 2. Append new data blobs. 69 + 3. Write a new Index containing all previous entries plus the new ones. 70 + 4. Write a new Footer. 71 + 5. Flush/Sync the file to disk. 72 + 73 + ### 3.3 Constraints 74 + - **Unique Keys:** Duplicate filenames are not permitted. 75 + - **Null Bytes:** Filenames MUST NOT contain internal null bytes (`\0`). 76 + - **Maximum Size:** File offsets are 64-bit, supporting archives up to 16 Exabytes. 77 + 78 + --- 79 + 80 + ## 4. Design Rationale 81 + - **Trailing Index:** Allows files to be "updated" or added by simply appending to the end of the file and writing a new index. 82 + - **Alignment:** 8-byte alignment ensures that `u64` fields can be read directly from a memory-mapped pointer without unaligned access penalties on modern CPUs. 83 + - **Zero-Copy:** Raw entries can be used directly as slices from memory without decompression or copying.
+13 -1
src/lib.rs
··· 1 + use fs2::FileExt; 1 2 use memmap2::Mmap; 2 3 use std::borrow::Cow; 3 4 use std::fs::{File, OpenOptions}; ··· 46 47 .write(true) 47 48 .create(true) 48 49 .open(path)?; 50 + 51 + file.lock_shared()?; 49 52 50 53 let len = file.metadata()?.len(); 51 54 ··· 224 227 } 225 228 226 229 pub fn save(&mut self) -> io::Result<()> { 230 + self.file.lock_exclusive()?; 231 + 227 232 self.file.seek(SeekFrom::Start(self.data_end))?; 228 233 let index_start = self.data_end; 229 234 ··· 244 249 245 250 self.file.write_all(footer.as_bytes())?; 246 251 self.file.flush()?; 247 - 248 252 self.mmap = Some(unsafe { Mmap::map(&self.file)? }); 253 + self.file.lock_shared()?; 254 + 249 255 Ok(()) 250 256 } 251 257 ··· 265 271 266 272 pub fn is_empty(&self) -> bool { 267 273 self.entries.is_empty() 274 + } 275 + } 276 + 277 + impl Drop for Bindle { 278 + fn drop(&mut self) { 279 + let _ = self.file.unlock(); 268 280 } 269 281 } 270 282