Restore a rocks database from object storage
9
fork

Configure Feed

Select the types of activity you want to include in your feed.

implementation, format stuff, etc.

in general when there's a question of how to handle something: try to match the rocksdb behaviour.

backup files#

see the rocksdb wiki and the authoritative rocks backup implementation: utilities/backup/backup_engine.cc.

The format is line-oriented text. yay.

The incremental backup files are stored like this:

meta/
  1           # meta file for backup 1
  2           # meta file for backup 2
private/
  1/          # per-backup files (CURRENT, MANIFEST, OPTIONS, WALs)
  2/
shared_checksum/
  000007_2894567812_590.sst   # SSTs shared across backups, funny names

The meta files contain everything you need to restore one backup: namely, a list of files to copy. It's all not too complicated.

Schema v2 added a schema_version 2.N on the first line but is otherwise backwards compatible with v1 (which starts with the timestamp line), so both are supported:

[schema_version 2.n]        # absent for v1
<timestamp>
<sequence_number>
[metadata <hex>]             # optional app metadata
[<field> <value>]*           # unknown fields, skippable unless ni:: prefixed
<file_count>
<path> [field value]*        # many lines like this (<file_count> of them)

File fields include crc32 (actually crc32c), size, temp, and ni::excluded. Fields starting with ni:: (non-ignorable) are meant to fail to parse unless they are specifically recognized by the parser. yay forward compat.

To avoid file name collisions, rocks puts _<checksum>_<size> suffixes on files in the shared_checksum/ folder. These are just for uniqueness. During restore you pop them out and just write to <name before underscore>.<ext>, without any interpretation.

exclusion zone#

rocks has fancy weird multi-backup advanced stuff where you can reference files living in one backup from another, to avoid redundant copies, and these get marked excluded.

these are outside the scope of eat-rocks and we error if any files in the meta are excluded, since we wouldn't know where to look for them.

get CURRENT#

rocks itself uses a file called CURRENT as its entrypoint to the db. when restoring, we write all other files first, then atomically rename the new CURRENT into place, so a partial restore won't corrupt things. (just following rocks here)

with integrity#

rocksdb (accidentally?) doesn't emit a size value in the meta for files, for... some reason. so despite implementing an unconditional validation check for it, it's really not actually checked.

we do reliably get the crc32c from rocks though, and eat-rocks will check it by default. passing --no-verify (or setting RestoreOptions::verify to false) disables the check. i'm not sure why you'd want to though -- unlike a restore from the local filesystem, downloading from object storage means the full contents get streamed through us, so checking the crc is basically free.

The meta file field is called crc32 but the values are actually crc32c. grep for "WART" in rocksdb source about it. crc32c with crc32c_append for streaming works for us.

all together (concurrency)#

the concurrency limit is applied in two places -- idk if this was a great idea but hey. futures::stream::buffer_unordered limits how many files we're asking object_store to work on at any given time, and object_store::limit::LimitStore wraps the actual ObjectStore backend to apply the same limit at a lower level.

plz don't ignore#

non-ignorable fields (ni:: prefix) cause hard failures in both header and file parsing if they're not recognized. Unknown ignorable fields are silently skipped. ni::excluded is the only recognized ni:: field currently.

don't test me#

unit tests go in the modules of things they test (normal rust)

object_store::memory::InMemory is great for stubbing out object storage space with arbitrary contents.

there are some neat end-to-end tests in tests/e2e.rs which hopefully validate the whole thing going on here, down to actually generating real backups from rocksdb (via the rocksdb rust crate) and restoring them with both our implementation and rocks'.

the meta file parser is pretty simple/small but hey why not fuzz it -- try:

rustup run nightly cargo fuzz run parse_meta