···11+.. _cartool:
22+33+cartool
44+=======
55+66+``cartool`` aims to be a CLI swiss-army-knife for analysing and modifying atproto repos stored inside CAR files.
77+88+.. code-block:: text
99+1010+ USAGE: cartool COMMAND [args...]
1111+1212+ Available commands:
1313+ info <car_path> : print CAR header and repo info
1414+ list <car_path> : list all records in the CAR (values as CIDs)
1515+ dump <car_path> : dump all records in the CAR (values as JSON)
1616+ dump_record <car_path> <key> : dump a single record, keyed on ('collection/rkey')
1717+ compact <car_in> <car_out> : rewrite the whole CAR, dropping any duplicated or unreferenced blocks
1818+ diff <car_a> <car_b> : list the record diff between two CAR files
1919+
+42
docs/overview.rst
···11+.. _overview:
22+33+Library Overview
44+================
55+66+If you have some `atproto repository <https://atproto.com/specs/repository>`_ data, and you want to operate on it with Python, you've come to the right place [1]_. The APIs offered here are rather low-level, but I'm planning on adding higher-level helper utilities in the future.
77+88+.. [1] Maybe also check out `arroba <https://github.com/snarfed/arroba>`_!
99+1010+=============
1111+Block Storage
1212+=============
1313+1414+The foundations of repos are content-addressed Blocks of data, as in the `IPLD <https://ipld.io/docs/motivation/benefits-of-content-addressing/>`_ data model. The abstract :meth:`~atmst.blockstore.BlockStore` interface facilitates access to blocks, agnostic of the underlying storage medium. The following implementations are available:
1515+1616+* :meth:`~atmst.blockstore.MemoryBlockStore` - stores blocks in memory only (inside a dict)
1717+1818+* :meth:`~atmst.blockstore.car_file.ReadOnlyCARBlockStore` - accesses the contents of a CAR file.
1919+2020+* :meth:`~atmst.blockstore.SqliteBlockStore` - accesses blocks stored in a table of an sqlite database.
2121+2222+Finally, the :meth:`~atmst.blockstore.OverlayBlockStore` class allows you to layer one BlockStore over another, with writes going to the top layer only. This is useful in several scenarios, for example, reading blocks from two CAR files at once so that you can diff them, or for staging modifications in memory ready to be committed to persistent storage.
2323+2424+===================
2525+Merkle Search Trees
2626+===================
2727+2828+With a BlockStore, we can read and write content-addressed blocks of data. Content-addressing is cool, but sometimes you want mutability. The `Merkle Search Tree <https://inria.hal.science/hal-02303490/document>`_ data structure builds on top of content-addressed Block storage, providing a mutable map of keys onto values. In atproto, the keys are arbitrary strings (under certain constraints), and the values are "records".
2929+3030+Everything is still immutable under the hood, so modifying an MST results in a new root hash.
3131+3232+:py:mod:`atmst` doesn't have a dedicated class to represent an MST (yet?), instead we just reference the root node by CID.
3333+3434+=====
3535+Nodes
3636+=====
3737+3838+An MST is comprised of one or more Nodes. :py:mod:`atmst` represents Nodes using :meth:`~atmst.mst.node.MSTNode`, an immutable dataclass.
3939+4040+Nodes are ultimately stored in a BlockStore, serialised as `DAG-CBOR <https://ipld.io/docs/codecs/known/dag-cbor/>`_, and the :meth:`~atmst.mst.node_store.NodeStore` class facilitates this. A NodeStore also maintains an LRU cache, mapping CIDs to MSTNode objects, to reduce the impact of BlockStore read latency, hash verification, and deserialisation overheads.
4141+4242+The :meth:`~atmst.mst.node_wrangler.NodeWrangler` class facilitates modifications to MSTs, and the :meth:`~atmst.mst.node_walker.NodeWalker` class facilitates access to MSTs, which the :meth:`~atmst.mst.diff.mst_diff` method makes use of.