docs: standardize docstrings with Google-style formatting for quartodoc

.chainlink/issues.db

This is a binary file and will not be displayed.

+6

CHANGELOG.md

··· 25 25 - **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases 26 26 27 27 ### Changed 28 + - Make docs script work from any directory (#371) 29 + - Add uv script shortcut 'docs' for documentation build (#370) 30 + - Update docstrings in local.py (#367) 31 + - Update docstrings in _protocols.py (#366) 32 + - Update docstrings in lens.py (#365) 33 + - Update docstrings in dataset.py (#364) 28 34 - Review and address human-review.md feedback (#344) 29 35 - Fix load_dataset overloads and AbstractIndex compatibility (#348) 30 36 - Connect load_dataset to index data_store for S3 credentials (#361)

+91

CLAUDE.md

··· 179 179 ... 180 180 ``` 181 181 182 + ## Docstring Formatting 183 + 184 + This project uses **Google-style docstrings** with quartodoc for API documentation generation. The most important formatting requirement is for **Example sections**. 185 + 186 + ### Example Section Format 187 + 188 + Example sections must use reStructuredText literal block syntax (`::`) to render correctly in quartodoc-generated documentation: 189 + 190 + ```python 191 + def my_function(): 192 + """Short description. 193 + 194 + Longer description if needed. 195 + 196 + Args: 197 + param: Description of parameter. 198 + 199 + Returns: 200 + Description of return value. 201 + 202 + Example: 203 + :: 204 + 205 + >>> result = my_function() 206 + >>> print(result) 207 + 'output' 208 + """ 209 + ``` 210 + 211 + **Key formatting rules:** 212 + 213 + 1. `Example:` with a colon, 4-space indented from the docstring margin 214 + 2. `::` on its own line, 8-space indented (4 more than `Example:`) 215 + 3. Blank line after `::` 216 + 4. Code examples indented 12 spaces (4 more than `::`) 217 + 5. Use `>>>` for Python prompts and `...` for continuation lines 218 + 219 + **Incorrect format (will not render properly):** 220 + ```python 221 + Example: 222 + >>> code_here() # Wrong - missing :: and extra indentation 223 + ``` 224 + 225 + **Correct format:** 226 + ```python 227 + Example: 228 + :: 229 + 230 + >>> code_here() # Correct - has :: and proper indentation 231 + ``` 232 + 233 + ### Multiple Examples 234 + 235 + For multiple examples, use the same pattern: 236 + 237 + ```python 238 + Example: 239 + :: 240 + 241 + >>> # First example 242 + >>> x = create_thing() 243 + 244 + >>> # Second example 245 + >>> y = other_thing() 246 + ``` 247 + 248 + ### Class and Method Docstrings 249 + 250 + Apply the same format to class docstrings and method docstrings: 251 + 252 + ```python 253 + class MyClass: 254 + """Class description. 255 + 256 + Example: 257 + :: 258 + 259 + >>> obj = MyClass() 260 + >>> obj.do_something() 261 + """ 262 + 263 + def method(self): 264 + """Method description. 265 + 266 + Example: 267 + :: 268 + 269 + >>> self.method() 270 + """ 271 + ``` 272 + 182 273 ## Issue Tracking 183 274 184 275 This project uses **chainlink** for issue tracking. Chainlink commands do NOT need to be prefixed with `uv run`:

+58

dev/build_docs.py

··· 1 + #!/usr/bin/env python3 2 + """Build documentation using quartodoc and quarto. 3 + 4 + This script can be run from any directory within the project. 5 + It finds the project root by locating pyproject.toml. 6 + """ 7 + 8 + import subprocess 9 + import sys 10 + from pathlib import Path 11 + 12 + 13 + def find_project_root() -> Path: 14 + """Find project root by searching for pyproject.toml.""" 15 + current = Path(__file__).resolve().parent 16 + while current != current.parent: 17 + if (current / "pyproject.toml").exists(): 18 + return current 19 + current = current.parent 20 + raise RuntimeError("Could not find project root (no pyproject.toml found)") 21 + 22 + 23 + def main() -> int: 24 + """Build documentation.""" 25 + project_root = find_project_root() 26 + docs_src = project_root / "docs_src" 27 + docs_out = project_root / "docs" 28 + 29 + if not docs_src.exists(): 30 + print(f"Error: docs_src directory not found at {docs_src}", file=sys.stderr) 31 + return 1 32 + 33 + print(f"Building docs from {docs_src}") 34 + 35 + # Run quartodoc build 36 + result = subprocess.run( 37 + ["quartodoc", "build"], 38 + cwd=docs_src, 39 + ) 40 + if result.returncode != 0: 41 + print("Error: quartodoc build failed", file=sys.stderr) 42 + return result.returncode 43 + 44 + # Run quarto render 45 + result = subprocess.run( 46 + ["quarto", "render", "--output-dir", str(docs_out)], 47 + cwd=docs_src, 48 + ) 49 + if result.returncode != 0: 50 + print("Error: quarto render failed", file=sys.stderr) 51 + return result.returncode 52 + 53 + print(f"Documentation built successfully in {docs_out}") 54 + return 0 55 + 56 + 57 + if __name__ == "__main__": 58 + sys.exit(main())

+12

dev/docs

··· 1 + #!/usr/bin/env bash 2 + # Build documentation using quartodoc and quarto. 3 + # Run from any directory in the project. 4 + 5 + set -e 6 + 7 + # Find script directory, then project root 8 + SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" 9 + PROJECT_ROOT="$(dirname "$SCRIPT_DIR")" 10 + 11 + cd "$PROJECT_ROOT" 12 + uv run python dev/build_docs.py "$@"

+133 -9

docs/api/AbstractDataStore.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.AbstractDataStore" id="toc-atdata.AbstractDataStore" class="nav-link active" data-scroll-target="#atdata.AbstractDataStore">AbstractDataStore</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.AbstractDataStore.read_url" id="toc-atdata.AbstractDataStore.read_url" class="nav-link" data-scroll-target="#atdata.AbstractDataStore.read_url">read_url</a></li> ··· 421 422 <p>Protocol for data storage operations.</p> 422 423 <p>This protocol abstracts over different storage backends for dataset data: - S3DataStore: S3-compatible object storage - PDSBlobStore: ATProto PDS blob storage (future)</p> 423 424 <p>The separation of index (metadata) from data store (actual files) allows flexible deployment: local index with S3 storage, atmosphere index with S3 storage, or atmosphere index with PDS blobs.</p> 424 - <p>Example: >>> store = S3DataStore(credentials, bucket=“my-bucket”) >>> urls = store.write_shards(dataset, prefix=“training/v1”) >>> print(urls) [‘s3://my-bucket/training/v1/shard-000000.tar’, …]</p> 425 + <section id="example" class="level2 doc-section doc-section-example"> 426 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 427 + <p>::</p> 428 + <pre><code>>>> store = S3DataStore(credentials, bucket="my-bucket") 429 + >>> urls = store.write_shards(dataset, prefix="training/v1") 430 + >>> print(urls) 431 + ['s3://my-bucket/training/v1/shard-000000.tar', ...]</code></pre> 432 + </section> 425 433 <section id="methods" class="level2"> 426 434 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 427 435 <table class="caption-top table"> ··· 448 456 </table> 449 457 <section id="atdata.AbstractDataStore.read_url" class="level3"> 450 458 <h3 class="anchored" data-anchor-id="atdata.AbstractDataStore.read_url">read_url</h3> 451 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.read_url(url)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 459 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.read_url(url)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 452 460 <p>Resolve a storage URL for reading.</p> 453 461 <p>Some storage backends may need to transform URLs (e.g., signing S3 URLs or resolving blob references). This method returns a URL that can be used directly with WebDataset.</p> 454 - <p>Args: url: Storage URL to resolve.</p> 455 - <p>Returns: WebDataset-compatible URL for reading.</p> 462 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 463 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 464 + <table class="caption-top table"> 465 + <thead> 466 + <tr class="header"> 467 + <th>Name</th> 468 + <th>Type</th> 469 + <th>Description</th> 470 + <th>Default</th> 471 + </tr> 472 + </thead> 473 + <tbody> 474 + <tr class="odd"> 475 + <td>url</td> 476 + <td><a href="`str`">str</a></td> 477 + <td>Storage URL to resolve.</td> 478 + <td><em>required</em></td> 479 + </tr> 480 + </tbody> 481 + </table> 482 + </section> 483 + <section id="returns" class="level4 doc-section doc-section-returns"> 484 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 485 + <table class="caption-top table"> 486 + <thead> 487 + <tr class="header"> 488 + <th>Name</th> 489 + <th>Type</th> 490 + <th>Description</th> 491 + </tr> 492 + </thead> 493 + <tbody> 494 + <tr class="odd"> 495 + <td></td> 496 + <td><a href="`str`">str</a></td> 497 + <td>WebDataset-compatible URL for reading.</td> 498 + </tr> 499 + </tbody> 500 + </table> 501 + </section> 456 502 </section> 457 503 <section id="atdata.AbstractDataStore.supports_streaming" class="level3"> 458 504 <h3 class="anchored" data-anchor-id="atdata.AbstractDataStore.supports_streaming">supports_streaming</h3> 459 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.supports_streaming()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 505 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.supports_streaming()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 460 506 <p>Whether this store supports streaming reads.</p> 461 - <p>Returns: True if the store supports efficient streaming (like S3), False if data must be fully downloaded first.</p> 507 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 508 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 509 + <table class="caption-top table"> 510 + <thead> 511 + <tr class="header"> 512 + <th>Name</th> 513 + <th>Type</th> 514 + <th>Description</th> 515 + </tr> 516 + </thead> 517 + <tbody> 518 + <tr class="odd"> 519 + <td></td> 520 + <td><a href="`bool`">bool</a></td> 521 + <td>True if the store supports efficient streaming (like S3),</td> 522 + </tr> 523 + <tr class="even"> 524 + <td></td> 525 + <td><a href="`bool`">bool</a></td> 526 + <td>False if data must be fully downloaded first.</td> 527 + </tr> 528 + </tbody> 529 + </table> 530 + </section> 462 531 </section> 463 532 <section id="atdata.AbstractDataStore.write_shards" class="level3"> 464 533 <h3 class="anchored" data-anchor-id="atdata.AbstractDataStore.write_shards">write_shards</h3> 465 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.write_shards(ds, <span class="op">*</span>, prefix, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 534 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>AbstractDataStore.write_shards(ds, <span class="op">*</span>, prefix, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 466 535 <p>Write dataset shards to storage.</p> 467 - <p>Args: ds: The Dataset to write. prefix: Path prefix for the shards (e.g., ‘datasets/mnist/v1’). **kwargs: Backend-specific options (e.g., maxcount for shard size).</p> 468 - <p>Returns: List of URLs for the written shards, suitable for use with WebDataset or atdata.Dataset().</p> 536 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 537 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 538 + <table class="caption-top table"> 539 + <thead> 540 + <tr class="header"> 541 + <th>Name</th> 542 + <th>Type</th> 543 + <th>Description</th> 544 + <th>Default</th> 545 + </tr> 546 + </thead> 547 + <tbody> 548 + <tr class="odd"> 549 + <td>ds</td> 550 + <td><a href="`atdata.dataset.Dataset`">Dataset</a></td> 551 + <td>The Dataset to write.</td> 552 + <td><em>required</em></td> 553 + </tr> 554 + <tr class="even"> 555 + <td>prefix</td> 556 + <td><a href="`str`">str</a></td> 557 + <td>Path prefix for the shards (e.g., ‘datasets/mnist/v1’).</td> 558 + <td><em>required</em></td> 559 + </tr> 560 + <tr class="odd"> 561 + <td>**kwargs</td> 562 + <td></td> 563 + <td>Backend-specific options (e.g., maxcount for shard size).</td> 564 + <td><code>{}</code></td> 565 + </tr> 566 + </tbody> 567 + </table> 568 + </section> 569 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 570 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 571 + <table class="caption-top table"> 572 + <thead> 573 + <tr class="header"> 574 + <th>Name</th> 575 + <th>Type</th> 576 + <th>Description</th> 577 + </tr> 578 + </thead> 579 + <tbody> 580 + <tr class="odd"> 581 + <td></td> 582 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 583 + <td>List of URLs for the written shards, suitable for use with</td> 584 + </tr> 585 + <tr class="even"> 586 + <td></td> 587 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 588 + <td>WebDataset or atdata.Dataset().</td> 589 + </tr> 590 + </tbody> 591 + </table> 469 592 470 593 594 + </section> 471 595 </section> 472 596 </section> 473 597 </section>

+393 -25

docs/api/AbstractIndex.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.AbstractIndex" id="toc-atdata.AbstractIndex" class="nav-link active" data-scroll-target="#atdata.AbstractIndex">AbstractIndex</a> 400 400 <ul class="collapse"> 401 + <li><a href="#optional-extensions" id="toc-optional-extensions" class="nav-link" data-scroll-target="#optional-extensions">Optional Extensions</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 403 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 404 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 405 <ul class="collapse"> ··· 426 428 <p>Protocol for index operations - implemented by LocalIndex and AtmosphereIndex.</p> 427 429 <p>This protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses</p> 428 430 <p>A single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.</p> 429 - <p>Optional Extensions: Some index implementations support additional features: - <code>data_store</code>: An AbstractDataStore for reading/writing dataset shards. If present, <code>load_dataset</code> will use it for S3 credential resolution.</p> 430 - <p>Example: >>> def publish_and_list(index: AbstractIndex) -> None: … # Publish schemas for different types … schema1 = index.publish_schema(ImageSample, version=“1.0.0”) … schema2 = index.publish_schema(TextSample, version=“1.0.0”) … … # Insert datasets of different types … index.insert_dataset(image_ds, name=“images”) … index.insert_dataset(text_ds, name=“texts”) … … # List all datasets (mixed types) … for entry in index.list_datasets(): … print(f”{entry.name} -> {entry.schema_ref}“)</p> 431 + <section id="optional-extensions" class="level2 doc-section doc-section-optional-extensions"> 432 + <h2 class="doc-section doc-section-optional-extensions anchored" data-anchor-id="optional-extensions">Optional Extensions</h2> 433 + <p>Some index implementations support additional features: - <code>data_store</code>: An AbstractDataStore for reading/writing dataset shards. If present, <code>load_dataset</code> will use it for S3 credential resolution.</p> 434 + </section> 435 + <section id="example" class="level2 doc-section doc-section-example"> 436 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 437 + <p>::</p> 438 + <pre><code>>>> def publish_and_list(index: AbstractIndex) -> None: 439 + ... # Publish schemas for different types 440 + ... schema1 = index.publish_schema(ImageSample, version="1.0.0") 441 + ... schema2 = index.publish_schema(TextSample, version="1.0.0") 442 + ... 443 + ... # Insert datasets of different types 444 + ... index.insert_dataset(image_ds, name="images") 445 + ... index.insert_dataset(text_ds, name="texts") 446 + ... 447 + ... # List all datasets (mixed types) 448 + ... for entry in index.list_datasets(): 449 + ... print(f"{entry.name} -> {entry.schema_ref}")</code></pre> 450 + </section> 431 451 <section id="attributes" class="level2"> 432 452 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 433 453 <table class="caption-top table"> ··· 491 511 </table> 492 512 <section id="atdata.AbstractIndex.decode_schema" class="level3"> 493 513 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.decode_schema">decode_schema</h3> 494 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.decode_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 514 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.decode_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 495 515 <p>Reconstruct a Python Packable type from a stored schema.</p> 496 516 <p>This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.</p> 497 - <p>Args: ref: Schema reference string (local:// or at://).</p> 498 - <p>Returns: A dynamically generated Packable class with fields matching the schema definition. The class can be used with <code>Dataset[T]</code> to load and iterate over samples.</p> 499 - <p>Raises: KeyError: If schema not found. ValueError: If schema cannot be decoded (unsupported field types).</p> 500 - <p>Example: >>> entry = index.get_dataset(“my-dataset”) >>> SampleType = index.decode_schema(entry.schema_ref) >>> ds = Dataset<a href="entry.data_urls[0]">SampleType</a> >>> for sample in ds.ordered(): … print(sample) # sample is instance of SampleType</p> 517 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 518 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 519 + <table class="caption-top table"> 520 + <thead> 521 + <tr class="header"> 522 + <th>Name</th> 523 + <th>Type</th> 524 + <th>Description</th> 525 + <th>Default</th> 526 + </tr> 527 + </thead> 528 + <tbody> 529 + <tr class="odd"> 530 + <td>ref</td> 531 + <td><a href="`str`">str</a></td> 532 + <td>Schema reference string (local:// or at://).</td> 533 + <td><em>required</em></td> 534 + </tr> 535 + </tbody> 536 + </table> 537 + </section> 538 + <section id="returns" class="level4 doc-section doc-section-returns"> 539 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 540 + <table class="caption-top table"> 541 + <thead> 542 + <tr class="header"> 543 + <th>Name</th> 544 + <th>Type</th> 545 + <th>Description</th> 546 + </tr> 547 + </thead> 548 + <tbody> 549 + <tr class="odd"> 550 + <td></td> 551 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 552 + <td>A dynamically generated Packable class with fields matching</td> 553 + </tr> 554 + <tr class="even"> 555 + <td></td> 556 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 557 + <td>the schema definition. The class can be used with</td> 558 + </tr> 559 + <tr class="odd"> 560 + <td></td> 561 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 562 + <td><code>Dataset[T]</code> to load and iterate over samples.</td> 563 + </tr> 564 + </tbody> 565 + </table> 566 + </section> 567 + <section id="raises" class="level4 doc-section doc-section-raises"> 568 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 569 + <table class="caption-top table"> 570 + <thead> 571 + <tr class="header"> 572 + <th>Name</th> 573 + <th>Type</th> 574 + <th>Description</th> 575 + </tr> 576 + </thead> 577 + <tbody> 578 + <tr class="odd"> 579 + <td></td> 580 + <td><a href="`KeyError`">KeyError</a></td> 581 + <td>If schema not found.</td> 582 + </tr> 583 + <tr class="even"> 584 + <td></td> 585 + <td><a href="`ValueError`">ValueError</a></td> 586 + <td>If schema cannot be decoded (unsupported field types).</td> 587 + </tr> 588 + </tbody> 589 + </table> 590 + </section> 591 + <section id="example-1" class="level4 doc-section doc-section-example"> 592 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 593 + <p>::</p> 594 + <pre><code>>>> entry = index.get_dataset("my-dataset") 595 + >>> SampleType = index.decode_schema(entry.schema_ref) 596 + >>> ds = Dataset[SampleType](entry.data_urls[0]) 597 + >>> for sample in ds.ordered(): 598 + ... print(sample) # sample is instance of SampleType</code></pre> 599 + </section> 501 600 </section> 502 601 <section id="atdata.AbstractIndex.get_dataset" class="level3"> 503 602 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.get_dataset">get_dataset</h3> 504 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 603 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 505 604 <p>Get a dataset entry by name or reference.</p> 506 - <p>Args: ref: Dataset name, path, or full reference string.</p> 507 - <p>Returns: IndexEntry for the dataset.</p> 508 - <p>Raises: KeyError: If dataset not found.</p> 605 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 606 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 607 + <table class="caption-top table"> 608 + <thead> 609 + <tr class="header"> 610 + <th>Name</th> 611 + <th>Type</th> 612 + <th>Description</th> 613 + <th>Default</th> 614 + </tr> 615 + </thead> 616 + <tbody> 617 + <tr class="odd"> 618 + <td>ref</td> 619 + <td><a href="`str`">str</a></td> 620 + <td>Dataset name, path, or full reference string.</td> 621 + <td><em>required</em></td> 622 + </tr> 623 + </tbody> 624 + </table> 625 + </section> 626 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 627 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 628 + <table class="caption-top table"> 629 + <thead> 630 + <tr class="header"> 631 + <th>Name</th> 632 + <th>Type</th> 633 + <th>Description</th> 634 + </tr> 635 + </thead> 636 + <tbody> 637 + <tr class="odd"> 638 + <td></td> 639 + <td><a href="`atdata._protocols.IndexEntry`">IndexEntry</a></td> 640 + <td>IndexEntry for the dataset.</td> 641 + </tr> 642 + </tbody> 643 + </table> 644 + </section> 645 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 646 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 647 + <table class="caption-top table"> 648 + <thead> 649 + <tr class="header"> 650 + <th>Name</th> 651 + <th>Type</th> 652 + <th>Description</th> 653 + </tr> 654 + </thead> 655 + <tbody> 656 + <tr class="odd"> 657 + <td></td> 658 + <td><a href="`KeyError`">KeyError</a></td> 659 + <td>If dataset not found.</td> 660 + </tr> 661 + </tbody> 662 + </table> 663 + </section> 509 664 </section> 510 665 <section id="atdata.AbstractIndex.get_schema" class="level3"> 511 666 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.get_schema">get_schema</h3> 512 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 667 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 513 668 <p>Get a schema record by reference.</p> 514 - <p>Args: ref: Schema reference string (local:// or at://).</p> 515 - <p>Returns: Schema record as a dictionary with fields like ‘name’, ‘version’, ‘fields’, etc.</p> 516 - <p>Raises: KeyError: If schema not found.</p> 669 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 670 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 671 + <table class="caption-top table"> 672 + <thead> 673 + <tr class="header"> 674 + <th>Name</th> 675 + <th>Type</th> 676 + <th>Description</th> 677 + <th>Default</th> 678 + </tr> 679 + </thead> 680 + <tbody> 681 + <tr class="odd"> 682 + <td>ref</td> 683 + <td><a href="`str`">str</a></td> 684 + <td>Schema reference string (local:// or at://).</td> 685 + <td><em>required</em></td> 686 + </tr> 687 + </tbody> 688 + </table> 689 + </section> 690 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 691 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 692 + <table class="caption-top table"> 693 + <thead> 694 + <tr class="header"> 695 + <th>Name</th> 696 + <th>Type</th> 697 + <th>Description</th> 698 + </tr> 699 + </thead> 700 + <tbody> 701 + <tr class="odd"> 702 + <td></td> 703 + <td><a href="`dict`">dict</a></td> 704 + <td>Schema record as a dictionary with fields like ‘name’, ‘version’,</td> 705 + </tr> 706 + <tr class="even"> 707 + <td></td> 708 + <td><a href="`dict`">dict</a></td> 709 + <td>‘fields’, etc.</td> 710 + </tr> 711 + </tbody> 712 + </table> 713 + </section> 714 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 715 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 716 + <table class="caption-top table"> 717 + <thead> 718 + <tr class="header"> 719 + <th>Name</th> 720 + <th>Type</th> 721 + <th>Description</th> 722 + </tr> 723 + </thead> 724 + <tbody> 725 + <tr class="odd"> 726 + <td></td> 727 + <td><a href="`KeyError`">KeyError</a></td> 728 + <td>If schema not found.</td> 729 + </tr> 730 + </tbody> 731 + </table> 732 + </section> 517 733 </section> 518 734 <section id="atdata.AbstractIndex.insert_dataset" class="level3"> 519 735 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.insert_dataset">insert_dataset</h3> 520 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.insert_dataset(ds, <span class="op">*</span>, name, schema_ref<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 736 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.insert_dataset(ds, <span class="op">*</span>, name, schema_ref<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 521 737 <p>Insert a dataset into the index.</p> 522 738 <p>The sample type is inferred from <code>ds.sample_type</code>. If schema_ref is not provided, the schema may be auto-published based on the sample type.</p> 523 - <p>Args: ds: The Dataset to register in the index (any sample type). name: Human-readable name for the dataset. schema_ref: Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type. **kwargs: Additional backend-specific options.</p> 524 - <p>Returns: IndexEntry for the inserted dataset.</p> 739 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 740 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 741 + <table class="caption-top table"> 742 + <thead> 743 + <tr class="header"> 744 + <th>Name</th> 745 + <th>Type</th> 746 + <th>Description</th> 747 + <th>Default</th> 748 + </tr> 749 + </thead> 750 + <tbody> 751 + <tr class="odd"> 752 + <td>ds</td> 753 + <td><a href="`atdata.dataset.Dataset`">Dataset</a></td> 754 + <td>The Dataset to register in the index (any sample type).</td> 755 + <td><em>required</em></td> 756 + </tr> 757 + <tr class="even"> 758 + <td>name</td> 759 + <td><a href="`str`">str</a></td> 760 + <td>Human-readable name for the dataset.</td> 761 + <td><em>required</em></td> 762 + </tr> 763 + <tr class="odd"> 764 + <td>schema_ref</td> 765 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 766 + <td>Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.</td> 767 + <td><code>None</code></td> 768 + </tr> 769 + <tr class="even"> 770 + <td>**kwargs</td> 771 + <td></td> 772 + <td>Additional backend-specific options.</td> 773 + <td><code>{}</code></td> 774 + </tr> 775 + </tbody> 776 + </table> 777 + </section> 778 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 779 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 780 + <table class="caption-top table"> 781 + <thead> 782 + <tr class="header"> 783 + <th>Name</th> 784 + <th>Type</th> 785 + <th>Description</th> 786 + </tr> 787 + </thead> 788 + <tbody> 789 + <tr class="odd"> 790 + <td></td> 791 + <td><a href="`atdata._protocols.IndexEntry`">IndexEntry</a></td> 792 + <td>IndexEntry for the inserted dataset.</td> 793 + </tr> 794 + </tbody> 795 + </table> 796 + </section> 525 797 </section> 526 798 <section id="atdata.AbstractIndex.list_datasets" class="level3"> 527 799 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.list_datasets">list_datasets</h3> 528 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.list_datasets()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 800 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.list_datasets()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 529 801 <p>Get all dataset entries as a materialized list.</p> 530 - <p>Returns: List of IndexEntry for each dataset.</p> 802 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 803 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 804 + <table class="caption-top table"> 805 + <thead> 806 + <tr class="header"> 807 + <th>Name</th> 808 + <th>Type</th> 809 + <th>Description</th> 810 + </tr> 811 + </thead> 812 + <tbody> 813 + <tr class="odd"> 814 + <td></td> 815 + <td><a href="`list`">list</a>[<a href="`atdata._protocols.IndexEntry`">IndexEntry</a>]</td> 816 + <td>List of IndexEntry for each dataset.</td> 817 + </tr> 818 + </tbody> 819 + </table> 820 + </section> 531 821 </section> 532 822 <section id="atdata.AbstractIndex.list_schemas" class="level3"> 533 823 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.list_schemas">list_schemas</h3> 534 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.list_schemas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 824 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.list_schemas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 535 825 <p>Get all schema records as a materialized list.</p> 536 - <p>Returns: List of schema records as dictionaries.</p> 826 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 827 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 828 + <table class="caption-top table"> 829 + <thead> 830 + <tr class="header"> 831 + <th>Name</th> 832 + <th>Type</th> 833 + <th>Description</th> 834 + </tr> 835 + </thead> 836 + <tbody> 837 + <tr class="odd"> 838 + <td></td> 839 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 840 + <td>List of schema records as dictionaries.</td> 841 + </tr> 842 + </tbody> 843 + </table> 844 + </section> 537 845 </section> 538 846 <section id="atdata.AbstractIndex.publish_schema" class="level3"> 539 847 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.publish_schema">publish_schema</h3> 540 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.publish_schema(sample_type, <span class="op">*</span>, version<span class="op">=</span><span class="st">'1.0.0'</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 848 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.publish_schema(sample_type, <span class="op">*</span>, version<span class="op">=</span><span class="st">'1.0.0'</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 541 849 <p>Publish a schema for a sample type.</p> 542 - <p>Args: sample_type: A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>). version: Semantic version string for the schema. **kwargs: Additional backend-specific options.</p> 543 - <p>Returns: Schema reference string: - Local: ‘local://schemas/{module.Class}<span class="citation" data-cites="version">@version</span>’ - Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’</p> 850 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 851 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 852 + <table class="caption-top table"> 853 + <thead> 854 + <tr class="header"> 855 + <th>Name</th> 856 + <th>Type</th> 857 + <th>Description</th> 858 + <th>Default</th> 859 + </tr> 860 + </thead> 861 + <tbody> 862 + <tr class="odd"> 863 + <td>sample_type</td> 864 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 865 + <td>A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>).</td> 866 + <td><em>required</em></td> 867 + </tr> 868 + <tr class="even"> 869 + <td>version</td> 870 + <td><a href="`str`">str</a></td> 871 + <td>Semantic version string for the schema.</td> 872 + <td><code>'1.0.0'</code></td> 873 + </tr> 874 + <tr class="odd"> 875 + <td>**kwargs</td> 876 + <td></td> 877 + <td>Additional backend-specific options.</td> 878 + <td><code>{}</code></td> 879 + </tr> 880 + </tbody> 881 + </table> 882 + </section> 883 + <section id="returns-6" class="level4 doc-section doc-section-returns"> 884 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-6">Returns</h4> 885 + <table class="caption-top table"> 886 + <thead> 887 + <tr class="header"> 888 + <th>Name</th> 889 + <th>Type</th> 890 + <th>Description</th> 891 + </tr> 892 + </thead> 893 + <tbody> 894 + <tr class="odd"> 895 + <td></td> 896 + <td><a href="`str`">str</a></td> 897 + <td>Schema reference string:</td> 898 + </tr> 899 + <tr class="even"> 900 + <td></td> 901 + <td><a href="`str`">str</a></td> 902 + <td>- Local: ‘local://schemas/{module.Class}<span class="citation" data-cites="version">@version</span>’</td> 903 + </tr> 904 + <tr class="odd"> 905 + <td></td> 906 + <td><a href="`str`">str</a></td> 907 + <td>- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’</td> 908 + </tr> 909 + </tbody> 910 + </table> 544 911 545 912 913 + </section> 546 914 </section> 547 915 </section> 548 916 </section>

+72 -5

docs/api/AtUri.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.AtUri" id="toc-atdata.atmosphere.AtUri" class="nav-link active" data-scroll-target="#atdata.atmosphere.AtUri">AtUri</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> ··· 419 420 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtUri(authority, collection, rkey)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 420 421 <p>Parsed AT Protocol URI.</p> 421 422 <p>AT URIs follow the format: at://<authority>/<collection>/<rkey></rkey></collection></authority></p> 422 - <p>Example: >>> uri = AtUri.parse(“at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz”) >>> uri.authority ‘did:plc:abc123’ >>> uri.collection ‘ac.foundation.dataset.sampleSchema’ >>> uri.rkey ‘xyz’</p> 423 + <section id="example" class="level2 doc-section doc-section-example"> 424 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 425 + <p>::</p> 426 + <pre><code>>>> uri = AtUri.parse("at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz") 427 + >>> uri.authority 428 + 'did:plc:abc123' 429 + >>> uri.collection 430 + 'ac.foundation.dataset.sampleSchema' 431 + >>> uri.rkey 432 + 'xyz'</code></pre> 433 + </section> 423 434 <section id="attributes" class="level2"> 424 435 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 425 436 <table class="caption-top table"> ··· 463 474 </table> 464 475 <section id="atdata.atmosphere.AtUri.parse" class="level3"> 465 476 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtUri.parse">parse</h3> 466 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtUri.parse(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 477 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtUri.parse(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 467 478 <p>Parse an AT URI string into components.</p> 468 - <p>Args: uri: AT URI string in format <code>at://<authority>/<collection>/<rkey></code></p> 469 - <p>Returns: Parsed AtUri instance.</p> 470 - <p>Raises: ValueError: If the URI format is invalid.</p> 479 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 480 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 481 + <table class="caption-top table"> 482 + <thead> 483 + <tr class="header"> 484 + <th>Name</th> 485 + <th>Type</th> 486 + <th>Description</th> 487 + <th>Default</th> 488 + </tr> 489 + </thead> 490 + <tbody> 491 + <tr class="odd"> 492 + <td>uri</td> 493 + <td><a href="`str`">str</a></td> 494 + <td>AT URI string in format <code>at://<authority>/<collection>/<rkey></code></td> 495 + <td><em>required</em></td> 496 + </tr> 497 + </tbody> 498 + </table> 499 + </section> 500 + <section id="returns" class="level4 doc-section doc-section-returns"> 501 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 502 + <table class="caption-top table"> 503 + <thead> 504 + <tr class="header"> 505 + <th>Name</th> 506 + <th>Type</th> 507 + <th>Description</th> 508 + </tr> 509 + </thead> 510 + <tbody> 511 + <tr class="odd"> 512 + <td></td> 513 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 514 + <td>Parsed AtUri instance.</td> 515 + </tr> 516 + </tbody> 517 + </table> 518 + </section> 519 + <section id="raises" class="level4 doc-section doc-section-raises"> 520 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 521 + <table class="caption-top table"> 522 + <thead> 523 + <tr class="header"> 524 + <th>Name</th> 525 + <th>Type</th> 526 + <th>Description</th> 527 + </tr> 528 + </thead> 529 + <tbody> 530 + <tr class="odd"> 531 + <td></td> 532 + <td><a href="`ValueError`">ValueError</a></td> 533 + <td>If the URI format is invalid.</td> 534 + </tr> 535 + </tbody> 536 + </table> 471 537 472 538 539 + </section> 473 540 </section> 474 541 </section> 475 542 </section>

+865 -72

docs/api/AtmosphereClient.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.AtmosphereClient" id="toc-atdata.atmosphere.AtmosphereClient" class="nav-link active" data-scroll-target="#atdata.atmosphere.AtmosphereClient">AtmosphereClient</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 402 + <li><a href="#note" id="toc-note" class="nav-link" data-scroll-target="#note">Note</a></li> 401 403 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 404 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 405 <ul class="collapse"> ··· 432 434 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient(base_url<span class="op">=</span><span class="va">None</span>, <span class="op">*</span>, _client<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 433 435 <p>ATProto client wrapper for atdata operations.</p> 434 436 <p>This class wraps the atproto SDK client and provides higher-level methods for working with atdata records (schemas, datasets, lenses).</p> 435 - <p>Example: >>> client = AtmosphereClient() >>> client.login(“alice.bsky.social”, “app-password”) >>> print(client.did) ‘did:plc:…’</p> 436 - <p>Note: The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.</p> 437 + <section id="example" class="level2 doc-section doc-section-example"> 438 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 439 + <p>::</p> 440 + <pre><code>>>> client = AtmosphereClient() 441 + >>> client.login("alice.bsky.social", "app-password") 442 + >>> print(client.did) 443 + 'did:plc:...'</code></pre> 444 + </section> 445 + <section id="note" class="level2 doc-section doc-section-note"> 446 + <h2 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h2> 447 + <p>The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.</p> 448 + </section> 437 449 <section id="attributes" class="level2"> 438 450 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 439 451 <table class="caption-top table"> ··· 529 541 </table> 530 542 <section id="atdata.atmosphere.AtmosphereClient.create_record" class="level3"> 531 543 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.create_record">create_record</h3> 532 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.create_record(</span> 533 - <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 534 - <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> record,</span> 535 - <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 536 - <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 537 - <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> validate<span class="op">=</span><span class="va">False</span>,</span> 538 - <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 544 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.create_record(</span> 545 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 546 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> record,</span> 547 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 548 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 549 + <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> validate<span class="op">=</span><span class="va">False</span>,</span> 550 + <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 539 551 <p>Create a record in the user’s repository.</p> 540 - <p>Args: collection: The NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’). record: The record data. Must include a ‘$type’ field. rkey: Optional explicit record key. If not provided, a TID is generated. validate: Whether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.</p> 541 - <p>Returns: The AT URI of the created record.</p> 542 - <p>Raises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If record creation fails.</p> 552 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 553 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 554 + <table class="caption-top table"> 555 + <thead> 556 + <tr class="header"> 557 + <th>Name</th> 558 + <th>Type</th> 559 + <th>Description</th> 560 + <th>Default</th> 561 + </tr> 562 + </thead> 563 + <tbody> 564 + <tr class="odd"> 565 + <td>collection</td> 566 + <td><a href="`str`">str</a></td> 567 + <td>The NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).</td> 568 + <td><em>required</em></td> 569 + </tr> 570 + <tr class="even"> 571 + <td>record</td> 572 + <td><a href="`dict`">dict</a></td> 573 + <td>The record data. Must include a ‘$type’ field.</td> 574 + <td><em>required</em></td> 575 + </tr> 576 + <tr class="odd"> 577 + <td>rkey</td> 578 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 579 + <td>Optional explicit record key. If not provided, a TID is generated.</td> 580 + <td><code>None</code></td> 581 + </tr> 582 + <tr class="even"> 583 + <td>validate</td> 584 + <td><a href="`bool`">bool</a></td> 585 + <td>Whether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.</td> 586 + <td><code>False</code></td> 587 + </tr> 588 + </tbody> 589 + </table> 590 + </section> 591 + <section id="returns" class="level4 doc-section doc-section-returns"> 592 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 593 + <table class="caption-top table"> 594 + <thead> 595 + <tr class="header"> 596 + <th>Name</th> 597 + <th>Type</th> 598 + <th>Description</th> 599 + </tr> 600 + </thead> 601 + <tbody> 602 + <tr class="odd"> 603 + <td></td> 604 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 605 + <td>The AT URI of the created record.</td> 606 + </tr> 607 + </tbody> 608 + </table> 609 + </section> 610 + <section id="raises" class="level4 doc-section doc-section-raises"> 611 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 612 + <table class="caption-top table"> 613 + <thead> 614 + <tr class="header"> 615 + <th>Name</th> 616 + <th>Type</th> 617 + <th>Description</th> 618 + </tr> 619 + </thead> 620 + <tbody> 621 + <tr class="odd"> 622 + <td></td> 623 + <td><a href="`ValueError`">ValueError</a></td> 624 + <td>If not authenticated.</td> 625 + </tr> 626 + <tr class="even"> 627 + <td></td> 628 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 629 + <td>If record creation fails.</td> 630 + </tr> 631 + </tbody> 632 + </table> 633 + </section> 543 634 </section> 544 635 <section id="atdata.atmosphere.AtmosphereClient.delete_record" class="level3"> 545 636 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.delete_record">delete_record</h3> 546 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.delete_record(uri, <span class="op">*</span>, swap_commit<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 637 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.delete_record(uri, <span class="op">*</span>, swap_commit<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 547 638 <p>Delete a record.</p> 548 - <p>Args: uri: The AT URI of the record to delete. swap_commit: Optional CID for compare-and-swap delete.</p> 549 - <p>Raises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If deletion fails.</p> 639 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 640 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 641 + <table class="caption-top table"> 642 + <thead> 643 + <tr class="header"> 644 + <th>Name</th> 645 + <th>Type</th> 646 + <th>Description</th> 647 + <th>Default</th> 648 + </tr> 649 + </thead> 650 + <tbody> 651 + <tr class="odd"> 652 + <td>uri</td> 653 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 654 + <td>The AT URI of the record to delete.</td> 655 + <td><em>required</em></td> 656 + </tr> 657 + <tr class="even"> 658 + <td>swap_commit</td> 659 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 660 + <td>Optional CID for compare-and-swap delete.</td> 661 + <td><code>None</code></td> 662 + </tr> 663 + </tbody> 664 + </table> 665 + </section> 666 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 667 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 668 + <table class="caption-top table"> 669 + <thead> 670 + <tr class="header"> 671 + <th>Name</th> 672 + <th>Type</th> 673 + <th>Description</th> 674 + </tr> 675 + </thead> 676 + <tbody> 677 + <tr class="odd"> 678 + <td></td> 679 + <td><a href="`ValueError`">ValueError</a></td> 680 + <td>If not authenticated.</td> 681 + </tr> 682 + <tr class="even"> 683 + <td></td> 684 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 685 + <td>If deletion fails.</td> 686 + </tr> 687 + </tbody> 688 + </table> 689 + </section> 550 690 </section> 551 691 <section id="atdata.atmosphere.AtmosphereClient.export_session" class="level3"> 552 692 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.export_session">export_session</h3> 553 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.export_session()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 693 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.export_session()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 554 694 <p>Export the current session for later reuse.</p> 555 - <p>Returns: Session string that can be passed to <code>login_with_session()</code>.</p> 556 - <p>Raises: ValueError: If not authenticated.</p> 695 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 696 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 697 + <table class="caption-top table"> 698 + <thead> 699 + <tr class="header"> 700 + <th>Name</th> 701 + <th>Type</th> 702 + <th>Description</th> 703 + </tr> 704 + </thead> 705 + <tbody> 706 + <tr class="odd"> 707 + <td></td> 708 + <td><a href="`str`">str</a></td> 709 + <td>Session string that can be passed to <code>login_with_session()</code>.</td> 710 + </tr> 711 + </tbody> 712 + </table> 713 + </section> 714 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 715 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 716 + <table class="caption-top table"> 717 + <thead> 718 + <tr class="header"> 719 + <th>Name</th> 720 + <th>Type</th> 721 + <th>Description</th> 722 + </tr> 723 + </thead> 724 + <tbody> 725 + <tr class="odd"> 726 + <td></td> 727 + <td><a href="`ValueError`">ValueError</a></td> 728 + <td>If not authenticated.</td> 729 + </tr> 730 + </tbody> 731 + </table> 732 + </section> 557 733 </section> 558 734 <section id="atdata.atmosphere.AtmosphereClient.get_blob" class="level3"> 559 735 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.get_blob">get_blob</h3> 560 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_blob(did, cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 736 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_blob(did, cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 561 737 <p>Download a blob from a PDS.</p> 562 738 <p>This resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.</p> 563 - <p>Args: did: The DID of the repository containing the blob. cid: The CID of the blob.</p> 564 - <p>Returns: The blob data as bytes.</p> 565 - <p>Raises: ValueError: If PDS endpoint cannot be resolved. requests.HTTPError: If blob fetch fails.</p> 739 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 740 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 741 + <table class="caption-top table"> 742 + <thead> 743 + <tr class="header"> 744 + <th>Name</th> 745 + <th>Type</th> 746 + <th>Description</th> 747 + <th>Default</th> 748 + </tr> 749 + </thead> 750 + <tbody> 751 + <tr class="odd"> 752 + <td>did</td> 753 + <td><a href="`str`">str</a></td> 754 + <td>The DID of the repository containing the blob.</td> 755 + <td><em>required</em></td> 756 + </tr> 757 + <tr class="even"> 758 + <td>cid</td> 759 + <td><a href="`str`">str</a></td> 760 + <td>The CID of the blob.</td> 761 + <td><em>required</em></td> 762 + </tr> 763 + </tbody> 764 + </table> 765 + </section> 766 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 767 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 768 + <table class="caption-top table"> 769 + <thead> 770 + <tr class="header"> 771 + <th>Name</th> 772 + <th>Type</th> 773 + <th>Description</th> 774 + </tr> 775 + </thead> 776 + <tbody> 777 + <tr class="odd"> 778 + <td></td> 779 + <td><a href="`bytes`">bytes</a></td> 780 + <td>The blob data as bytes.</td> 781 + </tr> 782 + </tbody> 783 + </table> 784 + </section> 785 + <section id="raises-3" class="level4 doc-section doc-section-raises"> 786 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-3">Raises</h4> 787 + <table class="caption-top table"> 788 + <thead> 789 + <tr class="header"> 790 + <th>Name</th> 791 + <th>Type</th> 792 + <th>Description</th> 793 + </tr> 794 + </thead> 795 + <tbody> 796 + <tr class="odd"> 797 + <td></td> 798 + <td><a href="`ValueError`">ValueError</a></td> 799 + <td>If PDS endpoint cannot be resolved.</td> 800 + </tr> 801 + <tr class="even"> 802 + <td></td> 803 + <td><a href="`requests`">requests</a>.<a href="`requests.HTTPError`">HTTPError</a></td> 804 + <td>If blob fetch fails.</td> 805 + </tr> 806 + </tbody> 807 + </table> 808 + </section> 566 809 </section> 567 810 <section id="atdata.atmosphere.AtmosphereClient.get_blob_url" class="level3"> 568 811 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.get_blob_url">get_blob_url</h3> 569 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_blob_url(did, cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 812 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_blob_url(did, cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 570 813 <p>Get the direct URL for fetching a blob.</p> 571 814 <p>This is useful for passing to WebDataset or other HTTP clients.</p> 572 - <p>Args: did: The DID of the repository containing the blob. cid: The CID of the blob.</p> 573 - <p>Returns: The full URL for fetching the blob.</p> 574 - <p>Raises: ValueError: If PDS endpoint cannot be resolved.</p> 815 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 816 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 817 + <table class="caption-top table"> 818 + <thead> 819 + <tr class="header"> 820 + <th>Name</th> 821 + <th>Type</th> 822 + <th>Description</th> 823 + <th>Default</th> 824 + </tr> 825 + </thead> 826 + <tbody> 827 + <tr class="odd"> 828 + <td>did</td> 829 + <td><a href="`str`">str</a></td> 830 + <td>The DID of the repository containing the blob.</td> 831 + <td><em>required</em></td> 832 + </tr> 833 + <tr class="even"> 834 + <td>cid</td> 835 + <td><a href="`str`">str</a></td> 836 + <td>The CID of the blob.</td> 837 + <td><em>required</em></td> 838 + </tr> 839 + </tbody> 840 + </table> 841 + </section> 842 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 843 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 844 + <table class="caption-top table"> 845 + <thead> 846 + <tr class="header"> 847 + <th>Name</th> 848 + <th>Type</th> 849 + <th>Description</th> 850 + </tr> 851 + </thead> 852 + <tbody> 853 + <tr class="odd"> 854 + <td></td> 855 + <td><a href="`str`">str</a></td> 856 + <td>The full URL for fetching the blob.</td> 857 + </tr> 858 + </tbody> 859 + </table> 860 + </section> 861 + <section id="raises-4" class="level4 doc-section doc-section-raises"> 862 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-4">Raises</h4> 863 + <table class="caption-top table"> 864 + <thead> 865 + <tr class="header"> 866 + <th>Name</th> 867 + <th>Type</th> 868 + <th>Description</th> 869 + </tr> 870 + </thead> 871 + <tbody> 872 + <tr class="odd"> 873 + <td></td> 874 + <td><a href="`ValueError`">ValueError</a></td> 875 + <td>If PDS endpoint cannot be resolved.</td> 876 + </tr> 877 + </tbody> 878 + </table> 879 + </section> 575 880 </section> 576 881 <section id="atdata.atmosphere.AtmosphereClient.get_record" class="level3"> 577 882 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.get_record">get_record</h3> 578 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_record(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 883 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.get_record(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 579 884 <p>Fetch a record by AT URI.</p> 580 - <p>Args: uri: The AT URI of the record.</p> 581 - <p>Returns: The record data as a dictionary.</p> 582 - <p>Raises: atproto.exceptions.AtProtocolError: If record not found.</p> 885 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 886 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 887 + <table class="caption-top table"> 888 + <thead> 889 + <tr class="header"> 890 + <th>Name</th> 891 + <th>Type</th> 892 + <th>Description</th> 893 + <th>Default</th> 894 + </tr> 895 + </thead> 896 + <tbody> 897 + <tr class="odd"> 898 + <td>uri</td> 899 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 900 + <td>The AT URI of the record.</td> 901 + <td><em>required</em></td> 902 + </tr> 903 + </tbody> 904 + </table> 905 + </section> 906 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 907 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 908 + <table class="caption-top table"> 909 + <thead> 910 + <tr class="header"> 911 + <th>Name</th> 912 + <th>Type</th> 913 + <th>Description</th> 914 + </tr> 915 + </thead> 916 + <tbody> 917 + <tr class="odd"> 918 + <td></td> 919 + <td><a href="`dict`">dict</a></td> 920 + <td>The record data as a dictionary.</td> 921 + </tr> 922 + </tbody> 923 + </table> 924 + </section> 925 + <section id="raises-5" class="level4 doc-section doc-section-raises"> 926 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-5">Raises</h4> 927 + <table class="caption-top table"> 928 + <thead> 929 + <tr class="header"> 930 + <th>Name</th> 931 + <th>Type</th> 932 + <th>Description</th> 933 + </tr> 934 + </thead> 935 + <tbody> 936 + <tr class="odd"> 937 + <td></td> 938 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 939 + <td>If record not found.</td> 940 + </tr> 941 + </tbody> 942 + </table> 943 + </section> 583 944 </section> 584 945 <section id="atdata.atmosphere.AtmosphereClient.list_datasets" class="level3"> 585 946 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.list_datasets">list_datasets</h3> 586 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_datasets(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 947 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_datasets(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 587 948 <p>List dataset records.</p> 588 - <p>Args: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.</p> 589 - <p>Returns: List of dataset records.</p> 949 + <section id="parameters-5" class="level4 doc-section doc-section-parameters"> 950 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4> 951 + <table class="caption-top table"> 952 + <thead> 953 + <tr class="header"> 954 + <th>Name</th> 955 + <th>Type</th> 956 + <th>Description</th> 957 + <th>Default</th> 958 + </tr> 959 + </thead> 960 + <tbody> 961 + <tr class="odd"> 962 + <td>repo</td> 963 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 964 + <td>The DID to query. Defaults to authenticated user.</td> 965 + <td><code>None</code></td> 966 + </tr> 967 + <tr class="even"> 968 + <td>limit</td> 969 + <td><a href="`int`">int</a></td> 970 + <td>Maximum number to return.</td> 971 + <td><code>100</code></td> 972 + </tr> 973 + </tbody> 974 + </table> 975 + </section> 976 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 977 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 978 + <table class="caption-top table"> 979 + <thead> 980 + <tr class="header"> 981 + <th>Name</th> 982 + <th>Type</th> 983 + <th>Description</th> 984 + </tr> 985 + </thead> 986 + <tbody> 987 + <tr class="odd"> 988 + <td></td> 989 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 990 + <td>List of dataset records.</td> 991 + </tr> 992 + </tbody> 993 + </table> 994 + </section> 590 995 </section> 591 996 <section id="atdata.atmosphere.AtmosphereClient.list_lenses" class="level3"> 592 997 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.list_lenses">list_lenses</h3> 593 - <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_lenses(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 998 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_lenses(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 594 999 <p>List lens records.</p> 595 - <p>Args: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.</p> 596 - <p>Returns: List of lens records.</p> 1000 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 1001 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 1002 + <table class="caption-top table"> 1003 + <thead> 1004 + <tr class="header"> 1005 + <th>Name</th> 1006 + <th>Type</th> 1007 + <th>Description</th> 1008 + <th>Default</th> 1009 + </tr> 1010 + </thead> 1011 + <tbody> 1012 + <tr class="odd"> 1013 + <td>repo</td> 1014 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1015 + <td>The DID to query. Defaults to authenticated user.</td> 1016 + <td><code>None</code></td> 1017 + </tr> 1018 + <tr class="even"> 1019 + <td>limit</td> 1020 + <td><a href="`int`">int</a></td> 1021 + <td>Maximum number to return.</td> 1022 + <td><code>100</code></td> 1023 + </tr> 1024 + </tbody> 1025 + </table> 1026 + </section> 1027 + <section id="returns-6" class="level4 doc-section doc-section-returns"> 1028 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-6">Returns</h4> 1029 + <table class="caption-top table"> 1030 + <thead> 1031 + <tr class="header"> 1032 + <th>Name</th> 1033 + <th>Type</th> 1034 + <th>Description</th> 1035 + </tr> 1036 + </thead> 1037 + <tbody> 1038 + <tr class="odd"> 1039 + <td></td> 1040 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 1041 + <td>List of lens records.</td> 1042 + </tr> 1043 + </tbody> 1044 + </table> 1045 + </section> 597 1046 </section> 598 1047 <section id="atdata.atmosphere.AtmosphereClient.list_records" class="level3"> 599 1048 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.list_records">list_records</h3> 600 - <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_records(</span> 601 - <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 602 - <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 603 - <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> repo<span class="op">=</span><span class="va">None</span>,</span> 604 - <span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> limit<span class="op">=</span><span class="dv">100</span>,</span> 605 - <span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> cursor<span class="op">=</span><span class="va">None</span>,</span> 606 - <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1049 + <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_records(</span> 1050 + <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 1051 + <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 1052 + <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> repo<span class="op">=</span><span class="va">None</span>,</span> 1053 + <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> limit<span class="op">=</span><span class="dv">100</span>,</span> 1054 + <span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> cursor<span class="op">=</span><span class="va">None</span>,</span> 1055 + <span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 607 1056 <p>List records in a collection.</p> 608 - <p>Args: collection: The NSID of the record collection. repo: The DID of the repository to query. Defaults to the authenticated user’s repository. limit: Maximum number of records to return (default 100). cursor: Pagination cursor from a previous call.</p> 609 - <p>Returns: A tuple of (records, next_cursor). The cursor is None if there are no more records.</p> 610 - <p>Raises: ValueError: If repo is None and not authenticated.</p> 1057 + <section id="parameters-7" class="level4 doc-section doc-section-parameters"> 1058 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-7">Parameters</h4> 1059 + <table class="caption-top table"> 1060 + <thead> 1061 + <tr class="header"> 1062 + <th>Name</th> 1063 + <th>Type</th> 1064 + <th>Description</th> 1065 + <th>Default</th> 1066 + </tr> 1067 + </thead> 1068 + <tbody> 1069 + <tr class="odd"> 1070 + <td>collection</td> 1071 + <td><a href="`str`">str</a></td> 1072 + <td>The NSID of the record collection.</td> 1073 + <td><em>required</em></td> 1074 + </tr> 1075 + <tr class="even"> 1076 + <td>repo</td> 1077 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1078 + <td>The DID of the repository to query. Defaults to the authenticated user’s repository.</td> 1079 + <td><code>None</code></td> 1080 + </tr> 1081 + <tr class="odd"> 1082 + <td>limit</td> 1083 + <td><a href="`int`">int</a></td> 1084 + <td>Maximum number of records to return (default 100).</td> 1085 + <td><code>100</code></td> 1086 + </tr> 1087 + <tr class="even"> 1088 + <td>cursor</td> 1089 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1090 + <td>Pagination cursor from a previous call.</td> 1091 + <td><code>None</code></td> 1092 + </tr> 1093 + </tbody> 1094 + </table> 1095 + </section> 1096 + <section id="returns-7" class="level4 doc-section doc-section-returns"> 1097 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-7">Returns</h4> 1098 + <table class="caption-top table"> 1099 + <thead> 1100 + <tr class="header"> 1101 + <th>Name</th> 1102 + <th>Type</th> 1103 + <th>Description</th> 1104 + </tr> 1105 + </thead> 1106 + <tbody> 1107 + <tr class="odd"> 1108 + <td></td> 1109 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 1110 + <td>A tuple of (records, next_cursor). The cursor is None if there</td> 1111 + </tr> 1112 + <tr class="even"> 1113 + <td></td> 1114 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1115 + <td>are no more records.</td> 1116 + </tr> 1117 + </tbody> 1118 + </table> 1119 + </section> 1120 + <section id="raises-6" class="level4 doc-section doc-section-raises"> 1121 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-6">Raises</h4> 1122 + <table class="caption-top table"> 1123 + <thead> 1124 + <tr class="header"> 1125 + <th>Name</th> 1126 + <th>Type</th> 1127 + <th>Description</th> 1128 + </tr> 1129 + </thead> 1130 + <tbody> 1131 + <tr class="odd"> 1132 + <td></td> 1133 + <td><a href="`ValueError`">ValueError</a></td> 1134 + <td>If repo is None and not authenticated.</td> 1135 + </tr> 1136 + </tbody> 1137 + </table> 1138 + </section> 611 1139 </section> 612 1140 <section id="atdata.atmosphere.AtmosphereClient.list_schemas" class="level3"> 613 1141 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.list_schemas">list_schemas</h3> 614 - <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_schemas(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1142 + <div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.list_schemas(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 615 1143 <p>List schema records.</p> 616 - <p>Args: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.</p> 617 - <p>Returns: List of schema records.</p> 1144 + <section id="parameters-8" class="level4 doc-section doc-section-parameters"> 1145 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-8">Parameters</h4> 1146 + <table class="caption-top table"> 1147 + <thead> 1148 + <tr class="header"> 1149 + <th>Name</th> 1150 + <th>Type</th> 1151 + <th>Description</th> 1152 + <th>Default</th> 1153 + </tr> 1154 + </thead> 1155 + <tbody> 1156 + <tr class="odd"> 1157 + <td>repo</td> 1158 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1159 + <td>The DID to query. Defaults to authenticated user.</td> 1160 + <td><code>None</code></td> 1161 + </tr> 1162 + <tr class="even"> 1163 + <td>limit</td> 1164 + <td><a href="`int`">int</a></td> 1165 + <td>Maximum number to return.</td> 1166 + <td><code>100</code></td> 1167 + </tr> 1168 + </tbody> 1169 + </table> 1170 + </section> 1171 + <section id="returns-8" class="level4 doc-section doc-section-returns"> 1172 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-8">Returns</h4> 1173 + <table class="caption-top table"> 1174 + <thead> 1175 + <tr class="header"> 1176 + <th>Name</th> 1177 + <th>Type</th> 1178 + <th>Description</th> 1179 + </tr> 1180 + </thead> 1181 + <tbody> 1182 + <tr class="odd"> 1183 + <td></td> 1184 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 1185 + <td>List of schema records.</td> 1186 + </tr> 1187 + </tbody> 1188 + </table> 1189 + </section> 618 1190 </section> 619 1191 <section id="atdata.atmosphere.AtmosphereClient.login" class="level3"> 620 1192 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.login">login</h3> 621 - <div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.login(handle, password)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1193 + <div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.login(handle, password)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 622 1194 <p>Authenticate with the ATProto PDS.</p> 623 - <p>Args: handle: Your Bluesky handle (e.g., ‘alice.bsky.social’). password: App-specific password (not your main password).</p> 624 - <p>Raises: atproto.exceptions.AtProtocolError: If authentication fails.</p> 1195 + <section id="parameters-9" class="level4 doc-section doc-section-parameters"> 1196 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-9">Parameters</h4> 1197 + <table class="caption-top table"> 1198 + <thead> 1199 + <tr class="header"> 1200 + <th>Name</th> 1201 + <th>Type</th> 1202 + <th>Description</th> 1203 + <th>Default</th> 1204 + </tr> 1205 + </thead> 1206 + <tbody> 1207 + <tr class="odd"> 1208 + <td>handle</td> 1209 + <td><a href="`str`">str</a></td> 1210 + <td>Your Bluesky handle (e.g., ‘alice.bsky.social’).</td> 1211 + <td><em>required</em></td> 1212 + </tr> 1213 + <tr class="even"> 1214 + <td>password</td> 1215 + <td><a href="`str`">str</a></td> 1216 + <td>App-specific password (not your main password).</td> 1217 + <td><em>required</em></td> 1218 + </tr> 1219 + </tbody> 1220 + </table> 1221 + </section> 1222 + <section id="raises-7" class="level4 doc-section doc-section-raises"> 1223 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-7">Raises</h4> 1224 + <table class="caption-top table"> 1225 + <thead> 1226 + <tr class="header"> 1227 + <th>Name</th> 1228 + <th>Type</th> 1229 + <th>Description</th> 1230 + </tr> 1231 + </thead> 1232 + <tbody> 1233 + <tr class="odd"> 1234 + <td></td> 1235 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 1236 + <td>If authentication fails.</td> 1237 + </tr> 1238 + </tbody> 1239 + </table> 1240 + </section> 625 1241 </section> 626 1242 <section id="atdata.atmosphere.AtmosphereClient.login_with_session" class="level3"> 627 1243 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.login_with_session">login_with_session</h3> 628 - <div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.login_with_session(session_string)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1244 + <div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.login_with_session(session_string)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 629 1245 <p>Authenticate using an exported session string.</p> 630 1246 <p>This allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.</p> 631 - <p>Args: session_string: Session string from <code>export_session()</code>.</p> 1247 + <section id="parameters-10" class="level4 doc-section doc-section-parameters"> 1248 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-10">Parameters</h4> 1249 + <table class="caption-top table"> 1250 + <thead> 1251 + <tr class="header"> 1252 + <th>Name</th> 1253 + <th>Type</th> 1254 + <th>Description</th> 1255 + <th>Default</th> 1256 + </tr> 1257 + </thead> 1258 + <tbody> 1259 + <tr class="odd"> 1260 + <td>session_string</td> 1261 + <td><a href="`str`">str</a></td> 1262 + <td>Session string from <code>export_session()</code>.</td> 1263 + <td><em>required</em></td> 1264 + </tr> 1265 + </tbody> 1266 + </table> 1267 + </section> 632 1268 </section> 633 1269 <section id="atdata.atmosphere.AtmosphereClient.put_record" class="level3"> 634 1270 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.put_record">put_record</h3> 635 - <div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.put_record(</span> 636 - <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 637 - <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> rkey,</span> 638 - <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> record,</span> 639 - <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 640 - <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> validate<span class="op">=</span><span class="va">False</span>,</span> 641 - <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a> swap_commit<span class="op">=</span><span class="va">None</span>,</span> 642 - <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1271 + <div class="sourceCode" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.put_record(</span> 1272 + <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> collection,</span> 1273 + <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> rkey,</span> 1274 + <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> record,</span> 1275 + <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 1276 + <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> validate<span class="op">=</span><span class="va">False</span>,</span> 1277 + <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> swap_commit<span class="op">=</span><span class="va">None</span>,</span> 1278 + <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 643 1279 <p>Create or update a record at a specific key.</p> 644 - <p>Args: collection: The NSID of the record collection. rkey: The record key. record: The record data. Must include a ‘$type’ field. validate: Whether to validate against the Lexicon schema. swap_commit: Optional CID for compare-and-swap update.</p> 645 - <p>Returns: The AT URI of the record.</p> 646 - <p>Raises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If operation fails.</p> 1280 + <section id="parameters-11" class="level4 doc-section doc-section-parameters"> 1281 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-11">Parameters</h4> 1282 + <table class="caption-top table"> 1283 + <thead> 1284 + <tr class="header"> 1285 + <th>Name</th> 1286 + <th>Type</th> 1287 + <th>Description</th> 1288 + <th>Default</th> 1289 + </tr> 1290 + </thead> 1291 + <tbody> 1292 + <tr class="odd"> 1293 + <td>collection</td> 1294 + <td><a href="`str`">str</a></td> 1295 + <td>The NSID of the record collection.</td> 1296 + <td><em>required</em></td> 1297 + </tr> 1298 + <tr class="even"> 1299 + <td>rkey</td> 1300 + <td><a href="`str`">str</a></td> 1301 + <td>The record key.</td> 1302 + <td><em>required</em></td> 1303 + </tr> 1304 + <tr class="odd"> 1305 + <td>record</td> 1306 + <td><a href="`dict`">dict</a></td> 1307 + <td>The record data. Must include a ‘$type’ field.</td> 1308 + <td><em>required</em></td> 1309 + </tr> 1310 + <tr class="even"> 1311 + <td>validate</td> 1312 + <td><a href="`bool`">bool</a></td> 1313 + <td>Whether to validate against the Lexicon schema.</td> 1314 + <td><code>False</code></td> 1315 + </tr> 1316 + <tr class="odd"> 1317 + <td>swap_commit</td> 1318 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 1319 + <td>Optional CID for compare-and-swap update.</td> 1320 + <td><code>None</code></td> 1321 + </tr> 1322 + </tbody> 1323 + </table> 1324 + </section> 1325 + <section id="returns-9" class="level4 doc-section doc-section-returns"> 1326 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-9">Returns</h4> 1327 + <table class="caption-top table"> 1328 + <thead> 1329 + <tr class="header"> 1330 + <th>Name</th> 1331 + <th>Type</th> 1332 + <th>Description</th> 1333 + </tr> 1334 + </thead> 1335 + <tbody> 1336 + <tr class="odd"> 1337 + <td></td> 1338 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 1339 + <td>The AT URI of the record.</td> 1340 + </tr> 1341 + </tbody> 1342 + </table> 1343 + </section> 1344 + <section id="raises-8" class="level4 doc-section doc-section-raises"> 1345 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-8">Raises</h4> 1346 + <table class="caption-top table"> 1347 + <thead> 1348 + <tr class="header"> 1349 + <th>Name</th> 1350 + <th>Type</th> 1351 + <th>Description</th> 1352 + </tr> 1353 + </thead> 1354 + <tbody> 1355 + <tr class="odd"> 1356 + <td></td> 1357 + <td><a href="`ValueError`">ValueError</a></td> 1358 + <td>If not authenticated.</td> 1359 + </tr> 1360 + <tr class="even"> 1361 + <td></td> 1362 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 1363 + <td>If operation fails.</td> 1364 + </tr> 1365 + </tbody> 1366 + </table> 1367 + </section> 647 1368 </section> 648 1369 <section id="atdata.atmosphere.AtmosphereClient.upload_blob" class="level3"> 649 1370 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereClient.upload_blob">upload_blob</h3> 650 - <div class="sourceCode" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.upload_blob(</span> 651 - <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> data,</span> 652 - <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> mime_type<span class="op">=</span><span class="st">'application/octet-stream'</span>,</span> 653 - <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1371 + <div class="sourceCode" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereClient.upload_blob(</span> 1372 + <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a> data,</span> 1373 + <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a> mime_type<span class="op">=</span><span class="st">'application/octet-stream'</span>,</span> 1374 + <span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 654 1375 <p>Upload binary data as a blob to the PDS.</p> 655 - <p>Args: data: Binary data to upload. mime_type: MIME type of the data (for reference, not enforced by PDS).</p> 656 - <p>Returns: A blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’. This can be embedded directly in record fields.</p> 657 - <p>Raises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If upload fails.</p> 1376 + <section id="parameters-12" class="level4 doc-section doc-section-parameters"> 1377 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-12">Parameters</h4> 1378 + <table class="caption-top table"> 1379 + <thead> 1380 + <tr class="header"> 1381 + <th>Name</th> 1382 + <th>Type</th> 1383 + <th>Description</th> 1384 + <th>Default</th> 1385 + </tr> 1386 + </thead> 1387 + <tbody> 1388 + <tr class="odd"> 1389 + <td>data</td> 1390 + <td><a href="`bytes`">bytes</a></td> 1391 + <td>Binary data to upload.</td> 1392 + <td><em>required</em></td> 1393 + </tr> 1394 + <tr class="even"> 1395 + <td>mime_type</td> 1396 + <td><a href="`str`">str</a></td> 1397 + <td>MIME type of the data (for reference, not enforced by PDS).</td> 1398 + <td><code>'application/octet-stream'</code></td> 1399 + </tr> 1400 + </tbody> 1401 + </table> 1402 + </section> 1403 + <section id="returns-10" class="level4 doc-section doc-section-returns"> 1404 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-10">Returns</h4> 1405 + <table class="caption-top table"> 1406 + <thead> 1407 + <tr class="header"> 1408 + <th>Name</th> 1409 + <th>Type</th> 1410 + <th>Description</th> 1411 + </tr> 1412 + </thead> 1413 + <tbody> 1414 + <tr class="odd"> 1415 + <td></td> 1416 + <td><a href="`dict`">dict</a></td> 1417 + <td>A blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.</td> 1418 + </tr> 1419 + <tr class="even"> 1420 + <td></td> 1421 + <td><a href="`dict`">dict</a></td> 1422 + <td>This can be embedded directly in record fields.</td> 1423 + </tr> 1424 + </tbody> 1425 + </table> 1426 + </section> 1427 + <section id="raises-9" class="level4 doc-section doc-section-raises"> 1428 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-9">Raises</h4> 1429 + <table class="caption-top table"> 1430 + <thead> 1431 + <tr class="header"> 1432 + <th>Name</th> 1433 + <th>Type</th> 1434 + <th>Description</th> 1435 + </tr> 1436 + </thead> 1437 + <tbody> 1438 + <tr class="odd"> 1439 + <td></td> 1440 + <td><a href="`ValueError`">ValueError</a></td> 1441 + <td>If not authenticated.</td> 1442 + </tr> 1443 + <tr class="even"> 1444 + <td></td> 1445 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 1446 + <td>If upload fails.</td> 1447 + </tr> 1448 + </tbody> 1449 + </table> 658 1450 659 1451 1452 + </section> 660 1453 </section> 661 1454 </section> 662 1455 </section>

+396 -36

docs/api/AtmosphereIndex.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.AtmosphereIndex" id="toc-atdata.atmosphere.AtmosphereIndex" class="nav-link active" data-scroll-target="#atdata.atmosphere.AtmosphereIndex">AtmosphereIndex</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> ··· 425 426 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 426 427 <p>ATProto index implementing AbstractIndex protocol.</p> 427 428 <p>Wraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.</p> 428 - <p>Example: >>> client = AtmosphereClient() >>> client.login(“handle.bsky.social”, “app-password”) >>> >>> index = AtmosphereIndex(client) >>> schema_ref = index.publish_schema(MySample, version=“1.0.0”) >>> entry = index.insert_dataset(dataset, name=“my-data”)</p> 429 + <section id="example" class="level2 doc-section doc-section-example"> 430 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 431 + <p>::</p> 432 + <pre><code>>>> client = AtmosphereClient() 433 + >>> client.login("handle.bsky.social", "app-password") 434 + >>> 435 + >>> index = AtmosphereIndex(client) 436 + >>> schema_ref = index.publish_schema(MySample, version="1.0.0") 437 + >>> entry = index.insert_dataset(dataset, name="my-data")</code></pre> 438 + </section> 429 439 <section id="attributes" class="level2"> 430 440 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 431 441 <table class="caption-top table"> ··· 489 499 </table> 490 500 <section id="atdata.atmosphere.AtmosphereIndex.decode_schema" class="level3"> 491 501 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.decode_schema">decode_schema</h3> 492 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.decode_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 502 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.decode_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 493 503 <p>Reconstruct a Python type from a schema record.</p> 494 - <p>Args: ref: AT URI of the schema record.</p> 495 - <p>Returns: Dynamically generated Packable type.</p> 496 - <p>Raises: ValueError: If schema cannot be decoded.</p> 504 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 505 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 506 + <table class="caption-top table"> 507 + <thead> 508 + <tr class="header"> 509 + <th>Name</th> 510 + <th>Type</th> 511 + <th>Description</th> 512 + <th>Default</th> 513 + </tr> 514 + </thead> 515 + <tbody> 516 + <tr class="odd"> 517 + <td>ref</td> 518 + <td><a href="`str`">str</a></td> 519 + <td>AT URI of the schema record.</td> 520 + <td><em>required</em></td> 521 + </tr> 522 + </tbody> 523 + </table> 524 + </section> 525 + <section id="returns" class="level4 doc-section doc-section-returns"> 526 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 527 + <table class="caption-top table"> 528 + <thead> 529 + <tr class="header"> 530 + <th>Name</th> 531 + <th>Type</th> 532 + <th>Description</th> 533 + </tr> 534 + </thead> 535 + <tbody> 536 + <tr class="odd"> 537 + <td></td> 538 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 539 + <td>Dynamically generated Packable type.</td> 540 + </tr> 541 + </tbody> 542 + </table> 543 + </section> 544 + <section id="raises" class="level4 doc-section doc-section-raises"> 545 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 546 + <table class="caption-top table"> 547 + <thead> 548 + <tr class="header"> 549 + <th>Name</th> 550 + <th>Type</th> 551 + <th>Description</th> 552 + </tr> 553 + </thead> 554 + <tbody> 555 + <tr class="odd"> 556 + <td></td> 557 + <td><a href="`ValueError`">ValueError</a></td> 558 + <td>If schema cannot be decoded.</td> 559 + </tr> 560 + </tbody> 561 + </table> 562 + </section> 497 563 </section> 498 564 <section id="atdata.atmosphere.AtmosphereIndex.get_dataset" class="level3"> 499 565 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.get_dataset">get_dataset</h3> 500 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 566 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 501 567 <p>Get a dataset by AT URI.</p> 502 - <p>Args: ref: AT URI of the dataset record.</p> 503 - <p>Returns: AtmosphereIndexEntry for the dataset.</p> 504 - <p>Raises: ValueError: If record is not a dataset.</p> 568 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 569 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 570 + <table class="caption-top table"> 571 + <thead> 572 + <tr class="header"> 573 + <th>Name</th> 574 + <th>Type</th> 575 + <th>Description</th> 576 + <th>Default</th> 577 + </tr> 578 + </thead> 579 + <tbody> 580 + <tr class="odd"> 581 + <td>ref</td> 582 + <td><a href="`str`">str</a></td> 583 + <td>AT URI of the dataset record.</td> 584 + <td><em>required</em></td> 585 + </tr> 586 + </tbody> 587 + </table> 588 + </section> 589 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 590 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 591 + <table class="caption-top table"> 592 + <thead> 593 + <tr class="header"> 594 + <th>Name</th> 595 + <th>Type</th> 596 + <th>Description</th> 597 + </tr> 598 + </thead> 599 + <tbody> 600 + <tr class="odd"> 601 + <td></td> 602 + <td><a href="`atdata.atmosphere.AtmosphereIndexEntry`">AtmosphereIndexEntry</a></td> 603 + <td>AtmosphereIndexEntry for the dataset.</td> 604 + </tr> 605 + </tbody> 606 + </table> 607 + </section> 608 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 609 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 610 + <table class="caption-top table"> 611 + <thead> 612 + <tr class="header"> 613 + <th>Name</th> 614 + <th>Type</th> 615 + <th>Description</th> 616 + </tr> 617 + </thead> 618 + <tbody> 619 + <tr class="odd"> 620 + <td></td> 621 + <td><a href="`ValueError`">ValueError</a></td> 622 + <td>If record is not a dataset.</td> 623 + </tr> 624 + </tbody> 625 + </table> 626 + </section> 505 627 </section> 506 628 <section id="atdata.atmosphere.AtmosphereIndex.get_schema" class="level3"> 507 629 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.get_schema">get_schema</h3> 508 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 630 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 509 631 <p>Get a schema record by AT URI.</p> 510 - <p>Args: ref: AT URI of the schema record.</p> 511 - <p>Returns: Schema record dictionary.</p> 512 - <p>Raises: ValueError: If record is not a schema.</p> 632 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 633 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 634 + <table class="caption-top table"> 635 + <thead> 636 + <tr class="header"> 637 + <th>Name</th> 638 + <th>Type</th> 639 + <th>Description</th> 640 + <th>Default</th> 641 + </tr> 642 + </thead> 643 + <tbody> 644 + <tr class="odd"> 645 + <td>ref</td> 646 + <td><a href="`str`">str</a></td> 647 + <td>AT URI of the schema record.</td> 648 + <td><em>required</em></td> 649 + </tr> 650 + </tbody> 651 + </table> 652 + </section> 653 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 654 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 655 + <table class="caption-top table"> 656 + <thead> 657 + <tr class="header"> 658 + <th>Name</th> 659 + <th>Type</th> 660 + <th>Description</th> 661 + </tr> 662 + </thead> 663 + <tbody> 664 + <tr class="odd"> 665 + <td></td> 666 + <td><a href="`dict`">dict</a></td> 667 + <td>Schema record dictionary.</td> 668 + </tr> 669 + </tbody> 670 + </table> 671 + </section> 672 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 673 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 674 + <table class="caption-top table"> 675 + <thead> 676 + <tr class="header"> 677 + <th>Name</th> 678 + <th>Type</th> 679 + <th>Description</th> 680 + </tr> 681 + </thead> 682 + <tbody> 683 + <tr class="odd"> 684 + <td></td> 685 + <td><a href="`ValueError`">ValueError</a></td> 686 + <td>If record is not a schema.</td> 687 + </tr> 688 + </tbody> 689 + </table> 690 + </section> 513 691 </section> 514 692 <section id="atdata.atmosphere.AtmosphereIndex.insert_dataset" class="level3"> 515 693 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.insert_dataset">insert_dataset</h3> 516 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.insert_dataset(</span> 517 - <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> ds,</span> 518 - <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 519 - <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> name,</span> 520 - <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span><span class="va">None</span>,</span> 521 - <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span> 522 - <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 694 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.insert_dataset(</span> 695 + <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> ds,</span> 696 + <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 697 + <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> name,</span> 698 + <span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span><span class="va">None</span>,</span> 699 + <span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span> 700 + <span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 523 701 <p>Insert a dataset into ATProto.</p> 524 - <p>Args: ds: The Dataset to publish. name: Human-readable name. schema_ref: Optional schema AT URI. If None, auto-publishes schema. **kwargs: Additional options (description, tags, license).</p> 525 - <p>Returns: AtmosphereIndexEntry for the inserted dataset.</p> 702 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 703 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 704 + <table class="caption-top table"> 705 + <thead> 706 + <tr class="header"> 707 + <th>Name</th> 708 + <th>Type</th> 709 + <th>Description</th> 710 + <th>Default</th> 711 + </tr> 712 + </thead> 713 + <tbody> 714 + <tr class="odd"> 715 + <td>ds</td> 716 + <td><a href="`atdata.dataset.Dataset`">Dataset</a></td> 717 + <td>The Dataset to publish.</td> 718 + <td><em>required</em></td> 719 + </tr> 720 + <tr class="even"> 721 + <td>name</td> 722 + <td><a href="`str`">str</a></td> 723 + <td>Human-readable name.</td> 724 + <td><em>required</em></td> 725 + </tr> 726 + <tr class="odd"> 727 + <td>schema_ref</td> 728 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 729 + <td>Optional schema AT URI. If None, auto-publishes schema.</td> 730 + <td><code>None</code></td> 731 + </tr> 732 + <tr class="even"> 733 + <td>**kwargs</td> 734 + <td></td> 735 + <td>Additional options (description, tags, license).</td> 736 + <td><code>{}</code></td> 737 + </tr> 738 + </tbody> 739 + </table> 740 + </section> 741 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 742 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 743 + <table class="caption-top table"> 744 + <thead> 745 + <tr class="header"> 746 + <th>Name</th> 747 + <th>Type</th> 748 + <th>Description</th> 749 + </tr> 750 + </thead> 751 + <tbody> 752 + <tr class="odd"> 753 + <td></td> 754 + <td><a href="`atdata.atmosphere.AtmosphereIndexEntry`">AtmosphereIndexEntry</a></td> 755 + <td>AtmosphereIndexEntry for the inserted dataset.</td> 756 + </tr> 757 + </tbody> 758 + </table> 759 + </section> 526 760 </section> 527 761 <section id="atdata.atmosphere.AtmosphereIndex.list_datasets" class="level3"> 528 762 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.list_datasets">list_datasets</h3> 529 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.list_datasets(repo<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 763 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.list_datasets(repo<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 530 764 <p>Get all dataset entries as a materialized list (AbstractIndex protocol).</p> 531 - <p>Args: repo: DID of repository. Defaults to authenticated user.</p> 532 - <p>Returns: List of AtmosphereIndexEntry for each dataset.</p> 765 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 766 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 767 + <table class="caption-top table"> 768 + <thead> 769 + <tr class="header"> 770 + <th>Name</th> 771 + <th>Type</th> 772 + <th>Description</th> 773 + <th>Default</th> 774 + </tr> 775 + </thead> 776 + <tbody> 777 + <tr class="odd"> 778 + <td>repo</td> 779 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 780 + <td>DID of repository. Defaults to authenticated user.</td> 781 + <td><code>None</code></td> 782 + </tr> 783 + </tbody> 784 + </table> 785 + </section> 786 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 787 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 788 + <table class="caption-top table"> 789 + <thead> 790 + <tr class="header"> 791 + <th>Name</th> 792 + <th>Type</th> 793 + <th>Description</th> 794 + </tr> 795 + </thead> 796 + <tbody> 797 + <tr class="odd"> 798 + <td></td> 799 + <td><a href="`list`">list</a>[<a href="`atdata.atmosphere.AtmosphereIndexEntry`">AtmosphereIndexEntry</a>]</td> 800 + <td>List of AtmosphereIndexEntry for each dataset.</td> 801 + </tr> 802 + </tbody> 803 + </table> 804 + </section> 533 805 </section> 534 806 <section id="atdata.atmosphere.AtmosphereIndex.list_schemas" class="level3"> 535 807 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.list_schemas">list_schemas</h3> 536 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.list_schemas(repo<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 808 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.list_schemas(repo<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 537 809 <p>Get all schema records as a materialized list (AbstractIndex protocol).</p> 538 - <p>Args: repo: DID of repository. Defaults to authenticated user.</p> 539 - <p>Returns: List of schema records as dictionaries.</p> 810 + <section id="parameters-5" class="level4 doc-section doc-section-parameters"> 811 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4> 812 + <table class="caption-top table"> 813 + <thead> 814 + <tr class="header"> 815 + <th>Name</th> 816 + <th>Type</th> 817 + <th>Description</th> 818 + <th>Default</th> 819 + </tr> 820 + </thead> 821 + <tbody> 822 + <tr class="odd"> 823 + <td>repo</td> 824 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 825 + <td>DID of repository. Defaults to authenticated user.</td> 826 + <td><code>None</code></td> 827 + </tr> 828 + </tbody> 829 + </table> 830 + </section> 831 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 832 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 833 + <table class="caption-top table"> 834 + <thead> 835 + <tr class="header"> 836 + <th>Name</th> 837 + <th>Type</th> 838 + <th>Description</th> 839 + </tr> 840 + </thead> 841 + <tbody> 842 + <tr class="odd"> 843 + <td></td> 844 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 845 + <td>List of schema records as dictionaries.</td> 846 + </tr> 847 + </tbody> 848 + </table> 849 + </section> 540 850 </section> 541 851 <section id="atdata.atmosphere.AtmosphereIndex.publish_schema" class="level3"> 542 852 <h3 class="anchored" data-anchor-id="atdata.atmosphere.AtmosphereIndex.publish_schema">publish_schema</h3> 543 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.publish_schema(</span> 544 - <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> sample_type,</span> 545 - <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 546 - <span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 547 - <span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span> 548 - <span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 853 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex.publish_schema(</span> 854 + <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> sample_type,</span> 855 + <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 856 + <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 857 + <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span> 858 + <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 549 859 <p>Publish a schema to ATProto.</p> 550 - <p>Args: sample_type: A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>). version: Semantic version string. **kwargs: Additional options (description, metadata).</p> 551 - <p>Returns: AT URI of the schema record.</p> 860 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 861 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 862 + <table class="caption-top table"> 863 + <thead> 864 + <tr class="header"> 865 + <th>Name</th> 866 + <th>Type</th> 867 + <th>Description</th> 868 + <th>Default</th> 869 + </tr> 870 + </thead> 871 + <tbody> 872 + <tr class="odd"> 873 + <td>sample_type</td> 874 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 875 + <td>A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>).</td> 876 + <td><em>required</em></td> 877 + </tr> 878 + <tr class="even"> 879 + <td>version</td> 880 + <td><a href="`str`">str</a></td> 881 + <td>Semantic version string.</td> 882 + <td><code>'1.0.0'</code></td> 883 + </tr> 884 + <tr class="odd"> 885 + <td>**kwargs</td> 886 + <td></td> 887 + <td>Additional options (description, metadata).</td> 888 + <td><code>{}</code></td> 889 + </tr> 890 + </tbody> 891 + </table> 892 + </section> 893 + <section id="returns-6" class="level4 doc-section doc-section-returns"> 894 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-6">Returns</h4> 895 + <table class="caption-top table"> 896 + <thead> 897 + <tr class="header"> 898 + <th>Name</th> 899 + <th>Type</th> 900 + <th>Description</th> 901 + </tr> 902 + </thead> 903 + <tbody> 904 + <tr class="odd"> 905 + <td></td> 906 + <td><a href="`str`">str</a></td> 907 + <td>AT URI of the schema record.</td> 908 + </tr> 909 + </tbody> 910 + </table> 552 911 553 912 913 + </section> 554 914 </section> 555 915 </section> 556 916 </section>

+9 -19

docs/api/AtmosphereIndexEntry.html

··· 414 414 <h1>AtmosphereIndexEntry</h1> 415 415 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndexEntry(uri, record)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 416 416 <p>Entry wrapper for ATProto dataset records implementing IndexEntry protocol.</p> 417 - <p>Attributes: _uri: AT URI of the record. _record: Raw record dictionary.</p> 418 - <section id="attributes" class="level2"> 419 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 417 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 418 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 420 419 <table class="caption-top table"> 421 420 <thead> 422 421 <tr class="header"> 423 422 <th>Name</th> 423 + <th>Type</th> 424 424 <th>Description</th> 425 425 </tr> 426 426 </thead> 427 427 <tbody> 428 428 <tr class="odd"> 429 - <td><a href="#atdata.atmosphere.AtmosphereIndexEntry.data_urls">data_urls</a></td> 430 - <td>WebDataset URLs from external storage.</td> 431 - </tr> 432 - <tr class="even"> 433 - <td><a href="#atdata.atmosphere.AtmosphereIndexEntry.metadata">metadata</a></td> 434 - <td>Metadata from the record, if any.</td> 435 - </tr> 436 - <tr class="odd"> 437 - <td><a href="#atdata.atmosphere.AtmosphereIndexEntry.name">name</a></td> 438 - <td>Human-readable dataset name.</td> 429 + <td>_uri</td> 430 + <td></td> 431 + <td>AT URI of the record.</td> 439 432 </tr> 440 433 <tr class="even"> 441 - <td><a href="#atdata.atmosphere.AtmosphereIndexEntry.schema_ref">schema_ref</a></td> 442 - <td>AT URI of the schema record.</td> 443 - </tr> 444 - <tr class="odd"> 445 - <td><a href="#atdata.atmosphere.AtmosphereIndexEntry.uri">uri</a></td> 446 - <td>AT URI of this record.</td> 434 + <td>_record</td> 435 + <td></td> 436 + <td>Raw record dictionary.</td> 447 437 </tr> 448 438 </tbody> 449 439 </table>

+94 -7

docs/api/DataSource.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.DataSource" id="toc-atdata.DataSource" class="nav-link active" data-scroll-target="#atdata.DataSource">DataSource</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> ··· 421 422 <p>Protocol for data sources that provide streams to Dataset.</p> 422 423 <p>A DataSource abstracts over different ways of accessing dataset shards: - URLSource: Standard WebDataset-compatible URLs (http, https, pipe, gs, etc.) - S3Source: S3-compatible storage with explicit credentials - BlobSource: ATProto blob references (future)</p> 423 424 <p>The key method is <code>shards()</code>, which yields (identifier, stream) pairs. These are fed directly to WebDataset’s tar_file_expander, bypassing URL resolution entirely. This enables: - Private S3 repos with credentials - Custom endpoints (Cloudflare R2, MinIO) - ATProto blob streaming - Any other source that can provide file-like objects</p> 424 - <p>Example: >>> source = S3Source( … bucket=“my-bucket”, … keys=[“data-000.tar”, “data-001.tar”], … endpoint=“https://r2.example.com”, … credentials=creds, … ) >>> ds = Dataset<a href="source">MySample</a> >>> for sample in ds.ordered(): … print(sample)</p> 425 + <section id="example" class="level2 doc-section doc-section-example"> 426 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 427 + <p>::</p> 428 + <pre><code>>>> source = S3Source( 429 + ... bucket="my-bucket", 430 + ... keys=["data-000.tar", "data-001.tar"], 431 + ... endpoint="https://r2.example.com", 432 + ... credentials=creds, 433 + ... ) 434 + >>> ds = Dataset[MySample](source) 435 + >>> for sample in ds.ordered(): 436 + ... print(sample)</code></pre> 437 + </section> 425 438 <section id="attributes" class="level2"> 426 439 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 427 440 <table class="caption-top table"> ··· 461 474 </table> 462 475 <section id="atdata.DataSource.list_shards" class="level3"> 463 476 <h3 class="anchored" data-anchor-id="atdata.DataSource.list_shards">list_shards</h3> 464 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>DataSource.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 477 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DataSource.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 465 478 <p>Get list of shard identifiers without opening streams.</p> 466 479 <p>Used for metadata queries like counting shards without actually streaming data. Implementations should return identifiers that match what shards would yield.</p> 467 - <p>Returns: List of shard identifier strings.</p> 480 + <section id="returns" class="level4 doc-section doc-section-returns"> 481 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 482 + <table class="caption-top table"> 483 + <thead> 484 + <tr class="header"> 485 + <th>Name</th> 486 + <th>Type</th> 487 + <th>Description</th> 488 + </tr> 489 + </thead> 490 + <tbody> 491 + <tr class="odd"> 492 + <td></td> 493 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 494 + <td>List of shard identifier strings.</td> 495 + </tr> 496 + </tbody> 497 + </table> 498 + </section> 468 499 </section> 469 500 <section id="atdata.DataSource.open_shard" class="level3"> 470 501 <h3 class="anchored" data-anchor-id="atdata.DataSource.open_shard">open_shard</h3> 471 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DataSource.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 502 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>DataSource.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 472 503 <p>Open a single shard by its identifier.</p> 473 504 <p>This method enables random access to individual shards, which is required for PyTorch DataLoader worker splitting. Each worker opens only its assigned shards rather than iterating all shards.</p> 474 - <p>Args: shard_id: Shard identifier from shard_list.</p> 475 - <p>Returns: File-like stream for reading the shard.</p> 476 - <p>Raises: KeyError: If shard_id is not in shard_list.</p> 505 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 506 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 507 + <table class="caption-top table"> 508 + <thead> 509 + <tr class="header"> 510 + <th>Name</th> 511 + <th>Type</th> 512 + <th>Description</th> 513 + <th>Default</th> 514 + </tr> 515 + </thead> 516 + <tbody> 517 + <tr class="odd"> 518 + <td>shard_id</td> 519 + <td><a href="`str`">str</a></td> 520 + <td>Shard identifier from shard_list.</td> 521 + <td><em>required</em></td> 522 + </tr> 523 + </tbody> 524 + </table> 525 + </section> 526 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 527 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 528 + <table class="caption-top table"> 529 + <thead> 530 + <tr class="header"> 531 + <th>Name</th> 532 + <th>Type</th> 533 + <th>Description</th> 534 + </tr> 535 + </thead> 536 + <tbody> 537 + <tr class="odd"> 538 + <td></td> 539 + <td><a href="`typing.IO`">IO</a>[<a href="`bytes`">bytes</a>]</td> 540 + <td>File-like stream for reading the shard.</td> 541 + </tr> 542 + </tbody> 543 + </table> 544 + </section> 545 + <section id="raises" class="level4 doc-section doc-section-raises"> 546 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 547 + <table class="caption-top table"> 548 + <thead> 549 + <tr class="header"> 550 + <th>Name</th> 551 + <th>Type</th> 552 + <th>Description</th> 553 + </tr> 554 + </thead> 555 + <tbody> 556 + <tr class="odd"> 557 + <td></td> 558 + <td><a href="`KeyError`">KeyError</a></td> 559 + <td>If shard_id is not in shard_list.</td> 560 + </tr> 561 + </tbody> 562 + </table> 477 563 478 564 565 + </section> 479 566 </section> 480 567 </section> 481 568 </section>

+417 -55

docs/api/Dataset.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.Dataset" id="toc-atdata.Dataset" class="nav-link active" data-scroll-target="#atdata.Dataset">Dataset</a> 400 400 <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 403 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 404 + <li><a href="#note" id="toc-note" class="nav-link" data-scroll-target="#note">Note</a></li> 402 405 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 406 <ul class="collapse"> 404 407 <li><a href="#atdata.Dataset.as_type" id="toc-atdata.Dataset.as_type" class="nav-link" data-scroll-target="#atdata.Dataset.as_type">as_type</a></li> ··· 426 429 <p>A typed dataset built on WebDataset with lens transformations.</p> 427 430 <p>This class wraps WebDataset tar archives and provides type-safe iteration over samples of a specific <code>PackableSample</code> type. Samples are stored as msgpack-serialized data within WebDataset shards.</p> 428 431 <p>The dataset supports: - Ordered and shuffled iteration - Automatic batching with <code>SampleBatch</code> - Type transformations via the lens system (<code>as_type()</code>) - Export to parquet format</p> 429 - <p>Type Parameters: ST: The sample type for this dataset, must derive from <code>PackableSample</code>.</p> 430 - <p>Attributes: url: WebDataset brace-notation URL for the tar file(s).</p> 431 - <p>Example: >>> ds = Dataset<a href=""path/to/data-{000000..000009}.tar"">MyData</a> >>> for sample in ds.ordered(batch_size=32): … # sample is SampleBatch[MyData] with batch_size samples … embeddings = sample.embeddings # shape: (32, …) … >>> # Transform to a different view >>> ds_view = ds.as_type(MyDataView)</p> 432 - <p>Note: This class uses Python’s <code>__orig_class__</code> mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax <code>Dataset[MyType](url)</code> rather than calling the constructor directly with an unsubscripted class.</p> 433 - <section id="attributes" class="level2"> 434 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 432 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 433 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 435 434 <table class="caption-top table"> 435 + <colgroup> 436 + <col style="width: 8%"> 437 + <col style="width: 8%"> 438 + <col style="width: 72%"> 439 + <col style="width: 12%"> 440 + </colgroup> 436 441 <thead> 437 442 <tr class="header"> 438 443 <th>Name</th> 444 + <th>Type</th> 439 445 <th>Description</th> 446 + <th>Default</th> 440 447 </tr> 441 448 </thead> 442 449 <tbody> 443 450 <tr class="odd"> 444 - <td><a href="#atdata.Dataset.batch_type">batch_type</a></td> 445 - <td>The type of batches produced by this dataset.</td> 451 + <td>ST</td> 452 + <td></td> 453 + <td>The sample type for this dataset, must derive from <code>PackableSample</code>.</td> 454 + <td><em>required</em></td> 446 455 </tr> 447 - <tr class="even"> 448 - <td><a href="#atdata.Dataset.metadata">metadata</a></td> 449 - <td>Fetch and cache metadata from metadata_url.</td> 456 + </tbody> 457 + </table> 458 + </section> 459 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 460 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 461 + <table class="caption-top table"> 462 + <thead> 463 + <tr class="header"> 464 + <th>Name</th> 465 + <th>Type</th> 466 + <th>Description</th> 450 467 </tr> 468 + </thead> 469 + <tbody> 451 470 <tr class="odd"> 452 - <td><a href="#atdata.Dataset.metadata_url">metadata_url</a></td> 453 - <td>Optional URL to msgpack-encoded metadata for this dataset.</td> 454 - </tr> 455 - <tr class="even"> 456 - <td><a href="#atdata.Dataset.sample_type">sample_type</a></td> 457 - <td>The type of each returned sample from this dataset’s iterator.</td> 458 - </tr> 459 - <tr class="odd"> 460 - <td><a href="#atdata.Dataset.shard_list">shard_list</a></td> 461 - <td>List of individual dataset shards (deprecated, use list_shards()).</td> 462 - </tr> 463 - <tr class="even"> 464 - <td><a href="#atdata.Dataset.source">source</a></td> 465 - <td>The underlying data source for this dataset.</td> 471 + <td>url</td> 472 + <td></td> 473 + <td>WebDataset brace-notation URL for the tar file(s).</td> 466 474 </tr> 467 475 </tbody> 468 476 </table> 469 477 </section> 478 + <section id="example" class="level2 doc-section doc-section-example"> 479 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 480 + <p>::</p> 481 + <pre><code>>>> ds = Dataset[MyData]("path/to/data-{000000..000009}.tar") 482 + >>> for sample in ds.ordered(batch_size=32): 483 + ... # sample is SampleBatch[MyData] with batch_size samples 484 + ... embeddings = sample.embeddings # shape: (32, ...) 485 + ... 486 + >>> # Transform to a different view 487 + >>> ds_view = ds.as_type(MyDataView)</code></pre> 488 + </section> 489 + <section id="note" class="level2 doc-section doc-section-note"> 490 + <h2 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h2> 491 + <p>This class uses Python’s <code>__orig_class__</code> mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax <code>Dataset[MyType](url)</code> rather than calling the constructor directly with an unsubscripted class.</p> 492 + </section> 470 493 <section id="methods" class="level2"> 471 494 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 472 495 <table class="caption-top table"> ··· 509 532 </table> 510 533 <section id="atdata.Dataset.as_type" class="level3"> 511 534 <h3 class="anchored" data-anchor-id="atdata.Dataset.as_type">as_type</h3> 512 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>Dataset.as_type(other)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 535 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>Dataset.as_type(other)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 513 536 <p>View this dataset through a different sample type using a registered lens.</p> 514 - <p>Args: other: The target sample type to transform into. Must be a type derived from <code>PackableSample</code>.</p> 515 - <p>Returns: A new <code>Dataset</code> instance that yields samples of type <code>other</code> by applying the appropriate lens transformation from the global <code>LensNetwork</code> registry.</p> 516 - <p>Raises: ValueError: If no registered lens exists between the current sample type and the target type.</p> 537 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 538 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 539 + <table class="caption-top table"> 540 + <thead> 541 + <tr class="header"> 542 + <th>Name</th> 543 + <th>Type</th> 544 + <th>Description</th> 545 + <th>Default</th> 546 + </tr> 547 + </thead> 548 + <tbody> 549 + <tr class="odd"> 550 + <td>other</td> 551 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata.dataset.RT`">RT</a>]</td> 552 + <td>The target sample type to transform into. Must be a type derived from <code>PackableSample</code>.</td> 553 + <td><em>required</em></td> 554 + </tr> 555 + </tbody> 556 + </table> 557 + </section> 558 + <section id="returns" class="level4 doc-section doc-section-returns"> 559 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 560 + <table class="caption-top table"> 561 + <thead> 562 + <tr class="header"> 563 + <th>Name</th> 564 + <th>Type</th> 565 + <th>Description</th> 566 + </tr> 567 + </thead> 568 + <tbody> 569 + <tr class="odd"> 570 + <td></td> 571 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata.dataset.RT`">RT</a>]</td> 572 + <td>A new <code>Dataset</code> instance that yields samples of type <code>other</code></td> 573 + </tr> 574 + <tr class="even"> 575 + <td></td> 576 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata.dataset.RT`">RT</a>]</td> 577 + <td>by applying the appropriate lens transformation from the global</td> 578 + </tr> 579 + <tr class="odd"> 580 + <td></td> 581 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata.dataset.RT`">RT</a>]</td> 582 + <td><code>LensNetwork</code> registry.</td> 583 + </tr> 584 + </tbody> 585 + </table> 586 + </section> 587 + <section id="raises" class="level4 doc-section doc-section-raises"> 588 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 589 + <table class="caption-top table"> 590 + <thead> 591 + <tr class="header"> 592 + <th>Name</th> 593 + <th>Type</th> 594 + <th>Description</th> 595 + </tr> 596 + </thead> 597 + <tbody> 598 + <tr class="odd"> 599 + <td></td> 600 + <td><a href="`ValueError`">ValueError</a></td> 601 + <td>If no registered lens exists between the current sample type and the target type.</td> 602 + </tr> 603 + </tbody> 604 + </table> 605 + </section> 517 606 </section> 518 607 <section id="atdata.Dataset.list_shards" class="level3"> 519 608 <h3 class="anchored" data-anchor-id="atdata.Dataset.list_shards">list_shards</h3> 520 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>Dataset.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 609 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>Dataset.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 521 610 <p>Get list of individual dataset shards.</p> 522 - <p>Returns: A full (non-lazy) list of the individual <code>tar</code> files within the source WebDataset.</p> 611 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 612 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 613 + <table class="caption-top table"> 614 + <thead> 615 + <tr class="header"> 616 + <th>Name</th> 617 + <th>Type</th> 618 + <th>Description</th> 619 + </tr> 620 + </thead> 621 + <tbody> 622 + <tr class="odd"> 623 + <td></td> 624 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 625 + <td>A full (non-lazy) list of the individual <code>tar</code> files within the</td> 626 + </tr> 627 + <tr class="even"> 628 + <td></td> 629 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 630 + <td>source WebDataset.</td> 631 + </tr> 632 + </tbody> 633 + </table> 634 + </section> 523 635 </section> 524 636 <section id="atdata.Dataset.ordered" class="level3"> 525 637 <h3 class="anchored" data-anchor-id="atdata.Dataset.ordered">ordered</h3> 526 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>Dataset.ordered(batch_size<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 638 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>Dataset.ordered(batch_size<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 527 639 <p>Iterate over the dataset in order</p> 528 - <p>Args: batch_size (:obj:<code>int</code>, optional): The size of iterated batches. Default: None (unbatched). If <code>None</code>, iterates over one sample at a time with no batch dimension.</p> 529 - <p>Returns: :obj:<code>webdataset.DataPipeline</code> A data pipeline that iterates over the dataset in its original sample order</p> 640 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 641 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 642 + <table class="caption-top table"> 643 + <colgroup> 644 + <col style="width: 7%"> 645 + <col style="width: 4%"> 646 + <col style="width: 81%"> 647 + <col style="width: 6%"> 648 + </colgroup> 649 + <thead> 650 + <tr class="header"> 651 + <th>Name</th> 652 + <th>Type</th> 653 + <th>Description</th> 654 + <th>Default</th> 655 + </tr> 656 + </thead> 657 + <tbody> 658 + <tr class="odd"> 659 + <td>batch_size (</td> 660 + <td></td> 661 + <td>obj:<code>int</code>, optional): The size of iterated batches. Default: None (unbatched). If <code>None</code>, iterates over one sample at a time with no batch dimension.</td> 662 + <td><em>required</em></td> 663 + </tr> 664 + </tbody> 665 + </table> 666 + </section> 667 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 668 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 669 + <table class="caption-top table"> 670 + <thead> 671 + <tr class="header"> 672 + <th>Name</th> 673 + <th>Type</th> 674 + <th>Description</th> 675 + </tr> 676 + </thead> 677 + <tbody> 678 + <tr class="odd"> 679 + <td></td> 680 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 681 + <td>obj:<code>webdataset.DataPipeline</code> A data pipeline that iterates over</td> 682 + </tr> 683 + <tr class="even"> 684 + <td></td> 685 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 686 + <td>the dataset in its original sample order</td> 687 + </tr> 688 + </tbody> 689 + </table> 690 + </section> 530 691 </section> 531 692 <section id="atdata.Dataset.shuffled" class="level3"> 532 693 <h3 class="anchored" data-anchor-id="atdata.Dataset.shuffled">shuffled</h3> 533 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>Dataset.shuffled(buffer_shards<span class="op">=</span><span class="dv">100</span>, buffer_samples<span class="op">=</span><span class="dv">10000</span>, batch_size<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 694 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>Dataset.shuffled(buffer_shards<span class="op">=</span><span class="dv">100</span>, buffer_samples<span class="op">=</span><span class="dv">10000</span>, batch_size<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 534 695 <p>Iterate over the dataset in random order.</p> 535 - <p>Args: buffer_shards: Number of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100. buffer_samples: Number of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000. batch_size: The size of iterated batches. Default: None (unbatched). If <code>None</code>, iterates over one sample at a time with no batch dimension.</p> 536 - <p>Returns: A WebDataset data pipeline that iterates over the dataset in randomized order. If <code>batch_size</code> is not <code>None</code>, yields <code>SampleBatch[ST]</code> instances; otherwise yields individual <code>ST</code> samples.</p> 696 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 697 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 698 + <table class="caption-top table"> 699 + <thead> 700 + <tr class="header"> 701 + <th>Name</th> 702 + <th>Type</th> 703 + <th>Description</th> 704 + <th>Default</th> 705 + </tr> 706 + </thead> 707 + <tbody> 708 + <tr class="odd"> 709 + <td>buffer_shards</td> 710 + <td><a href="`int`">int</a></td> 711 + <td>Number of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100.</td> 712 + <td><code>100</code></td> 713 + </tr> 714 + <tr class="even"> 715 + <td>buffer_samples</td> 716 + <td><a href="`int`">int</a></td> 717 + <td>Number of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000.</td> 718 + <td><code>10000</code></td> 719 + </tr> 720 + <tr class="odd"> 721 + <td>batch_size</td> 722 + <td><a href="`int`">int</a> | None</td> 723 + <td>The size of iterated batches. Default: None (unbatched). If <code>None</code>, iterates over one sample at a time with no batch dimension.</td> 724 + <td><code>None</code></td> 725 + </tr> 726 + </tbody> 727 + </table> 728 + </section> 729 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 730 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 731 + <table class="caption-top table"> 732 + <thead> 733 + <tr class="header"> 734 + <th>Name</th> 735 + <th>Type</th> 736 + <th>Description</th> 737 + </tr> 738 + </thead> 739 + <tbody> 740 + <tr class="odd"> 741 + <td></td> 742 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 743 + <td>A WebDataset data pipeline that iterates over the dataset in</td> 744 + </tr> 745 + <tr class="even"> 746 + <td></td> 747 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 748 + <td>randomized order. If <code>batch_size</code> is not <code>None</code>, yields</td> 749 + </tr> 750 + <tr class="odd"> 751 + <td></td> 752 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 753 + <td><code>SampleBatch[ST]</code> instances; otherwise yields individual <code>ST</code></td> 754 + </tr> 755 + <tr class="even"> 756 + <td></td> 757 + <td><a href="`typing.Iterable`">Iterable</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 758 + <td>samples.</td> 759 + </tr> 760 + </tbody> 761 + </table> 762 + </section> 537 763 </section> 538 764 <section id="atdata.Dataset.to_parquet" class="level3"> 539 765 <h3 class="anchored" data-anchor-id="atdata.Dataset.to_parquet">to_parquet</h3> 540 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>Dataset.to_parquet(path, sample_map<span class="op">=</span><span class="va">None</span>, maxcount<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 766 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>Dataset.to_parquet(path, sample_map<span class="op">=</span><span class="va">None</span>, maxcount<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 541 767 <p>Export dataset contents to parquet format.</p> 542 768 <p>Converts all samples to a pandas DataFrame and saves to parquet file(s). Useful for interoperability with data analysis tools.</p> 543 - <p>Args: path: Output path for the parquet file. If <code>maxcount</code> is specified, files are named <code>{stem}-{segment:06d}.parquet</code>. sample_map: Optional function to convert samples to dictionaries. Defaults to <code>dataclasses.asdict</code>. maxcount: If specified, split output into multiple files with at most this many samples each. Recommended for large datasets. **kwargs: Additional arguments passed to <code>pandas.DataFrame.to_parquet()</code>. Common options include <code>compression</code>, <code>index</code>, <code>engine</code>.</p> 544 - <p>Warning: <strong>Memory Usage</strong>: When <code>maxcount=None</code> (default), this method loads the <strong>entire dataset into memory</strong> as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.</p> 545 - <pre><code>For datasets larger than available RAM, always specify ``maxcount``:: 546 - 547 - # Safe for large datasets - processes in chunks 548 - ds.to_parquet("output.parquet", maxcount=10000) 549 - 550 - This creates multiple parquet files: ``output-000000.parquet``, 551 - ``output-000001.parquet``, etc.</code></pre> 552 - <p>Example: >>> ds = Dataset<a href=""data.tar"">MySample</a> >>> # Small dataset - load all at once >>> ds.to_parquet(“output.parquet”) >>> >>> # Large dataset - process in chunks >>> ds.to_parquet(“output.parquet”, maxcount=50000)</p> 769 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 770 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 771 + <table class="caption-top table"> 772 + <thead> 773 + <tr class="header"> 774 + <th>Name</th> 775 + <th>Type</th> 776 + <th>Description</th> 777 + <th>Default</th> 778 + </tr> 779 + </thead> 780 + <tbody> 781 + <tr class="odd"> 782 + <td>path</td> 783 + <td><a href="`atdata.dataset.Pathlike`">Pathlike</a></td> 784 + <td>Output path for the parquet file. If <code>maxcount</code> is specified, files are named <code>{stem}-{segment:06d}.parquet</code>.</td> 785 + <td><em>required</em></td> 786 + </tr> 787 + <tr class="even"> 788 + <td>sample_map</td> 789 + <td><a href="`typing.Optional`">Optional</a>[<a href="`atdata.dataset.SampleExportMap`">SampleExportMap</a>]</td> 790 + <td>Optional function to convert samples to dictionaries. Defaults to <code>dataclasses.asdict</code>.</td> 791 + <td><code>None</code></td> 792 + </tr> 793 + <tr class="odd"> 794 + <td>maxcount</td> 795 + <td><a href="`typing.Optional`">Optional</a>[<a href="`int`">int</a>]</td> 796 + <td>If specified, split output into multiple files with at most this many samples each. Recommended for large datasets.</td> 797 + <td><code>None</code></td> 798 + </tr> 799 + <tr class="even"> 800 + <td>**kwargs</td> 801 + <td></td> 802 + <td>Additional arguments passed to <code>pandas.DataFrame.to_parquet()</code>. Common options include <code>compression</code>, <code>index</code>, <code>engine</code>.</td> 803 + <td><code>{}</code></td> 804 + </tr> 805 + </tbody> 806 + </table> 807 + </section> 808 + <section id="warning" class="level4 doc-section doc-section-warning"> 809 + <h4 class="doc-section doc-section-warning anchored" data-anchor-id="warning">Warning</h4> 810 + <p><strong>Memory Usage</strong>: When <code>maxcount=None</code> (default), this method loads the <strong>entire dataset into memory</strong> as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.</p> 811 + <p>For datasets larger than available RAM, always specify <code>maxcount</code>::</p> 812 + <pre><code># Safe for large datasets - processes in chunks 813 + ds.to_parquet("output.parquet", maxcount=10000)</code></pre> 814 + <p>This creates multiple parquet files: <code>output-000000.parquet</code>, <code>output-000001.parquet</code>, etc.</p> 815 + </section> 816 + <section id="example-1" class="level4 doc-section doc-section-example"> 817 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 818 + <p>::</p> 819 + <pre><code>>>> ds = Dataset[MySample]("data.tar") 820 + >>> # Small dataset - load all at once 821 + >>> ds.to_parquet("output.parquet") 822 + >>> 823 + >>> # Large dataset - process in chunks 824 + >>> ds.to_parquet("output.parquet", maxcount=50000)</code></pre> 825 + </section> 553 826 </section> 554 827 <section id="atdata.Dataset.wrap" class="level3"> 555 828 <h3 class="anchored" data-anchor-id="atdata.Dataset.wrap">wrap</h3> 556 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>Dataset.wrap(sample)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 829 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>Dataset.wrap(sample)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 557 830 <p>Wrap a raw msgpack sample into the appropriate dataset-specific type.</p> 558 - <p>Args: sample: A dictionary containing at minimum a <code>'msgpack'</code> key with serialized sample bytes.</p> 559 - <p>Returns: A deserialized sample of type <code>ST</code>, optionally transformed through a lens if <code>as_type()</code> was called.</p> 831 + <section id="parameters-5" class="level4 doc-section doc-section-parameters"> 832 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4> 833 + <table class="caption-top table"> 834 + <thead> 835 + <tr class="header"> 836 + <th>Name</th> 837 + <th>Type</th> 838 + <th>Description</th> 839 + <th>Default</th> 840 + </tr> 841 + </thead> 842 + <tbody> 843 + <tr class="odd"> 844 + <td>sample</td> 845 + <td><a href="`atdata.dataset.WDSRawSample`">WDSRawSample</a></td> 846 + <td>A dictionary containing at minimum a <code>'msgpack'</code> key with serialized sample bytes.</td> 847 + <td><em>required</em></td> 848 + </tr> 849 + </tbody> 850 + </table> 851 + </section> 852 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 853 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 854 + <table class="caption-top table"> 855 + <thead> 856 + <tr class="header"> 857 + <th>Name</th> 858 + <th>Type</th> 859 + <th>Description</th> 860 + </tr> 861 + </thead> 862 + <tbody> 863 + <tr class="odd"> 864 + <td></td> 865 + <td><a href="`atdata.dataset.ST`">ST</a></td> 866 + <td>A deserialized sample of type <code>ST</code>, optionally transformed through</td> 867 + </tr> 868 + <tr class="even"> 869 + <td></td> 870 + <td><a href="`atdata.dataset.ST`">ST</a></td> 871 + <td>a lens if <code>as_type()</code> was called.</td> 872 + </tr> 873 + </tbody> 874 + </table> 875 + </section> 560 876 </section> 561 877 <section id="atdata.Dataset.wrap_batch" class="level3"> 562 878 <h3 class="anchored" data-anchor-id="atdata.Dataset.wrap_batch">wrap_batch</h3> 563 - <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>Dataset.wrap_batch(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 879 + <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>Dataset.wrap_batch(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 564 880 <p>Wrap a batch of raw msgpack samples into a typed SampleBatch.</p> 565 - <p>Args: batch: A dictionary containing a <code>'msgpack'</code> key with a list of serialized sample bytes.</p> 566 - <p>Returns: A <code>SampleBatch[ST]</code> containing deserialized samples, optionally transformed through a lens if <code>as_type()</code> was called.</p> 567 - <p>Note: This implementation deserializes samples one at a time, then aggregates them into a batch.</p> 881 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 882 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 883 + <table class="caption-top table"> 884 + <thead> 885 + <tr class="header"> 886 + <th>Name</th> 887 + <th>Type</th> 888 + <th>Description</th> 889 + <th>Default</th> 890 + </tr> 891 + </thead> 892 + <tbody> 893 + <tr class="odd"> 894 + <td>batch</td> 895 + <td><a href="`atdata.dataset.WDSRawBatch`">WDSRawBatch</a></td> 896 + <td>A dictionary containing a <code>'msgpack'</code> key with a list of serialized sample bytes.</td> 897 + <td><em>required</em></td> 898 + </tr> 899 + </tbody> 900 + </table> 901 + </section> 902 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 903 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 904 + <table class="caption-top table"> 905 + <thead> 906 + <tr class="header"> 907 + <th>Name</th> 908 + <th>Type</th> 909 + <th>Description</th> 910 + </tr> 911 + </thead> 912 + <tbody> 913 + <tr class="odd"> 914 + <td></td> 915 + <td><a href="`atdata.dataset.SampleBatch`">SampleBatch</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 916 + <td>A <code>SampleBatch[ST]</code> containing deserialized samples, optionally</td> 917 + </tr> 918 + <tr class="even"> 919 + <td></td> 920 + <td><a href="`atdata.dataset.SampleBatch`">SampleBatch</a>[<a href="`atdata.dataset.ST`">ST</a>]</td> 921 + <td>transformed through a lens if <code>as_type()</code> was called.</td> 922 + </tr> 923 + </tbody> 924 + </table> 925 + </section> 926 + <section id="note-1" class="level4 doc-section doc-section-note"> 927 + <h4 class="doc-section doc-section-note anchored" data-anchor-id="note-1">Note</h4> 928 + <p>This implementation deserializes samples one at a time, then aggregates them into a batch.</p> 568 929 569 930 931 + </section> 570 932 </section> 571 933 </section> 572 934 </section>

+40 -2

docs/api/DatasetDict.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.DatasetDict" id="toc-atdata.DatasetDict" class="nav-link active" data-scroll-target="#atdata.DatasetDict">DatasetDict</a> 400 400 <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 403 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 404 </ul></li> 403 405 </ul> ··· 415 417 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>DatasetDict(splits<span class="op">=</span><span class="va">None</span>, sample_type<span class="op">=</span><span class="va">None</span>, streaming<span class="op">=</span><span class="va">False</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 416 418 <p>A dictionary of split names to Dataset instances.</p> 417 419 <p>Similar to HuggingFace’s DatasetDict, this provides a container for multiple dataset splits (train, test, validation, etc.) with convenience methods that operate across all splits.</p> 418 - <p>Type Parameters: ST: The sample type for all datasets in this dict.</p> 419 - <p>Example: >>> ds_dict = load_dataset(“path/to/data”, MyData) >>> train = ds_dict[“train”] >>> test = ds_dict[“test”] >>> >>> # Iterate over all splits >>> for split_name, dataset in ds_dict.items(): … print(f”{split_name}: {len(dataset.shard_list)} shards”)</p> 420 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 421 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 422 + <table class="caption-top table"> 423 + <colgroup> 424 + <col style="width: 10%"> 425 + <col style="width: 10%"> 426 + <col style="width: 63%"> 427 + <col style="width: 15%"> 428 + </colgroup> 429 + <thead> 430 + <tr class="header"> 431 + <th>Name</th> 432 + <th>Type</th> 433 + <th>Description</th> 434 + <th>Default</th> 435 + </tr> 436 + </thead> 437 + <tbody> 438 + <tr class="odd"> 439 + <td>ST</td> 440 + <td></td> 441 + <td>The sample type for all datasets in this dict.</td> 442 + <td><em>required</em></td> 443 + </tr> 444 + </tbody> 445 + </table> 446 + </section> 447 + <section id="example" class="level2 doc-section doc-section-example"> 448 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 449 + <p>::</p> 450 + <pre><code>>>> ds_dict = load_dataset("path/to/data", MyData) 451 + >>> train = ds_dict["train"] 452 + >>> test = ds_dict["test"] 453 + >>> 454 + >>> # Iterate over all splits 455 + >>> for split_name, dataset in ds_dict.items(): 456 + ... print(f"{split_name}: {len(dataset.shard_list)} shards")</code></pre> 457 + </section> 420 458 <section id="attributes" class="level2"> 421 459 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 422 460 <table class="caption-top table">

+477 -32

docs/api/DatasetLoader.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.DatasetLoader" id="toc-atdata.atmosphere.DatasetLoader" class="nav-link active" data-scroll-target="#atdata.atmosphere.DatasetLoader">DatasetLoader</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.atmosphere.DatasetLoader.get" id="toc-atdata.atmosphere.DatasetLoader.get" class="nav-link" data-scroll-target="#atdata.atmosphere.DatasetLoader.get">get</a></li> ··· 425 426 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 426 427 <p>Loads dataset records from ATProto.</p> 427 428 <p>This class fetches dataset index records and can create Dataset objects from them. Note that loading a dataset requires having the corresponding Python class for the sample type.</p> 428 - <p>Example: >>> client = AtmosphereClient() >>> loader = DatasetLoader(client) >>> >>> # List available datasets >>> datasets = loader.list() >>> for ds in datasets: … print(ds[“name”], ds[“schemaRef”]) >>> >>> # Get a specific dataset record >>> record = loader.get(“at://did:plc:abc/ac.foundation.dataset.record/xyz”)</p> 429 + <section id="example" class="level2 doc-section doc-section-example"> 430 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 431 + <p>::</p> 432 + <pre><code>>>> client = AtmosphereClient() 433 + >>> loader = DatasetLoader(client) 434 + >>> 435 + >>> # List available datasets 436 + >>> datasets = loader.list() 437 + >>> for ds in datasets: 438 + ... print(ds["name"], ds["schemaRef"]) 439 + >>> 440 + >>> # Get a specific dataset record 441 + >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.record/xyz")</code></pre> 442 + </section> 429 443 <section id="methods" class="level2"> 430 444 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 431 445 <table class="caption-top table"> ··· 472 486 </table> 473 487 <section id="atdata.atmosphere.DatasetLoader.get" class="level3"> 474 488 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get">get</h3> 475 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 489 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 476 490 <p>Fetch a dataset record by AT URI.</p> 477 - <p>Args: uri: The AT URI of the dataset record.</p> 478 - <p>Returns: The dataset record as a dictionary.</p> 479 - <p>Raises: ValueError: If the record is not a dataset record.</p> 491 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 492 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 493 + <table class="caption-top table"> 494 + <thead> 495 + <tr class="header"> 496 + <th>Name</th> 497 + <th>Type</th> 498 + <th>Description</th> 499 + <th>Default</th> 500 + </tr> 501 + </thead> 502 + <tbody> 503 + <tr class="odd"> 504 + <td>uri</td> 505 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 506 + <td>The AT URI of the dataset record.</td> 507 + <td><em>required</em></td> 508 + </tr> 509 + </tbody> 510 + </table> 511 + </section> 512 + <section id="returns" class="level4 doc-section doc-section-returns"> 513 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 514 + <table class="caption-top table"> 515 + <thead> 516 + <tr class="header"> 517 + <th>Name</th> 518 + <th>Type</th> 519 + <th>Description</th> 520 + </tr> 521 + </thead> 522 + <tbody> 523 + <tr class="odd"> 524 + <td></td> 525 + <td><a href="`dict`">dict</a></td> 526 + <td>The dataset record as a dictionary.</td> 527 + </tr> 528 + </tbody> 529 + </table> 530 + </section> 531 + <section id="raises" class="level4 doc-section doc-section-raises"> 532 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 533 + <table class="caption-top table"> 534 + <thead> 535 + <tr class="header"> 536 + <th>Name</th> 537 + <th>Type</th> 538 + <th>Description</th> 539 + </tr> 540 + </thead> 541 + <tbody> 542 + <tr class="odd"> 543 + <td></td> 544 + <td><a href="`ValueError`">ValueError</a></td> 545 + <td>If the record is not a dataset record.</td> 546 + </tr> 547 + </tbody> 548 + </table> 549 + </section> 480 550 </section> 481 551 <section id="atdata.atmosphere.DatasetLoader.get_blob_urls" class="level3"> 482 552 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get_blob_urls">get_blob_urls</h3> 483 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_blob_urls(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 553 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_blob_urls(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 484 554 <p>Get fetchable URLs for blob-stored dataset shards.</p> 485 555 <p>This resolves the PDS endpoint and constructs URLs that can be used to fetch the blob data directly.</p> 486 - <p>Args: uri: The AT URI of the dataset record.</p> 487 - <p>Returns: List of URLs for fetching the blob data.</p> 488 - <p>Raises: ValueError: If storage type is not blobs or PDS cannot be resolved.</p> 556 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 557 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 558 + <table class="caption-top table"> 559 + <thead> 560 + <tr class="header"> 561 + <th>Name</th> 562 + <th>Type</th> 563 + <th>Description</th> 564 + <th>Default</th> 565 + </tr> 566 + </thead> 567 + <tbody> 568 + <tr class="odd"> 569 + <td>uri</td> 570 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 571 + <td>The AT URI of the dataset record.</td> 572 + <td><em>required</em></td> 573 + </tr> 574 + </tbody> 575 + </table> 576 + </section> 577 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 578 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 579 + <table class="caption-top table"> 580 + <thead> 581 + <tr class="header"> 582 + <th>Name</th> 583 + <th>Type</th> 584 + <th>Description</th> 585 + </tr> 586 + </thead> 587 + <tbody> 588 + <tr class="odd"> 589 + <td></td> 590 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 591 + <td>List of URLs for fetching the blob data.</td> 592 + </tr> 593 + </tbody> 594 + </table> 595 + </section> 596 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 597 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 598 + <table class="caption-top table"> 599 + <thead> 600 + <tr class="header"> 601 + <th>Name</th> 602 + <th>Type</th> 603 + <th>Description</th> 604 + </tr> 605 + </thead> 606 + <tbody> 607 + <tr class="odd"> 608 + <td></td> 609 + <td><a href="`ValueError`">ValueError</a></td> 610 + <td>If storage type is not blobs or PDS cannot be resolved.</td> 611 + </tr> 612 + </tbody> 613 + </table> 614 + </section> 489 615 </section> 490 616 <section id="atdata.atmosphere.DatasetLoader.get_blobs" class="level3"> 491 617 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get_blobs">get_blobs</h3> 492 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_blobs(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 618 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_blobs(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 493 619 <p>Get the blob references from a dataset record.</p> 494 - <p>Args: uri: The AT URI of the dataset record.</p> 495 - <p>Returns: List of blob reference dicts with keys: $type, ref, mimeType, size.</p> 496 - <p>Raises: ValueError: If the storage type is not blobs.</p> 620 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 621 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 622 + <table class="caption-top table"> 623 + <thead> 624 + <tr class="header"> 625 + <th>Name</th> 626 + <th>Type</th> 627 + <th>Description</th> 628 + <th>Default</th> 629 + </tr> 630 + </thead> 631 + <tbody> 632 + <tr class="odd"> 633 + <td>uri</td> 634 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 635 + <td>The AT URI of the dataset record.</td> 636 + <td><em>required</em></td> 637 + </tr> 638 + </tbody> 639 + </table> 640 + </section> 641 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 642 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 643 + <table class="caption-top table"> 644 + <thead> 645 + <tr class="header"> 646 + <th>Name</th> 647 + <th>Type</th> 648 + <th>Description</th> 649 + </tr> 650 + </thead> 651 + <tbody> 652 + <tr class="odd"> 653 + <td></td> 654 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 655 + <td>List of blob reference dicts with keys: $type, ref, mimeType, size.</td> 656 + </tr> 657 + </tbody> 658 + </table> 659 + </section> 660 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 661 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 662 + <table class="caption-top table"> 663 + <thead> 664 + <tr class="header"> 665 + <th>Name</th> 666 + <th>Type</th> 667 + <th>Description</th> 668 + </tr> 669 + </thead> 670 + <tbody> 671 + <tr class="odd"> 672 + <td></td> 673 + <td><a href="`ValueError`">ValueError</a></td> 674 + <td>If the storage type is not blobs.</td> 675 + </tr> 676 + </tbody> 677 + </table> 678 + </section> 497 679 </section> 498 680 <section id="atdata.atmosphere.DatasetLoader.get_metadata" class="level3"> 499 681 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get_metadata">get_metadata</h3> 500 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_metadata(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 682 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_metadata(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 501 683 <p>Get the metadata from a dataset record.</p> 502 - <p>Args: uri: The AT URI of the dataset record.</p> 503 - <p>Returns: The metadata dictionary, or None if no metadata.</p> 684 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 685 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 686 + <table class="caption-top table"> 687 + <thead> 688 + <tr class="header"> 689 + <th>Name</th> 690 + <th>Type</th> 691 + <th>Description</th> 692 + <th>Default</th> 693 + </tr> 694 + </thead> 695 + <tbody> 696 + <tr class="odd"> 697 + <td>uri</td> 698 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 699 + <td>The AT URI of the dataset record.</td> 700 + <td><em>required</em></td> 701 + </tr> 702 + </tbody> 703 + </table> 704 + </section> 705 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 706 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 707 + <table class="caption-top table"> 708 + <thead> 709 + <tr class="header"> 710 + <th>Name</th> 711 + <th>Type</th> 712 + <th>Description</th> 713 + </tr> 714 + </thead> 715 + <tbody> 716 + <tr class="odd"> 717 + <td></td> 718 + <td><a href="`typing.Optional`">Optional</a>[<a href="`dict`">dict</a>]</td> 719 + <td>The metadata dictionary, or None if no metadata.</td> 720 + </tr> 721 + </tbody> 722 + </table> 723 + </section> 504 724 </section> 505 725 <section id="atdata.atmosphere.DatasetLoader.get_storage_type" class="level3"> 506 726 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get_storage_type">get_storage_type</h3> 507 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_storage_type(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 727 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_storage_type(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 508 728 <p>Get the storage type of a dataset record.</p> 509 - <p>Args: uri: The AT URI of the dataset record.</p> 510 - <p>Returns: Either “external” or “blobs”.</p> 511 - <p>Raises: ValueError: If storage type is unknown.</p> 729 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 730 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 731 + <table class="caption-top table"> 732 + <thead> 733 + <tr class="header"> 734 + <th>Name</th> 735 + <th>Type</th> 736 + <th>Description</th> 737 + <th>Default</th> 738 + </tr> 739 + </thead> 740 + <tbody> 741 + <tr class="odd"> 742 + <td>uri</td> 743 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 744 + <td>The AT URI of the dataset record.</td> 745 + <td><em>required</em></td> 746 + </tr> 747 + </tbody> 748 + </table> 749 + </section> 750 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 751 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 752 + <table class="caption-top table"> 753 + <thead> 754 + <tr class="header"> 755 + <th>Name</th> 756 + <th>Type</th> 757 + <th>Description</th> 758 + </tr> 759 + </thead> 760 + <tbody> 761 + <tr class="odd"> 762 + <td></td> 763 + <td><a href="`str`">str</a></td> 764 + <td>Either “external” or “blobs”.</td> 765 + </tr> 766 + </tbody> 767 + </table> 768 + </section> 769 + <section id="raises-3" class="level4 doc-section doc-section-raises"> 770 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-3">Raises</h4> 771 + <table class="caption-top table"> 772 + <thead> 773 + <tr class="header"> 774 + <th>Name</th> 775 + <th>Type</th> 776 + <th>Description</th> 777 + </tr> 778 + </thead> 779 + <tbody> 780 + <tr class="odd"> 781 + <td></td> 782 + <td><a href="`ValueError`">ValueError</a></td> 783 + <td>If storage type is unknown.</td> 784 + </tr> 785 + </tbody> 786 + </table> 787 + </section> 512 788 </section> 513 789 <section id="atdata.atmosphere.DatasetLoader.get_urls" class="level3"> 514 790 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.get_urls">get_urls</h3> 515 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_urls(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 791 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.get_urls(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 516 792 <p>Get the WebDataset URLs from a dataset record.</p> 517 - <p>Args: uri: The AT URI of the dataset record.</p> 518 - <p>Returns: List of WebDataset URLs.</p> 519 - <p>Raises: ValueError: If the storage type is not external URLs.</p> 793 + <section id="parameters-5" class="level4 doc-section doc-section-parameters"> 794 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4> 795 + <table class="caption-top table"> 796 + <thead> 797 + <tr class="header"> 798 + <th>Name</th> 799 + <th>Type</th> 800 + <th>Description</th> 801 + <th>Default</th> 802 + </tr> 803 + </thead> 804 + <tbody> 805 + <tr class="odd"> 806 + <td>uri</td> 807 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 808 + <td>The AT URI of the dataset record.</td> 809 + <td><em>required</em></td> 810 + </tr> 811 + </tbody> 812 + </table> 813 + </section> 814 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 815 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 816 + <table class="caption-top table"> 817 + <thead> 818 + <tr class="header"> 819 + <th>Name</th> 820 + <th>Type</th> 821 + <th>Description</th> 822 + </tr> 823 + </thead> 824 + <tbody> 825 + <tr class="odd"> 826 + <td></td> 827 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 828 + <td>List of WebDataset URLs.</td> 829 + </tr> 830 + </tbody> 831 + </table> 832 + </section> 833 + <section id="raises-4" class="level4 doc-section doc-section-raises"> 834 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-4">Raises</h4> 835 + <table class="caption-top table"> 836 + <thead> 837 + <tr class="header"> 838 + <th>Name</th> 839 + <th>Type</th> 840 + <th>Description</th> 841 + </tr> 842 + </thead> 843 + <tbody> 844 + <tr class="odd"> 845 + <td></td> 846 + <td><a href="`ValueError`">ValueError</a></td> 847 + <td>If the storage type is not external URLs.</td> 848 + </tr> 849 + </tbody> 850 + </table> 851 + </section> 520 852 </section> 521 853 <section id="atdata.atmosphere.DatasetLoader.list_all" class="level3"> 522 854 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.list_all">list_all</h3> 523 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 855 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 524 856 <p>List dataset records from a repository.</p> 525 - <p>Args: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.</p> 526 - <p>Returns: List of dataset records.</p> 857 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 858 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 859 + <table class="caption-top table"> 860 + <thead> 861 + <tr class="header"> 862 + <th>Name</th> 863 + <th>Type</th> 864 + <th>Description</th> 865 + <th>Default</th> 866 + </tr> 867 + </thead> 868 + <tbody> 869 + <tr class="odd"> 870 + <td>repo</td> 871 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 872 + <td>The DID of the repository. Defaults to authenticated user.</td> 873 + <td><code>None</code></td> 874 + </tr> 875 + <tr class="even"> 876 + <td>limit</td> 877 + <td><a href="`int`">int</a></td> 878 + <td>Maximum number of records to return.</td> 879 + <td><code>100</code></td> 880 + </tr> 881 + </tbody> 882 + </table> 883 + </section> 884 + <section id="returns-6" class="level4 doc-section doc-section-returns"> 885 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-6">Returns</h4> 886 + <table class="caption-top table"> 887 + <thead> 888 + <tr class="header"> 889 + <th>Name</th> 890 + <th>Type</th> 891 + <th>Description</th> 892 + </tr> 893 + </thead> 894 + <tbody> 895 + <tr class="odd"> 896 + <td></td> 897 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 898 + <td>List of dataset records.</td> 899 + </tr> 900 + </tbody> 901 + </table> 902 + </section> 527 903 </section> 528 904 <section id="atdata.atmosphere.DatasetLoader.to_dataset" class="level3"> 529 905 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetLoader.to_dataset">to_dataset</h3> 530 - <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.to_dataset(uri, sample_type)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 906 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetLoader.to_dataset(uri, sample_type)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 531 907 <p>Create a Dataset object from an ATProto record.</p> 532 908 <p>This method creates a Dataset instance from a published record. You must provide the sample type class, which should match the schema referenced by the record.</p> 533 909 <p>Supports both external URL storage and ATProto blob storage.</p> 534 - <p>Args: uri: The AT URI of the dataset record. sample_type: The Python class for the sample type.</p> 535 - <p>Returns: A Dataset instance configured from the record.</p> 536 - <p>Raises: ValueError: If no storage URLs can be resolved.</p> 537 - <p>Example: >>> loader = DatasetLoader(client) >>> dataset = loader.to_dataset(uri, MySampleType) >>> for batch in dataset.shuffled(batch_size=32): … process(batch)</p> 910 + <section id="parameters-7" class="level4 doc-section doc-section-parameters"> 911 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-7">Parameters</h4> 912 + <table class="caption-top table"> 913 + <thead> 914 + <tr class="header"> 915 + <th>Name</th> 916 + <th>Type</th> 917 + <th>Description</th> 918 + <th>Default</th> 919 + </tr> 920 + </thead> 921 + <tbody> 922 + <tr class="odd"> 923 + <td>uri</td> 924 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 925 + <td>The AT URI of the dataset record.</td> 926 + <td><em>required</em></td> 927 + </tr> 928 + <tr class="even"> 929 + <td>sample_type</td> 930 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata.atmosphere.records.ST`">ST</a>]</td> 931 + <td>The Python class for the sample type.</td> 932 + <td><em>required</em></td> 933 + </tr> 934 + </tbody> 935 + </table> 936 + </section> 937 + <section id="returns-7" class="level4 doc-section doc-section-returns"> 938 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-7">Returns</h4> 939 + <table class="caption-top table"> 940 + <thead> 941 + <tr class="header"> 942 + <th>Name</th> 943 + <th>Type</th> 944 + <th>Description</th> 945 + </tr> 946 + </thead> 947 + <tbody> 948 + <tr class="odd"> 949 + <td></td> 950 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata.atmosphere.records.ST`">ST</a>]</td> 951 + <td>A Dataset instance configured from the record.</td> 952 + </tr> 953 + </tbody> 954 + </table> 955 + </section> 956 + <section id="raises-5" class="level4 doc-section doc-section-raises"> 957 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-5">Raises</h4> 958 + <table class="caption-top table"> 959 + <thead> 960 + <tr class="header"> 961 + <th>Name</th> 962 + <th>Type</th> 963 + <th>Description</th> 964 + </tr> 965 + </thead> 966 + <tbody> 967 + <tr class="odd"> 968 + <td></td> 969 + <td><a href="`ValueError`">ValueError</a></td> 970 + <td>If no storage URLs can be resolved.</td> 971 + </tr> 972 + </tbody> 973 + </table> 974 + </section> 975 + <section id="example-1" class="level4 doc-section doc-section-example"> 976 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 977 + <p>::</p> 978 + <pre><code>>>> loader = DatasetLoader(client) 979 + >>> dataset = loader.to_dataset(uri, MySampleType) 980 + >>> for batch in dataset.shuffled(batch_size=32): 981 + ... process(batch)</code></pre> 538 982 539 983 984 + </section> 540 985 </section> 541 986 </section> 542 987 </section>

+329 -40

docs/api/DatasetPublisher.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.DatasetPublisher" id="toc-atdata.atmosphere.DatasetPublisher" class="nav-link active" data-scroll-target="#atdata.atmosphere.DatasetPublisher">DatasetPublisher</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.atmosphere.DatasetPublisher.publish" id="toc-atdata.atmosphere.DatasetPublisher.publish" class="nav-link" data-scroll-target="#atdata.atmosphere.DatasetPublisher.publish">publish</a></li> ··· 420 421 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 421 422 <p>Publishes dataset index records to ATProto.</p> 422 423 <p>This class creates dataset records that reference a schema and point to external storage (WebDataset URLs) or ATProto blobs.</p> 423 - <p>Example: >>> dataset = atdata.Dataset<a href=""s3://bucket/data-{000000..000009}.tar"">MySample</a> >>> >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = DatasetPublisher(client) >>> uri = publisher.publish( … dataset, … name=“My Training Data”, … description=“Training data for my model”, … tags=[“computer-vision”, “training”], … )</p> 424 + <section id="example" class="level2 doc-section doc-section-example"> 425 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 426 + <p>::</p> 427 + <pre><code>>>> dataset = atdata.Dataset[MySample]("s3://bucket/data-{000000..000009}.tar") 428 + >>> 429 + >>> client = AtmosphereClient() 430 + >>> client.login("handle", "password") 431 + >>> 432 + >>> publisher = DatasetPublisher(client) 433 + >>> uri = publisher.publish( 434 + ... dataset, 435 + ... name="My Training Data", 436 + ... description="Training data for my model", 437 + ... tags=["computer-vision", "training"], 438 + ... )</code></pre> 439 + </section> 424 440 <section id="methods" class="level2"> 425 441 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 426 442 <table class="caption-top table"> ··· 447 463 </table> 448 464 <section id="atdata.atmosphere.DatasetPublisher.publish" class="level3"> 449 465 <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetPublisher.publish">publish</h3> 450 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish(</span> 451 - <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 452 - <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 453 - <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> name,</span> 454 - <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span><span class="va">None</span>,</span> 455 - <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 456 - <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span><span class="va">None</span>,</span> 457 - <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="va">None</span>,</span> 458 - <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> auto_publish_schema<span class="op">=</span><span class="va">True</span>,</span> 459 - <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> schema_version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 460 - <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 461 - <span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 462 - <p>Publish a dataset index record to ATProto.</p> 463 - <p>Args: dataset: The Dataset to publish. name: Human-readable dataset name. schema_uri: AT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’). auto_publish_schema: If True and schema_uri not provided, automatically publish the schema first. schema_version: Version for auto-published schema. rkey: Optional explicit record key.</p> 464 - <p>Returns: The AT URI of the created dataset record.</p> 465 - <p>Raises: ValueError: If schema_uri is not provided and auto_publish_schema is False.</p> 466 - </section> 467 - <section id="atdata.atmosphere.DatasetPublisher.publish_with_blobs" class="level3"> 468 - <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetPublisher.publish_with_blobs">publish_with_blobs</h3> 469 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish_with_blobs(</span> 470 - <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> blobs,</span> 471 - <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> schema_uri,</span> 472 - <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 473 - <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> name,</span> 466 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish(</span> 467 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 468 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 469 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> name,</span> 470 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span><span class="va">None</span>,</span> 474 471 <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 475 472 <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span><span class="va">None</span>,</span> 476 473 <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="va">None</span>,</span> 477 - <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span><span class="va">None</span>,</span> 478 - <span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> mime_type<span class="op">=</span><span class="st">'application/x-tar'</span>,</span> 474 + <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> auto_publish_schema<span class="op">=</span><span class="va">True</span>,</span> 475 + <span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> schema_version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 479 476 <span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 480 477 <span id="cb3-12"><a href="#cb3-12" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 481 - <p>Publish a dataset with data stored as ATProto blobs.</p> 482 - <p>This method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).</p> 483 - <p>Args: blobs: List of binary data (e.g., tar shards) to upload as blobs. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. mime_type: MIME type for the blobs (default: application/x-tar). rkey: Optional explicit record key.</p> 484 - <p>Returns: The AT URI of the created dataset record.</p> 485 - <p>Note: Blobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.</p> 478 + <p>Publish a dataset index record to ATProto.</p> 479 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 480 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 481 + <table class="caption-top table"> 482 + <thead> 483 + <tr class="header"> 484 + <th>Name</th> 485 + <th>Type</th> 486 + <th>Description</th> 487 + <th>Default</th> 488 + </tr> 489 + </thead> 490 + <tbody> 491 + <tr class="odd"> 492 + <td>dataset</td> 493 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata.atmosphere.records.ST`">ST</a>]</td> 494 + <td>The Dataset to publish.</td> 495 + <td><em>required</em></td> 496 + </tr> 497 + <tr class="even"> 498 + <td>name</td> 499 + <td><a href="`str`">str</a></td> 500 + <td>Human-readable dataset name.</td> 501 + <td><em>required</em></td> 502 + </tr> 503 + <tr class="odd"> 504 + <td>schema_uri</td> 505 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 506 + <td>AT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published.</td> 507 + <td><code>None</code></td> 508 + </tr> 509 + <tr class="even"> 510 + <td>description</td> 511 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 512 + <td>Human-readable description.</td> 513 + <td><code>None</code></td> 514 + </tr> 515 + <tr class="odd"> 516 + <td>tags</td> 517 + <td><a href="`typing.Optional`">Optional</a>[<a href="`list`">list</a>[<a href="`str`">str</a>]]</td> 518 + <td>Searchable tags for discovery.</td> 519 + <td><code>None</code></td> 520 + </tr> 521 + <tr class="even"> 522 + <td>license</td> 523 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 524 + <td>SPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’).</td> 525 + <td><code>None</code></td> 526 + </tr> 527 + <tr class="odd"> 528 + <td>auto_publish_schema</td> 529 + <td><a href="`bool`">bool</a></td> 530 + <td>If True and schema_uri not provided, automatically publish the schema first.</td> 531 + <td><code>True</code></td> 532 + </tr> 533 + <tr class="even"> 534 + <td>schema_version</td> 535 + <td><a href="`str`">str</a></td> 536 + <td>Version for auto-published schema.</td> 537 + <td><code>'1.0.0'</code></td> 538 + </tr> 539 + <tr class="odd"> 540 + <td>rkey</td> 541 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 542 + <td>Optional explicit record key.</td> 543 + <td><code>None</code></td> 544 + </tr> 545 + </tbody> 546 + </table> 486 547 </section> 487 - <section id="atdata.atmosphere.DatasetPublisher.publish_with_urls" class="level3"> 488 - <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetPublisher.publish_with_urls">publish_with_urls</h3> 489 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish_with_urls(</span> 490 - <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> urls,</span> 548 + <section id="returns" class="level4 doc-section doc-section-returns"> 549 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 550 + <table class="caption-top table"> 551 + <thead> 552 + <tr class="header"> 553 + <th>Name</th> 554 + <th>Type</th> 555 + <th>Description</th> 556 + </tr> 557 + </thead> 558 + <tbody> 559 + <tr class="odd"> 560 + <td></td> 561 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 562 + <td>The AT URI of the created dataset record.</td> 563 + </tr> 564 + </tbody> 565 + </table> 566 + </section> 567 + <section id="raises" class="level4 doc-section doc-section-raises"> 568 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 569 + <table class="caption-top table"> 570 + <thead> 571 + <tr class="header"> 572 + <th>Name</th> 573 + <th>Type</th> 574 + <th>Description</th> 575 + </tr> 576 + </thead> 577 + <tbody> 578 + <tr class="odd"> 579 + <td></td> 580 + <td><a href="`ValueError`">ValueError</a></td> 581 + <td>If schema_uri is not provided and auto_publish_schema is False.</td> 582 + </tr> 583 + </tbody> 584 + </table> 585 + </section> 586 + </section> 587 + <section id="atdata.atmosphere.DatasetPublisher.publish_with_blobs" class="level3"> 588 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetPublisher.publish_with_blobs">publish_with_blobs</h3> 589 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish_with_blobs(</span> 590 + <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> blobs,</span> 491 591 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> schema_uri,</span> 492 592 <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 493 593 <span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> name,</span> ··· 495 595 <span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span><span class="va">None</span>,</span> 496 596 <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="va">None</span>,</span> 497 597 <span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span><span class="va">None</span>,</span> 498 - <span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 499 - <span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 598 + <span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> mime_type<span class="op">=</span><span class="st">'application/x-tar'</span>,</span> 599 + <span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 600 + <span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 601 + <p>Publish a dataset with data stored as ATProto blobs.</p> 602 + <p>This method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).</p> 603 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 604 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 605 + <table class="caption-top table"> 606 + <thead> 607 + <tr class="header"> 608 + <th>Name</th> 609 + <th>Type</th> 610 + <th>Description</th> 611 + <th>Default</th> 612 + </tr> 613 + </thead> 614 + <tbody> 615 + <tr class="odd"> 616 + <td>blobs</td> 617 + <td><a href="`list`">list</a>[<a href="`bytes`">bytes</a>]</td> 618 + <td>List of binary data (e.g., tar shards) to upload as blobs.</td> 619 + <td><em>required</em></td> 620 + </tr> 621 + <tr class="even"> 622 + <td>schema_uri</td> 623 + <td><a href="`str`">str</a></td> 624 + <td>AT URI of the schema record.</td> 625 + <td><em>required</em></td> 626 + </tr> 627 + <tr class="odd"> 628 + <td>name</td> 629 + <td><a href="`str`">str</a></td> 630 + <td>Human-readable dataset name.</td> 631 + <td><em>required</em></td> 632 + </tr> 633 + <tr class="even"> 634 + <td>description</td> 635 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 636 + <td>Human-readable description.</td> 637 + <td><code>None</code></td> 638 + </tr> 639 + <tr class="odd"> 640 + <td>tags</td> 641 + <td><a href="`typing.Optional`">Optional</a>[<a href="`list`">list</a>[<a href="`str`">str</a>]]</td> 642 + <td>Searchable tags for discovery.</td> 643 + <td><code>None</code></td> 644 + </tr> 645 + <tr class="even"> 646 + <td>license</td> 647 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 648 + <td>SPDX license identifier.</td> 649 + <td><code>None</code></td> 650 + </tr> 651 + <tr class="odd"> 652 + <td>metadata</td> 653 + <td><a href="`typing.Optional`">Optional</a>[<a href="`dict`">dict</a>]</td> 654 + <td>Arbitrary metadata dictionary.</td> 655 + <td><code>None</code></td> 656 + </tr> 657 + <tr class="even"> 658 + <td>mime_type</td> 659 + <td><a href="`str`">str</a></td> 660 + <td>MIME type for the blobs (default: application/x-tar).</td> 661 + <td><code>'application/x-tar'</code></td> 662 + </tr> 663 + <tr class="odd"> 664 + <td>rkey</td> 665 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 666 + <td>Optional explicit record key.</td> 667 + <td><code>None</code></td> 668 + </tr> 669 + </tbody> 670 + </table> 671 + </section> 672 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 673 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 674 + <table class="caption-top table"> 675 + <thead> 676 + <tr class="header"> 677 + <th>Name</th> 678 + <th>Type</th> 679 + <th>Description</th> 680 + </tr> 681 + </thead> 682 + <tbody> 683 + <tr class="odd"> 684 + <td></td> 685 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 686 + <td>The AT URI of the created dataset record.</td> 687 + </tr> 688 + </tbody> 689 + </table> 690 + </section> 691 + <section id="note" class="level4 doc-section doc-section-note"> 692 + <h4 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h4> 693 + <p>Blobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.</p> 694 + </section> 695 + </section> 696 + <section id="atdata.atmosphere.DatasetPublisher.publish_with_urls" class="level3"> 697 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.DatasetPublisher.publish_with_urls">publish_with_urls</h3> 698 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.DatasetPublisher.publish_with_urls(</span> 699 + <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> urls,</span> 700 + <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> schema_uri,</span> 701 + <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 702 + <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> name,</span> 703 + <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 704 + <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span><span class="va">None</span>,</span> 705 + <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="va">None</span>,</span> 706 + <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span><span class="va">None</span>,</span> 707 + <span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 708 + <span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 500 709 <p>Publish a dataset record with explicit URLs.</p> 501 710 <p>This method allows publishing a dataset record without having a Dataset object, useful for registering existing WebDataset files.</p> 502 - <p>Args: urls: List of WebDataset URLs with brace notation. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key.</p> 503 - <p>Returns: The AT URI of the created dataset record.</p> 711 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 712 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 713 + <table class="caption-top table"> 714 + <thead> 715 + <tr class="header"> 716 + <th>Name</th> 717 + <th>Type</th> 718 + <th>Description</th> 719 + <th>Default</th> 720 + </tr> 721 + </thead> 722 + <tbody> 723 + <tr class="odd"> 724 + <td>urls</td> 725 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 726 + <td>List of WebDataset URLs with brace notation.</td> 727 + <td><em>required</em></td> 728 + </tr> 729 + <tr class="even"> 730 + <td>schema_uri</td> 731 + <td><a href="`str`">str</a></td> 732 + <td>AT URI of the schema record.</td> 733 + <td><em>required</em></td> 734 + </tr> 735 + <tr class="odd"> 736 + <td>name</td> 737 + <td><a href="`str`">str</a></td> 738 + <td>Human-readable dataset name.</td> 739 + <td><em>required</em></td> 740 + </tr> 741 + <tr class="even"> 742 + <td>description</td> 743 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 744 + <td>Human-readable description.</td> 745 + <td><code>None</code></td> 746 + </tr> 747 + <tr class="odd"> 748 + <td>tags</td> 749 + <td><a href="`typing.Optional`">Optional</a>[<a href="`list`">list</a>[<a href="`str`">str</a>]]</td> 750 + <td>Searchable tags for discovery.</td> 751 + <td><code>None</code></td> 752 + </tr> 753 + <tr class="even"> 754 + <td>license</td> 755 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 756 + <td>SPDX license identifier.</td> 757 + <td><code>None</code></td> 758 + </tr> 759 + <tr class="odd"> 760 + <td>metadata</td> 761 + <td><a href="`typing.Optional`">Optional</a>[<a href="`dict`">dict</a>]</td> 762 + <td>Arbitrary metadata dictionary.</td> 763 + <td><code>None</code></td> 764 + </tr> 765 + <tr class="even"> 766 + <td>rkey</td> 767 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 768 + <td>Optional explicit record key.</td> 769 + <td><code>None</code></td> 770 + </tr> 771 + </tbody> 772 + </table> 773 + </section> 774 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 775 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 776 + <table class="caption-top table"> 777 + <thead> 778 + <tr class="header"> 779 + <th>Name</th> 780 + <th>Type</th> 781 + <th>Description</th> 782 + </tr> 783 + </thead> 784 + <tbody> 785 + <tr class="odd"> 786 + <td></td> 787 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 788 + <td>The AT URI of the created dataset record.</td> 789 + </tr> 790 + </tbody> 791 + </table> 504 792 505 793 794 + </section> 506 795 </section> 507 796 </section> 508 797 </section>

+151 -15

docs/api/DictSample.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.DictSample" id="toc-atdata.DictSample" class="nav-link active" data-scroll-target="#atdata.DictSample">DictSample</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 402 + <li><a href="#note" id="toc-note" class="nav-link" data-scroll-target="#note">Note</a></li> 401 403 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 404 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 405 <ul class="collapse"> ··· 427 429 <p>This class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (<code>sample.field</code>) and dict-style (<code>sample["field"]</code>) access to fields.</p> 428 430 <p><code>DictSample</code> is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema</p> 429 431 <p>To convert to a typed schema, use <code>Dataset.as_type()</code> with a <code>@packable</code>-decorated class. Every <code>@packable</code> class automatically registers a lens from <code>DictSample</code>, making this conversion seamless.</p> 430 - <p>Example: >>> ds = load_dataset(“path/to/data.tar”) # Returns Dataset<a href="#atdata.DictSample">DictSample</a> >>> for sample in ds.ordered(): … print(sample.some_field) # Attribute access … print(sample[“other_field”]) # Dict access … print(sample.keys()) # Inspect available fields … >>> # Convert to typed schema >>> typed_ds = ds.as_type(MyTypedSample)</p> 431 - <p>Note: NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.</p> 432 + <section id="example" class="level2 doc-section doc-section-example"> 433 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 434 + <p>::</p> 435 + <pre><code>>>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample] 436 + >>> for sample in ds.ordered(): 437 + ... print(sample.some_field) # Attribute access 438 + ... print(sample["other_field"]) # Dict access 439 + ... print(sample.keys()) # Inspect available fields 440 + ... 441 + >>> # Convert to typed schema 442 + >>> typed_ds = ds.as_type(MyTypedSample)</code></pre> 443 + </section> 444 + <section id="note" class="level2 doc-section doc-section-note"> 445 + <h2 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h2> 446 + <p>NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.</p> 447 + </section> 432 448 <section id="attributes" class="level2"> 433 449 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 434 450 <table class="caption-top table"> ··· 492 508 </table> 493 509 <section id="atdata.DictSample.from_bytes" class="level3"> 494 510 <h3 class="anchored" data-anchor-id="atdata.DictSample.from_bytes">from_bytes</h3> 495 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>DictSample.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 511 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DictSample.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 496 512 <p>Create a DictSample from raw msgpack bytes.</p> 497 - <p>Args: bs: Raw bytes from a msgpack-serialized sample.</p> 498 - <p>Returns: New DictSample instance with the unpacked data.</p> 513 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 514 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 515 + <table class="caption-top table"> 516 + <thead> 517 + <tr class="header"> 518 + <th>Name</th> 519 + <th>Type</th> 520 + <th>Description</th> 521 + <th>Default</th> 522 + </tr> 523 + </thead> 524 + <tbody> 525 + <tr class="odd"> 526 + <td>bs</td> 527 + <td><a href="`bytes`">bytes</a></td> 528 + <td>Raw bytes from a msgpack-serialized sample.</td> 529 + <td><em>required</em></td> 530 + </tr> 531 + </tbody> 532 + </table> 533 + </section> 534 + <section id="returns" class="level4 doc-section doc-section-returns"> 535 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 536 + <table class="caption-top table"> 537 + <thead> 538 + <tr class="header"> 539 + <th>Name</th> 540 + <th>Type</th> 541 + <th>Description</th> 542 + </tr> 543 + </thead> 544 + <tbody> 545 + <tr class="odd"> 546 + <td></td> 547 + <td><a href="`atdata.dataset.DictSample`">DictSample</a></td> 548 + <td>New DictSample instance with the unpacked data.</td> 549 + </tr> 550 + </tbody> 551 + </table> 552 + </section> 499 553 </section> 500 554 <section id="atdata.DictSample.from_data" class="level3"> 501 555 <h3 class="anchored" data-anchor-id="atdata.DictSample.from_data">from_data</h3> 502 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>DictSample.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 556 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>DictSample.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 503 557 <p>Create a DictSample from unpacked msgpack data.</p> 504 - <p>Args: data: Dictionary with field names as keys.</p> 505 - <p>Returns: New DictSample instance wrapping the data.</p> 558 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 559 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 560 + <table class="caption-top table"> 561 + <thead> 562 + <tr class="header"> 563 + <th>Name</th> 564 + <th>Type</th> 565 + <th>Description</th> 566 + <th>Default</th> 567 + </tr> 568 + </thead> 569 + <tbody> 570 + <tr class="odd"> 571 + <td>data</td> 572 + <td><a href="`dict`">dict</a>[<a href="`str`">str</a>, <a href="`typing.Any`">Any</a>]</td> 573 + <td>Dictionary with field names as keys.</td> 574 + <td><em>required</em></td> 575 + </tr> 576 + </tbody> 577 + </table> 578 + </section> 579 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 580 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 581 + <table class="caption-top table"> 582 + <thead> 583 + <tr class="header"> 584 + <th>Name</th> 585 + <th>Type</th> 586 + <th>Description</th> 587 + </tr> 588 + </thead> 589 + <tbody> 590 + <tr class="odd"> 591 + <td></td> 592 + <td><a href="`atdata.dataset.DictSample`">DictSample</a></td> 593 + <td>New DictSample instance wrapping the data.</td> 594 + </tr> 595 + </tbody> 596 + </table> 597 + </section> 506 598 </section> 507 599 <section id="atdata.DictSample.get" class="level3"> 508 600 <h3 class="anchored" data-anchor-id="atdata.DictSample.get">get</h3> 509 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>DictSample.get(key, default<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 601 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>DictSample.get(key, default<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 510 602 <p>Get a field value with optional default.</p> 511 - <p>Args: key: Field name to access. default: Value to return if field doesn’t exist.</p> 512 - <p>Returns: The field value or default.</p> 603 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 604 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 605 + <table class="caption-top table"> 606 + <thead> 607 + <tr class="header"> 608 + <th>Name</th> 609 + <th>Type</th> 610 + <th>Description</th> 611 + <th>Default</th> 612 + </tr> 613 + </thead> 614 + <tbody> 615 + <tr class="odd"> 616 + <td>key</td> 617 + <td><a href="`str`">str</a></td> 618 + <td>Field name to access.</td> 619 + <td><em>required</em></td> 620 + </tr> 621 + <tr class="even"> 622 + <td>default</td> 623 + <td><a href="`typing.Any`">Any</a></td> 624 + <td>Value to return if field doesn’t exist.</td> 625 + <td><code>None</code></td> 626 + </tr> 627 + </tbody> 628 + </table> 629 + </section> 630 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 631 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 632 + <table class="caption-top table"> 633 + <thead> 634 + <tr class="header"> 635 + <th>Name</th> 636 + <th>Type</th> 637 + <th>Description</th> 638 + </tr> 639 + </thead> 640 + <tbody> 641 + <tr class="odd"> 642 + <td></td> 643 + <td><a href="`typing.Any`">Any</a></td> 644 + <td>The field value or default.</td> 645 + </tr> 646 + </tbody> 647 + </table> 648 + </section> 513 649 </section> 514 650 <section id="atdata.DictSample.items" class="level3"> 515 651 <h3 class="anchored" data-anchor-id="atdata.DictSample.items">items</h3> 516 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>DictSample.items()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 652 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>DictSample.items()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 517 653 <p>Return list of (field_name, value) tuples.</p> 518 654 </section> 519 655 <section id="atdata.DictSample.keys" class="level3"> 520 656 <h3 class="anchored" data-anchor-id="atdata.DictSample.keys">keys</h3> 521 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>DictSample.keys()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 657 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>DictSample.keys()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 522 658 <p>Return list of field names.</p> 523 659 </section> 524 660 <section id="atdata.DictSample.to_dict" class="level3"> 525 661 <h3 class="anchored" data-anchor-id="atdata.DictSample.to_dict">to_dict</h3> 526 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>DictSample.to_dict()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 662 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>DictSample.to_dict()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 527 663 <p>Return a copy of the underlying data dictionary.</p> 528 664 </section> 529 665 <section id="atdata.DictSample.values" class="level3"> 530 666 <h3 class="anchored" data-anchor-id="atdata.DictSample.values">values</h3> 531 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>DictSample.values()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 667 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>DictSample.values()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 532 668 <p>Return list of field values.</p> 533 669 534 670

+5 -1

docs/api/IndexEntry.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.IndexEntry" id="toc-atdata.IndexEntry" class="nav-link active" data-scroll-target="#atdata.IndexEntry">IndexEntry</a> 400 400 <ul class="collapse"> 401 + <li><a href="#properties" id="toc-properties" class="nav-link" data-scroll-target="#properties">Properties</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 </ul></li> 403 404 </ul> ··· 415 416 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>IndexEntry()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 416 417 <p>Common interface for index entries (local or atmosphere).</p> 417 418 <p>Both LocalDatasetEntry and atmosphere DatasetRecord-based entries should satisfy this protocol, enabling code that works with either.</p> 418 - <p>Properties: name: Human-readable dataset name schema_ref: Reference to schema (local:// path or AT URI) data_urls: WebDataset URLs for the data metadata: Arbitrary metadata dict, or None</p> 419 + <section id="properties" class="level2 doc-section doc-section-properties"> 420 + <h2 class="doc-section doc-section-properties anchored" data-anchor-id="properties">Properties</h2> 421 + <p>name: Human-readable dataset name schema_ref: Reference to schema (local:// path or AT URI) data_urls: WebDataset URLs for the data metadata: Arbitrary metadata dict, or None</p> 422 + </section> 419 423 <section id="attributes" class="level2"> 420 424 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 421 425 <table class="caption-top table">

+385 -28

docs/api/Lens.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.lens" id="toc-atdata.lens" class="nav-link active" data-scroll-target="#atdata.lens">lens</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#classes" id="toc-classes" class="nav-link" data-scroll-target="#classes">Classes</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.lens.Lens" id="toc-atdata.lens.Lens" class="nav-link" data-scroll-target="#atdata.lens.Lens">Lens</a></li> ··· 430 431 <li><code>@lens</code>: Decorator to create and register lens transformations</li> 431 432 </ul> 432 433 <p>Lenses support the functional programming concept of composable, well-behaved transformations that satisfy lens laws (GetPut and PutGet).</p> 433 - <p>Example: >>> <span class="citation" data-cites="packable">@packable</span> … class FullData: … name: str … age: int … embedding: NDArray … >>> <span class="citation" data-cites="packable">@packable</span> … class NameOnly: … name: str … >>> <span class="citation" data-cites="lens">@lens</span> … def name_view(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> <span class="citation" data-cites="name_view.putter">@name_view.putter</span> … def name_view_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age, … embedding=source.embedding) … >>> ds = Dataset<a href=""data.tar"">FullData</a> >>> ds_names = ds.as_type(NameOnly) # Uses registered lens</p> 434 + <section id="example" class="level2 doc-section doc-section-example"> 435 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 436 + <p>::</p> 437 + <pre><code>>>> @packable 438 + ... class FullData: 439 + ... name: str 440 + ... age: int 441 + ... embedding: NDArray 442 + ... 443 + >>> @packable 444 + ... class NameOnly: 445 + ... name: str 446 + ... 447 + >>> @lens 448 + ... def name_view(full: FullData) -> NameOnly: 449 + ... return NameOnly(name=full.name) 450 + ... 451 + >>> @name_view.putter 452 + ... def name_view_put(view: NameOnly, source: FullData) -> FullData: 453 + ... return FullData(name=view.name, age=source.age, 454 + ... embedding=source.embedding) 455 + ... 456 + >>> ds = Dataset[FullData]("data.tar") 457 + >>> ds_names = ds.as_type(NameOnly) # Uses registered lens</code></pre> 458 + </section> 434 459 <section id="classes" class="level2"> 435 460 <h2 class="anchored" data-anchor-id="classes">Classes</h2> 436 461 <table class="caption-top table"> ··· 453 478 </table> 454 479 <section id="atdata.lens.Lens" class="level3"> 455 480 <h3 class="anchored" data-anchor-id="atdata.lens.Lens">Lens</h3> 456 - <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>lens.Lens(get, put<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 481 + <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>lens.Lens(get, put<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 457 482 <p>A bidirectional transformation between two sample types.</p> 458 483 <p>A lens provides a way to view and update data of type <code>S</code> (source) as if it were type <code>V</code> (view). It consists of a getter that transforms <code>S -> V</code> and an optional putter that transforms <code>(V, S) -> S</code>, enabling updates to the view to be reflected back in the source.</p> 459 - <p>Type Parameters: S: The source type, must derive from <code>PackableSample</code>. V: The view type, must derive from <code>PackableSample</code>.</p> 460 - <p>Example: >>> <span class="citation" data-cites="lens">@lens</span> … def name_lens(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> <span class="citation" data-cites="name_lens.putter">@name_lens.putter</span> … def name_lens_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)</p> 484 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 485 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 486 + <table class="caption-top table"> 487 + <colgroup> 488 + <col style="width: 9%"> 489 + <col style="width: 9%"> 490 + <col style="width: 66%"> 491 + <col style="width: 14%"> 492 + </colgroup> 493 + <thead> 494 + <tr class="header"> 495 + <th>Name</th> 496 + <th>Type</th> 497 + <th>Description</th> 498 + <th>Default</th> 499 + </tr> 500 + </thead> 501 + <tbody> 502 + <tr class="odd"> 503 + <td>S</td> 504 + <td></td> 505 + <td>The source type, must derive from <code>PackableSample</code>.</td> 506 + <td><em>required</em></td> 507 + </tr> 508 + <tr class="even"> 509 + <td>V</td> 510 + <td></td> 511 + <td>The view type, must derive from <code>PackableSample</code>.</td> 512 + <td><em>required</em></td> 513 + </tr> 514 + </tbody> 515 + </table> 516 + </section> 517 + <section id="example-1" class="level4 doc-section doc-section-example"> 518 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 519 + <p>::</p> 520 + <pre><code>>>> @lens 521 + ... def name_lens(full: FullData) -> NameOnly: 522 + ... return NameOnly(name=full.name) 523 + ... 524 + >>> @name_lens.putter 525 + ... def name_lens_put(view: NameOnly, source: FullData) -> FullData: 526 + ... return FullData(name=view.name, age=source.age)</code></pre> 527 + </section> 461 528 <section id="methods" class="level4"> 462 529 <h4 class="anchored" data-anchor-id="methods">Methods</h4> 463 530 <table class="caption-top table"> ··· 484 551 </table> 485 552 <section id="atdata.lens.Lens.get" class="level5"> 486 553 <h5 class="anchored" data-anchor-id="atdata.lens.Lens.get">get</h5> 487 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.get(s)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 554 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.get(s)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 488 555 <p>Transform the source into the view type.</p> 489 - <p>Args: s: The source sample of type <code>S</code>.</p> 490 - <p>Returns: A view of the source as type <code>V</code>.</p> 556 + <section id="parameters-1" class="level6 doc-section doc-section-parameters"> 557 + <h6 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h6> 558 + <table class="caption-top table"> 559 + <thead> 560 + <tr class="header"> 561 + <th>Name</th> 562 + <th>Type</th> 563 + <th>Description</th> 564 + <th>Default</th> 565 + </tr> 566 + </thead> 567 + <tbody> 568 + <tr class="odd"> 569 + <td>s</td> 570 + <td><a href="`atdata.lens.S`">S</a></td> 571 + <td>The source sample of type <code>S</code>.</td> 572 + <td><em>required</em></td> 573 + </tr> 574 + </tbody> 575 + </table> 576 + </section> 577 + <section id="returns" class="level6 doc-section doc-section-returns"> 578 + <h6 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h6> 579 + <table class="caption-top table"> 580 + <thead> 581 + <tr class="header"> 582 + <th>Name</th> 583 + <th>Type</th> 584 + <th>Description</th> 585 + </tr> 586 + </thead> 587 + <tbody> 588 + <tr class="odd"> 589 + <td></td> 590 + <td><a href="`atdata.lens.V`">V</a></td> 591 + <td>A view of the source as type <code>V</code>.</td> 592 + </tr> 593 + </tbody> 594 + </table> 595 + </section> 491 596 </section> 492 597 <section id="atdata.lens.Lens.put" class="level5"> 493 598 <h5 class="anchored" data-anchor-id="atdata.lens.Lens.put">put</h5> 494 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.put(v, s)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 599 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.put(v, s)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 495 600 <p>Update the source based on a modified view.</p> 496 - <p>Args: v: The modified view of type <code>V</code>. s: The original source of type <code>S</code>.</p> 497 - <p>Returns: An updated source of type <code>S</code> that reflects changes from the view.</p> 601 + <section id="parameters-2" class="level6 doc-section doc-section-parameters"> 602 + <h6 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h6> 603 + <table class="caption-top table"> 604 + <thead> 605 + <tr class="header"> 606 + <th>Name</th> 607 + <th>Type</th> 608 + <th>Description</th> 609 + <th>Default</th> 610 + </tr> 611 + </thead> 612 + <tbody> 613 + <tr class="odd"> 614 + <td>v</td> 615 + <td><a href="`atdata.lens.V`">V</a></td> 616 + <td>The modified view of type <code>V</code>.</td> 617 + <td><em>required</em></td> 618 + </tr> 619 + <tr class="even"> 620 + <td>s</td> 621 + <td><a href="`atdata.lens.S`">S</a></td> 622 + <td>The original source of type <code>S</code>.</td> 623 + <td><em>required</em></td> 624 + </tr> 625 + </tbody> 626 + </table> 627 + </section> 628 + <section id="returns-1" class="level6 doc-section doc-section-returns"> 629 + <h6 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h6> 630 + <table class="caption-top table"> 631 + <thead> 632 + <tr class="header"> 633 + <th>Name</th> 634 + <th>Type</th> 635 + <th>Description</th> 636 + </tr> 637 + </thead> 638 + <tbody> 639 + <tr class="odd"> 640 + <td></td> 641 + <td><a href="`atdata.lens.S`">S</a></td> 642 + <td>An updated source of type <code>S</code> that reflects changes from the view.</td> 643 + </tr> 644 + </tbody> 645 + </table> 646 + </section> 498 647 </section> 499 648 <section id="atdata.lens.Lens.putter" class="level5"> 500 649 <h5 class="anchored" data-anchor-id="atdata.lens.Lens.putter">putter</h5> 501 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.putter(put)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 650 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>lens.Lens.putter(put)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 502 651 <p>Decorator to register a putter function for this lens.</p> 503 - <p>Args: put: A function that takes a view of type <code>V</code> and source of type <code>S</code>, and returns an updated source of type <code>S</code>.</p> 504 - <p>Returns: The putter function, allowing this to be used as a decorator.</p> 505 - <p>Example: >>> <span class="citation" data-cites="my_lens.putter">@my_lens.putter</span> … def my_lens_put(view: ViewType, source: SourceType) -> SourceType: … return SourceType(…)</p> 652 + <section id="parameters-3" class="level6 doc-section doc-section-parameters"> 653 + <h6 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h6> 654 + <table class="caption-top table"> 655 + <thead> 656 + <tr class="header"> 657 + <th>Name</th> 658 + <th>Type</th> 659 + <th>Description</th> 660 + <th>Default</th> 661 + </tr> 662 + </thead> 663 + <tbody> 664 + <tr class="odd"> 665 + <td>put</td> 666 + <td><a href="`atdata.lens.LensPutter`">LensPutter</a>[<a href="`atdata.lens.S`">S</a>, <a href="`atdata.lens.V`">V</a>]</td> 667 + <td>A function that takes a view of type <code>V</code> and source of type <code>S</code>, and returns an updated source of type <code>S</code>.</td> 668 + <td><em>required</em></td> 669 + </tr> 670 + </tbody> 671 + </table> 672 + </section> 673 + <section id="returns-2" class="level6 doc-section doc-section-returns"> 674 + <h6 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h6> 675 + <table class="caption-top table"> 676 + <thead> 677 + <tr class="header"> 678 + <th>Name</th> 679 + <th>Type</th> 680 + <th>Description</th> 681 + </tr> 682 + </thead> 683 + <tbody> 684 + <tr class="odd"> 685 + <td></td> 686 + <td><a href="`atdata.lens.LensPutter`">LensPutter</a>[<a href="`atdata.lens.S`">S</a>, <a href="`atdata.lens.V`">V</a>]</td> 687 + <td>The putter function, allowing this to be used as a decorator.</td> 688 + </tr> 689 + </tbody> 690 + </table> 691 + </section> 692 + <section id="example-2" class="level6 doc-section doc-section-example"> 693 + <h6 class="doc-section doc-section-example anchored" data-anchor-id="example-2">Example</h6> 694 + <p>::</p> 695 + <pre><code>>>> @my_lens.putter 696 + ... def my_lens_put(view: ViewType, source: SourceType) -> SourceType: 697 + ... return SourceType(...)</code></pre> 506 698 </section> 507 699 </section> 508 700 </section> 701 + </section> 509 702 <section id="atdata.lens.LensNetwork" class="level3"> 510 703 <h3 class="anchored" data-anchor-id="atdata.lens.LensNetwork">LensNetwork</h3> 511 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 704 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 512 705 <p>Global registry for lens transformations between sample types.</p> 513 706 <p>This class implements a singleton pattern to maintain a global registry of all lenses decorated with <code>@lens</code>. It enables looking up transformations between different <code>PackableSample</code> types.</p> 514 - <p>Attributes: _instance: The singleton instance of this class. _registry: Dictionary mapping <code>(source_type, view_type)</code> tuples to their corresponding <code>Lens</code> objects.</p> 707 + <section id="attributes" class="level4 doc-section doc-section-attributes"> 708 + <h4 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h4> 709 + <table class="caption-top table"> 710 + <thead> 711 + <tr class="header"> 712 + <th>Name</th> 713 + <th>Type</th> 714 + <th>Description</th> 715 + </tr> 716 + </thead> 717 + <tbody> 718 + <tr class="odd"> 719 + <td>_instance</td> 720 + <td></td> 721 + <td>The singleton instance of this class.</td> 722 + </tr> 723 + <tr class="even"> 724 + <td>_registry</td> 725 + <td><a href="`typing.Dict`">Dict</a>[<a href="`atdata.lens.LensSignature`">LensSignature</a>, <a href="`atdata.lens.Lens`">Lens</a>]</td> 726 + <td>Dictionary mapping <code>(source_type, view_type)</code> tuples to their corresponding <code>Lens</code> objects.</td> 727 + </tr> 728 + </tbody> 729 + </table> 730 + </section> 515 731 <section id="methods-1" class="level4"> 516 732 <h4 class="anchored" data-anchor-id="methods-1">Methods</h4> 517 733 <table class="caption-top table"> ··· 534 750 </table> 535 751 <section id="atdata.lens.LensNetwork.register" class="level5"> 536 752 <h5 class="anchored" data-anchor-id="atdata.lens.LensNetwork.register">register</h5> 537 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork.register(_lens)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 753 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork.register(_lens)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 538 754 <p>Register a lens as the canonical transformation between two types.</p> 539 - <p>Args: _lens: The lens to register. Will be stored in the registry under the key <code>(_lens.source_type, _lens.view_type)</code>.</p> 540 - <p>Note: If a lens already exists for the same type pair, it will be overwritten.</p> 755 + <section id="parameters-4" class="level6 doc-section doc-section-parameters"> 756 + <h6 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h6> 757 + <table class="caption-top table"> 758 + <thead> 759 + <tr class="header"> 760 + <th>Name</th> 761 + <th>Type</th> 762 + <th>Description</th> 763 + <th>Default</th> 764 + </tr> 765 + </thead> 766 + <tbody> 767 + <tr class="odd"> 768 + <td>_lens</td> 769 + <td><a href="`atdata.lens.Lens`">Lens</a></td> 770 + <td>The lens to register. Will be stored in the registry under the key <code>(_lens.source_type, _lens.view_type)</code>.</td> 771 + <td><em>required</em></td> 772 + </tr> 773 + </tbody> 774 + </table> 775 + </section> 776 + <section id="note" class="level6 doc-section doc-section-note"> 777 + <h6 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h6> 778 + <p>If a lens already exists for the same type pair, it will be overwritten.</p> 779 + </section> 541 780 </section> 542 781 <section id="atdata.lens.LensNetwork.transform" class="level5"> 543 782 <h5 class="anchored" data-anchor-id="atdata.lens.LensNetwork.transform">transform</h5> 544 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork.transform(source, view)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 783 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>lens.LensNetwork.transform(source, view)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 545 784 <p>Look up the lens transformation between two sample types.</p> 546 - <p>Args: source: The source sample type (must derive from <code>PackableSample</code>). view: The target view type (must derive from <code>PackableSample</code>).</p> 547 - <p>Returns: The registered <code>Lens</code> that transforms from <code>source</code> to <code>view</code>.</p> 548 - <p>Raises: ValueError: If no lens has been registered for the given type pair.</p> 549 - <p>Note: Currently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented.</p> 785 + <section id="parameters-5" class="level6 doc-section doc-section-parameters"> 786 + <h6 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h6> 787 + <table class="caption-top table"> 788 + <thead> 789 + <tr class="header"> 790 + <th>Name</th> 791 + <th>Type</th> 792 + <th>Description</th> 793 + <th>Default</th> 794 + </tr> 795 + </thead> 796 + <tbody> 797 + <tr class="odd"> 798 + <td>source</td> 799 + <td><a href="`atdata.lens.DatasetType`">DatasetType</a></td> 800 + <td>The source sample type (must derive from <code>PackableSample</code>).</td> 801 + <td><em>required</em></td> 802 + </tr> 803 + <tr class="even"> 804 + <td>view</td> 805 + <td><a href="`atdata.lens.DatasetType`">DatasetType</a></td> 806 + <td>The target view type (must derive from <code>PackableSample</code>).</td> 807 + <td><em>required</em></td> 808 + </tr> 809 + </tbody> 810 + </table> 811 + </section> 812 + <section id="returns-3" class="level6 doc-section doc-section-returns"> 813 + <h6 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h6> 814 + <table class="caption-top table"> 815 + <thead> 816 + <tr class="header"> 817 + <th>Name</th> 818 + <th>Type</th> 819 + <th>Description</th> 820 + </tr> 821 + </thead> 822 + <tbody> 823 + <tr class="odd"> 824 + <td></td> 825 + <td><a href="`atdata.lens.Lens`">Lens</a></td> 826 + <td>The registered <code>Lens</code> that transforms from <code>source</code> to <code>view</code>.</td> 827 + </tr> 828 + </tbody> 829 + </table> 830 + </section> 831 + <section id="raises" class="level6 doc-section doc-section-raises"> 832 + <h6 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h6> 833 + <table class="caption-top table"> 834 + <thead> 835 + <tr class="header"> 836 + <th>Name</th> 837 + <th>Type</th> 838 + <th>Description</th> 839 + </tr> 840 + </thead> 841 + <tbody> 842 + <tr class="odd"> 843 + <td></td> 844 + <td><a href="`ValueError`">ValueError</a></td> 845 + <td>If no lens has been registered for the given type pair.</td> 846 + </tr> 847 + </tbody> 848 + </table> 849 + </section> 850 + <section id="note-1" class="level6 doc-section doc-section-note"> 851 + <h6 class="doc-section doc-section-note anchored" data-anchor-id="note-1">Note</h6> 852 + <p>Currently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented.</p> 853 + </section> 550 854 </section> 551 855 </section> 552 856 </section> ··· 569 873 </table> 570 874 <section id="atdata.lens.lens" class="level3"> 571 875 <h3 class="anchored" data-anchor-id="atdata.lens.lens">lens</h3> 572 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>lens.lens(f)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 876 + <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>lens.lens(f)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 573 877 <p>Decorator to create and register a lens transformation.</p> 574 878 <p>This decorator converts a getter function into a <code>Lens</code> object and automatically registers it in the global <code>LensNetwork</code> registry.</p> 575 - <p>Args: f: A getter function that transforms from source type <code>S</code> to view type <code>V</code>. Must have exactly one parameter with a type annotation.</p> 576 - <p>Returns: A <code>Lens[S, V]</code> object that can be called to apply the transformation or decorated with <code>@lens_name.putter</code> to add a putter function.</p> 577 - <p>Example: >>> <span class="citation" data-cites="lens">@lens</span> … def extract_name(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> <span class="citation" data-cites="extract_name.putter">@extract_name.putter</span> … def extract_name_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)</p> 879 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 880 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 881 + <table class="caption-top table"> 882 + <thead> 883 + <tr class="header"> 884 + <th>Name</th> 885 + <th>Type</th> 886 + <th>Description</th> 887 + <th>Default</th> 888 + </tr> 889 + </thead> 890 + <tbody> 891 + <tr class="odd"> 892 + <td>f</td> 893 + <td><a href="`atdata.lens.LensGetter`">LensGetter</a>[<a href="`atdata.lens.S`">S</a>, <a href="`atdata.lens.V`">V</a>]</td> 894 + <td>A getter function that transforms from source type <code>S</code> to view type <code>V</code>. Must have exactly one parameter with a type annotation.</td> 895 + <td><em>required</em></td> 896 + </tr> 897 + </tbody> 898 + </table> 899 + </section> 900 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 901 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 902 + <table class="caption-top table"> 903 + <thead> 904 + <tr class="header"> 905 + <th>Name</th> 906 + <th>Type</th> 907 + <th>Description</th> 908 + </tr> 909 + </thead> 910 + <tbody> 911 + <tr class="odd"> 912 + <td></td> 913 + <td><a href="`atdata.lens.Lens`">Lens</a>[<a href="`atdata.lens.S`">S</a>, <a href="`atdata.lens.V`">V</a>]</td> 914 + <td>A <code>Lens[S, V]</code> object that can be called to apply the transformation</td> 915 + </tr> 916 + <tr class="even"> 917 + <td></td> 918 + <td><a href="`atdata.lens.Lens`">Lens</a>[<a href="`atdata.lens.S`">S</a>, <a href="`atdata.lens.V`">V</a>]</td> 919 + <td>or decorated with <code>@lens_name.putter</code> to add a putter function.</td> 920 + </tr> 921 + </tbody> 922 + </table> 923 + </section> 924 + <section id="example-3" class="level4 doc-section doc-section-example"> 925 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-3">Example</h4> 926 + <p>::</p> 927 + <pre><code>>>> @lens 928 + ... def extract_name(full: FullData) -> NameOnly: 929 + ... return NameOnly(name=full.name) 930 + ... 931 + >>> @extract_name.putter 932 + ... def extract_name_put(view: NameOnly, source: FullData) -> FullData: 933 + ... return FullData(name=view.name, age=source.age)</code></pre> 578 934 579 935 936 + </section> 580 937 </section> 581 938 </section> 582 939 </section>

+176 -15

docs/api/LensLoader.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.LensLoader" id="toc-atdata.atmosphere.LensLoader" class="nav-link active" data-scroll-target="#atdata.atmosphere.LensLoader">LensLoader</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.atmosphere.LensLoader.find_by_schemas" id="toc-atdata.atmosphere.LensLoader.find_by_schemas" class="nav-link" data-scroll-target="#atdata.atmosphere.LensLoader.find_by_schemas">find_by_schemas</a></li> ··· 420 421 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 421 422 <p>Loads lens records from ATProto.</p> 422 423 <p>This class fetches lens transformation records. Note that actually using a lens requires installing the referenced code and importing it manually.</p> 423 - <p>Example: >>> client = AtmosphereClient() >>> loader = LensLoader(client) >>> >>> record = loader.get(“at://did:plc:abc/ac.foundation.dataset.lens/xyz”) >>> print(record[“name”]) >>> print(record[“sourceSchema”]) >>> print(record.get(“getterCode”, {}).get(“repository”))</p> 424 + <section id="example" class="level2 doc-section doc-section-example"> 425 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 426 + <p>::</p> 427 + <pre><code>>>> client = AtmosphereClient() 428 + >>> loader = LensLoader(client) 429 + >>> 430 + >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.lens/xyz") 431 + >>> print(record["name"]) 432 + >>> print(record["sourceSchema"]) 433 + >>> print(record.get("getterCode", {}).get("repository"))</code></pre> 434 + </section> 424 435 <section id="methods" class="level2"> 425 436 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 426 437 <table class="caption-top table"> ··· 447 458 </table> 448 459 <section id="atdata.atmosphere.LensLoader.find_by_schemas" class="level3"> 449 460 <h3 class="anchored" data-anchor-id="atdata.atmosphere.LensLoader.find_by_schemas">find_by_schemas</h3> 450 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.find_by_schemas(</span> 451 - <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 452 - <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> target_schema_uri<span class="op">=</span><span class="va">None</span>,</span> 453 - <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> repo<span class="op">=</span><span class="va">None</span>,</span> 454 - <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 461 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.find_by_schemas(</span> 462 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 463 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> target_schema_uri<span class="op">=</span><span class="va">None</span>,</span> 464 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> repo<span class="op">=</span><span class="va">None</span>,</span> 465 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 455 466 <p>Find lenses that transform between specific schemas.</p> 456 - <p>Args: source_schema_uri: AT URI of the source schema. target_schema_uri: Optional AT URI of the target schema. If not provided, returns all lenses from the source. repo: The DID of the repository to search.</p> 457 - <p>Returns: List of matching lens records.</p> 467 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 468 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 469 + <table class="caption-top table"> 470 + <thead> 471 + <tr class="header"> 472 + <th>Name</th> 473 + <th>Type</th> 474 + <th>Description</th> 475 + <th>Default</th> 476 + </tr> 477 + </thead> 478 + <tbody> 479 + <tr class="odd"> 480 + <td>source_schema_uri</td> 481 + <td><a href="`str`">str</a></td> 482 + <td>AT URI of the source schema.</td> 483 + <td><em>required</em></td> 484 + </tr> 485 + <tr class="even"> 486 + <td>target_schema_uri</td> 487 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 488 + <td>Optional AT URI of the target schema. If not provided, returns all lenses from the source.</td> 489 + <td><code>None</code></td> 490 + </tr> 491 + <tr class="odd"> 492 + <td>repo</td> 493 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 494 + <td>The DID of the repository to search.</td> 495 + <td><code>None</code></td> 496 + </tr> 497 + </tbody> 498 + </table> 499 + </section> 500 + <section id="returns" class="level4 doc-section doc-section-returns"> 501 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 502 + <table class="caption-top table"> 503 + <thead> 504 + <tr class="header"> 505 + <th>Name</th> 506 + <th>Type</th> 507 + <th>Description</th> 508 + </tr> 509 + </thead> 510 + <tbody> 511 + <tr class="odd"> 512 + <td></td> 513 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 514 + <td>List of matching lens records.</td> 515 + </tr> 516 + </tbody> 517 + </table> 518 + </section> 458 519 </section> 459 520 <section id="atdata.atmosphere.LensLoader.get" class="level3"> 460 521 <h3 class="anchored" data-anchor-id="atdata.atmosphere.LensLoader.get">get</h3> 461 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 522 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 462 523 <p>Fetch a lens record by AT URI.</p> 463 - <p>Args: uri: The AT URI of the lens record.</p> 464 - <p>Returns: The lens record as a dictionary.</p> 465 - <p>Raises: ValueError: If the record is not a lens record.</p> 524 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 525 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 526 + <table class="caption-top table"> 527 + <thead> 528 + <tr class="header"> 529 + <th>Name</th> 530 + <th>Type</th> 531 + <th>Description</th> 532 + <th>Default</th> 533 + </tr> 534 + </thead> 535 + <tbody> 536 + <tr class="odd"> 537 + <td>uri</td> 538 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 539 + <td>The AT URI of the lens record.</td> 540 + <td><em>required</em></td> 541 + </tr> 542 + </tbody> 543 + </table> 544 + </section> 545 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 546 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 547 + <table class="caption-top table"> 548 + <thead> 549 + <tr class="header"> 550 + <th>Name</th> 551 + <th>Type</th> 552 + <th>Description</th> 553 + </tr> 554 + </thead> 555 + <tbody> 556 + <tr class="odd"> 557 + <td></td> 558 + <td><a href="`dict`">dict</a></td> 559 + <td>The lens record as a dictionary.</td> 560 + </tr> 561 + </tbody> 562 + </table> 563 + </section> 564 + <section id="raises" class="level4 doc-section doc-section-raises"> 565 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 566 + <table class="caption-top table"> 567 + <thead> 568 + <tr class="header"> 569 + <th>Name</th> 570 + <th>Type</th> 571 + <th>Description</th> 572 + </tr> 573 + </thead> 574 + <tbody> 575 + <tr class="odd"> 576 + <td></td> 577 + <td><a href="`ValueError`">ValueError</a></td> 578 + <td>If the record is not a lens record.</td> 579 + </tr> 580 + </tbody> 581 + </table> 582 + </section> 466 583 </section> 467 584 <section id="atdata.atmosphere.LensLoader.list_all" class="level3"> 468 585 <h3 class="anchored" data-anchor-id="atdata.atmosphere.LensLoader.list_all">list_all</h3> 469 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 586 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 470 587 <p>List lens records from a repository.</p> 471 - <p>Args: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.</p> 472 - <p>Returns: List of lens records.</p> 588 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 589 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 590 + <table class="caption-top table"> 591 + <thead> 592 + <tr class="header"> 593 + <th>Name</th> 594 + <th>Type</th> 595 + <th>Description</th> 596 + <th>Default</th> 597 + </tr> 598 + </thead> 599 + <tbody> 600 + <tr class="odd"> 601 + <td>repo</td> 602 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 603 + <td>The DID of the repository. Defaults to authenticated user.</td> 604 + <td><code>None</code></td> 605 + </tr> 606 + <tr class="even"> 607 + <td>limit</td> 608 + <td><a href="`int`">int</a></td> 609 + <td>Maximum number of records to return.</td> 610 + <td><code>100</code></td> 611 + </tr> 612 + </tbody> 613 + </table> 614 + </section> 615 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 616 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 617 + <table class="caption-top table"> 618 + <thead> 619 + <tr class="header"> 620 + <th>Name</th> 621 + <th>Type</th> 622 + <th>Description</th> 623 + </tr> 624 + </thead> 625 + <tbody> 626 + <tr class="odd"> 627 + <td></td> 628 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 629 + <td>List of lens records.</td> 630 + </tr> 631 + </tbody> 632 + </table> 473 633 474 634 635 + </section> 475 636 </section> 476 637 </section> 477 638 </section>

+238 -29

docs/api/LensPublisher.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.LensPublisher" id="toc-atdata.atmosphere.LensPublisher" class="nav-link active" data-scroll-target="#atdata.atmosphere.LensPublisher">LensPublisher</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 402 + <li><a href="#security-note" id="toc-security-note" class="nav-link" data-scroll-target="#security-note">Security Note</a></li> 401 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 404 <ul class="collapse"> 403 405 <li><a href="#atdata.atmosphere.LensPublisher.publish" id="toc-atdata.atmosphere.LensPublisher.publish" class="nav-link" data-scroll-target="#atdata.atmosphere.LensPublisher.publish">publish</a></li> ··· 419 421 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensPublisher(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 420 422 <p>Publishes Lens transformation records to ATProto.</p> 421 423 <p>This class creates lens records that reference source and target schemas and point to the transformation code in a git repository.</p> 422 - <p>Example: >>> <span class="citation" data-cites="atdata.lens">@atdata.lens</span> … def my_lens(source: SourceType) -> TargetType: … return TargetType(field=source.other_field) >>> >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = LensPublisher(client) >>> uri = publisher.publish( … name=“my_lens”, … source_schema_uri=“at://did:plc:abc/ac.foundation.dataset.sampleSchema/source”, … target_schema_uri=“at://did:plc:abc/ac.foundation.dataset.sampleSchema/target”, … code_repository=“https://github.com/user/repo”, … code_commit=“abc123def456”, … getter_path=“mymodule.lenses:my_lens”, … putter_path=“mymodule.lenses:my_lens_putter”, … )</p> 423 - <p>Security Note: Lens code is stored as references to git repositories rather than inline code. This prevents arbitrary code execution from ATProto records. Users must manually install and trust lens implementations.</p> 424 + <section id="example" class="level2 doc-section doc-section-example"> 425 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 426 + <p>::</p> 427 + <pre><code>>>> @atdata.lens 428 + ... def my_lens(source: SourceType) -> TargetType: 429 + ... return TargetType(field=source.other_field) 430 + >>> 431 + >>> client = AtmosphereClient() 432 + >>> client.login("handle", "password") 433 + >>> 434 + >>> publisher = LensPublisher(client) 435 + >>> uri = publisher.publish( 436 + ... name="my_lens", 437 + ... source_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/source", 438 + ... target_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/target", 439 + ... code_repository="https://github.com/user/repo", 440 + ... code_commit="abc123def456", 441 + ... getter_path="mymodule.lenses:my_lens", 442 + ... putter_path="mymodule.lenses:my_lens_putter", 443 + ... )</code></pre> 444 + </section> 445 + <section id="security-note" class="level2 doc-section doc-section-security-note"> 446 + <h2 class="doc-section doc-section-security-note anchored" data-anchor-id="security-note">Security Note</h2> 447 + <p>Lens code is stored as references to git repositories rather than inline code. This prevents arbitrary code execution from ATProto records. Users must manually install and trust lens implementations.</p> 448 + </section> 424 449 <section id="methods" class="level2"> 425 450 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 426 451 <table class="caption-top table"> ··· 443 468 </table> 444 469 <section id="atdata.atmosphere.LensPublisher.publish" class="level3"> 445 470 <h3 class="anchored" data-anchor-id="atdata.atmosphere.LensPublisher.publish">publish</h3> 446 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensPublisher.publish(</span> 447 - <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> name,</span> 448 - <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 449 - <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> target_schema_uri,</span> 450 - <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 451 - <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> code_repository<span class="op">=</span><span class="va">None</span>,</span> 452 - <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> code_commit<span class="op">=</span><span class="va">None</span>,</span> 453 - <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> getter_path<span class="op">=</span><span class="va">None</span>,</span> 454 - <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a> putter_path<span class="op">=</span><span class="va">None</span>,</span> 455 - <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 456 - <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 471 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensPublisher.publish(</span> 472 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> name,</span> 473 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 474 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> target_schema_uri,</span> 475 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 476 + <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> code_repository<span class="op">=</span><span class="va">None</span>,</span> 477 + <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> code_commit<span class="op">=</span><span class="va">None</span>,</span> 478 + <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> getter_path<span class="op">=</span><span class="va">None</span>,</span> 479 + <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> putter_path<span class="op">=</span><span class="va">None</span>,</span> 480 + <span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 481 + <span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 457 482 <p>Publish a lens transformation record to ATProto.</p> 458 - <p>Args: name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. description: What this transformation does. code_repository: Git repository URL containing the lens code. code_commit: Git commit hash for reproducibility. getter_path: Module path to the getter function (e.g., ‘mymodule.lenses:my_getter’). putter_path: Module path to the putter function (e.g., ‘mymodule.lenses:my_putter’). rkey: Optional explicit record key.</p> 459 - <p>Returns: The AT URI of the created lens record.</p> 460 - <p>Raises: ValueError: If code references are incomplete.</p> 483 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 484 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 485 + <table class="caption-top table"> 486 + <thead> 487 + <tr class="header"> 488 + <th>Name</th> 489 + <th>Type</th> 490 + <th>Description</th> 491 + <th>Default</th> 492 + </tr> 493 + </thead> 494 + <tbody> 495 + <tr class="odd"> 496 + <td>name</td> 497 + <td><a href="`str`">str</a></td> 498 + <td>Human-readable lens name.</td> 499 + <td><em>required</em></td> 500 + </tr> 501 + <tr class="even"> 502 + <td>source_schema_uri</td> 503 + <td><a href="`str`">str</a></td> 504 + <td>AT URI of the source schema.</td> 505 + <td><em>required</em></td> 506 + </tr> 507 + <tr class="odd"> 508 + <td>target_schema_uri</td> 509 + <td><a href="`str`">str</a></td> 510 + <td>AT URI of the target schema.</td> 511 + <td><em>required</em></td> 512 + </tr> 513 + <tr class="even"> 514 + <td>description</td> 515 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 516 + <td>What this transformation does.</td> 517 + <td><code>None</code></td> 518 + </tr> 519 + <tr class="odd"> 520 + <td>code_repository</td> 521 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 522 + <td>Git repository URL containing the lens code.</td> 523 + <td><code>None</code></td> 524 + </tr> 525 + <tr class="even"> 526 + <td>code_commit</td> 527 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 528 + <td>Git commit hash for reproducibility.</td> 529 + <td><code>None</code></td> 530 + </tr> 531 + <tr class="odd"> 532 + <td>getter_path</td> 533 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 534 + <td>Module path to the getter function (e.g., ‘mymodule.lenses:my_getter’).</td> 535 + <td><code>None</code></td> 536 + </tr> 537 + <tr class="even"> 538 + <td>putter_path</td> 539 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 540 + <td>Module path to the putter function (e.g., ‘mymodule.lenses:my_putter’).</td> 541 + <td><code>None</code></td> 542 + </tr> 543 + <tr class="odd"> 544 + <td>rkey</td> 545 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 546 + <td>Optional explicit record key.</td> 547 + <td><code>None</code></td> 548 + </tr> 549 + </tbody> 550 + </table> 551 + </section> 552 + <section id="returns" class="level4 doc-section doc-section-returns"> 553 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 554 + <table class="caption-top table"> 555 + <thead> 556 + <tr class="header"> 557 + <th>Name</th> 558 + <th>Type</th> 559 + <th>Description</th> 560 + </tr> 561 + </thead> 562 + <tbody> 563 + <tr class="odd"> 564 + <td></td> 565 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 566 + <td>The AT URI of the created lens record.</td> 567 + </tr> 568 + </tbody> 569 + </table> 570 + </section> 571 + <section id="raises" class="level4 doc-section doc-section-raises"> 572 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 573 + <table class="caption-top table"> 574 + <thead> 575 + <tr class="header"> 576 + <th>Name</th> 577 + <th>Type</th> 578 + <th>Description</th> 579 + </tr> 580 + </thead> 581 + <tbody> 582 + <tr class="odd"> 583 + <td></td> 584 + <td><a href="`ValueError`">ValueError</a></td> 585 + <td>If code references are incomplete.</td> 586 + </tr> 587 + </tbody> 588 + </table> 589 + </section> 461 590 </section> 462 591 <section id="atdata.atmosphere.LensPublisher.publish_from_lens" class="level3"> 463 592 <h3 class="anchored" data-anchor-id="atdata.atmosphere.LensPublisher.publish_from_lens">publish_from_lens</h3> 464 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensPublisher.publish_from_lens(</span> 465 - <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> lens_obj,</span> 466 - <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 467 - <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> name,</span> 468 - <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 469 - <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> target_schema_uri,</span> 470 - <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> code_repository,</span> 471 - <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> code_commit,</span> 472 - <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 473 - <span id="cb3-10"><a href="#cb3-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 474 - <span id="cb3-11"><a href="#cb3-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 593 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.LensPublisher.publish_from_lens(</span> 594 + <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> lens_obj,</span> 595 + <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 596 + <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a> name,</span> 597 + <span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a> source_schema_uri,</span> 598 + <span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> target_schema_uri,</span> 599 + <span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> code_repository,</span> 600 + <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a> code_commit,</span> 601 + <span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 602 + <span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 603 + <span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 475 604 <p>Publish a lens record from an existing Lens object.</p> 476 605 <p>This method extracts the getter and putter function names from the Lens object and publishes a record referencing them.</p> 477 - <p>Args: lens_obj: The Lens object to publish. name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. code_repository: Git repository URL. code_commit: Git commit hash. description: What this transformation does. rkey: Optional explicit record key.</p> 478 - <p>Returns: The AT URI of the created lens record.</p> 606 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 607 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 608 + <table class="caption-top table"> 609 + <thead> 610 + <tr class="header"> 611 + <th>Name</th> 612 + <th>Type</th> 613 + <th>Description</th> 614 + <th>Default</th> 615 + </tr> 616 + </thead> 617 + <tbody> 618 + <tr class="odd"> 619 + <td>lens_obj</td> 620 + <td><a href="`atdata.lens.Lens`">Lens</a></td> 621 + <td>The Lens object to publish.</td> 622 + <td><em>required</em></td> 623 + </tr> 624 + <tr class="even"> 625 + <td>name</td> 626 + <td><a href="`str`">str</a></td> 627 + <td>Human-readable lens name.</td> 628 + <td><em>required</em></td> 629 + </tr> 630 + <tr class="odd"> 631 + <td>source_schema_uri</td> 632 + <td><a href="`str`">str</a></td> 633 + <td>AT URI of the source schema.</td> 634 + <td><em>required</em></td> 635 + </tr> 636 + <tr class="even"> 637 + <td>target_schema_uri</td> 638 + <td><a href="`str`">str</a></td> 639 + <td>AT URI of the target schema.</td> 640 + <td><em>required</em></td> 641 + </tr> 642 + <tr class="odd"> 643 + <td>code_repository</td> 644 + <td><a href="`str`">str</a></td> 645 + <td>Git repository URL.</td> 646 + <td><em>required</em></td> 647 + </tr> 648 + <tr class="even"> 649 + <td>code_commit</td> 650 + <td><a href="`str`">str</a></td> 651 + <td>Git commit hash.</td> 652 + <td><em>required</em></td> 653 + </tr> 654 + <tr class="odd"> 655 + <td>description</td> 656 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 657 + <td>What this transformation does.</td> 658 + <td><code>None</code></td> 659 + </tr> 660 + <tr class="even"> 661 + <td>rkey</td> 662 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 663 + <td>Optional explicit record key.</td> 664 + <td><code>None</code></td> 665 + </tr> 666 + </tbody> 667 + </table> 668 + </section> 669 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 670 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 671 + <table class="caption-top table"> 672 + <thead> 673 + <tr class="header"> 674 + <th>Name</th> 675 + <th>Type</th> 676 + <th>Description</th> 677 + </tr> 678 + </thead> 679 + <tbody> 680 + <tr class="odd"> 681 + <td></td> 682 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 683 + <td>The AT URI of the created lens record.</td> 684 + </tr> 685 + </tbody> 686 + </table> 479 687 480 688 689 + </section> 481 690 </section> 482 691 </section> 483 692 </section>

+16 -3

docs/api/Packable-protocol.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.Packable" id="toc-atdata.Packable" class="nav-link active" data-scroll-target="#atdata.Packable">Packable</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> ··· 422 423 <p>This protocol allows classes decorated with <code>@packable</code> to be recognized as valid types for lens transformations and schema operations, even though the decorator doesn’t change the class’s nominal type at static analysis time.</p> 423 424 <p>Both <code>PackableSample</code> subclasses and <code>@packable</code>-decorated classes satisfy this protocol structurally.</p> 424 425 <p>The protocol captures the full interface needed for: - Lens type transformations (as_wds, from_data) - Schema publishing (class introspection via dataclass fields) - Serialization/deserialization (packed, from_bytes)</p> 425 - <p>Example: >>> <span class="citation" data-cites="packable">@packable</span> … class MySample: … name: str … value: int … >>> def process(sample_type: Type<a href="#atdata.Packable">Packable</a>) -> None: … # Type checker knows sample_type has from_bytes, packed, etc. … instance = sample_type.from_bytes(data) … print(instance.packed)</p> 426 + <section id="example" class="level2 doc-section doc-section-example"> 427 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 428 + <p>::</p> 429 + <pre><code>>>> @packable 430 + ... class MySample: 431 + ... name: str 432 + ... value: int 433 + ... 434 + >>> def process(sample_type: Type[Packable]) -> None: 435 + ... # Type checker knows sample_type has from_bytes, packed, etc. 436 + ... instance = sample_type.from_bytes(data) 437 + ... print(instance.packed)</code></pre> 438 + </section> 426 439 <section id="attributes" class="level2"> 427 440 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 428 441 <table class="caption-top table"> ··· 466 479 </table> 467 480 <section id="atdata.Packable.from_bytes" class="level3"> 468 481 <h3 class="anchored" data-anchor-id="atdata.Packable.from_bytes">from_bytes</h3> 469 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>Packable.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 482 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>Packable.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 470 483 <p>Create instance from raw msgpack bytes.</p> 471 484 </section> 472 485 <section id="atdata.Packable.from_data" class="level3"> 473 486 <h3 class="anchored" data-anchor-id="atdata.Packable.from_data">from_data</h3> 474 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>Packable.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 487 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>Packable.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 475 488 <p>Create instance from unpacked msgpack data dictionary.</p> 476 489 477 490

+95 -7

docs/api/PackableSample.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.PackableSample" id="toc-atdata.PackableSample" class="nav-link active" data-scroll-target="#atdata.PackableSample">PackableSample</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> ··· 421 422 <p>Base class for samples that can be serialized with msgpack.</p> 422 423 <p>This abstract base class provides automatic serialization/deserialization for dataclass-based samples. Fields annotated as <code>NDArray</code> or <code>NDArray | None</code> are automatically converted between numpy arrays and bytes during packing/unpacking.</p> 423 424 <p>Subclasses should be defined either by: 1. Direct inheritance with the <code>@dataclass</code> decorator 2. Using the <code>@packable</code> decorator (recommended)</p> 424 - <p>Example: >>> <span class="citation" data-cites="packable">@packable</span> … class MyData: … name: str … embeddings: NDArray … >>> sample = MyData(name=“test”, embeddings=np.array([1.0, 2.0])) >>> packed = sample.packed # Serialize to bytes >>> restored = MyData.from_bytes(packed) # Deserialize</p> 425 + <section id="example" class="level2 doc-section doc-section-example"> 426 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 427 + <p>::</p> 428 + <pre><code>>>> @packable 429 + ... class MyData: 430 + ... name: str 431 + ... embeddings: NDArray 432 + ... 433 + >>> sample = MyData(name="test", embeddings=np.array([1.0, 2.0])) 434 + >>> packed = sample.packed # Serialize to bytes 435 + >>> restored = MyData.from_bytes(packed) # Deserialize</code></pre> 436 + </section> 425 437 <section id="attributes" class="level2"> 426 438 <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 427 439 <table class="caption-top table"> ··· 465 477 </table> 466 478 <section id="atdata.PackableSample.from_bytes" class="level3"> 467 479 <h3 class="anchored" data-anchor-id="atdata.PackableSample.from_bytes">from_bytes</h3> 468 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>PackableSample.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 480 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>PackableSample.from_bytes(bs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 469 481 <p>Create a sample instance from raw msgpack bytes.</p> 470 - <p>Args: bs: Raw bytes from a msgpack-serialized sample.</p> 471 - <p>Returns: A new instance of this sample class deserialized from the bytes.</p> 482 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 483 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 484 + <table class="caption-top table"> 485 + <thead> 486 + <tr class="header"> 487 + <th>Name</th> 488 + <th>Type</th> 489 + <th>Description</th> 490 + <th>Default</th> 491 + </tr> 492 + </thead> 493 + <tbody> 494 + <tr class="odd"> 495 + <td>bs</td> 496 + <td><a href="`bytes`">bytes</a></td> 497 + <td>Raw bytes from a msgpack-serialized sample.</td> 498 + <td><em>required</em></td> 499 + </tr> 500 + </tbody> 501 + </table> 502 + </section> 503 + <section id="returns" class="level4 doc-section doc-section-returns"> 504 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 505 + <table class="caption-top table"> 506 + <thead> 507 + <tr class="header"> 508 + <th>Name</th> 509 + <th>Type</th> 510 + <th>Description</th> 511 + </tr> 512 + </thead> 513 + <tbody> 514 + <tr class="odd"> 515 + <td></td> 516 + <td><a href="`typing.Self`">Self</a></td> 517 + <td>A new instance of this sample class deserialized from the bytes.</td> 518 + </tr> 519 + </tbody> 520 + </table> 521 + </section> 472 522 </section> 473 523 <section id="atdata.PackableSample.from_data" class="level3"> 474 524 <h3 class="anchored" data-anchor-id="atdata.PackableSample.from_data">from_data</h3> 475 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>PackableSample.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 525 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>PackableSample.from_data(data)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 476 526 <p>Create a sample instance from unpacked msgpack data.</p> 477 - <p>Args: data: Dictionary with keys matching the sample’s field names.</p> 478 - <p>Returns: New instance with NDArray fields auto-converted from bytes.</p> 527 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 528 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 529 + <table class="caption-top table"> 530 + <thead> 531 + <tr class="header"> 532 + <th>Name</th> 533 + <th>Type</th> 534 + <th>Description</th> 535 + <th>Default</th> 536 + </tr> 537 + </thead> 538 + <tbody> 539 + <tr class="odd"> 540 + <td>data</td> 541 + <td><a href="`atdata.dataset.WDSRawSample`">WDSRawSample</a></td> 542 + <td>Dictionary with keys matching the sample’s field names.</td> 543 + <td><em>required</em></td> 544 + </tr> 545 + </tbody> 546 + </table> 547 + </section> 548 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 549 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 550 + <table class="caption-top table"> 551 + <thead> 552 + <tr class="header"> 553 + <th>Name</th> 554 + <th>Type</th> 555 + <th>Description</th> 556 + </tr> 557 + </thead> 558 + <tbody> 559 + <tr class="odd"> 560 + <td></td> 561 + <td><a href="`typing.Self`">Self</a></td> 562 + <td>New instance with NDArray fields auto-converted from bytes.</td> 563 + </tr> 564 + </tbody> 565 + </table> 479 566 480 567 568 + </section> 481 569 </section> 482 570 </section> 483 571 </section>

+266 -29

docs/api/S3Source.html

··· 399 399 <li><a href="#atdata.S3Source" id="toc-atdata.S3Source" class="nav-link active" data-scroll-target="#atdata.S3Source">S3Source</a> 400 400 <ul class="collapse"> 401 401 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> 404 405 <li><a href="#atdata.S3Source.from_credentials" id="toc-atdata.S3Source.from_credentials" class="nav-link" data-scroll-target="#atdata.S3Source.from_credentials">from_credentials</a></li> ··· 431 432 <p>Data source for S3-compatible storage with explicit credentials.</p> 432 433 <p>Uses boto3 to stream directly from S3, supporting: - Standard AWS S3 - S3-compatible endpoints (Cloudflare R2, MinIO, etc.) - Private buckets with credentials - IAM role authentication (when keys not provided)</p> 433 434 <p>Unlike URL-based approaches, this doesn’t require URL transformation or global gopen_schemes registration. Credentials are scoped to the source instance.</p> 434 - <p>Attributes: bucket: S3 bucket name. keys: List of object keys (paths within bucket). endpoint: Optional custom endpoint URL for S3-compatible services. access_key: Optional AWS access key ID. secret_key: Optional AWS secret access key. region: Optional AWS region (defaults to us-east-1).</p> 435 - <p>Example: >>> source = S3Source( … bucket=“my-datasets”, … keys=[“train/shard-000.tar”, “train/shard-001.tar”], … endpoint=“https://abc123.r2.cloudflarestorage.com”, … access_key=“AKIAIOSFODNN7EXAMPLE”, … secret_key=“wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY”, … ) >>> for shard_id, stream in source.shards: … process(stream)</p> 436 - <section id="attributes" class="level2"> 437 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 435 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 436 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 438 437 <table class="caption-top table"> 439 438 <thead> 440 439 <tr class="header"> 441 440 <th>Name</th> 441 + <th>Type</th> 442 442 <th>Description</th> 443 443 </tr> 444 444 </thead> 445 445 <tbody> 446 446 <tr class="odd"> 447 - <td><a href="#atdata.S3Source.shard_list">shard_list</a></td> 448 - <td>Return list of S3 URIs for the shards (deprecated, use list_shards()).</td> 447 + <td>bucket</td> 448 + <td><a href="`str`">str</a></td> 449 + <td>S3 bucket name.</td> 450 + </tr> 451 + <tr class="even"> 452 + <td>keys</td> 453 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 454 + <td>List of object keys (paths within bucket).</td> 455 + </tr> 456 + <tr class="odd"> 457 + <td>endpoint</td> 458 + <td><a href="`str`">str</a> | None</td> 459 + <td>Optional custom endpoint URL for S3-compatible services.</td> 449 460 </tr> 450 461 <tr class="even"> 451 - <td><a href="#atdata.S3Source.shards">shards</a></td> 452 - <td>Lazily yield (s3_uri, stream) pairs for each shard.</td> 462 + <td>access_key</td> 463 + <td><a href="`str`">str</a> | None</td> 464 + <td>Optional AWS access key ID.</td> 465 + </tr> 466 + <tr class="odd"> 467 + <td>secret_key</td> 468 + <td><a href="`str`">str</a> | None</td> 469 + <td>Optional AWS secret access key.</td> 470 + </tr> 471 + <tr class="even"> 472 + <td>region</td> 473 + <td><a href="`str`">str</a> | None</td> 474 + <td>Optional AWS region (defaults to us-east-1).</td> 453 475 </tr> 454 476 </tbody> 455 477 </table> 478 + </section> 479 + <section id="example" class="level2 doc-section doc-section-example"> 480 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 481 + <p>::</p> 482 + <pre><code>>>> source = S3Source( 483 + ... bucket="my-datasets", 484 + ... keys=["train/shard-000.tar", "train/shard-001.tar"], 485 + ... endpoint="https://abc123.r2.cloudflarestorage.com", 486 + ... access_key="AKIAIOSFODNN7EXAMPLE", 487 + ... secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", 488 + ... ) 489 + >>> for shard_id, stream in source.shards: 490 + ... process(stream)</code></pre> 456 491 </section> 457 492 <section id="methods" class="level2"> 458 493 <h2 class="anchored" data-anchor-id="methods">Methods</h2> ··· 484 519 </table> 485 520 <section id="atdata.S3Source.from_credentials" class="level3"> 486 521 <h3 class="anchored" data-anchor-id="atdata.S3Source.from_credentials">from_credentials</h3> 487 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>S3Source.from_credentials(credentials, bucket, keys)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 522 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>S3Source.from_credentials(credentials, bucket, keys)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 488 523 <p>Create S3Source from a credentials dictionary.</p> 489 524 <p>Accepts the same credential format used by S3DataStore.</p> 490 - <p>Args: credentials: Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT. bucket: S3 bucket name. keys: List of object keys.</p> 491 - <p>Returns: Configured S3Source.</p> 492 - <p>Example: >>> creds = { … “AWS_ACCESS_KEY_ID”: “…”, … “AWS_SECRET_ACCESS_KEY”: “…”, … “AWS_ENDPOINT”: “https://r2.example.com”, … } >>> source = S3Source.from_credentials(creds, “my-bucket”, [“data.tar”])</p> 525 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 526 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 527 + <table class="caption-top table"> 528 + <thead> 529 + <tr class="header"> 530 + <th>Name</th> 531 + <th>Type</th> 532 + <th>Description</th> 533 + <th>Default</th> 534 + </tr> 535 + </thead> 536 + <tbody> 537 + <tr class="odd"> 538 + <td>credentials</td> 539 + <td><a href="`dict`">dict</a>[<a href="`str`">str</a>, <a href="`str`">str</a>]</td> 540 + <td>Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT.</td> 541 + <td><em>required</em></td> 542 + </tr> 543 + <tr class="even"> 544 + <td>bucket</td> 545 + <td><a href="`str`">str</a></td> 546 + <td>S3 bucket name.</td> 547 + <td><em>required</em></td> 548 + </tr> 549 + <tr class="odd"> 550 + <td>keys</td> 551 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 552 + <td>List of object keys.</td> 553 + <td><em>required</em></td> 554 + </tr> 555 + </tbody> 556 + </table> 557 + </section> 558 + <section id="returns" class="level4 doc-section doc-section-returns"> 559 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 560 + <table class="caption-top table"> 561 + <thead> 562 + <tr class="header"> 563 + <th>Name</th> 564 + <th>Type</th> 565 + <th>Description</th> 566 + </tr> 567 + </thead> 568 + <tbody> 569 + <tr class="odd"> 570 + <td></td> 571 + <td>'S3Source'</td> 572 + <td>Configured S3Source.</td> 573 + </tr> 574 + </tbody> 575 + </table> 576 + </section> 577 + <section id="example-1" class="level4 doc-section doc-section-example"> 578 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 579 + <p>::</p> 580 + <pre><code>>>> creds = { 581 + ... "AWS_ACCESS_KEY_ID": "...", 582 + ... "AWS_SECRET_ACCESS_KEY": "...", 583 + ... "AWS_ENDPOINT": "https://r2.example.com", 584 + ... } 585 + >>> source = S3Source.from_credentials(creds, "my-bucket", ["data.tar"])</code></pre> 586 + </section> 493 587 </section> 494 588 <section id="atdata.S3Source.from_urls" class="level3"> 495 589 <h3 class="anchored" data-anchor-id="atdata.S3Source.from_urls">from_urls</h3> 496 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>S3Source.from_urls(</span> 497 - <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> urls,</span> 498 - <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 499 - <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> endpoint<span class="op">=</span><span class="va">None</span>,</span> 500 - <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> access_key<span class="op">=</span><span class="va">None</span>,</span> 501 - <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> secret_key<span class="op">=</span><span class="va">None</span>,</span> 502 - <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> region<span class="op">=</span><span class="va">None</span>,</span> 503 - <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 590 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>S3Source.from_urls(</span> 591 + <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> urls,</span> 592 + <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 593 + <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> endpoint<span class="op">=</span><span class="va">None</span>,</span> 594 + <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> access_key<span class="op">=</span><span class="va">None</span>,</span> 595 + <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> secret_key<span class="op">=</span><span class="va">None</span>,</span> 596 + <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> region<span class="op">=</span><span class="va">None</span>,</span> 597 + <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 504 598 <p>Create S3Source from s3:// URLs.</p> 505 599 <p>Parses s3://bucket/key URLs and extracts bucket and keys. All URLs must be in the same bucket.</p> 506 - <p>Args: urls: List of s3:// URLs. endpoint: Optional custom endpoint. access_key: Optional access key. secret_key: Optional secret key. region: Optional region.</p> 507 - <p>Returns: S3Source configured for the given URLs.</p> 508 - <p>Raises: ValueError: If URLs are not valid s3:// URLs or span multiple buckets.</p> 509 - <p>Example: >>> source = S3Source.from_urls( … [“s3://my-bucket/train-000.tar”, “s3://my-bucket/train-001.tar”], … endpoint=“https://r2.example.com”, … )</p> 600 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 601 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 602 + <table class="caption-top table"> 603 + <thead> 604 + <tr class="header"> 605 + <th>Name</th> 606 + <th>Type</th> 607 + <th>Description</th> 608 + <th>Default</th> 609 + </tr> 610 + </thead> 611 + <tbody> 612 + <tr class="odd"> 613 + <td>urls</td> 614 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 615 + <td>List of s3:// URLs.</td> 616 + <td><em>required</em></td> 617 + </tr> 618 + <tr class="even"> 619 + <td>endpoint</td> 620 + <td><a href="`str`">str</a> | None</td> 621 + <td>Optional custom endpoint.</td> 622 + <td><code>None</code></td> 623 + </tr> 624 + <tr class="odd"> 625 + <td>access_key</td> 626 + <td><a href="`str`">str</a> | None</td> 627 + <td>Optional access key.</td> 628 + <td><code>None</code></td> 629 + </tr> 630 + <tr class="even"> 631 + <td>secret_key</td> 632 + <td><a href="`str`">str</a> | None</td> 633 + <td>Optional secret key.</td> 634 + <td><code>None</code></td> 635 + </tr> 636 + <tr class="odd"> 637 + <td>region</td> 638 + <td><a href="`str`">str</a> | None</td> 639 + <td>Optional region.</td> 640 + <td><code>None</code></td> 641 + </tr> 642 + </tbody> 643 + </table> 644 + </section> 645 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 646 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 647 + <table class="caption-top table"> 648 + <thead> 649 + <tr class="header"> 650 + <th>Name</th> 651 + <th>Type</th> 652 + <th>Description</th> 653 + </tr> 654 + </thead> 655 + <tbody> 656 + <tr class="odd"> 657 + <td></td> 658 + <td>'S3Source'</td> 659 + <td>S3Source configured for the given URLs.</td> 660 + </tr> 661 + </tbody> 662 + </table> 663 + </section> 664 + <section id="raises" class="level4 doc-section doc-section-raises"> 665 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 666 + <table class="caption-top table"> 667 + <thead> 668 + <tr class="header"> 669 + <th>Name</th> 670 + <th>Type</th> 671 + <th>Description</th> 672 + </tr> 673 + </thead> 674 + <tbody> 675 + <tr class="odd"> 676 + <td></td> 677 + <td><a href="`ValueError`">ValueError</a></td> 678 + <td>If URLs are not valid s3:// URLs or span multiple buckets.</td> 679 + </tr> 680 + </tbody> 681 + </table> 682 + </section> 683 + <section id="example-2" class="level4 doc-section doc-section-example"> 684 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-2">Example</h4> 685 + <p>::</p> 686 + <pre><code>>>> source = S3Source.from_urls( 687 + ... ["s3://my-bucket/train-000.tar", "s3://my-bucket/train-001.tar"], 688 + ... endpoint="https://r2.example.com", 689 + ... )</code></pre> 690 + </section> 510 691 </section> 511 692 <section id="atdata.S3Source.list_shards" class="level3"> 512 693 <h3 class="anchored" data-anchor-id="atdata.S3Source.list_shards">list_shards</h3> 513 - <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>S3Source.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 694 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>S3Source.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 514 695 <p>Return list of S3 URIs for the shards.</p> 515 696 </section> 516 697 <section id="atdata.S3Source.open_shard" class="level3"> 517 698 <h3 class="anchored" data-anchor-id="atdata.S3Source.open_shard">open_shard</h3> 518 - <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>S3Source.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 699 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>S3Source.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 519 700 <p>Open a single shard by S3 URI.</p> 520 - <p>Args: shard_id: S3 URI of the shard (s3://bucket/key).</p> 521 - <p>Returns: StreamingBody for reading the object.</p> 522 - <p>Raises: KeyError: If shard_id is not in list_shards().</p> 701 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 702 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 703 + <table class="caption-top table"> 704 + <thead> 705 + <tr class="header"> 706 + <th>Name</th> 707 + <th>Type</th> 708 + <th>Description</th> 709 + <th>Default</th> 710 + </tr> 711 + </thead> 712 + <tbody> 713 + <tr class="odd"> 714 + <td>shard_id</td> 715 + <td><a href="`str`">str</a></td> 716 + <td>S3 URI of the shard (s3://bucket/key).</td> 717 + <td><em>required</em></td> 718 + </tr> 719 + </tbody> 720 + </table> 721 + </section> 722 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 723 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 724 + <table class="caption-top table"> 725 + <thead> 726 + <tr class="header"> 727 + <th>Name</th> 728 + <th>Type</th> 729 + <th>Description</th> 730 + </tr> 731 + </thead> 732 + <tbody> 733 + <tr class="odd"> 734 + <td></td> 735 + <td><a href="`typing.IO`">IO</a>[<a href="`bytes`">bytes</a>]</td> 736 + <td>StreamingBody for reading the object.</td> 737 + </tr> 738 + </tbody> 739 + </table> 740 + </section> 741 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 742 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 743 + <table class="caption-top table"> 744 + <thead> 745 + <tr class="header"> 746 + <th>Name</th> 747 + <th>Type</th> 748 + <th>Description</th> 749 + </tr> 750 + </thead> 751 + <tbody> 752 + <tr class="odd"> 753 + <td></td> 754 + <td><a href="`KeyError`">KeyError</a></td> 755 + <td>If shard_id is not in list_shards().</td> 756 + </tr> 757 + </tbody> 758 + </table> 523 759 524 760 761 + </section> 525 762 </section> 526 763 </section> 527 764 </section>

+47 -8

docs/api/SampleBatch.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.SampleBatch" id="toc-atdata.SampleBatch" class="nav-link active" data-scroll-target="#atdata.SampleBatch">SampleBatch</a> 400 400 <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 401 402 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 403 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 404 + <li><a href="#note" id="toc-note" class="nav-link" data-scroll-target="#note">Note</a></li> 402 405 </ul></li> 403 406 </ul> 404 407 <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/SampleBatch.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> ··· 416 419 <p>A batch of samples with automatic attribute aggregation.</p> 417 420 <p>This class wraps a sequence of samples and provides magic <code>__getattr__</code> access to aggregate sample attributes. When you access an attribute that exists on the sample type, it automatically aggregates values across all samples in the batch.</p> 418 421 <p>NDArray fields are stacked into a numpy array with a batch dimension. Other fields are aggregated into a list.</p> 419 - <p>Type Parameters: DT: The sample type, must derive from <code>PackableSample</code>.</p> 420 - <p>Attributes: samples: The list of sample instances in this batch.</p> 421 - <p>Example: >>> batch = SampleBatch<a href="[sample1, sample2, sample3]">MyData</a> >>> batch.embeddings # Returns stacked numpy array of shape (3, …) >>> batch.names # Returns list of names</p> 422 - <p>Note: This class uses Python’s <code>__orig_class__</code> mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax <code>SampleBatch[MyType](samples)</code> rather than calling the constructor directly with an unsubscripted class.</p> 423 - <section id="attributes" class="level2"> 424 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 422 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 423 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 424 + <table class="caption-top table"> 425 + <colgroup> 426 + <col style="width: 9%"> 427 + <col style="width: 9%"> 428 + <col style="width: 66%"> 429 + <col style="width: 14%"> 430 + </colgroup> 431 + <thead> 432 + <tr class="header"> 433 + <th>Name</th> 434 + <th>Type</th> 435 + <th>Description</th> 436 + <th>Default</th> 437 + </tr> 438 + </thead> 439 + <tbody> 440 + <tr class="odd"> 441 + <td>DT</td> 442 + <td></td> 443 + <td>The sample type, must derive from <code>PackableSample</code>.</td> 444 + <td><em>required</em></td> 445 + </tr> 446 + </tbody> 447 + </table> 448 + </section> 449 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 450 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 425 451 <table class="caption-top table"> 426 452 <thead> 427 453 <tr class="header"> 428 454 <th>Name</th> 455 + <th>Type</th> 429 456 <th>Description</th> 430 457 </tr> 431 458 </thead> 432 459 <tbody> 433 460 <tr class="odd"> 434 - <td><a href="#atdata.SampleBatch.sample_type">sample_type</a></td> 435 - <td>The type of each sample in this batch.</td> 461 + <td>samples</td> 462 + <td></td> 463 + <td>The list of sample instances in this batch.</td> 436 464 </tr> 437 465 </tbody> 438 466 </table> 467 + </section> 468 + <section id="example" class="level2 doc-section doc-section-example"> 469 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 470 + <p>::</p> 471 + <pre><code>>>> batch = SampleBatch[MyData]([sample1, sample2, sample3]) 472 + >>> batch.embeddings # Returns stacked numpy array of shape (3, ...) 473 + >>> batch.names # Returns list of names</code></pre> 474 + </section> 475 + <section id="note" class="level2 doc-section doc-section-note"> 476 + <h2 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h2> 477 + <p>This class uses Python’s <code>__orig_class__</code> mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax <code>SampleBatch[MyType](samples)</code> rather than calling the constructor directly with an unsubscripted class.</p> 439 478 440 479 441 480 </section>

+124 -8

docs/api/SchemaLoader.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.SchemaLoader" id="toc-atdata.atmosphere.SchemaLoader" class="nav-link active" data-scroll-target="#atdata.atmosphere.SchemaLoader">SchemaLoader</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.atmosphere.SchemaLoader.get" id="toc-atdata.atmosphere.SchemaLoader.get" class="nav-link" data-scroll-target="#atdata.atmosphere.SchemaLoader.get">get</a></li> ··· 419 420 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaLoader(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 420 421 <p>Loads PackableSample schemas from ATProto.</p> 421 422 <p>This class fetches schema records from ATProto and can list available schemas from a repository.</p> 422 - <p>Example: >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> loader = SchemaLoader(client) >>> schema = loader.get(“at://did:plc:…/ac.foundation.dataset.sampleSchema/…”) >>> print(schema[“name”]) ‘MySample’</p> 423 + <section id="example" class="level2 doc-section doc-section-example"> 424 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 425 + <p>::</p> 426 + <pre><code>>>> client = AtmosphereClient() 427 + >>> client.login("handle", "password") 428 + >>> 429 + >>> loader = SchemaLoader(client) 430 + >>> schema = loader.get("at://did:plc:.../ac.foundation.dataset.sampleSchema/...") 431 + >>> print(schema["name"]) 432 + 'MySample'</code></pre> 433 + </section> 423 434 <section id="methods" class="level2"> 424 435 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 425 436 <table class="caption-top table"> ··· 442 453 </table> 443 454 <section id="atdata.atmosphere.SchemaLoader.get" class="level3"> 444 455 <h3 class="anchored" data-anchor-id="atdata.atmosphere.SchemaLoader.get">get</h3> 445 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 456 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaLoader.get(uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 446 457 <p>Fetch a schema record by AT URI.</p> 447 - <p>Args: uri: The AT URI of the schema record.</p> 448 - <p>Returns: The schema record as a dictionary.</p> 449 - <p>Raises: ValueError: If the record is not a schema record. atproto.exceptions.AtProtocolError: If record not found.</p> 458 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 459 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 460 + <table class="caption-top table"> 461 + <thead> 462 + <tr class="header"> 463 + <th>Name</th> 464 + <th>Type</th> 465 + <th>Description</th> 466 + <th>Default</th> 467 + </tr> 468 + </thead> 469 + <tbody> 470 + <tr class="odd"> 471 + <td>uri</td> 472 + <td><a href="`str`">str</a> | <a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 473 + <td>The AT URI of the schema record.</td> 474 + <td><em>required</em></td> 475 + </tr> 476 + </tbody> 477 + </table> 478 + </section> 479 + <section id="returns" class="level4 doc-section doc-section-returns"> 480 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 481 + <table class="caption-top table"> 482 + <thead> 483 + <tr class="header"> 484 + <th>Name</th> 485 + <th>Type</th> 486 + <th>Description</th> 487 + </tr> 488 + </thead> 489 + <tbody> 490 + <tr class="odd"> 491 + <td></td> 492 + <td><a href="`dict`">dict</a></td> 493 + <td>The schema record as a dictionary.</td> 494 + </tr> 495 + </tbody> 496 + </table> 497 + </section> 498 + <section id="raises" class="level4 doc-section doc-section-raises"> 499 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 500 + <table class="caption-top table"> 501 + <thead> 502 + <tr class="header"> 503 + <th>Name</th> 504 + <th>Type</th> 505 + <th>Description</th> 506 + </tr> 507 + </thead> 508 + <tbody> 509 + <tr class="odd"> 510 + <td></td> 511 + <td><a href="`ValueError`">ValueError</a></td> 512 + <td>If the record is not a schema record.</td> 513 + </tr> 514 + <tr class="even"> 515 + <td></td> 516 + <td><a href="`atproto`">atproto</a>.<a href="`atproto.exceptions`">exceptions</a>.<a href="`atproto.exceptions.AtProtocolError`">AtProtocolError</a></td> 517 + <td>If record not found.</td> 518 + </tr> 519 + </tbody> 520 + </table> 521 + </section> 450 522 </section> 451 523 <section id="atdata.atmosphere.SchemaLoader.list_all" class="level3"> 452 524 <h3 class="anchored" data-anchor-id="atdata.atmosphere.SchemaLoader.list_all">list_all</h3> 453 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 525 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaLoader.list_all(repo<span class="op">=</span><span class="va">None</span>, limit<span class="op">=</span><span class="dv">100</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 454 526 <p>List schema records from a repository.</p> 455 - <p>Args: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.</p> 456 - <p>Returns: List of schema records.</p> 527 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 528 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 529 + <table class="caption-top table"> 530 + <thead> 531 + <tr class="header"> 532 + <th>Name</th> 533 + <th>Type</th> 534 + <th>Description</th> 535 + <th>Default</th> 536 + </tr> 537 + </thead> 538 + <tbody> 539 + <tr class="odd"> 540 + <td>repo</td> 541 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 542 + <td>The DID of the repository. Defaults to authenticated user.</td> 543 + <td><code>None</code></td> 544 + </tr> 545 + <tr class="even"> 546 + <td>limit</td> 547 + <td><a href="`int`">int</a></td> 548 + <td>Maximum number of records to return.</td> 549 + <td><code>100</code></td> 550 + </tr> 551 + </tbody> 552 + </table> 553 + </section> 554 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 555 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 556 + <table class="caption-top table"> 557 + <thead> 558 + <tr class="header"> 559 + <th>Name</th> 560 + <th>Type</th> 561 + <th>Description</th> 562 + </tr> 563 + </thead> 564 + <tbody> 565 + <tr class="odd"> 566 + <td></td> 567 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 568 + <td>List of schema records.</td> 569 + </tr> 570 + </tbody> 571 + </table> 457 572 458 573 574 + </section> 459 575 </section> 460 576 </section> 461 577 </section>

+120 -13

docs/api/SchemaPublisher.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.atmosphere.SchemaPublisher" id="toc-atdata.atmosphere.SchemaPublisher" class="nav-link active" data-scroll-target="#atdata.atmosphere.SchemaPublisher">SchemaPublisher</a> 400 400 <ul class="collapse"> 401 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.atmosphere.SchemaPublisher.publish" id="toc-atdata.atmosphere.SchemaPublisher.publish" class="nav-link" data-scroll-target="#atdata.atmosphere.SchemaPublisher.publish">publish</a></li> ··· 418 419 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaPublisher(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 419 420 <p>Publishes PackableSample schemas to ATProto.</p> 420 421 <p>This class introspects a PackableSample class to extract its field definitions and publishes them as an ATProto schema record.</p> 421 - <p>Example: >>> <span class="citation" data-cites="atdata.packable">@atdata.packable</span> … class MySample: … image: NDArray … label: str … >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = SchemaPublisher(client) >>> uri = publisher.publish(MySample, version=“1.0.0”) >>> print(uri) at://did:plc:…/ac.foundation.dataset.sampleSchema/…</p> 422 + <section id="example" class="level2 doc-section doc-section-example"> 423 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 424 + <p>::</p> 425 + <pre><code>>>> @atdata.packable 426 + ... class MySample: 427 + ... image: NDArray 428 + ... label: str 429 + ... 430 + >>> client = AtmosphereClient() 431 + >>> client.login("handle", "password") 432 + >>> 433 + >>> publisher = SchemaPublisher(client) 434 + >>> uri = publisher.publish(MySample, version="1.0.0") 435 + >>> print(uri) 436 + at://did:plc:.../ac.foundation.dataset.sampleSchema/...</code></pre> 437 + </section> 422 438 <section id="methods" class="level2"> 423 439 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 424 440 <table class="caption-top table"> ··· 437 453 </table> 438 454 <section id="atdata.atmosphere.SchemaPublisher.publish" class="level3"> 439 455 <h3 class="anchored" data-anchor-id="atdata.atmosphere.SchemaPublisher.publish">publish</h3> 440 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaPublisher.publish(</span> 441 - <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> sample_type,</span> 442 - <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 443 - <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="va">None</span>,</span> 444 - <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 445 - <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 446 - <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span><span class="va">None</span>,</span> 447 - <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 448 - <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 456 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.SchemaPublisher.publish(</span> 457 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a> sample_type,</span> 458 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 459 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="va">None</span>,</span> 460 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">'1.0.0'</span>,</span> 461 + <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="va">None</span>,</span> 462 + <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span><span class="va">None</span>,</span> 463 + <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a> rkey<span class="op">=</span><span class="va">None</span>,</span> 464 + <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 449 465 <p>Publish a PackableSample schema to ATProto.</p> 450 - <p>Args: sample_type: The PackableSample class to publish. name: Human-readable name. Defaults to the class name. version: Semantic version string (e.g., ‘1.0.0’). description: Human-readable description. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key. If not provided, a TID is generated.</p> 451 - <p>Returns: The AT URI of the created schema record.</p> 452 - <p>Raises: ValueError: If sample_type is not a dataclass or client is not authenticated. TypeError: If a field type is not supported.</p> 466 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 467 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 468 + <table class="caption-top table"> 469 + <thead> 470 + <tr class="header"> 471 + <th>Name</th> 472 + <th>Type</th> 473 + <th>Description</th> 474 + <th>Default</th> 475 + </tr> 476 + </thead> 477 + <tbody> 478 + <tr class="odd"> 479 + <td>sample_type</td> 480 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata.atmosphere.schema.ST`">ST</a>]</td> 481 + <td>The PackableSample class to publish.</td> 482 + <td><em>required</em></td> 483 + </tr> 484 + <tr class="even"> 485 + <td>name</td> 486 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 487 + <td>Human-readable name. Defaults to the class name.</td> 488 + <td><code>None</code></td> 489 + </tr> 490 + <tr class="odd"> 491 + <td>version</td> 492 + <td><a href="`str`">str</a></td> 493 + <td>Semantic version string (e.g., ‘1.0.0’).</td> 494 + <td><code>'1.0.0'</code></td> 495 + </tr> 496 + <tr class="even"> 497 + <td>description</td> 498 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 499 + <td>Human-readable description.</td> 500 + <td><code>None</code></td> 501 + </tr> 502 + <tr class="odd"> 503 + <td>metadata</td> 504 + <td><a href="`typing.Optional`">Optional</a>[<a href="`dict`">dict</a>]</td> 505 + <td>Arbitrary metadata dictionary.</td> 506 + <td><code>None</code></td> 507 + </tr> 508 + <tr class="even"> 509 + <td>rkey</td> 510 + <td><a href="`typing.Optional`">Optional</a>[<a href="`str`">str</a>]</td> 511 + <td>Optional explicit record key. If not provided, a TID is generated.</td> 512 + <td><code>None</code></td> 513 + </tr> 514 + </tbody> 515 + </table> 516 + </section> 517 + <section id="returns" class="level4 doc-section doc-section-returns"> 518 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 519 + <table class="caption-top table"> 520 + <thead> 521 + <tr class="header"> 522 + <th>Name</th> 523 + <th>Type</th> 524 + <th>Description</th> 525 + </tr> 526 + </thead> 527 + <tbody> 528 + <tr class="odd"> 529 + <td></td> 530 + <td><a href="`atdata.atmosphere._types.AtUri`">AtUri</a></td> 531 + <td>The AT URI of the created schema record.</td> 532 + </tr> 533 + </tbody> 534 + </table> 535 + </section> 536 + <section id="raises" class="level4 doc-section doc-section-raises"> 537 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 538 + <table class="caption-top table"> 539 + <thead> 540 + <tr class="header"> 541 + <th>Name</th> 542 + <th>Type</th> 543 + <th>Description</th> 544 + </tr> 545 + </thead> 546 + <tbody> 547 + <tr class="odd"> 548 + <td></td> 549 + <td><a href="`ValueError`">ValueError</a></td> 550 + <td>If sample_type is not a dataclass or client is not authenticated.</td> 551 + </tr> 552 + <tr class="even"> 553 + <td></td> 554 + <td><a href="`TypeError`">TypeError</a></td> 555 + <td>If a field type is not supported.</td> 556 + </tr> 557 + </tbody> 558 + </table> 453 559 454 560 561 + </section> 455 562 </section> 456 563 </section> 457 564 </section>

+75 -15

docs/api/URLSource.html

··· 399 399 <li><a href="#atdata.URLSource" id="toc-atdata.URLSource" class="nav-link active" data-scroll-target="#atdata.URLSource">URLSource</a> 400 400 <ul class="collapse"> 401 401 <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 402 403 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 403 404 <ul class="collapse"> 404 405 <li><a href="#atdata.URLSource.list_shards" id="toc-atdata.URLSource.list_shards" class="nav-link" data-scroll-target="#atdata.URLSource.list_shards">list_shards</a></li> ··· 421 422 <p>Data source for WebDataset-compatible URLs.</p> 422 423 <p>Wraps WebDataset’s gopen to open URLs using built-in handlers for http, https, pipe, gs, hf, sftp, etc. Supports brace expansion for shard patterns like “data-{000..099}.tar”.</p> 423 424 <p>This is the default source type when a string URL is passed to Dataset.</p> 424 - <p>Attributes: url: URL or brace pattern for the shards.</p> 425 - <p>Example: >>> source = URLSource(“https://example.com/train-{000..009}.tar”) >>> for shard_id, stream in source.shards: … print(f”Streaming {shard_id}“)</p> 426 - <section id="attributes" class="level2"> 427 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 425 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 426 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 428 427 <table class="caption-top table"> 429 428 <thead> 430 429 <tr class="header"> 431 430 <th>Name</th> 431 + <th>Type</th> 432 432 <th>Description</th> 433 433 </tr> 434 434 </thead> 435 435 <tbody> 436 436 <tr class="odd"> 437 - <td><a href="#atdata.URLSource.shard_list">shard_list</a></td> 438 - <td>Expand brace pattern and return list of shard URLs (deprecated, use list_shards()).</td> 439 - </tr> 440 - <tr class="even"> 441 - <td><a href="#atdata.URLSource.shards">shards</a></td> 442 - <td>Lazily yield (url, stream) pairs for each shard.</td> 437 + <td>url</td> 438 + <td><a href="`str`">str</a></td> 439 + <td>URL or brace pattern for the shards.</td> 443 440 </tr> 444 441 </tbody> 445 442 </table> 446 443 </section> 444 + <section id="example" class="level2 doc-section doc-section-example"> 445 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 446 + <p>::</p> 447 + <pre><code>>>> source = URLSource("https://example.com/train-{000..009}.tar") 448 + >>> for shard_id, stream in source.shards: 449 + ... print(f"Streaming {shard_id}")</code></pre> 450 + </section> 447 451 <section id="methods" class="level2"> 448 452 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 449 453 <table class="caption-top table"> ··· 466 470 </table> 467 471 <section id="atdata.URLSource.list_shards" class="level3"> 468 472 <h3 class="anchored" data-anchor-id="atdata.URLSource.list_shards">list_shards</h3> 469 - <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>URLSource.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 473 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>URLSource.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 470 474 <p>Expand brace pattern and return list of shard URLs.</p> 471 475 </section> 472 476 <section id="atdata.URLSource.open_shard" class="level3"> 473 477 <h3 class="anchored" data-anchor-id="atdata.URLSource.open_shard">open_shard</h3> 474 - <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>URLSource.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 478 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>URLSource.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 475 479 <p>Open a single shard by URL.</p> 476 - <p>Args: shard_id: URL of the shard to open.</p> 477 - <p>Returns: File-like stream from gopen.</p> 478 - <p>Raises: KeyError: If shard_id is not in list_shards().</p> 480 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 481 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 482 + <table class="caption-top table"> 483 + <thead> 484 + <tr class="header"> 485 + <th>Name</th> 486 + <th>Type</th> 487 + <th>Description</th> 488 + <th>Default</th> 489 + </tr> 490 + </thead> 491 + <tbody> 492 + <tr class="odd"> 493 + <td>shard_id</td> 494 + <td><a href="`str`">str</a></td> 495 + <td>URL of the shard to open.</td> 496 + <td><em>required</em></td> 497 + </tr> 498 + </tbody> 499 + </table> 500 + </section> 501 + <section id="returns" class="level4 doc-section doc-section-returns"> 502 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 503 + <table class="caption-top table"> 504 + <thead> 505 + <tr class="header"> 506 + <th>Name</th> 507 + <th>Type</th> 508 + <th>Description</th> 509 + </tr> 510 + </thead> 511 + <tbody> 512 + <tr class="odd"> 513 + <td></td> 514 + <td><a href="`typing.IO`">IO</a>[<a href="`bytes`">bytes</a>]</td> 515 + <td>File-like stream from gopen.</td> 516 + </tr> 517 + </tbody> 518 + </table> 519 + </section> 520 + <section id="raises" class="level4 doc-section doc-section-raises"> 521 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 522 + <table class="caption-top table"> 523 + <thead> 524 + <tr class="header"> 525 + <th>Name</th> 526 + <th>Type</th> 527 + <th>Description</th> 528 + </tr> 529 + </thead> 530 + <tbody> 531 + <tr class="odd"> 532 + <td></td> 533 + <td><a href="`KeyError`">KeyError</a></td> 534 + <td>If shard_id is not in list_shards().</td> 535 + </tr> 536 + </tbody> 537 + </table> 479 538 480 539 540 + </section> 481 541 </section> 482 542 </section> 483 543 </section>

+136 -25

docs/api/load_dataset.html

··· 396 396 <h2 id="toc-title">On this page</h2> 397 397 398 398 <ul> 399 - <li><a href="#atdata.load_dataset" id="toc-atdata.load_dataset" class="nav-link active" data-scroll-target="#atdata.load_dataset">load_dataset</a></li> 399 + <li><a href="#atdata.load_dataset" id="toc-atdata.load_dataset" class="nav-link active" data-scroll-target="#atdata.load_dataset">load_dataset</a> 400 + <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 402 + <li><a href="#returns" id="toc-returns" class="nav-link" data-scroll-target="#returns">Returns</a></li> 403 + <li><a href="#raises" id="toc-raises" class="nav-link" data-scroll-target="#raises">Raises</a></li> 404 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 405 + </ul></li> 400 406 </ul> 401 407 <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/load_dataset.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 402 408 </div> ··· 421 427 <p>Load a dataset from local files, remote URLs, or an index.</p> 422 428 <p>This function provides a HuggingFace Datasets-style interface for loading atdata typed datasets. It handles path resolution, split detection, and returns either a single Dataset or a DatasetDict depending on the split parameter.</p> 423 429 <p>When no <code>sample_type</code> is provided, returns a <code>Dataset[DictSample]</code> that provides dynamic dict-like access to fields. Use <code>.as_type(MyType)</code> to convert to a typed schema.</p> 424 - <p>Args: path: Path to dataset. Can be: - Index lookup: “<span class="citation" data-cites="handle/dataset-name">@handle/dataset-name</span>” or “<span class="citation" data-cites="local/dataset-name">@local/dataset-name</span>” - WebDataset brace notation: “path/to/{train,test}-{000..099}.tar” - Local directory: “./data/” (scans for .tar files) - Glob pattern: “path/to/<em>.tar” - Remote URL: ”s3://bucket/path/data-</em>.tar” - Single file: “path/to/data.tar”</p> 425 - <pre><code>sample_type: The PackableSample subclass defining the schema. If None, 426 - returns ``Dataset[DictSample]`` with dynamic field access. Can also 427 - be resolved from an index when using @handle/dataset syntax. 428 - 429 - split: Which split to load. If None, returns a DatasetDict with all 430 - detected splits. If specified (e.g., "train", "test"), returns 431 - a single Dataset for that split. 432 - 433 - data_files: Optional explicit mapping of data files. Can be: 434 - - str: Single file pattern 435 - - list[str]: List of file patterns (assigned to "train") 436 - - dict[str, str | list[str]]: Explicit split -> files mapping 437 - 438 - streaming: If True, explicitly marks the dataset for streaming mode. 439 - Note: atdata Datasets are already lazy/streaming via WebDataset 440 - pipelines, so this parameter primarily signals intent. 441 - 442 - index: Optional AbstractIndex for dataset lookup. Required when using 443 - @handle/dataset syntax. When provided with an indexed path, the 444 - schema can be auto-resolved from the index.</code></pre> 445 - <p>Returns: If split is None: DatasetDict with all detected splits. If split is specified: Dataset for that split. Type is <code>ST</code> if sample_type provided, otherwise <code>DictSample</code>.</p> 446 - <p>Raises: ValueError: If the specified split is not found. FileNotFoundError: If no data files are found at the path. KeyError: If dataset not found in index.</p> 447 - <p>Example: >>> # Load without type - get DictSample for exploration >>> ds = load_dataset(“./data/train.tar”, split=“train”) >>> for sample in ds.ordered(): … print(sample.keys()) # Explore fields … print(sample[“text”]) # Dict-style access … print(sample.label) # Attribute access >>> >>> # Convert to typed schema >>> typed_ds = ds.as_type(TextData) >>> >>> # Or load with explicit type directly >>> train_ds = load_dataset(“./data/train-*.tar”, TextData, split=“train”) >>> >>> # Load from index with auto-type resolution >>> index = LocalIndex() >>> ds = load_dataset(“<span class="citation" data-cites="local/my-dataset">@local/my-dataset</span>”, index=index, split=“train”)</p> 430 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 431 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 432 + <table class="caption-top table"> 433 + <thead> 434 + <tr class="header"> 435 + <th>Name</th> 436 + <th>Type</th> 437 + <th>Description</th> 438 + <th>Default</th> 439 + </tr> 440 + </thead> 441 + <tbody> 442 + <tr class="odd"> 443 + <td>path</td> 444 + <td><a href="`str`">str</a></td> 445 + <td>Path to dataset. Can be: - Index lookup: “<span class="citation" data-cites="handle/dataset-name">@handle/dataset-name</span>” or “<span class="citation" data-cites="local/dataset-name">@local/dataset-name</span>” - WebDataset brace notation: “path/to/{train,test}-{000..099}.tar” - Local directory: “./data/” (scans for .tar files) - Glob pattern: “path/to/<em>.tar” - Remote URL: ”s3://bucket/path/data-</em>.tar” - Single file: “path/to/data.tar”</td> 446 + <td><em>required</em></td> 447 + </tr> 448 + <tr class="even"> 449 + <td>sample_type</td> 450 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._hf_api.ST`">ST</a>] | None</td> 451 + <td>The PackableSample subclass defining the schema. If None, returns <code>Dataset[DictSample]</code> with dynamic field access. Can also be resolved from an index when using <span class="citation" data-cites="handle/dataset">@handle/dataset</span> syntax.</td> 452 + <td><code>None</code></td> 453 + </tr> 454 + <tr class="odd"> 455 + <td>split</td> 456 + <td><a href="`str`">str</a> | None</td> 457 + <td>Which split to load. If None, returns a DatasetDict with all detected splits. If specified (e.g., “train”, “test”), returns a single Dataset for that split.</td> 458 + <td><code>None</code></td> 459 + </tr> 460 + <tr class="even"> 461 + <td>data_files</td> 462 + <td><a href="`str`">str</a> | <a href="`list`">list</a>[<a href="`str`">str</a>] | <a href="`dict`">dict</a>[<a href="`str`">str</a>, <a href="`str`">str</a> | <a href="`list`">list</a>[<a href="`str`">str</a>]] | None</td> 463 + <td>Optional explicit mapping of data files. Can be: - str: Single file pattern - list[str]: List of file patterns (assigned to “train”) - dict[str, str | list[str]]: Explicit split -> files mapping</td> 464 + <td><code>None</code></td> 465 + </tr> 466 + <tr class="odd"> 467 + <td>streaming</td> 468 + <td><a href="`bool`">bool</a></td> 469 + <td>If True, explicitly marks the dataset for streaming mode. Note: atdata Datasets are already lazy/streaming via WebDataset pipelines, so this parameter primarily signals intent.</td> 470 + <td><code>False</code></td> 471 + </tr> 472 + <tr class="even"> 473 + <td>index</td> 474 + <td><a href="`typing.Optional`">Optional</a>['AbstractIndex']</td> 475 + <td>Optional AbstractIndex for dataset lookup. Required when using <span class="citation" data-cites="handle/dataset">@handle/dataset</span> syntax. When provided with an indexed path, the schema can be auto-resolved from the index.</td> 476 + <td><code>None</code></td> 477 + </tr> 478 + </tbody> 479 + </table> 480 + </section> 481 + <section id="returns" class="level2 doc-section doc-section-returns"> 482 + <h2 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h2> 483 + <table class="caption-top table"> 484 + <thead> 485 + <tr class="header"> 486 + <th>Name</th> 487 + <th>Type</th> 488 + <th>Description</th> 489 + </tr> 490 + </thead> 491 + <tbody> 492 + <tr class="odd"> 493 + <td></td> 494 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata._hf_api.ST`">ST</a>] | <a href="`atdata._hf_api.DatasetDict`">DatasetDict</a>[<a href="`atdata._hf_api.ST`">ST</a>]</td> 495 + <td>If split is None: DatasetDict with all detected splits.</td> 496 + </tr> 497 + <tr class="even"> 498 + <td></td> 499 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata._hf_api.ST`">ST</a>] | <a href="`atdata._hf_api.DatasetDict`">DatasetDict</a>[<a href="`atdata._hf_api.ST`">ST</a>]</td> 500 + <td>If split is specified: Dataset for that split.</td> 501 + </tr> 502 + <tr class="odd"> 503 + <td></td> 504 + <td><a href="`atdata.dataset.Dataset`">Dataset</a>[<a href="`atdata._hf_api.ST`">ST</a>] | <a href="`atdata._hf_api.DatasetDict`">DatasetDict</a>[<a href="`atdata._hf_api.ST`">ST</a>]</td> 505 + <td>Type is <code>ST</code> if sample_type provided, otherwise <code>DictSample</code>.</td> 506 + </tr> 507 + </tbody> 508 + </table> 509 + </section> 510 + <section id="raises" class="level2 doc-section doc-section-raises"> 511 + <h2 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h2> 512 + <table class="caption-top table"> 513 + <thead> 514 + <tr class="header"> 515 + <th>Name</th> 516 + <th>Type</th> 517 + <th>Description</th> 518 + </tr> 519 + </thead> 520 + <tbody> 521 + <tr class="odd"> 522 + <td></td> 523 + <td><a href="`ValueError`">ValueError</a></td> 524 + <td>If the specified split is not found.</td> 525 + </tr> 526 + <tr class="even"> 527 + <td></td> 528 + <td><a href="`FileNotFoundError`">FileNotFoundError</a></td> 529 + <td>If no data files are found at the path.</td> 530 + </tr> 531 + <tr class="odd"> 532 + <td></td> 533 + <td><a href="`KeyError`">KeyError</a></td> 534 + <td>If dataset not found in index.</td> 535 + </tr> 536 + </tbody> 537 + </table> 538 + </section> 539 + <section id="example" class="level2 doc-section doc-section-example"> 540 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 541 + <p>::</p> 542 + <pre><code>>>> # Load without type - get DictSample for exploration 543 + >>> ds = load_dataset("./data/train.tar", split="train") 544 + >>> for sample in ds.ordered(): 545 + ... print(sample.keys()) # Explore fields 546 + ... print(sample["text"]) # Dict-style access 547 + ... print(sample.label) # Attribute access 548 + >>> 549 + >>> # Convert to typed schema 550 + >>> typed_ds = ds.as_type(TextData) 551 + >>> 552 + >>> # Or load with explicit type directly 553 + >>> train_ds = load_dataset("./data/train-*.tar", TextData, split="train") 554 + >>> 555 + >>> # Load from index with auto-type resolution 556 + >>> index = LocalIndex() 557 + >>> ds = load_dataset("@local/my-dataset", index=index, split="train")</code></pre> 448 558 449 559 560 + </section> 450 561 </section> 451 562 452 563 </main>

+869 -79

docs/api/local.Index.html

··· 441 441 <p>Redis-backed index for tracking datasets in a repository.</p> 442 442 <p>Implements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.</p> 443 443 <p>When initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.</p> 444 - <p>Attributes: _redis: Redis connection for index storage. _data_store: Optional AbstractDataStore for writing dataset shards.</p> 445 - <section id="attributes" class="level2"> 446 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 444 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 445 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 447 446 <table class="caption-top table"> 447 + <colgroup> 448 + <col style="width: 16%"> 449 + <col style="width: 10%"> 450 + <col style="width: 72%"> 451 + </colgroup> 448 452 <thead> 449 453 <tr class="header"> 450 454 <th>Name</th> 455 + <th>Type</th> 451 456 <th>Description</th> 452 457 </tr> 453 458 </thead> 454 459 <tbody> 455 460 <tr class="odd"> 456 - <td><a href="#atdata.local.Index.all_entries">all_entries</a></td> 457 - <td>Get all index entries as a list (deprecated, use list_entries()).</td> 461 + <td>_redis</td> 462 + <td></td> 463 + <td>Redis connection for index storage.</td> 458 464 </tr> 459 465 <tr class="even"> 460 - <td><a href="#atdata.local.Index.data_store">data_store</a></td> 461 - <td>The data store for writing shards, or None if index-only.</td> 462 - </tr> 463 - <tr class="odd"> 464 - <td><a href="#atdata.local.Index.datasets">datasets</a></td> 465 - <td>Lazily iterate over all dataset entries (AbstractIndex protocol).</td> 466 - </tr> 467 - <tr class="even"> 468 - <td><a href="#atdata.local.Index.entries">entries</a></td> 469 - <td>Iterate over all index entries.</td> 470 - </tr> 471 - <tr class="odd"> 472 - <td><a href="#atdata.local.Index.schemas">schemas</a></td> 473 - <td>Iterate over all schema records in this index.</td> 474 - </tr> 475 - <tr class="even"> 476 - <td><a href="#atdata.local.Index.stub_dir">stub_dir</a></td> 477 - <td>Directory where stub files are written, or None if auto-stubs disabled.</td> 478 - </tr> 479 - <tr class="odd"> 480 - <td><a href="#atdata.local.Index.types">types</a></td> 481 - <td>Namespace for accessing loaded schema types.</td> 466 + <td>_data_store</td> 467 + <td></td> 468 + <td>Optional AbstractDataStore for writing dataset shards.</td> 482 469 </tr> 483 470 </tbody> 484 471 </table> ··· 564 551 <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>local.Index.add_entry(ds, <span class="op">*</span>, name, schema_ref<span class="op">=</span><span class="va">None</span>, metadata<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 565 552 <p>Add a dataset to the index.</p> 566 553 <p>Creates a LocalDatasetEntry for the dataset and persists it to Redis.</p> 567 - <p>Args: ds: The dataset to add to the index. name: Human-readable name for the dataset. schema_ref: Optional schema reference. If None, generates from sample type. metadata: Optional metadata dictionary. If None, uses ds._metadata if available.</p> 568 - <p>Returns: The created LocalDatasetEntry object.</p> 554 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 555 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 556 + <table class="caption-top table"> 557 + <thead> 558 + <tr class="header"> 559 + <th>Name</th> 560 + <th>Type</th> 561 + <th>Description</th> 562 + <th>Default</th> 563 + </tr> 564 + </thead> 565 + <tbody> 566 + <tr class="odd"> 567 + <td>ds</td> 568 + <td><a href="`atdata.Dataset`">Dataset</a></td> 569 + <td>The dataset to add to the index.</td> 570 + <td><em>required</em></td> 571 + </tr> 572 + <tr class="even"> 573 + <td>name</td> 574 + <td><a href="`str`">str</a></td> 575 + <td>Human-readable name for the dataset.</td> 576 + <td><em>required</em></td> 577 + </tr> 578 + <tr class="odd"> 579 + <td>schema_ref</td> 580 + <td><a href="`str`">str</a> | None</td> 581 + <td>Optional schema reference. If None, generates from sample type.</td> 582 + <td><code>None</code></td> 583 + </tr> 584 + <tr class="even"> 585 + <td>metadata</td> 586 + <td><a href="`dict`">dict</a> | None</td> 587 + <td>Optional metadata dictionary. If None, uses ds._metadata if available.</td> 588 + <td><code>None</code></td> 589 + </tr> 590 + </tbody> 591 + </table> 592 + </section> 593 + <section id="returns" class="level4 doc-section doc-section-returns"> 594 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 595 + <table class="caption-top table"> 596 + <thead> 597 + <tr class="header"> 598 + <th>Name</th> 599 + <th>Type</th> 600 + <th>Description</th> 601 + </tr> 602 + </thead> 603 + <tbody> 604 + <tr class="odd"> 605 + <td></td> 606 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 607 + <td>The created LocalDatasetEntry object.</td> 608 + </tr> 609 + </tbody> 610 + </table> 611 + </section> 569 612 </section> 570 613 <section id="atdata.local.Index.clear_stubs" class="level3"> 571 614 <h3 class="anchored" data-anchor-id="atdata.local.Index.clear_stubs">clear_stubs</h3> 572 615 <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>local.Index.clear_stubs()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 573 616 <p>Remove all auto-generated stub files.</p> 574 617 <p>Only works if auto_stubs was enabled when creating the Index.</p> 575 - <p>Returns: Number of stub files removed, or 0 if auto_stubs is disabled.</p> 618 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 619 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 620 + <table class="caption-top table"> 621 + <thead> 622 + <tr class="header"> 623 + <th>Name</th> 624 + <th>Type</th> 625 + <th>Description</th> 626 + </tr> 627 + </thead> 628 + <tbody> 629 + <tr class="odd"> 630 + <td></td> 631 + <td><a href="`int`">int</a></td> 632 + <td>Number of stub files removed, or 0 if auto_stubs is disabled.</td> 633 + </tr> 634 + </tbody> 635 + </table> 636 + </section> 576 637 </section> 577 638 <section id="atdata.local.Index.decode_schema" class="level3"> 578 639 <h3 class="anchored" data-anchor-id="atdata.local.Index.decode_schema">decode_schema</h3> ··· 580 641 <p>Reconstruct a Python PackableSample type from a stored schema.</p> 581 642 <p>This method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.</p> 582 643 <p>If auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.</p> 583 - <p>Args: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).</p> 584 - <p>Returns: A PackableSample subclass - either imported from a generated module (if auto_stubs is enabled) or dynamically created.</p> 585 - <p>Raises: KeyError: If schema not found. ValueError: If schema cannot be decoded.</p> 644 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 645 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 646 + <table class="caption-top table"> 647 + <thead> 648 + <tr class="header"> 649 + <th>Name</th> 650 + <th>Type</th> 651 + <th>Description</th> 652 + <th>Default</th> 653 + </tr> 654 + </thead> 655 + <tbody> 656 + <tr class="odd"> 657 + <td>ref</td> 658 + <td><a href="`str`">str</a></td> 659 + <td>Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).</td> 660 + <td><em>required</em></td> 661 + </tr> 662 + </tbody> 663 + </table> 664 + </section> 665 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 666 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 667 + <table class="caption-top table"> 668 + <thead> 669 + <tr class="header"> 670 + <th>Name</th> 671 + <th>Type</th> 672 + <th>Description</th> 673 + </tr> 674 + </thead> 675 + <tbody> 676 + <tr class="odd"> 677 + <td></td> 678 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 679 + <td>A PackableSample subclass - either imported from a generated module</td> 680 + </tr> 681 + <tr class="even"> 682 + <td></td> 683 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 684 + <td>(if auto_stubs is enabled) or dynamically created.</td> 685 + </tr> 686 + </tbody> 687 + </table> 688 + </section> 689 + <section id="raises" class="level4 doc-section doc-section-raises"> 690 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 691 + <table class="caption-top table"> 692 + <thead> 693 + <tr class="header"> 694 + <th>Name</th> 695 + <th>Type</th> 696 + <th>Description</th> 697 + </tr> 698 + </thead> 699 + <tbody> 700 + <tr class="odd"> 701 + <td></td> 702 + <td><a href="`KeyError`">KeyError</a></td> 703 + <td>If schema not found.</td> 704 + </tr> 705 + <tr class="even"> 706 + <td></td> 707 + <td><a href="`ValueError`">ValueError</a></td> 708 + <td>If schema cannot be decoded.</td> 709 + </tr> 710 + </tbody> 711 + </table> 712 + </section> 586 713 </section> 587 714 <section id="atdata.local.Index.decode_schema_as" class="level3"> 588 715 <h3 class="anchored" data-anchor-id="atdata.local.Index.decode_schema_as">decode_schema_as</h3> 589 716 <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>local.Index.decode_schema_as(ref, type_hint)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 590 717 <p>Decode a schema with explicit type hint for IDE support.</p> 591 718 <p>This is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.</p> 592 - <p>Args: ref: Schema reference string. type_hint: The stub type to use for type hints. Import this from the generated stub file.</p> 593 - <p>Returns: The decoded type, cast to match the type_hint for IDE support.</p> 594 - <p>Example: >>> # After enabling auto_stubs and configuring IDE extraPaths: >>> from local.MySample_1_0_0 import MySample >>> >>> # This gives full IDE autocomplete: >>> DecodedType = index.decode_schema_as(ref, MySample) >>> sample = DecodedType(text=“hello”, value=42) # IDE knows signature!</p> 595 - <p>Note: The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.</p> 719 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 720 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 721 + <table class="caption-top table"> 722 + <thead> 723 + <tr class="header"> 724 + <th>Name</th> 725 + <th>Type</th> 726 + <th>Description</th> 727 + <th>Default</th> 728 + </tr> 729 + </thead> 730 + <tbody> 731 + <tr class="odd"> 732 + <td>ref</td> 733 + <td><a href="`str`">str</a></td> 734 + <td>Schema reference string.</td> 735 + <td><em>required</em></td> 736 + </tr> 737 + <tr class="even"> 738 + <td>type_hint</td> 739 + <td><a href="`type`">type</a>[<a href="`atdata.local.T`">T</a>]</td> 740 + <td>The stub type to use for type hints. Import this from the generated stub file.</td> 741 + <td><em>required</em></td> 742 + </tr> 743 + </tbody> 744 + </table> 745 + </section> 746 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 747 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 748 + <table class="caption-top table"> 749 + <thead> 750 + <tr class="header"> 751 + <th>Name</th> 752 + <th>Type</th> 753 + <th>Description</th> 754 + </tr> 755 + </thead> 756 + <tbody> 757 + <tr class="odd"> 758 + <td></td> 759 + <td><a href="`type`">type</a>[<a href="`atdata.local.T`">T</a>]</td> 760 + <td>The decoded type, cast to match the type_hint for IDE support.</td> 761 + </tr> 762 + </tbody> 763 + </table> 764 + </section> 765 + <section id="example" class="level4 doc-section doc-section-example"> 766 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h4> 767 + <p>::</p> 768 + <pre><code>>>> # After enabling auto_stubs and configuring IDE extraPaths: 769 + >>> from local.MySample_1_0_0 import MySample 770 + >>> 771 + >>> # This gives full IDE autocomplete: 772 + >>> DecodedType = index.decode_schema_as(ref, MySample) 773 + >>> sample = DecodedType(text="hello", value=42) # IDE knows signature!</code></pre> 774 + </section> 775 + <section id="note" class="level4 doc-section doc-section-note"> 776 + <h4 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h4> 777 + <p>The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.</p> 778 + </section> 596 779 </section> 597 780 <section id="atdata.local.Index.get_dataset" class="level3"> 598 781 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_dataset">get_dataset</h3> 599 - <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 782 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_dataset(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 600 783 <p>Get a dataset entry by name (AbstractIndex protocol).</p> 601 - <p>Args: ref: Dataset name.</p> 602 - <p>Returns: IndexEntry for the dataset.</p> 603 - <p>Raises: KeyError: If dataset not found.</p> 784 + <section id="parameters-3" class="level4 doc-section doc-section-parameters"> 785 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-3">Parameters</h4> 786 + <table class="caption-top table"> 787 + <thead> 788 + <tr class="header"> 789 + <th>Name</th> 790 + <th>Type</th> 791 + <th>Description</th> 792 + <th>Default</th> 793 + </tr> 794 + </thead> 795 + <tbody> 796 + <tr class="odd"> 797 + <td>ref</td> 798 + <td><a href="`str`">str</a></td> 799 + <td>Dataset name.</td> 800 + <td><em>required</em></td> 801 + </tr> 802 + </tbody> 803 + </table> 804 + </section> 805 + <section id="returns-4" class="level4 doc-section doc-section-returns"> 806 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-4">Returns</h4> 807 + <table class="caption-top table"> 808 + <thead> 809 + <tr class="header"> 810 + <th>Name</th> 811 + <th>Type</th> 812 + <th>Description</th> 813 + </tr> 814 + </thead> 815 + <tbody> 816 + <tr class="odd"> 817 + <td></td> 818 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 819 + <td>IndexEntry for the dataset.</td> 820 + </tr> 821 + </tbody> 822 + </table> 823 + </section> 824 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 825 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 826 + <table class="caption-top table"> 827 + <thead> 828 + <tr class="header"> 829 + <th>Name</th> 830 + <th>Type</th> 831 + <th>Description</th> 832 + </tr> 833 + </thead> 834 + <tbody> 835 + <tr class="odd"> 836 + <td></td> 837 + <td><a href="`KeyError`">KeyError</a></td> 838 + <td>If dataset not found.</td> 839 + </tr> 840 + </tbody> 841 + </table> 842 + </section> 604 843 </section> 605 844 <section id="atdata.local.Index.get_entry" class="level3"> 606 845 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_entry">get_entry</h3> 607 - <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_entry(cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 846 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_entry(cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 608 847 <p>Get an entry by its CID.</p> 609 - <p>Args: cid: Content identifier of the entry.</p> 610 - <p>Returns: LocalDatasetEntry for the given CID.</p> 611 - <p>Raises: KeyError: If entry not found.</p> 848 + <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 849 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 850 + <table class="caption-top table"> 851 + <thead> 852 + <tr class="header"> 853 + <th>Name</th> 854 + <th>Type</th> 855 + <th>Description</th> 856 + <th>Default</th> 857 + </tr> 858 + </thead> 859 + <tbody> 860 + <tr class="odd"> 861 + <td>cid</td> 862 + <td><a href="`str`">str</a></td> 863 + <td>Content identifier of the entry.</td> 864 + <td><em>required</em></td> 865 + </tr> 866 + </tbody> 867 + </table> 868 + </section> 869 + <section id="returns-5" class="level4 doc-section doc-section-returns"> 870 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-5">Returns</h4> 871 + <table class="caption-top table"> 872 + <thead> 873 + <tr class="header"> 874 + <th>Name</th> 875 + <th>Type</th> 876 + <th>Description</th> 877 + </tr> 878 + </thead> 879 + <tbody> 880 + <tr class="odd"> 881 + <td></td> 882 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 883 + <td>LocalDatasetEntry for the given CID.</td> 884 + </tr> 885 + </tbody> 886 + </table> 887 + </section> 888 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 889 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 890 + <table class="caption-top table"> 891 + <thead> 892 + <tr class="header"> 893 + <th>Name</th> 894 + <th>Type</th> 895 + <th>Description</th> 896 + </tr> 897 + </thead> 898 + <tbody> 899 + <tr class="odd"> 900 + <td></td> 901 + <td><a href="`KeyError`">KeyError</a></td> 902 + <td>If entry not found.</td> 903 + </tr> 904 + </tbody> 905 + </table> 906 + </section> 612 907 </section> 613 908 <section id="atdata.local.Index.get_entry_by_name" class="level3"> 614 909 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_entry_by_name">get_entry_by_name</h3> 615 - <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_entry_by_name(name)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 910 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_entry_by_name(name)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 616 911 <p>Get an entry by its human-readable name.</p> 617 - <p>Args: name: Human-readable name of the entry.</p> 618 - <p>Returns: LocalDatasetEntry with the given name.</p> 619 - <p>Raises: KeyError: If no entry with that name exists.</p> 912 + <section id="parameters-5" class="level4 doc-section doc-section-parameters"> 913 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-5">Parameters</h4> 914 + <table class="caption-top table"> 915 + <thead> 916 + <tr class="header"> 917 + <th>Name</th> 918 + <th>Type</th> 919 + <th>Description</th> 920 + <th>Default</th> 921 + </tr> 922 + </thead> 923 + <tbody> 924 + <tr class="odd"> 925 + <td>name</td> 926 + <td><a href="`str`">str</a></td> 927 + <td>Human-readable name of the entry.</td> 928 + <td><em>required</em></td> 929 + </tr> 930 + </tbody> 931 + </table> 932 + </section> 933 + <section id="returns-6" class="level4 doc-section doc-section-returns"> 934 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-6">Returns</h4> 935 + <table class="caption-top table"> 936 + <thead> 937 + <tr class="header"> 938 + <th>Name</th> 939 + <th>Type</th> 940 + <th>Description</th> 941 + </tr> 942 + </thead> 943 + <tbody> 944 + <tr class="odd"> 945 + <td></td> 946 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 947 + <td>LocalDatasetEntry with the given name.</td> 948 + </tr> 949 + </tbody> 950 + </table> 951 + </section> 952 + <section id="raises-3" class="level4 doc-section doc-section-raises"> 953 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-3">Raises</h4> 954 + <table class="caption-top table"> 955 + <thead> 956 + <tr class="header"> 957 + <th>Name</th> 958 + <th>Type</th> 959 + <th>Description</th> 960 + </tr> 961 + </thead> 962 + <tbody> 963 + <tr class="odd"> 964 + <td></td> 965 + <td><a href="`KeyError`">KeyError</a></td> 966 + <td>If no entry with that name exists.</td> 967 + </tr> 968 + </tbody> 969 + </table> 970 + </section> 620 971 </section> 621 972 <section id="atdata.local.Index.get_import_path" class="level3"> 622 973 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_import_path">get_import_path</h3> 623 - <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_import_path(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 974 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_import_path(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 624 975 <p>Get the import path for a schema’s generated module.</p> 625 976 <p>When auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.</p> 626 - <p>Args: ref: Schema reference string.</p> 627 - <p>Returns: Import path like “local.MySample_1_0_0”, or None if auto_stubs is disabled.</p> 628 - <p>Example: >>> index = LocalIndex(auto_stubs=True) >>> ref = index.publish_schema(MySample, version=“1.0.0”) >>> index.load_schema(ref) >>> print(index.get_import_path(ref)) local.MySample_1_0_0 >>> # Then in your code: >>> # from local.MySample_1_0_0 import MySample</p> 977 + <section id="parameters-6" class="level4 doc-section doc-section-parameters"> 978 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-6">Parameters</h4> 979 + <table class="caption-top table"> 980 + <thead> 981 + <tr class="header"> 982 + <th>Name</th> 983 + <th>Type</th> 984 + <th>Description</th> 985 + <th>Default</th> 986 + </tr> 987 + </thead> 988 + <tbody> 989 + <tr class="odd"> 990 + <td>ref</td> 991 + <td><a href="`str`">str</a></td> 992 + <td>Schema reference string.</td> 993 + <td><em>required</em></td> 994 + </tr> 995 + </tbody> 996 + </table> 997 + </section> 998 + <section id="returns-7" class="level4 doc-section doc-section-returns"> 999 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-7">Returns</h4> 1000 + <table class="caption-top table"> 1001 + <thead> 1002 + <tr class="header"> 1003 + <th>Name</th> 1004 + <th>Type</th> 1005 + <th>Description</th> 1006 + </tr> 1007 + </thead> 1008 + <tbody> 1009 + <tr class="odd"> 1010 + <td></td> 1011 + <td><a href="`str`">str</a> | None</td> 1012 + <td>Import path like “local.MySample_1_0_0”, or None if auto_stubs</td> 1013 + </tr> 1014 + <tr class="even"> 1015 + <td></td> 1016 + <td><a href="`str`">str</a> | None</td> 1017 + <td>is disabled.</td> 1018 + </tr> 1019 + </tbody> 1020 + </table> 1021 + </section> 1022 + <section id="example-1" class="level4 doc-section doc-section-example"> 1023 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-1">Example</h4> 1024 + <p>::</p> 1025 + <pre><code>>>> index = LocalIndex(auto_stubs=True) 1026 + >>> ref = index.publish_schema(MySample, version="1.0.0") 1027 + >>> index.load_schema(ref) 1028 + >>> print(index.get_import_path(ref)) 1029 + local.MySample_1_0_0 1030 + >>> # Then in your code: 1031 + >>> # from local.MySample_1_0_0 import MySample</code></pre> 1032 + </section> 629 1033 </section> 630 1034 <section id="atdata.local.Index.get_schema" class="level3"> 631 1035 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_schema">get_schema</h3> 632 - <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1036 + <div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 633 1037 <p>Get a schema record by reference (AbstractIndex protocol).</p> 634 - <p>Args: ref: Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}<span class="citation" data-cites="version">@version</span>) and legacy format (local://schemas/{module.Class}<span class="citation" data-cites="version">@version</span>).</p> 635 - <p>Returns: Schema record as a dictionary with keys ‘name’, ‘version’, ‘fields’, ‘$ref’, etc.</p> 636 - <p>Raises: KeyError: If schema not found. ValueError: If reference format is invalid.</p> 1038 + <section id="parameters-7" class="level4 doc-section doc-section-parameters"> 1039 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-7">Parameters</h4> 1040 + <table class="caption-top table"> 1041 + <thead> 1042 + <tr class="header"> 1043 + <th>Name</th> 1044 + <th>Type</th> 1045 + <th>Description</th> 1046 + <th>Default</th> 1047 + </tr> 1048 + </thead> 1049 + <tbody> 1050 + <tr class="odd"> 1051 + <td>ref</td> 1052 + <td><a href="`str`">str</a></td> 1053 + <td>Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}<span class="citation" data-cites="version">@version</span>) and legacy format (local://schemas/{module.Class}<span class="citation" data-cites="version">@version</span>).</td> 1054 + <td><em>required</em></td> 1055 + </tr> 1056 + </tbody> 1057 + </table> 1058 + </section> 1059 + <section id="returns-8" class="level4 doc-section doc-section-returns"> 1060 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-8">Returns</h4> 1061 + <table class="caption-top table"> 1062 + <thead> 1063 + <tr class="header"> 1064 + <th>Name</th> 1065 + <th>Type</th> 1066 + <th>Description</th> 1067 + </tr> 1068 + </thead> 1069 + <tbody> 1070 + <tr class="odd"> 1071 + <td></td> 1072 + <td><a href="`dict`">dict</a></td> 1073 + <td>Schema record as a dictionary with keys ‘name’, ‘version’,</td> 1074 + </tr> 1075 + <tr class="even"> 1076 + <td></td> 1077 + <td><a href="`dict`">dict</a></td> 1078 + <td>‘fields’, ‘$ref’, etc.</td> 1079 + </tr> 1080 + </tbody> 1081 + </table> 1082 + </section> 1083 + <section id="raises-4" class="level4 doc-section doc-section-raises"> 1084 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-4">Raises</h4> 1085 + <table class="caption-top table"> 1086 + <thead> 1087 + <tr class="header"> 1088 + <th>Name</th> 1089 + <th>Type</th> 1090 + <th>Description</th> 1091 + </tr> 1092 + </thead> 1093 + <tbody> 1094 + <tr class="odd"> 1095 + <td></td> 1096 + <td><a href="`KeyError`">KeyError</a></td> 1097 + <td>If schema not found.</td> 1098 + </tr> 1099 + <tr class="even"> 1100 + <td></td> 1101 + <td><a href="`ValueError`">ValueError</a></td> 1102 + <td>If reference format is invalid.</td> 1103 + </tr> 1104 + </tbody> 1105 + </table> 1106 + </section> 637 1107 </section> 638 1108 <section id="atdata.local.Index.get_schema_record" class="level3"> 639 1109 <h3 class="anchored" data-anchor-id="atdata.local.Index.get_schema_record">get_schema_record</h3> 640 - <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_schema_record(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1110 + <div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>local.Index.get_schema_record(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 641 1111 <p>Get a schema record as LocalSchemaRecord object.</p> 642 1112 <p>Use this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.</p> 643 - <p>Args: ref: Schema reference string.</p> 644 - <p>Returns: LocalSchemaRecord with schema details.</p> 645 - <p>Raises: KeyError: If schema not found. ValueError: If reference format is invalid.</p> 1113 + <section id="parameters-8" class="level4 doc-section doc-section-parameters"> 1114 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-8">Parameters</h4> 1115 + <table class="caption-top table"> 1116 + <thead> 1117 + <tr class="header"> 1118 + <th>Name</th> 1119 + <th>Type</th> 1120 + <th>Description</th> 1121 + <th>Default</th> 1122 + </tr> 1123 + </thead> 1124 + <tbody> 1125 + <tr class="odd"> 1126 + <td>ref</td> 1127 + <td><a href="`str`">str</a></td> 1128 + <td>Schema reference string.</td> 1129 + <td><em>required</em></td> 1130 + </tr> 1131 + </tbody> 1132 + </table> 1133 + </section> 1134 + <section id="returns-9" class="level4 doc-section doc-section-returns"> 1135 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-9">Returns</h4> 1136 + <table class="caption-top table"> 1137 + <thead> 1138 + <tr class="header"> 1139 + <th>Name</th> 1140 + <th>Type</th> 1141 + <th>Description</th> 1142 + </tr> 1143 + </thead> 1144 + <tbody> 1145 + <tr class="odd"> 1146 + <td></td> 1147 + <td><a href="`atdata.local.LocalSchemaRecord`">LocalSchemaRecord</a></td> 1148 + <td>LocalSchemaRecord with schema details.</td> 1149 + </tr> 1150 + </tbody> 1151 + </table> 1152 + </section> 1153 + <section id="raises-5" class="level4 doc-section doc-section-raises"> 1154 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-5">Raises</h4> 1155 + <table class="caption-top table"> 1156 + <thead> 1157 + <tr class="header"> 1158 + <th>Name</th> 1159 + <th>Type</th> 1160 + <th>Description</th> 1161 + </tr> 1162 + </thead> 1163 + <tbody> 1164 + <tr class="odd"> 1165 + <td></td> 1166 + <td><a href="`KeyError`">KeyError</a></td> 1167 + <td>If schema not found.</td> 1168 + </tr> 1169 + <tr class="even"> 1170 + <td></td> 1171 + <td><a href="`ValueError`">ValueError</a></td> 1172 + <td>If reference format is invalid.</td> 1173 + </tr> 1174 + </tbody> 1175 + </table> 1176 + </section> 646 1177 </section> 647 1178 <section id="atdata.local.Index.insert_dataset" class="level3"> 648 1179 <h3 class="anchored" data-anchor-id="atdata.local.Index.insert_dataset">insert_dataset</h3> 649 - <div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>local.Index.insert_dataset(ds, <span class="op">*</span>, name, schema_ref<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1180 + <div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>local.Index.insert_dataset(ds, <span class="op">*</span>, name, schema_ref<span class="op">=</span><span class="va">None</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 650 1181 <p>Insert a dataset into the index (AbstractIndex protocol).</p> 651 1182 <p>If a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.</p> 652 - <p>Args: ds: The Dataset to register. name: Human-readable name for the dataset. schema_ref: Optional schema reference. **kwargs: Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first</p> 653 - <p>Returns: IndexEntry for the inserted dataset.</p> 1183 + <section id="parameters-9" class="level4 doc-section doc-section-parameters"> 1184 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-9">Parameters</h4> 1185 + <table class="caption-top table"> 1186 + <thead> 1187 + <tr class="header"> 1188 + <th>Name</th> 1189 + <th>Type</th> 1190 + <th>Description</th> 1191 + <th>Default</th> 1192 + </tr> 1193 + </thead> 1194 + <tbody> 1195 + <tr class="odd"> 1196 + <td>ds</td> 1197 + <td><a href="`atdata.Dataset`">Dataset</a></td> 1198 + <td>The Dataset to register.</td> 1199 + <td><em>required</em></td> 1200 + </tr> 1201 + <tr class="even"> 1202 + <td>name</td> 1203 + <td><a href="`str`">str</a></td> 1204 + <td>Human-readable name for the dataset.</td> 1205 + <td><em>required</em></td> 1206 + </tr> 1207 + <tr class="odd"> 1208 + <td>schema_ref</td> 1209 + <td><a href="`str`">str</a> | None</td> 1210 + <td>Optional schema reference.</td> 1211 + <td><code>None</code></td> 1212 + </tr> 1213 + <tr class="even"> 1214 + <td>**kwargs</td> 1215 + <td></td> 1216 + <td>Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first</td> 1217 + <td><code>{}</code></td> 1218 + </tr> 1219 + </tbody> 1220 + </table> 1221 + </section> 1222 + <section id="returns-10" class="level4 doc-section doc-section-returns"> 1223 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-10">Returns</h4> 1224 + <table class="caption-top table"> 1225 + <thead> 1226 + <tr class="header"> 1227 + <th>Name</th> 1228 + <th>Type</th> 1229 + <th>Description</th> 1230 + </tr> 1231 + </thead> 1232 + <tbody> 1233 + <tr class="odd"> 1234 + <td></td> 1235 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 1236 + <td>IndexEntry for the inserted dataset.</td> 1237 + </tr> 1238 + </tbody> 1239 + </table> 1240 + </section> 654 1241 </section> 655 1242 <section id="atdata.local.Index.list_datasets" class="level3"> 656 1243 <h3 class="anchored" data-anchor-id="atdata.local.Index.list_datasets">list_datasets</h3> 657 - <div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_datasets()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1244 + <div class="sourceCode" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_datasets()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 658 1245 <p>Get all dataset entries as a materialized list (AbstractIndex protocol).</p> 659 - <p>Returns: List of IndexEntry for each dataset.</p> 1246 + <section id="returns-11" class="level4 doc-section doc-section-returns"> 1247 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-11">Returns</h4> 1248 + <table class="caption-top table"> 1249 + <thead> 1250 + <tr class="header"> 1251 + <th>Name</th> 1252 + <th>Type</th> 1253 + <th>Description</th> 1254 + </tr> 1255 + </thead> 1256 + <tbody> 1257 + <tr class="odd"> 1258 + <td></td> 1259 + <td><a href="`list`">list</a>[<a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a>]</td> 1260 + <td>List of IndexEntry for each dataset.</td> 1261 + </tr> 1262 + </tbody> 1263 + </table> 1264 + </section> 660 1265 </section> 661 1266 <section id="atdata.local.Index.list_entries" class="level3"> 662 1267 <h3 class="anchored" data-anchor-id="atdata.local.Index.list_entries">list_entries</h3> 663 - <div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_entries()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1268 + <div class="sourceCode" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_entries()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 664 1269 <p>Get all index entries as a materialized list.</p> 665 - <p>Returns: List of all LocalDatasetEntry objects in the index.</p> 1270 + <section id="returns-12" class="level4 doc-section doc-section-returns"> 1271 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-12">Returns</h4> 1272 + <table class="caption-top table"> 1273 + <thead> 1274 + <tr class="header"> 1275 + <th>Name</th> 1276 + <th>Type</th> 1277 + <th>Description</th> 1278 + </tr> 1279 + </thead> 1280 + <tbody> 1281 + <tr class="odd"> 1282 + <td></td> 1283 + <td><a href="`list`">list</a>[<a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a>]</td> 1284 + <td>List of all LocalDatasetEntry objects in the index.</td> 1285 + </tr> 1286 + </tbody> 1287 + </table> 1288 + </section> 666 1289 </section> 667 1290 <section id="atdata.local.Index.list_schemas" class="level3"> 668 1291 <h3 class="anchored" data-anchor-id="atdata.local.Index.list_schemas">list_schemas</h3> 669 - <div class="sourceCode" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_schemas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1292 + <div class="sourceCode" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>local.Index.list_schemas()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 670 1293 <p>Get all schema records as a materialized list (AbstractIndex protocol).</p> 671 - <p>Returns: List of schema records as dictionaries.</p> 1294 + <section id="returns-13" class="level4 doc-section doc-section-returns"> 1295 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-13">Returns</h4> 1296 + <table class="caption-top table"> 1297 + <thead> 1298 + <tr class="header"> 1299 + <th>Name</th> 1300 + <th>Type</th> 1301 + <th>Description</th> 1302 + </tr> 1303 + </thead> 1304 + <tbody> 1305 + <tr class="odd"> 1306 + <td></td> 1307 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 1308 + <td>List of schema records as dictionaries.</td> 1309 + </tr> 1310 + </tbody> 1311 + </table> 1312 + </section> 672 1313 </section> 673 1314 <section id="atdata.local.Index.load_schema" class="level3"> 674 1315 <h3 class="anchored" data-anchor-id="atdata.local.Index.load_schema">load_schema</h3> 675 - <div class="sourceCode" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>local.Index.load_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1316 + <div class="sourceCode" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>local.Index.load_schema(ref)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 676 1317 <p>Load a schema and make it available in the types namespace.</p> 677 1318 <p>This method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:<code>types</code> namespace for easy access.</p> 678 - <p>Args: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).</p> 679 - <p>Returns: The decoded PackableSample subclass. Also available via <code>index.types.<ClassName></code> after this call.</p> 680 - <p>Raises: KeyError: If schema not found. ValueError: If schema cannot be decoded.</p> 681 - <p>Example: >>> # Load and use immediately >>> MyType = index.load_schema(“atdata://local/sampleSchema/MySample@1.0.0”) >>> sample = MyType(name=“hello”, value=42) >>> >>> # Or access later via namespace >>> index.load_schema(“atdata://local/sampleSchema/OtherType@1.0.0”) >>> other = index.types.OtherType(data=“test”)</p> 1319 + <section id="parameters-10" class="level4 doc-section doc-section-parameters"> 1320 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-10">Parameters</h4> 1321 + <table class="caption-top table"> 1322 + <thead> 1323 + <tr class="header"> 1324 + <th>Name</th> 1325 + <th>Type</th> 1326 + <th>Description</th> 1327 + <th>Default</th> 1328 + </tr> 1329 + </thead> 1330 + <tbody> 1331 + <tr class="odd"> 1332 + <td>ref</td> 1333 + <td><a href="`str`">str</a></td> 1334 + <td>Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).</td> 1335 + <td><em>required</em></td> 1336 + </tr> 1337 + </tbody> 1338 + </table> 1339 + </section> 1340 + <section id="returns-14" class="level4 doc-section doc-section-returns"> 1341 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-14">Returns</h4> 1342 + <table class="caption-top table"> 1343 + <thead> 1344 + <tr class="header"> 1345 + <th>Name</th> 1346 + <th>Type</th> 1347 + <th>Description</th> 1348 + </tr> 1349 + </thead> 1350 + <tbody> 1351 + <tr class="odd"> 1352 + <td></td> 1353 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 1354 + <td>The decoded PackableSample subclass. Also available via</td> 1355 + </tr> 1356 + <tr class="even"> 1357 + <td></td> 1358 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 1359 + <td><code>index.types.<ClassName></code> after this call.</td> 1360 + </tr> 1361 + </tbody> 1362 + </table> 1363 + </section> 1364 + <section id="raises-6" class="level4 doc-section doc-section-raises"> 1365 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-6">Raises</h4> 1366 + <table class="caption-top table"> 1367 + <thead> 1368 + <tr class="header"> 1369 + <th>Name</th> 1370 + <th>Type</th> 1371 + <th>Description</th> 1372 + </tr> 1373 + </thead> 1374 + <tbody> 1375 + <tr class="odd"> 1376 + <td></td> 1377 + <td><a href="`KeyError`">KeyError</a></td> 1378 + <td>If schema not found.</td> 1379 + </tr> 1380 + <tr class="even"> 1381 + <td></td> 1382 + <td><a href="`ValueError`">ValueError</a></td> 1383 + <td>If schema cannot be decoded.</td> 1384 + </tr> 1385 + </tbody> 1386 + </table> 1387 + </section> 1388 + <section id="example-2" class="level4 doc-section doc-section-example"> 1389 + <h4 class="doc-section doc-section-example anchored" data-anchor-id="example-2">Example</h4> 1390 + <p>::</p> 1391 + <pre><code>>>> # Load and use immediately 1392 + >>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 1393 + >>> sample = MyType(name="hello", value=42) 1394 + >>> 1395 + >>> # Or access later via namespace 1396 + >>> index.load_schema("atdata://local/sampleSchema/OtherType@1.0.0") 1397 + >>> other = index.types.OtherType(data="test")</code></pre> 1398 + </section> 682 1399 </section> 683 1400 <section id="atdata.local.Index.publish_schema" class="level3"> 684 1401 <h3 class="anchored" data-anchor-id="atdata.local.Index.publish_schema">publish_schema</h3> 685 - <div class="sourceCode" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>local.Index.publish_schema(sample_type, <span class="op">*</span>, version<span class="op">=</span><span class="va">None</span>, description<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1402 + <div class="sourceCode" id="cb20"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a>local.Index.publish_schema(sample_type, <span class="op">*</span>, version<span class="op">=</span><span class="va">None</span>, description<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 686 1403 <p>Publish a schema for a sample type to Redis.</p> 687 - <p>Args: sample_type: The PackableSample subclass to publish. version: Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists. description: Optional human-readable description. If None, uses the class docstring.</p> 688 - <p>Returns: Schema reference string: ‘atdata://local/sampleSchema/{name}<span class="citation" data-cites="version">@version</span>’.</p> 689 - <p>Raises: ValueError: If sample_type is not a dataclass. TypeError: If a field type is not supported.</p> 1404 + <section id="parameters-11" class="level4 doc-section doc-section-parameters"> 1405 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-11">Parameters</h4> 1406 + <table class="caption-top table"> 1407 + <thead> 1408 + <tr class="header"> 1409 + <th>Name</th> 1410 + <th>Type</th> 1411 + <th>Description</th> 1412 + <th>Default</th> 1413 + </tr> 1414 + </thead> 1415 + <tbody> 1416 + <tr class="odd"> 1417 + <td>sample_type</td> 1418 + <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 1419 + <td>The PackableSample subclass to publish.</td> 1420 + <td><em>required</em></td> 1421 + </tr> 1422 + <tr class="even"> 1423 + <td>version</td> 1424 + <td><a href="`str`">str</a> | None</td> 1425 + <td>Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.</td> 1426 + <td><code>None</code></td> 1427 + </tr> 1428 + <tr class="odd"> 1429 + <td>description</td> 1430 + <td><a href="`str`">str</a> | None</td> 1431 + <td>Optional human-readable description. If None, uses the class docstring.</td> 1432 + <td><code>None</code></td> 1433 + </tr> 1434 + </tbody> 1435 + </table> 1436 + </section> 1437 + <section id="returns-15" class="level4 doc-section doc-section-returns"> 1438 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-15">Returns</h4> 1439 + <table class="caption-top table"> 1440 + <thead> 1441 + <tr class="header"> 1442 + <th>Name</th> 1443 + <th>Type</th> 1444 + <th>Description</th> 1445 + </tr> 1446 + </thead> 1447 + <tbody> 1448 + <tr class="odd"> 1449 + <td></td> 1450 + <td><a href="`str`">str</a></td> 1451 + <td>Schema reference string: ‘atdata://local/sampleSchema/{name}<span class="citation" data-cites="version">@version</span>’.</td> 1452 + </tr> 1453 + </tbody> 1454 + </table> 1455 + </section> 1456 + <section id="raises-7" class="level4 doc-section doc-section-raises"> 1457 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-7">Raises</h4> 1458 + <table class="caption-top table"> 1459 + <thead> 1460 + <tr class="header"> 1461 + <th>Name</th> 1462 + <th>Type</th> 1463 + <th>Description</th> 1464 + </tr> 1465 + </thead> 1466 + <tbody> 1467 + <tr class="odd"> 1468 + <td></td> 1469 + <td><a href="`ValueError`">ValueError</a></td> 1470 + <td>If sample_type is not a dataclass.</td> 1471 + </tr> 1472 + <tr class="even"> 1473 + <td></td> 1474 + <td><a href="`TypeError`">TypeError</a></td> 1475 + <td>If a field type is not supported.</td> 1476 + </tr> 1477 + </tbody> 1478 + </table> 690 1479 691 1480 1481 + </section> 692 1482 </section> 693 1483 </section> 694 1484 </section>

+101 -27

docs/api/local.LocalDatasetEntry.html

··· 428 428 <p>Index entry for a dataset stored in the local repository.</p> 429 429 <p>Implements the IndexEntry protocol for compatibility with AbstractIndex. Uses dual identity: a content-addressable CID (ATProto-compatible) and a human-readable name.</p> 430 430 <p>The CID is generated from the entry’s content (schema_ref + data_urls), ensuring the same data produces the same CID whether stored locally or in the atmosphere. This enables seamless promotion from local to ATProto.</p> 431 - <p>Attributes: name: Human-readable name for this dataset. schema_ref: Reference to the schema for this dataset. data_urls: WebDataset URLs for the data. metadata: Arbitrary metadata dictionary, or None if not set.</p> 432 - <section id="attributes" class="level2"> 433 - <h2 class="anchored" data-anchor-id="attributes">Attributes</h2> 431 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 432 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 434 433 <table class="caption-top table"> 435 434 <thead> 436 435 <tr class="header"> 437 436 <th>Name</th> 437 + <th>Type</th> 438 438 <th>Description</th> 439 439 </tr> 440 440 </thead> 441 441 <tbody> 442 442 <tr class="odd"> 443 - <td><a href="#atdata.local.LocalDatasetEntry.cid">cid</a></td> 444 - <td>Content identifier (ATProto-compatible CID).</td> 445 - </tr> 446 - <tr class="even"> 447 - <td><a href="#atdata.local.LocalDatasetEntry.data_urls">data_urls</a></td> 448 - <td>WebDataset URLs for the data.</td> 449 - </tr> 450 - <tr class="odd"> 451 - <td><a href="#atdata.local.LocalDatasetEntry.metadata">metadata</a></td> 452 - <td>Arbitrary metadata dictionary, or None if not set.</td> 453 - </tr> 454 - <tr class="even"> 455 - <td><a href="#atdata.local.LocalDatasetEntry.name">name</a></td> 443 + <td>name</td> 444 + <td><a href="`str`">str</a></td> 456 445 <td>Human-readable name for this dataset.</td> 457 446 </tr> 458 - <tr class="odd"> 459 - <td><a href="#atdata.local.LocalDatasetEntry.sample_kind">sample_kind</a></td> 460 - <td>Legacy property: returns schema_ref for backwards compatibility.</td> 461 - </tr> 462 447 <tr class="even"> 463 - <td><a href="#atdata.local.LocalDatasetEntry.schema_ref">schema_ref</a></td> 448 + <td>schema_ref</td> 449 + <td><a href="`str`">str</a></td> 464 450 <td>Reference to the schema for this dataset.</td> 465 451 </tr> 466 452 <tr class="odd"> 467 - <td><a href="#atdata.local.LocalDatasetEntry.wds_url">wds_url</a></td> 468 - <td>Legacy property: returns first data URL for backwards compatibility.</td> 453 + <td>data_urls</td> 454 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 455 + <td>WebDataset URLs for the data.</td> 456 + </tr> 457 + <tr class="even"> 458 + <td>metadata</td> 459 + <td><a href="`dict`">dict</a> | None</td> 460 + <td>Arbitrary metadata dictionary, or None if not set.</td> 469 461 </tr> 470 462 </tbody> 471 463 </table> ··· 494 486 <h3 class="anchored" data-anchor-id="atdata.local.LocalDatasetEntry.from_redis">from_redis</h3> 495 487 <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>local.LocalDatasetEntry.from_redis(redis, cid)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 496 488 <p>Load an entry from Redis by CID.</p> 497 - <p>Args: redis: Redis connection to read from. cid: Content identifier of the entry to load.</p> 498 - <p>Returns: LocalDatasetEntry loaded from Redis.</p> 499 - <p>Raises: KeyError: If entry not found.</p> 489 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 490 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 491 + <table class="caption-top table"> 492 + <thead> 493 + <tr class="header"> 494 + <th>Name</th> 495 + <th>Type</th> 496 + <th>Description</th> 497 + <th>Default</th> 498 + </tr> 499 + </thead> 500 + <tbody> 501 + <tr class="odd"> 502 + <td>redis</td> 503 + <td><a href="`redis.Redis`">Redis</a></td> 504 + <td>Redis connection to read from.</td> 505 + <td><em>required</em></td> 506 + </tr> 507 + <tr class="even"> 508 + <td>cid</td> 509 + <td><a href="`str`">str</a></td> 510 + <td>Content identifier of the entry to load.</td> 511 + <td><em>required</em></td> 512 + </tr> 513 + </tbody> 514 + </table> 515 + </section> 516 + <section id="returns" class="level4 doc-section doc-section-returns"> 517 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 518 + <table class="caption-top table"> 519 + <thead> 520 + <tr class="header"> 521 + <th>Name</th> 522 + <th>Type</th> 523 + <th>Description</th> 524 + </tr> 525 + </thead> 526 + <tbody> 527 + <tr class="odd"> 528 + <td></td> 529 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 530 + <td>LocalDatasetEntry loaded from Redis.</td> 531 + </tr> 532 + </tbody> 533 + </table> 534 + </section> 535 + <section id="raises" class="level4 doc-section doc-section-raises"> 536 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 537 + <table class="caption-top table"> 538 + <thead> 539 + <tr class="header"> 540 + <th>Name</th> 541 + <th>Type</th> 542 + <th>Description</th> 543 + </tr> 544 + </thead> 545 + <tbody> 546 + <tr class="odd"> 547 + <td></td> 548 + <td><a href="`KeyError`">KeyError</a></td> 549 + <td>If entry not found.</td> 550 + </tr> 551 + </tbody> 552 + </table> 553 + </section> 500 554 </section> 501 555 <section id="atdata.local.LocalDatasetEntry.write_to" class="level3"> 502 556 <h3 class="anchored" data-anchor-id="atdata.local.LocalDatasetEntry.write_to">write_to</h3> 503 557 <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>local.LocalDatasetEntry.write_to(redis)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 504 558 <p>Persist this index entry to Redis.</p> 505 559 <p>Stores the entry as a Redis hash with key ‘{REDIS_KEY_DATASET_ENTRY}:{cid}’.</p> 506 - <p>Args: redis: Redis connection to write to.</p> 560 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 561 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 562 + <table class="caption-top table"> 563 + <thead> 564 + <tr class="header"> 565 + <th>Name</th> 566 + <th>Type</th> 567 + <th>Description</th> 568 + <th>Default</th> 569 + </tr> 570 + </thead> 571 + <tbody> 572 + <tr class="odd"> 573 + <td>redis</td> 574 + <td><a href="`redis.Redis`">Redis</a></td> 575 + <td>Redis connection to write to.</td> 576 + <td><em>required</em></td> 577 + </tr> 578 + </tbody> 579 + </table> 507 580 508 581 582 + </section> 509 583 </section> 510 584 </section> 511 585 </section>

+171 -7

docs/api/local.S3DataStore.html

··· 398 398 <ul> 399 399 <li><a href="#atdata.local.S3DataStore" id="toc-atdata.local.S3DataStore" class="nav-link active" data-scroll-target="#atdata.local.S3DataStore">local.S3DataStore</a> 400 400 <ul class="collapse"> 401 + <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 401 402 <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 402 403 <ul class="collapse"> 403 404 <li><a href="#atdata.local.S3DataStore.read_url" id="toc-atdata.local.S3DataStore.read_url" class="nav-link" data-scroll-target="#atdata.local.S3DataStore.read_url">read_url</a></li> ··· 420 421 <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>local.S3DataStore(credentials, <span class="op">*</span>, bucket)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 421 422 <p>S3-compatible data store implementing AbstractDataStore protocol.</p> 422 423 <p>Handles writing dataset shards to S3-compatible object storage and resolving URLs for reading.</p> 423 - <p>Attributes: credentials: S3 credentials dictionary. bucket: Target bucket name. _fs: S3FileSystem instance.</p> 424 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 425 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 426 + <table class="caption-top table"> 427 + <thead> 428 + <tr class="header"> 429 + <th>Name</th> 430 + <th>Type</th> 431 + <th>Description</th> 432 + </tr> 433 + </thead> 434 + <tbody> 435 + <tr class="odd"> 436 + <td>credentials</td> 437 + <td></td> 438 + <td>S3 credentials dictionary.</td> 439 + </tr> 440 + <tr class="even"> 441 + <td>bucket</td> 442 + <td></td> 443 + <td>Target bucket name.</td> 444 + </tr> 445 + <tr class="odd"> 446 + <td>_fs</td> 447 + <td></td> 448 + <td>S3FileSystem instance.</td> 449 + </tr> 450 + </tbody> 451 + </table> 452 + </section> 424 453 <section id="methods" class="level2"> 425 454 <h2 class="anchored" data-anchor-id="methods">Methods</h2> 426 455 <table class="caption-top table"> ··· 451 480 <p>Resolve an S3 URL for reading/streaming.</p> 452 481 <p>For S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.</p> 453 482 <p>For standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.</p> 454 - <p>Args: url: S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).</p> 455 - <p>Returns: HTTPS URL if custom endpoint is configured, otherwise unchanged. Example: ‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’</p> 483 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 484 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 485 + <table class="caption-top table"> 486 + <thead> 487 + <tr class="header"> 488 + <th>Name</th> 489 + <th>Type</th> 490 + <th>Description</th> 491 + <th>Default</th> 492 + </tr> 493 + </thead> 494 + <tbody> 495 + <tr class="odd"> 496 + <td>url</td> 497 + <td><a href="`str`">str</a></td> 498 + <td>S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).</td> 499 + <td><em>required</em></td> 500 + </tr> 501 + </tbody> 502 + </table> 503 + </section> 504 + <section id="returns" class="level4 doc-section doc-section-returns"> 505 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 506 + <table class="caption-top table"> 507 + <thead> 508 + <tr class="header"> 509 + <th>Name</th> 510 + <th>Type</th> 511 + <th>Description</th> 512 + </tr> 513 + </thead> 514 + <tbody> 515 + <tr class="odd"> 516 + <td></td> 517 + <td><a href="`str`">str</a></td> 518 + <td>HTTPS URL if custom endpoint is configured, otherwise unchanged.</td> 519 + </tr> 520 + <tr class="even"> 521 + <td>Example</td> 522 + <td><a href="`str`">str</a></td> 523 + <td>‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’</td> 524 + </tr> 525 + </tbody> 526 + </table> 527 + </section> 456 528 </section> 457 529 <section id="atdata.local.S3DataStore.supports_streaming" class="level3"> 458 530 <h3 class="anchored" data-anchor-id="atdata.local.S3DataStore.supports_streaming">supports_streaming</h3> 459 531 <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>local.S3DataStore.supports_streaming()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 460 532 <p>S3 supports streaming reads.</p> 461 - <p>Returns: True.</p> 533 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 534 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 535 + <table class="caption-top table"> 536 + <thead> 537 + <tr class="header"> 538 + <th>Name</th> 539 + <th>Type</th> 540 + <th>Description</th> 541 + </tr> 542 + </thead> 543 + <tbody> 544 + <tr class="odd"> 545 + <td></td> 546 + <td><a href="`bool`">bool</a></td> 547 + <td>True.</td> 548 + </tr> 549 + </tbody> 550 + </table> 551 + </section> 462 552 </section> 463 553 <section id="atdata.local.S3DataStore.write_shards" class="level3"> 464 554 <h3 class="anchored" data-anchor-id="atdata.local.S3DataStore.write_shards">write_shards</h3> 465 555 <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>local.S3DataStore.write_shards(ds, <span class="op">*</span>, prefix, cache_local<span class="op">=</span><span class="va">False</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 466 556 <p>Write dataset shards to S3.</p> 467 - <p>Args: ds: The Dataset to write. prefix: Path prefix within bucket (e.g., ‘datasets/mnist/v1’). cache_local: If True, write locally first then copy to S3. **kwargs: Additional args passed to wds.ShardWriter (e.g., maxcount).</p> 468 - <p>Returns: List of S3 URLs for the written shards.</p> 469 - <p>Raises: RuntimeError: If no shards were written.</p> 557 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 558 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 559 + <table class="caption-top table"> 560 + <thead> 561 + <tr class="header"> 562 + <th>Name</th> 563 + <th>Type</th> 564 + <th>Description</th> 565 + <th>Default</th> 566 + </tr> 567 + </thead> 568 + <tbody> 569 + <tr class="odd"> 570 + <td>ds</td> 571 + <td><a href="`atdata.Dataset`">Dataset</a></td> 572 + <td>The Dataset to write.</td> 573 + <td><em>required</em></td> 574 + </tr> 575 + <tr class="even"> 576 + <td>prefix</td> 577 + <td><a href="`str`">str</a></td> 578 + <td>Path prefix within bucket (e.g., ‘datasets/mnist/v1’).</td> 579 + <td><em>required</em></td> 580 + </tr> 581 + <tr class="odd"> 582 + <td>cache_local</td> 583 + <td><a href="`bool`">bool</a></td> 584 + <td>If True, write locally first then copy to S3.</td> 585 + <td><code>False</code></td> 586 + </tr> 587 + <tr class="even"> 588 + <td>**kwargs</td> 589 + <td></td> 590 + <td>Additional args passed to wds.ShardWriter (e.g., maxcount).</td> 591 + <td><code>{}</code></td> 592 + </tr> 593 + </tbody> 594 + </table> 595 + </section> 596 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 597 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 598 + <table class="caption-top table"> 599 + <thead> 600 + <tr class="header"> 601 + <th>Name</th> 602 + <th>Type</th> 603 + <th>Description</th> 604 + </tr> 605 + </thead> 606 + <tbody> 607 + <tr class="odd"> 608 + <td></td> 609 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 610 + <td>List of S3 URLs for the written shards.</td> 611 + </tr> 612 + </tbody> 613 + </table> 614 + </section> 615 + <section id="raises" class="level4 doc-section doc-section-raises"> 616 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 617 + <table class="caption-top table"> 618 + <thead> 619 + <tr class="header"> 620 + <th>Name</th> 621 + <th>Type</th> 622 + <th>Description</th> 623 + </tr> 624 + </thead> 625 + <tbody> 626 + <tr class="odd"> 627 + <td></td> 628 + <td><a href="`RuntimeError`">RuntimeError</a></td> 629 + <td>If no shards were written.</td> 630 + </tr> 631 + </tbody> 632 + </table> 470 633 471 634 635 + </section> 472 636 </section> 473 637 </section> 474 638 </section>

+71 -4

docs/api/packable.html

··· 396 396 <h2 id="toc-title">On this page</h2> 397 397 398 398 <ul> 399 - <li><a href="#atdata.packable" id="toc-atdata.packable" class="nav-link active" data-scroll-target="#atdata.packable">packable</a></li> 399 + <li><a href="#atdata.packable" id="toc-atdata.packable" class="nav-link active" data-scroll-target="#atdata.packable">packable</a> 400 + <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 402 + <li><a href="#returns" id="toc-returns" class="nav-link" data-scroll-target="#returns">Returns</a></li> 403 + <li><a href="#examples" id="toc-examples" class="nav-link" data-scroll-target="#examples">Examples</a></li> 404 + </ul></li> 400 405 </ul> 401 406 <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/packable.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 402 407 </div> ··· 413 418 <p>Decorator to convert a regular class into a <code>PackableSample</code>.</p> 414 419 <p>This decorator transforms a class into a dataclass that inherits from <code>PackableSample</code>, enabling automatic msgpack serialization/deserialization with special handling for NDArray fields.</p> 415 420 <p>The resulting class satisfies the <code>Packable</code> protocol, making it compatible with all atdata APIs that accept packable types (e.g., <code>publish_schema</code>, lens transformations, etc.).</p> 416 - <p>Args: cls: The class to convert. Should have type annotations for its fields.</p> 417 - <p>Returns: A new dataclass that inherits from <code>PackableSample</code> with the same name and annotations as the original class. The class satisfies the <code>Packable</code> protocol and can be used with <code>Type[Packable]</code> signatures.</p> 418 - <p>Example: >>> <span class="citation" data-cites="packable">@packable</span> … class MyData: … name: str … values: NDArray … >>> sample = MyData(name=“test”, values=np.array([1, 2, 3])) >>> bytes_data = sample.packed >>> restored = MyData.from_bytes(bytes_data) >>> >>> # Works with Packable-typed APIs >>> index.publish_schema(MyData, version=“1.0.0”) # Type-safe</p> 421 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 422 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 423 + <table class="caption-top table"> 424 + <thead> 425 + <tr class="header"> 426 + <th>Name</th> 427 + <th>Type</th> 428 + <th>Description</th> 429 + <th>Default</th> 430 + </tr> 431 + </thead> 432 + <tbody> 433 + <tr class="odd"> 434 + <td>cls</td> 435 + <td><a href="`type`">type</a>[<a href="`atdata.dataset._T`">_T</a>]</td> 436 + <td>The class to convert. Should have type annotations for its fields.</td> 437 + <td><em>required</em></td> 438 + </tr> 439 + </tbody> 440 + </table> 441 + </section> 442 + <section id="returns" class="level2 doc-section doc-section-returns"> 443 + <h2 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h2> 444 + <table class="caption-top table"> 445 + <thead> 446 + <tr class="header"> 447 + <th>Name</th> 448 + <th>Type</th> 449 + <th>Description</th> 450 + </tr> 451 + </thead> 452 + <tbody> 453 + <tr class="odd"> 454 + <td></td> 455 + <td><a href="`type`">type</a>[<a href="`atdata.dataset._T`">_T</a>]</td> 456 + <td>A new dataclass that inherits from <code>PackableSample</code> with the same</td> 457 + </tr> 458 + <tr class="even"> 459 + <td></td> 460 + <td><a href="`type`">type</a>[<a href="`atdata.dataset._T`">_T</a>]</td> 461 + <td>name and annotations as the original class. The class satisfies the</td> 462 + </tr> 463 + <tr class="odd"> 464 + <td></td> 465 + <td><a href="`type`">type</a>[<a href="`atdata.dataset._T`">_T</a>]</td> 466 + <td><code>Packable</code> protocol and can be used with <code>Type[Packable]</code> signatures.</td> 467 + </tr> 468 + </tbody> 469 + </table> 470 + </section> 471 + <section id="examples" class="level2 doc-section doc-section-examples"> 472 + <h2 class="doc-section doc-section-examples anchored" data-anchor-id="examples">Examples</h2> 473 + <p>This is a test of the functionality::</p> 474 + <pre><code>@packable 475 + class MyData: 476 + name: str 477 + values: NDArray 478 + 479 + sample = MyData(name="test", values=np.array([1, 2, 3])) 480 + bytes_data = sample.packed 481 + restored = MyData.from_bytes(bytes_data) 482 + 483 + # Works with Packable-typed APIs 484 + index.publish_schema(MyData, version="1.0.0") # Type-safe</code></pre> 419 485 420 486 487 + </section> 421 488 </section> 422 489 423 490 </main>

+121 -5

docs/api/promote_to_atmosphere.html

··· 396 396 <h2 id="toc-title">On this page</h2> 397 397 398 398 <ul> 399 - <li><a href="#atdata.promote.promote_to_atmosphere" id="toc-atdata.promote.promote_to_atmosphere" class="nav-link active" data-scroll-target="#atdata.promote.promote_to_atmosphere">promote_to_atmosphere</a></li> 399 + <li><a href="#atdata.promote.promote_to_atmosphere" id="toc-atdata.promote.promote_to_atmosphere" class="nav-link active" data-scroll-target="#atdata.promote.promote_to_atmosphere">promote_to_atmosphere</a> 400 + <ul class="collapse"> 401 + <li><a href="#parameters" id="toc-parameters" class="nav-link" data-scroll-target="#parameters">Parameters</a></li> 402 + <li><a href="#returns" id="toc-returns" class="nav-link" data-scroll-target="#returns">Returns</a></li> 403 + <li><a href="#raises" id="toc-raises" class="nav-link" data-scroll-target="#raises">Raises</a></li> 404 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 405 + </ul></li> 400 406 </ul> 401 407 <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/promote_to_atmosphere.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 402 408 </div> ··· 422 428 <span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 423 429 <p>Promote a local dataset to the atmosphere network.</p> 424 430 <p>This function takes a locally-indexed dataset and publishes it to ATProto, making it discoverable on the federated atmosphere network.</p> 425 - <p>Args: local_entry: The LocalDatasetEntry to promote. local_index: Local index containing the schema for this entry. atmosphere_client: Authenticated AtmosphereClient. data_store: Optional data store for copying data to new location. If None, the existing data_urls are used as-is. name: Override name for the atmosphere record. Defaults to local name. description: Optional description for the dataset. tags: Optional tags for discovery. license: Optional license identifier.</p> 426 - <p>Returns: AT URI of the created atmosphere dataset record.</p> 427 - <p>Raises: KeyError: If schema not found in local index. ValueError: If local entry has no data URLs.</p> 428 - <p>Example: >>> entry = local_index.get_dataset(“mnist-train”) >>> uri = promote_to_atmosphere(entry, local_index, client) >>> print(uri) at://did:plc:abc123/ac.foundation.dataset.datasetIndex/…</p> 431 + <section id="parameters" class="level2 doc-section doc-section-parameters"> 432 + <h2 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h2> 433 + <table class="caption-top table"> 434 + <thead> 435 + <tr class="header"> 436 + <th>Name</th> 437 + <th>Type</th> 438 + <th>Description</th> 439 + <th>Default</th> 440 + </tr> 441 + </thead> 442 + <tbody> 443 + <tr class="odd"> 444 + <td>local_entry</td> 445 + <td><a href="`atdata.local.LocalDatasetEntry`">LocalDatasetEntry</a></td> 446 + <td>The LocalDatasetEntry to promote.</td> 447 + <td><em>required</em></td> 448 + </tr> 449 + <tr class="even"> 450 + <td>local_index</td> 451 + <td><a href="`atdata.local.Index`">LocalIndex</a></td> 452 + <td>Local index containing the schema for this entry.</td> 453 + <td><em>required</em></td> 454 + </tr> 455 + <tr class="odd"> 456 + <td>atmosphere_client</td> 457 + <td><a href="`atdata.atmosphere.AtmosphereClient`">AtmosphereClient</a></td> 458 + <td>Authenticated AtmosphereClient.</td> 459 + <td><em>required</em></td> 460 + </tr> 461 + <tr class="even"> 462 + <td>data_store</td> 463 + <td><a href="`atdata._protocols.AbstractDataStore`">AbstractDataStore</a> | None</td> 464 + <td>Optional data store for copying data to new location. If None, the existing data_urls are used as-is.</td> 465 + <td><code>None</code></td> 466 + </tr> 467 + <tr class="odd"> 468 + <td>name</td> 469 + <td><a href="`str`">str</a> | None</td> 470 + <td>Override name for the atmosphere record. Defaults to local name.</td> 471 + <td><code>None</code></td> 472 + </tr> 473 + <tr class="even"> 474 + <td>description</td> 475 + <td><a href="`str`">str</a> | None</td> 476 + <td>Optional description for the dataset.</td> 477 + <td><code>None</code></td> 478 + </tr> 479 + <tr class="odd"> 480 + <td>tags</td> 481 + <td><a href="`list`">list</a>[<a href="`str`">str</a>] | None</td> 482 + <td>Optional tags for discovery.</td> 483 + <td><code>None</code></td> 484 + </tr> 485 + <tr class="even"> 486 + <td>license</td> 487 + <td><a href="`str`">str</a> | None</td> 488 + <td>Optional license identifier.</td> 489 + <td><code>None</code></td> 490 + </tr> 491 + </tbody> 492 + </table> 493 + </section> 494 + <section id="returns" class="level2 doc-section doc-section-returns"> 495 + <h2 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h2> 496 + <table class="caption-top table"> 497 + <thead> 498 + <tr class="header"> 499 + <th>Name</th> 500 + <th>Type</th> 501 + <th>Description</th> 502 + </tr> 503 + </thead> 504 + <tbody> 505 + <tr class="odd"> 506 + <td></td> 507 + <td><a href="`str`">str</a></td> 508 + <td>AT URI of the created atmosphere dataset record.</td> 509 + </tr> 510 + </tbody> 511 + </table> 512 + </section> 513 + <section id="raises" class="level2 doc-section doc-section-raises"> 514 + <h2 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h2> 515 + <table class="caption-top table"> 516 + <thead> 517 + <tr class="header"> 518 + <th>Name</th> 519 + <th>Type</th> 520 + <th>Description</th> 521 + </tr> 522 + </thead> 523 + <tbody> 524 + <tr class="odd"> 525 + <td></td> 526 + <td><a href="`KeyError`">KeyError</a></td> 527 + <td>If schema not found in local index.</td> 528 + </tr> 529 + <tr class="even"> 530 + <td></td> 531 + <td><a href="`ValueError`">ValueError</a></td> 532 + <td>If local entry has no data URLs.</td> 533 + </tr> 534 + </tbody> 535 + </table> 536 + </section> 537 + <section id="example" class="level2 doc-section doc-section-example"> 538 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 539 + <p>::</p> 540 + <pre><code>>>> entry = local_index.get_dataset("mnist-train") 541 + >>> uri = promote_to_atmosphere(entry, local_index, client) 542 + >>> print(uri) 543 + at://did:plc:abc123/ac.foundation.dataset.datasetIndex/...</code></pre> 429 544 430 545 546 + </section> 431 547 </section> 432 548 433 549 </main>

+6 -6

docs/index.html

··· 619 619 <h2 class="anchored" data-anchor-id="quick-example">Quick Example</h2> 620 620 <section id="define-a-sample-type" class="level3"> 621 621 <h3 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h3> 622 - <div id="79809729" class="cell"> 622 + <div id="5e6ae3e6" class="cell"> 623 623 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 624 624 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 625 625 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 633 633 </section> 634 634 <section id="create-and-write-samples" class="level3"> 635 635 <h3 class="anchored" data-anchor-id="create-and-write-samples">Create and Write Samples</h3> 636 - <div id="e00777c6" class="cell"> 636 + <div id="0eba47de" class="cell"> 637 637 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 638 638 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 639 639 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> ··· 652 652 </section> 653 653 <section id="load-and-iterate" class="level3"> 654 654 <h3 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h3> 655 - <div id="c980da75" class="cell"> 655 + <div id="4c9f9c26" class="cell"> 656 656 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-000000.tar"</span>)</span> 657 657 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 658 658 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate with batching</span></span> ··· 665 665 </section> 666 666 <section id="huggingface-style-loading" class="level2"> 667 667 <h2 class="anchored" data-anchor-id="huggingface-style-loading">HuggingFace-Style Loading</h2> 668 - <div id="1146ac5b" class="cell"> 668 + <div id="0e22b49e" class="cell"> 669 669 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local path</span></span> 670 670 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.load_dataset(<span class="st">"path/to/data-{000000..000009}.tar"</span>, split<span class="op">=</span><span class="st">"train"</span>)</span> 671 671 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 677 677 </section> 678 678 <section id="local-storage-with-redis-s3" class="level2"> 679 679 <h2 class="anchored" data-anchor-id="local-storage-with-redis-s3">Local Storage with Redis + S3</h2> 680 - <div id="8667ef7f" class="cell"> 680 + <div id="5bff0047" class="cell"> 681 681 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 682 682 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 683 683 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 701 701 </section> 702 702 <section id="publish-to-atproto-federation" class="level2"> 703 703 <h2 class="anchored" data-anchor-id="publish-to-atproto-federation">Publish to ATProto Federation</h2> 704 - <div id="cca46e22" class="cell"> 704 + <div id="afceac3d" class="cell"> 705 705 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 706 706 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> 707 707 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>

+17 -17

docs/reference/atmosphere.html

··· 611 611 <section id="atmosphereclient" class="level2"> 612 612 <h2 class="anchored" data-anchor-id="atmosphereclient">AtmosphereClient</h2> 613 613 <p>The client handles authentication and record operations:</p> 614 - <div id="989b75fb" class="cell"> 614 + <div id="4c64e271" class="cell"> 615 615 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 616 616 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 617 617 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> ··· 638 638 <section id="session-management" class="level3"> 639 639 <h3 class="anchored" data-anchor-id="session-management">Session Management</h3> 640 640 <p>Save and restore sessions to avoid re-authentication:</p> 641 - <div id="646c68b2" class="cell"> 641 + <div id="d219859d" class="cell"> 642 642 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export session for later</span></span> 643 643 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>session_string <span class="op">=</span> client.export_session()</span> 644 644 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 650 650 <section id="custom-pds" class="level3"> 651 651 <h3 class="anchored" data-anchor-id="custom-pds">Custom PDS</h3> 652 652 <p>Connect to a custom PDS instead of bsky.social:</p> 653 - <div id="6ef0e288" class="cell"> 653 + <div id="7d355909" class="cell"> 654 654 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient(base_url<span class="op">=</span><span class="st">"https://pds.example.com"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 655 655 </div> 656 656 </section> ··· 658 658 <section id="atmosphereindex" class="level2"> 659 659 <h2 class="anchored" data-anchor-id="atmosphereindex">AtmosphereIndex</h2> 660 660 <p>The unified interface for ATProto operations, implementing the AbstractIndex protocol:</p> 661 - <div id="96f270b3" class="cell"> 661 + <div id="c54e0fd7" class="cell"> 662 662 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 663 663 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 664 664 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> ··· 668 668 </div> 669 669 <section id="publishing-schemas" class="level3"> 670 670 <h3 class="anchored" data-anchor-id="publishing-schemas">Publishing Schemas</h3> 671 - <div id="eb6b576f" class="cell"> 671 + <div id="03e3bdc7" class="cell"> 672 672 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 673 673 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 674 674 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 689 689 </section> 690 690 <section id="publishing-datasets" class="level3"> 691 691 <h3 class="anchored" data-anchor-id="publishing-datasets">Publishing Datasets</h3> 692 - <div id="cc9f6d3f" class="cell"> 692 + <div id="0034f50d" class="cell"> 693 693 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 694 694 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 695 695 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> ··· 707 707 </section> 708 708 <section id="listing-and-retrieving" class="level3"> 709 709 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 710 - <div id="06b6f91b" class="cell"> 710 + <div id="636979ca" class="cell"> 711 711 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># List your datasets</span></span> 712 712 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets():</span> 713 713 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> ··· 733 733 <p>For more control, use the individual publisher classes:</p> 734 734 <section id="schemapublisher" class="level3"> 735 735 <h3 class="anchored" data-anchor-id="schemapublisher">SchemaPublisher</h3> 736 - <div id="2bc57530" class="cell"> 736 + <div id="690e6ad9" class="cell"> 737 737 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaPublisher</span> 738 738 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 739 739 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> SchemaPublisher(client)</span> ··· 749 749 </section> 750 750 <section id="datasetpublisher" class="level3"> 751 751 <h3 class="anchored" data-anchor-id="datasetpublisher">DatasetPublisher</h3> 752 - <div id="ce2208c7" class="cell"> 752 + <div id="b4abaae7" class="cell"> 753 753 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetPublisher</span> 754 754 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 755 755 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> DatasetPublisher(client)</span> ··· 767 767 <section id="blob-storage" class="level4"> 768 768 <h4 class="anchored" data-anchor-id="blob-storage">Blob Storage</h4> 769 769 <p>For smaller datasets (up to ~50MB per shard), you can store data directly in ATProto blobs instead of external URLs:</p> 770 - <div id="b9ea7102" class="cell"> 770 + <div id="17cf3e11" class="cell"> 771 771 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 772 772 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 773 773 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> ··· 787 787 <span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 788 788 </div> 789 789 <p>To load datasets with blob storage:</p> 790 - <div id="d420ed30" class="cell"> 790 + <div id="411daf78" class="cell"> 791 791 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 792 792 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span> 793 793 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> ··· 808 808 </section> 809 809 <section id="lenspublisher" class="level3"> 810 810 <h3 class="anchored" data-anchor-id="lenspublisher">LensPublisher</h3> 811 - <div id="af52b23d" class="cell"> 811 + <div id="aac3596a" class="cell"> 812 812 <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensPublisher</span> 813 813 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span> 814 814 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> LensPublisher(client)</span> ··· 851 851 <p>For direct access to records, use the loader classes:</p> 852 852 <section id="schemaloader" class="level3"> 853 853 <h3 class="anchored" data-anchor-id="schemaloader">SchemaLoader</h3> 854 - <div id="3b15532c" class="cell"> 854 + <div id="5c5becfb" class="cell"> 855 855 <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaLoader</span> 856 856 <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span> 857 857 <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> SchemaLoader(client)</span> ··· 867 867 </section> 868 868 <section id="datasetloader" class="level3"> 869 869 <h3 class="anchored" data-anchor-id="datasetloader">DatasetLoader</h3> 870 - <div id="4cb996c8" class="cell"> 870 + <div id="c0a3e839" class="cell"> 871 871 <div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 872 872 <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span> 873 873 <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> ··· 895 895 </section> 896 896 <section id="lensloader" class="level3"> 897 897 <h3 class="anchored" data-anchor-id="lensloader">LensLoader</h3> 898 - <div id="89d9f79a" class="cell"> 898 + <div id="282a71b9" class="cell"> 899 899 <div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensLoader</span> 900 900 <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span> 901 901 <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> LensLoader(client)</span> ··· 920 920 <section id="at-uris" class="level2"> 921 921 <h2 class="anchored" data-anchor-id="at-uris">AT URIs</h2> 922 922 <p>ATProto records are identified by AT URIs:</p> 923 - <div id="c79de45e" class="cell"> 923 + <div id="c3a85cc4" class="cell"> 924 924 <div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtUri</span> 925 925 <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span> 926 926 <span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Parse an AT URI</span></span> ··· 986 986 </section> 987 987 <section id="complete-example" class="level2"> 988 988 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 989 - <div id="46b43b4f" class="cell"> 989 + <div id="d3ec5a19" class="cell"> 990 990 <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 991 991 <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 992 992 <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>

+13 -13

docs/reference/datasets.html

··· 593 593 <p>The <code>Dataset</code> class provides typed iteration over WebDataset tar files with automatic batching and lens transformations.</p> 594 594 <section id="creating-a-dataset" class="level2"> 595 595 <h2 class="anchored" data-anchor-id="creating-a-dataset">Creating a Dataset</h2> 596 - <div id="51fa5997" class="cell"> 596 + <div id="f29acbac" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 598 598 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 599 599 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 616 616 <section id="url-source-default" class="level3"> 617 617 <h3 class="anchored" data-anchor-id="url-source-default">URL Source (default)</h3> 618 618 <p>When you pass a string to <code>Dataset</code>, it automatically wraps it in a <code>URLSource</code>:</p> 619 - <div id="0263f709" class="cell"> 619 + <div id="613ca0b1" class="cell"> 620 620 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># These are equivalent:</span></span> 621 621 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 622 622 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](atdata.URLSource(<span class="st">"data-{000000..000009}.tar"</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 625 625 <section id="s3-source" class="level3"> 626 626 <h3 class="anchored" data-anchor-id="s3-source">S3 Source</h3> 627 627 <p>For private S3 buckets or S3-compatible storage (Cloudflare R2, MinIO), use <code>S3Source</code>:</p> 628 - <div id="0d5b4a74" class="cell"> 628 + <div id="7024789e" class="cell"> 629 629 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># From explicit credentials</span></span> 630 630 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> atdata.S3Source(</span> 631 631 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-bucket"</span>,</span> ··· 663 663 <section id="ordered-iteration" class="level3"> 664 664 <h3 class="anchored" data-anchor-id="ordered-iteration">Ordered Iteration</h3> 665 665 <p>Iterate through samples in their original order:</p> 666 - <div id="0b54f5da" class="cell"> 666 + <div id="caf78bf2" class="cell"> 667 667 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># With batching (default batch_size=1)</span></span> 668 668 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 669 669 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image <span class="co"># numpy array (32, H, W, C)</span></span> ··· 677 677 <section id="shuffled-iteration" class="level3"> 678 678 <h3 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h3> 679 679 <p>Iterate with randomized order at both shard and sample levels:</p> 680 - <div id="f15460c3" class="cell"> 680 + <div id="ab24a75c" class="cell"> 681 681 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 682 682 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled</span></span> 683 683 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> process(batch)</span> ··· 708 708 <section id="samplebatch" class="level2"> 709 709 <h2 class="anchored" data-anchor-id="samplebatch">SampleBatch</h2> 710 710 <p>When iterating with a <code>batch_size</code>, each iteration yields a <code>SampleBatch</code> with automatic attribute aggregation.</p> 711 - <div id="04f9af45" class="cell"> 711 + <div id="46b42eb2" class="cell"> 712 712 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 713 713 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> Sample:</span> 714 714 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># shape (256,)</span></span> ··· 728 728 <section id="type-transformations-with-lenses" class="level2"> 729 729 <h2 class="anchored" data-anchor-id="type-transformations-with-lenses">Type Transformations with Lenses</h2> 730 730 <p>View a dataset through a different sample type using registered lenses:</p> 731 - <div id="e178d059" class="cell"> 731 + <div id="a15c6b46" class="cell"> 732 732 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 733 733 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span> 734 734 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">str</span></span> ··· 750 750 <section id="shard-list" class="level3"> 751 751 <h3 class="anchored" data-anchor-id="shard-list">Shard List</h3> 752 752 <p>Get the list of individual tar files:</p> 753 - <div id="a83b495c" class="cell"> 753 + <div id="96651430" class="cell"> 754 754 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 755 755 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shards <span class="op">=</span> dataset.shard_list</span> 756 756 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="co"># ['data-000000.tar', 'data-000001.tar', ..., 'data-000009.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 759 759 <section id="metadata" class="level3"> 760 760 <h3 class="anchored" data-anchor-id="metadata">Metadata</h3> 761 761 <p>Datasets can have associated metadata from a URL:</p> 762 - <div id="b93a6baf" class="cell"> 762 + <div id="0fe36b3d" class="cell"> 763 763 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](</span> 764 764 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"data-{000000..000009}.tar"</span>,</span> 765 765 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> metadata_url<span class="op">=</span><span class="st">"https://example.com/metadata.msgpack"</span></span> ··· 773 773 <section id="writing-datasets" class="level2"> 774 774 <h2 class="anchored" data-anchor-id="writing-datasets">Writing Datasets</h2> 775 775 <p>Use WebDataset’s <code>TarWriter</code> or <code>ShardWriter</code> to create datasets:</p> 776 - <div id="6915fff3" class="cell"> 776 + <div id="5b8843dc" class="cell"> 777 777 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 778 778 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 779 779 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 796 796 <section id="parquet-export" class="level2"> 797 797 <h2 class="anchored" data-anchor-id="parquet-export">Parquet Export</h2> 798 798 <p>Export dataset contents to parquet format:</p> 799 - <div id="293e4903" class="cell"> 799 + <div id="c8523229" class="cell"> 800 800 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export entire dataset</span></span> 801 801 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>dataset.to_parquet(<span class="st">"output.parquet"</span>)</span> 802 802 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> ··· 847 847 <section id="source" class="level3"> 848 848 <h3 class="anchored" data-anchor-id="source">Source</h3> 849 849 <p>Access the underlying <code>DataSource</code>:</p> 850 - <div id="2f5c0957" class="cell"> 850 + <div id="c088cf54" class="cell"> 851 851 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data.tar"</span>)</span> 852 852 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> dataset.source <span class="co"># URLSource instance</span></span> 853 853 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(source.shard_list) <span class="co"># ['data.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 856 856 <section id="sample-type" class="level3"> 857 857 <h3 class="anchored" data-anchor-id="sample-type">Sample Type</h3> 858 858 <p>Get the type parameter used to create the dataset:</p> 859 - <div id="92f39647" class="cell"> 859 + <div id="237b62eb" class="cell"> 860 860 <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data.tar"</span>)</span> 861 861 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.sample_type) <span class="co"># <class 'ImageSample'></span></span> 862 862 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.batch_type) <span class="co"># SampleBatch[ImageSample]</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>

+10 -10

docs/reference/lenses.html

··· 585 585 <section id="creating-a-lens" class="level2"> 586 586 <h2 class="anchored" data-anchor-id="creating-a-lens">Creating a Lens</h2> 587 587 <p>Use the <code>@lens</code> decorator to define a getter:</p> 588 - <div id="33ea6879" class="cell"> 588 + <div id="d8be4d9e" class="cell"> 589 589 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 590 590 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 591 591 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 615 615 <section id="adding-a-putter" class="level2"> 616 616 <h2 class="anchored" data-anchor-id="adding-a-putter">Adding a Putter</h2> 617 617 <p>To enable bidirectional updates, add a putter:</p> 618 - <div id="7bdbe3d5" class="cell"> 618 + <div id="c7c0b1d2" class="cell"> 619 619 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@simplify.putter</span></span> 620 620 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> simplify_put(view: SimpleSample, source: FullSample) <span class="op">-></span> FullSample:</span> 621 621 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> FullSample(</span> ··· 635 635 <section id="using-lenses-with-datasets" class="level2"> 636 636 <h2 class="anchored" data-anchor-id="using-lenses-with-datasets">Using Lenses with Datasets</h2> 637 637 <p>Lenses integrate with <code>Dataset.as_type()</code>:</p> 638 - <div id="e87c44ae" class="cell"> 638 + <div id="2f033a35" class="cell"> 639 639 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FullSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 640 640 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 641 641 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># View through a different type</span></span> ··· 650 650 <section id="direct-lens-usage" class="level2"> 651 651 <h2 class="anchored" data-anchor-id="direct-lens-usage">Direct Lens Usage</h2> 652 652 <p>Lenses can also be called directly:</p> 653 - <div id="c0864fed" class="cell"> 653 + <div id="59469c09" class="cell"> 654 654 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 655 655 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 656 656 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>full <span class="op">=</span> FullSample(</span> ··· 679 679 <div class="tab-content"> 680 680 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 681 681 <p>If you get a view and immediately put it back, the source is unchanged:</p> 682 - <div id="cb89e22c" class="cell"> 682 + <div id="634c1b5b" class="cell"> 683 683 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>view <span class="op">=</span> lens.get(source)</span> 684 684 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.put(view, source) <span class="op">==</span> source</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 685 685 </div> 686 686 </div> 687 687 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 688 688 <p>If you put a view, getting it back yields that view:</p> 689 - <div id="4982d795" class="cell"> 689 + <div id="d9273642" class="cell"> 690 690 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>updated <span class="op">=</span> lens.put(view, source)</span> 691 691 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.get(updated) <span class="op">==</span> view</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 692 692 </div> 693 693 </div> 694 694 <div id="tabset-1-3" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-3-tab"> 695 695 <p>Putting twice is equivalent to putting once with the final value:</p> 696 - <div id="9eb6fcc6" class="cell"> 696 + <div id="4936f018" class="cell"> 697 697 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>result1 <span class="op">=</span> lens.put(v2, lens.put(v1, source))</span> 698 698 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>result2 <span class="op">=</span> lens.put(v2, source)</span> 699 699 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> result1 <span class="op">==</span> result2</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 705 705 <section id="trivial-putter" class="level2"> 706 706 <h2 class="anchored" data-anchor-id="trivial-putter">Trivial Putter</h2> 707 707 <p>If no putter is defined, a trivial putter is used that ignores view updates:</p> 708 - <div id="437ffe9e" class="cell"> 708 + <div id="cef605de" class="cell"> 709 709 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.lens</span></span> 710 710 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> extract_label(src: FullSample) <span class="op">-></span> SimpleSample:</span> 711 711 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> SimpleSample(label<span class="op">=</span>src.label, confidence<span class="op">=</span>src.confidence)</span> ··· 719 719 <section id="lensnetwork-registry" class="level2"> 720 720 <h2 class="anchored" data-anchor-id="lensnetwork-registry">LensNetwork Registry</h2> 721 721 <p>The <code>LensNetwork</code> is a singleton that stores all registered lenses:</p> 722 - <div id="25019e3a" class="cell"> 722 + <div id="7b7548a8" class="cell"> 723 723 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.lens <span class="im">import</span> LensNetwork</span> 724 724 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 725 725 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>network <span class="op">=</span> LensNetwork()</span> ··· 736 736 </section> 737 737 <section id="example-feature-extraction" class="level2"> 738 738 <h2 class="anchored" data-anchor-id="example-feature-extraction">Example: Feature Extraction</h2> 739 - <div id="9f113564" class="cell"> 739 + <div id="72811776" class="cell"> 740 740 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 741 741 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> RawSample:</span> 742 742 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> audio: NDArray</span>

+12 -12

docs/reference/load-dataset.html

··· 594 594 </section> 595 595 <section id="basic-usage" class="level2"> 596 596 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 597 - <div id="f1266ba6" class="cell"> 597 + <div id="4d4d1559" class="cell"> 598 598 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 599 599 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 600 600 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> ··· 617 617 <h2 class="anchored" data-anchor-id="path-formats">Path Formats</h2> 618 618 <section id="webdataset-brace-notation" class="level3"> 619 619 <h3 class="anchored" data-anchor-id="webdataset-brace-notation">WebDataset Brace Notation</h3> 620 - <div id="27909507" class="cell"> 620 + <div id="f383e82d" class="cell"> 621 621 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Range notation</span></span> 622 622 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"data-{000000..000099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 623 623 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 627 627 </section> 628 628 <section id="glob-patterns" class="level3"> 629 629 <h3 class="anchored" data-anchor-id="glob-patterns">Glob Patterns</h3> 630 - <div id="d5dbbe76" class="cell"> 630 + <div id="66bf9de6" class="cell"> 631 631 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Match all tar files</span></span> 632 632 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"path/to/*.tar"</span>, MySample)</span> 633 633 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 637 637 </section> 638 638 <section id="local-directory" class="level3"> 639 639 <h3 class="anchored" data-anchor-id="local-directory">Local Directory</h3> 640 - <div id="944279f4" class="cell"> 640 + <div id="98f56181" class="cell"> 641 641 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Scans for .tar files</span></span> 642 642 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"./my-dataset/"</span>, MySample)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 643 643 </div> 644 644 </section> 645 645 <section id="remote-urls" class="level3"> 646 646 <h3 class="anchored" data-anchor-id="remote-urls">Remote URLs</h3> 647 - <div id="7a335878" class="cell"> 647 + <div id="8c1287e4" class="cell"> 648 648 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># S3 (public buckets)</span></span> 649 649 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"s3://bucket/data-{000..099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 650 650 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 670 670 </section> 671 671 <section id="index-lookup" class="level3"> 672 672 <h3 class="anchored" data-anchor-id="index-lookup">Index Lookup</h3> 673 - <div id="4af74f7c" class="cell"> 673 + <div id="e8e3bd9a" class="cell"> 674 674 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 675 675 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 676 676 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 737 737 <section id="datasetdict" class="level2"> 738 738 <h2 class="anchored" data-anchor-id="datasetdict">DatasetDict</h2> 739 739 <p>When loading without <code>split=</code>, returns a <code>DatasetDict</code>:</p> 740 - <div id="fb9261b7" class="cell"> 740 + <div id="45c2fb32" class="cell"> 741 741 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data/"</span>, MySample)</span> 742 742 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 743 743 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Access splits</span></span> ··· 757 757 <section id="explicit-data-files" class="level2"> 758 758 <h2 class="anchored" data-anchor-id="explicit-data-files">Explicit Data Files</h2> 759 759 <p>Override automatic detection with <code>data_files</code>:</p> 760 - <div id="36a08da1" class="cell"> 760 + <div id="fdfc9d5b" class="cell"> 761 761 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Single pattern</span></span> 762 762 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(</span> 763 763 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"path/to/"</span>,</span> ··· 786 786 <section id="streaming-mode" class="level2"> 787 787 <h2 class="anchored" data-anchor-id="streaming-mode">Streaming Mode</h2> 788 788 <p>The <code>streaming</code> parameter signals intent for streaming mode:</p> 789 - <div id="f2d4367e" class="cell"> 789 + <div id="973b8c68" class="cell"> 790 790 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Mark as streaming</span></span> 791 791 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, streaming<span class="op">=</span><span class="va">True</span>)</span> 792 792 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 811 811 <section id="auto-type-resolution" class="level2"> 812 812 <h2 class="anchored" data-anchor-id="auto-type-resolution">Auto Type Resolution</h2> 813 813 <p>When using index lookup, the sample type can be resolved automatically:</p> 814 - <div id="a0ce944d" class="cell"> 814 + <div id="73592d21" class="cell"> 815 815 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 816 816 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 817 817 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 825 825 </section> 826 826 <section id="error-handling" class="level2"> 827 827 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 828 - <div id="e19c3b8b" class="cell"> 828 + <div id="a95e0c73" class="cell"> 829 829 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 830 830 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 831 831 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">FileNotFoundError</span>:</span> ··· 841 841 </section> 842 842 <section id="complete-example" class="level2"> 843 843 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 844 - <div id="86a0c23d" class="cell"> 844 + <div id="e5dd31bb" class="cell"> 845 845 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 846 846 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 847 847 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>

+11 -11

docs/reference/local-storage.html

··· 593 593 <section id="localindex" class="level2"> 594 594 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 595 595 <p>The index tracks datasets in Redis:</p> 596 - <div id="e4892e76" class="cell"> 596 + <div id="9eda0b05" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 598 598 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 599 599 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Default connection (localhost:6379)</span></span> ··· 609 609 </div> 610 610 <section id="adding-entries" class="level3"> 611 611 <h3 class="anchored" data-anchor-id="adding-entries">Adding Entries</h3> 612 - <div id="32b365cd" class="cell"> 612 + <div id="06fc3264" class="cell"> 613 613 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 614 614 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 615 615 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 634 634 </section> 635 635 <section id="listing-and-retrieving" class="level3"> 636 636 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 637 - <div id="ee6440d2" class="cell"> 637 + <div id="ad6bfc60" class="cell"> 638 638 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate all entries</span></span> 639 639 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.entries:</span> 640 640 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>cid<span class="sc">}</span><span class="ss">"</span>)</span> ··· 666 666 </div> 667 667 </div> 668 668 <p>The Repo class combines S3 storage with Redis indexing:</p> 669 - <div id="7a621962" class="cell"> 669 + <div id="29ea9596" class="cell"> 670 670 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Repo</span> 671 671 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 672 672 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># From credentials file</span></span> ··· 686 686 <span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 687 687 </div> 688 688 <p><strong>Preferred approach</strong> - Use <code>LocalIndex</code> with <code>S3DataStore</code>:</p> 689 - <div id="9d875541" class="cell"> 689 + <div id="6c613634" class="cell"> 690 690 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 691 691 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 692 692 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 724 724 </section> 725 725 <section id="inserting-datasets" class="level3"> 726 726 <h3 class="anchored" data-anchor-id="inserting-datasets">Inserting Datasets</h3> 727 - <div id="903bddd9" class="cell"> 727 + <div id="7f3d98eb" class="cell"> 728 728 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 729 729 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 730 730 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 754 754 </section> 755 755 <section id="insert-options" class="level3"> 756 756 <h3 class="anchored" data-anchor-id="insert-options">Insert Options</h3> 757 - <div id="19bb1cb2" class="cell"> 757 + <div id="7983708f" class="cell"> 758 758 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>entry, ds <span class="op">=</span> repo.insert(</span> 759 759 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 760 760 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 768 768 <section id="localdatasetentry" class="level2"> 769 769 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 770 770 <p>Index entries provide content-addressable identification:</p> 771 - <div id="44c581ea" class="cell"> 771 + <div id="9cc14a25" class="cell"> 772 772 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.get_entry_by_name(<span class="st">"my-dataset"</span>)</span> 773 773 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 774 774 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Core properties (IndexEntry protocol)</span></span> ··· 801 801 <section id="schema-storage" class="level2"> 802 802 <h2 class="anchored" data-anchor-id="schema-storage">Schema Storage</h2> 803 803 <p>Schemas can be stored and retrieved from the index:</p> 804 - <div id="d7acfc3b" class="cell"> 804 + <div id="10fa70b6" class="cell"> 805 805 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 806 806 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 807 807 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 832 832 <section id="s3datastore" class="level2"> 833 833 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 834 834 <p>For direct S3 operations without Redis indexing:</p> 835 - <div id="ca987136" class="cell"> 835 + <div id="e7ebf388" class="cell"> 836 836 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 837 837 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 838 838 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 854 854 </section> 855 855 <section id="complete-workflow-example" class="level2"> 856 856 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 857 - <div id="405d0248" class="cell"> 857 + <div id="315c6d71" class="cell"> 858 858 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 859 859 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 860 860 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>

+12 -12

docs/reference/packable-samples.html

··· 588 588 <section id="the-packable-decorator" class="level2"> 589 589 <h2 class="anchored" data-anchor-id="the-packable-decorator">The <code>@packable</code> Decorator</h2> 590 590 <p>The recommended way to define a sample type is with the <code>@packable</code> decorator:</p> 591 - <div id="72c5b921" class="cell"> 591 + <div id="c7fe1d4d" class="cell"> 592 592 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 593 593 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 594 594 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 610 610 <h2 class="anchored" data-anchor-id="supported-field-types">Supported Field Types</h2> 611 611 <section id="primitives" class="level3"> 612 612 <h3 class="anchored" data-anchor-id="primitives">Primitives</h3> 613 - <div id="54f1d59c" class="cell"> 613 + <div id="726a3050" class="cell"> 614 614 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 615 615 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> PrimitiveSample:</span> 616 616 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> name: <span class="bu">str</span></span> ··· 623 623 <section id="numpy-arrays" class="level3"> 624 624 <h3 class="anchored" data-anchor-id="numpy-arrays">NumPy Arrays</h3> 625 625 <p>Fields annotated as <code>NDArray</code> are automatically converted:</p> 626 - <div id="c4830c19" class="cell"> 626 + <div id="5d1ff141" class="cell"> 627 627 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 628 628 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ArraySample:</span> 629 629 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Required array</span></span> ··· 645 645 </section> 646 646 <section id="lists" class="level3"> 647 647 <h3 class="anchored" data-anchor-id="lists">Lists</h3> 648 - <div id="6e0f1b5a" class="cell"> 648 + <div id="b6955e98" class="cell"> 649 649 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 650 650 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ListSample:</span> 651 651 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> tags: <span class="bu">list</span>[<span class="bu">str</span>]</span> ··· 657 657 <h2 class="anchored" data-anchor-id="serialization">Serialization</h2> 658 658 <section id="packing-to-bytes" class="level3"> 659 659 <h3 class="anchored" data-anchor-id="packing-to-bytes">Packing to Bytes</h3> 660 - <div id="79959f17" class="cell"> 660 + <div id="f3a59094" class="cell"> 661 661 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 662 662 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> 663 663 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span><span class="st">"cat"</span>,</span> ··· 671 671 </section> 672 672 <section id="unpacking-from-bytes" class="level3"> 673 673 <h3 class="anchored" data-anchor-id="unpacking-from-bytes">Unpacking from Bytes</h3> 674 - <div id="8f5716b7" class="cell"> 674 + <div id="3cef8193" class="cell"> 675 675 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Deserialize from bytes</span></span> 676 676 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>restored <span class="op">=</span> ImageSample.from_bytes(packed_bytes)</span> 677 677 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 683 683 <section id="webdataset-format" class="level3"> 684 684 <h3 class="anchored" data-anchor-id="webdataset-format">WebDataset Format</h3> 685 685 <p>The <code>as_wds</code> property returns a dict ready for WebDataset:</p> 686 - <div id="7204f700" class="cell"> 686 + <div id="6a42126d" class="cell"> 687 687 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>wds_dict <span class="op">=</span> sample.as_wds</span> 688 688 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co"># {'__key__': '1234...', 'msgpack': b'...'}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 689 689 </div> 690 690 <p>Write samples to a tar file:</p> 691 - <div id="4d6fce32" class="cell"> 691 + <div id="f3b5feb9" class="cell"> 692 692 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 693 693 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span> 694 694 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"data-000000.tar"</span>) <span class="im">as</span> sink:</span> ··· 701 701 <section id="direct-inheritance-alternative" class="level2"> 702 702 <h2 class="anchored" data-anchor-id="direct-inheritance-alternative">Direct Inheritance (Alternative)</h2> 703 703 <p>You can also inherit directly from <code>PackableSample</code>:</p> 704 - <div id="292b25b4" class="cell"> 704 + <div id="d7828cb0" class="cell"> 705 705 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> dataclass</span> 706 706 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 707 707 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@dataclass</span></span> ··· 739 739 <section id="the-_ensure_good-method" class="level3"> 740 740 <h3 class="anchored" data-anchor-id="the-_ensure_good-method">The <code>_ensure_good()</code> Method</h3> 741 741 <p>This method runs automatically after construction and handles NDArray conversion:</p> 742 - <div id="cb200293" class="cell"> 742 + <div id="821f434d" class="cell"> 743 743 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> _ensure_good(<span class="va">self</span>):</span> 744 744 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> field <span class="kw">in</span> dataclasses.fields(<span class="va">self</span>):</span> 745 745 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> _is_possibly_ndarray_type(field.<span class="bu">type</span>):</span> ··· 755 755 <ul class="nav nav-tabs" role="tablist"><li class="nav-item" role="presentation"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" role="tab" aria-controls="tabset-2-1" aria-selected="true">Do</a></li><li class="nav-item" role="presentation"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" role="tab" aria-controls="tabset-2-2" aria-selected="false">Don’t</a></li></ul> 756 756 <div class="tab-content"> 757 757 <div id="tabset-2-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-2-1-tab"> 758 - <div id="d6fc0558" class="cell"> 758 + <div id="55dd65fe" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 760 760 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> GoodSample:</span> 761 761 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Clear type annotation</span></span> ··· 765 765 </div> 766 766 </div> 767 767 <div id="tabset-2-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-2-2-tab"> 768 - <div id="c2e9f475" class="cell"> 768 + <div id="726bdb4d" class="cell"> 769 769 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 770 770 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> BadSample:</span> 771 771 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="co"># DON'T: Nested dataclasses not supported</span></span>

+7 -7

docs/reference/promotion.html

··· 584 584 </section> 585 585 <section id="basic-usage" class="level2"> 586 586 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 587 - <div id="7d14b429" class="cell"> 587 + <div id="4a508603" class="cell"> 588 588 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 589 589 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 590 590 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> ··· 604 604 </section> 605 605 <section id="with-metadata" class="level2"> 606 606 <h2 class="anchored" data-anchor-id="with-metadata">With Metadata</h2> 607 - <div id="65df81fd" class="cell"> 607 + <div id="5f1b68d6" class="cell"> 608 608 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 609 609 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> entry,</span> 610 610 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 619 619 <section id="schema-deduplication" class="level2"> 620 620 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 621 621 <p>The promotion workflow automatically checks for existing schemas:</p> 622 - <div id="ee6fb5c5" class="cell"> 622 + <div id="5ebcd712" class="cell"> 623 623 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 624 624 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 625 625 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 639 639 <div class="tab-content"> 640 640 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 641 641 <p>By default, promotion keeps the original data URLs:</p> 642 - <div id="c90baa61" class="cell"> 642 + <div id="c2e3746b" class="cell"> 643 643 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 644 644 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 645 645 </div> ··· 652 652 </div> 653 653 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 654 654 <p>To copy data to a different storage location:</p> 655 - <div id="f038936e" class="cell"> 655 + <div id="560f6131" class="cell"> 656 656 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 657 657 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 658 658 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 680 680 </section> 681 681 <section id="complete-workflow-example" class="level2"> 682 682 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 683 - <div id="50096160" class="cell"> 683 + <div id="298a5d1e" class="cell"> 684 684 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 685 685 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 686 686 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 751 751 </section> 752 752 <section id="error-handling" class="level2"> 753 753 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 754 - <div id="9ab9d034" class="cell"> 754 + <div id="d6f7527e" class="cell"> 755 755 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 756 756 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span> 757 757 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span>

+12 -12

docs/reference/protocols.html

··· 605 605 <section id="indexentry-protocol" class="level2"> 606 606 <h2 class="anchored" data-anchor-id="indexentry-protocol">IndexEntry Protocol</h2> 607 607 <p>Represents a dataset entry in any index:</p> 608 - <div id="9cab6a83" class="cell"> 608 + <div id="fd34618d" class="cell"> 609 609 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry</span> 610 610 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 611 611 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> process_entry(entry: IndexEntry) <span class="op">-></span> <span class="va">None</span>:</span> ··· 659 659 <section id="abstractindex-protocol" class="level2"> 660 660 <h2 class="anchored" data-anchor-id="abstractindex-protocol">AbstractIndex Protocol</h2> 661 661 <p>Defines operations for managing schemas and datasets:</p> 662 - <div id="3e25c48c" class="cell"> 662 + <div id="787a9642" class="cell"> 663 663 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex</span> 664 664 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 665 665 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> list_all_datasets(index: AbstractIndex) <span class="op">-></span> <span class="va">None</span>:</span> ··· 669 669 </div> 670 670 <section id="dataset-operations" class="level3"> 671 671 <h3 class="anchored" data-anchor-id="dataset-operations">Dataset Operations</h3> 672 - <div id="791a6bd5" class="cell"> 672 + <div id="60dc2e88" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Insert a dataset</span></span> 674 674 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 675 675 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 687 687 </section> 688 688 <section id="schema-operations" class="level3"> 689 689 <h3 class="anchored" data-anchor-id="schema-operations">Schema Operations</h3> 690 - <div id="9054849a" class="cell"> 690 + <div id="d74fb8cf" class="cell"> 691 691 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 692 692 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 693 693 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> MySample,</span> ··· 718 718 <section id="abstractdatastore-protocol" class="level2"> 719 719 <h2 class="anchored" data-anchor-id="abstractdatastore-protocol">AbstractDataStore Protocol</h2> 720 720 <p>Abstracts over different storage backends:</p> 721 - <div id="fabf7644" class="cell"> 721 + <div id="681d1f3b" class="cell"> 722 722 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractDataStore</span> 723 723 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 724 724 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> write_dataset(store: AbstractDataStore, dataset) <span class="op">-></span> <span class="bu">list</span>[<span class="bu">str</span>]:</span> ··· 728 728 </div> 729 729 <section id="methods" class="level3"> 730 730 <h3 class="anchored" data-anchor-id="methods">Methods</h3> 731 - <div id="4a9f329b" class="cell"> 731 + <div id="5556c4b7" class="cell"> 732 732 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Write dataset shards</span></span> 733 733 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>urls <span class="op">=</span> store.write_shards(</span> 734 734 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 755 755 <section id="datasource-protocol" class="level2"> 756 756 <h2 class="anchored" data-anchor-id="datasource-protocol">DataSource Protocol</h2> 757 757 <p>Abstracts over different data source backends for streaming dataset shards:</p> 758 - <div id="cbc7c002" class="cell"> 758 + <div id="fd9a5c6c" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 760 760 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 761 761 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> load_from_source(source: DataSource) <span class="op">-></span> <span class="va">None</span>:</span> ··· 768 768 </div> 769 769 <section id="methods-1" class="level3"> 770 770 <h3 class="anchored" data-anchor-id="methods-1">Methods</h3> 771 - <div id="9316c6c5" class="cell"> 771 + <div id="9585dc37" class="cell"> 772 772 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Get list of shard identifiers</span></span> 773 773 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shard_ids <span class="op">=</span> source.shard_list <span class="co"># ['data-000000.tar', 'data-000001.tar', ...]</span></span> 774 774 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span> ··· 791 791 <section id="creating-custom-data-sources" class="level3"> 792 792 <h3 class="anchored" data-anchor-id="creating-custom-data-sources">Creating Custom Data Sources</h3> 793 793 <p>Implement the <code>DataSource</code> protocol for custom backends:</p> 794 - <div id="d217fc30" class="cell"> 794 + <div id="d872ea12" class="cell"> 795 795 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> typing <span class="im">import</span> Iterator, IO</span> 796 796 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 797 797 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 829 829 <section id="using-protocols-for-polymorphism" class="level2"> 830 830 <h2 class="anchored" data-anchor-id="using-protocols-for-polymorphism">Using Protocols for Polymorphism</h2> 831 831 <p>Write code that works with any backend:</p> 832 - <div id="ad7b8caa" class="cell"> 832 + <div id="a0cb9067" class="cell"> 833 833 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex, IndexEntry</span> 834 834 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> Dataset</span> 835 835 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 900 900 <section id="type-checking" class="level2"> 901 901 <h2 class="anchored" data-anchor-id="type-checking">Type Checking</h2> 902 902 <p>Protocols are runtime-checkable:</p> 903 - <div id="a1a0abd3" class="cell"> 903 + <div id="915ac335" class="cell"> 904 904 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry, AbstractIndex</span> 905 905 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 906 906 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if object implements protocol</span></span> ··· 914 914 </section> 915 915 <section id="complete-example" class="level2"> 916 916 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 917 - <div id="3dc654c8" class="cell"> 917 + <div id="3773658b" class="cell"> 918 918 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 919 919 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 920 920 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span>

+2 -2

docs/reference/uri-spec.html

··· 675 675 <h2 class="anchored" data-anchor-id="examples">Examples</h2> 676 676 <section id="local-development" class="level3"> 677 677 <h3 class="anchored" data-anchor-id="local-development">Local Development</h3> 678 - <div id="b57d1bc3" class="cell"> 678 + <div id="f4a7e7c1" class="cell"> 679 679 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Index</span> 680 680 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 681 681 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> Index()</span> ··· 694 694 </section> 695 695 <section id="atmosphere-atproto-federation" class="level3"> 696 696 <h3 class="anchored" data-anchor-id="atmosphere-atproto-federation">Atmosphere (ATProto Federation)</h3> 697 - <div id="0af8bcbf" class="cell"> 697 + <div id="85edc00e" class="cell"> 698 698 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> Client</span> 699 699 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 700 700 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> Client()</span>

+359 -58

docs/search.json

··· 1048 1048 "href": "api/DatasetDict.html", 1049 1049 "title": "DatasetDict", 1050 1050 "section": "", 1051 - "text": "DatasetDict(splits=None, sample_type=None, streaming=False)\nA dictionary of split names to Dataset instances.\nSimilar to HuggingFace’s DatasetDict, this provides a container for multiple dataset splits (train, test, validation, etc.) with convenience methods that operate across all splits.\nType Parameters: ST: The sample type for all datasets in this dict.\nExample: >>> ds_dict = load_dataset(“path/to/data”, MyData) >>> train = ds_dict[“train”] >>> test = ds_dict[“test”] >>> >>> # Iterate over all splits >>> for split_name, dataset in ds_dict.items(): … print(f”{split_name}: {len(dataset.shard_list)} shards”)\n\n\n\n\n\nName\nDescription\n\n\n\n\nnum_shards\nNumber of shards in each split.\n\n\nsample_type\nThe sample type for datasets in this dict.\n\n\nstreaming\nWhether this DatasetDict was loaded in streaming mode." 1051 + "text": "DatasetDict(splits=None, sample_type=None, streaming=False)\nA dictionary of split names to Dataset instances.\nSimilar to HuggingFace’s DatasetDict, this provides a container for multiple dataset splits (train, test, validation, etc.) with convenience methods that operate across all splits.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nST\n\nThe sample type for all datasets in this dict.\nrequired\n\n\n\n\n\n\n::\n>>> ds_dict = load_dataset(\"path/to/data\", MyData)\n>>> train = ds_dict[\"train\"]\n>>> test = ds_dict[\"test\"]\n>>>\n>>> # Iterate over all splits\n>>> for split_name, dataset in ds_dict.items():\n... print(f\"{split_name}: {len(dataset.shard_list)} shards\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nnum_shards\nNumber of shards in each split.\n\n\nsample_type\nThe sample type for datasets in this dict.\n\n\nstreaming\nWhether this DatasetDict was loaded in streaming mode." 1052 + }, 1053 + { 1054 + "objectID": "api/DatasetDict.html#parameters", 1055 + "href": "api/DatasetDict.html#parameters", 1056 + "title": "DatasetDict", 1057 + "section": "", 1058 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\nST\n\nThe sample type for all datasets in this dict.\nrequired" 1059 + }, 1060 + { 1061 + "objectID": "api/DatasetDict.html#example", 1062 + "href": "api/DatasetDict.html#example", 1063 + "title": "DatasetDict", 1064 + "section": "", 1065 + "text": "::\n>>> ds_dict = load_dataset(\"path/to/data\", MyData)\n>>> train = ds_dict[\"train\"]\n>>> test = ds_dict[\"test\"]\n>>>\n>>> # Iterate over all splits\n>>> for split_name, dataset in ds_dict.items():\n... print(f\"{split_name}: {len(dataset.shard_list)} shards\")" 1052 1066 }, 1053 1067 { 1054 1068 "objectID": "api/DatasetDict.html#attributes", ··· 1062 1076 "href": "api/AtmosphereClient.html", 1063 1077 "title": "AtmosphereClient", 1064 1078 "section": "", 1065 - "text": "atmosphere.AtmosphereClient(base_url=None, *, _client=None)\nATProto client wrapper for atdata operations.\nThis class wraps the atproto SDK client and provides higher-level methods for working with atdata records (schemas, datasets, lenses).\nExample: >>> client = AtmosphereClient() >>> client.login(“alice.bsky.social”, “app-password”) >>> print(client.did) ‘did:plc:…’\nNote: The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.\n\n\n\n\n\nName\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\nArgs: collection: The NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’). record: The record data. Must include a ‘$type’ field. rkey: Optional explicit record key. If not provided, a TID is generated. validate: Whether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nReturns: The AT URI of the created record.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If record creation fails.\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\nArgs: uri: The AT URI of the record to delete. swap_commit: Optional CID for compare-and-swap delete.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If deletion fails.\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\nReturns: Session string that can be passed to login_with_session().\nRaises: ValueError: If not authenticated.\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\nArgs: did: The DID of the repository containing the blob. cid: The CID of the blob.\nReturns: The blob data as bytes.\nRaises: ValueError: If PDS endpoint cannot be resolved. requests.HTTPError: If blob fetch fails.\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\nArgs: did: The DID of the repository containing the blob. cid: The CID of the blob.\nReturns: The full URL for fetching the blob.\nRaises: ValueError: If PDS endpoint cannot be resolved.\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\nArgs: uri: The AT URI of the record.\nReturns: The record data as a dictionary.\nRaises: atproto.exceptions.AtProtocolError: If record not found.\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of dataset records.\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of lens records.\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\nArgs: collection: The NSID of the record collection. repo: The DID of the repository to query. Defaults to the authenticated user’s repository. limit: Maximum number of records to return (default 100). cursor: Pagination cursor from a previous call.\nReturns: A tuple of (records, next_cursor). The cursor is None if there are no more records.\nRaises: ValueError: If repo is None and not authenticated.\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of schema records.\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\nArgs: handle: Your Bluesky handle (e.g., ‘alice.bsky.social’). password: App-specific password (not your main password).\nRaises: atproto.exceptions.AtProtocolError: If authentication fails.\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\nArgs: session_string: Session string from export_session().\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\nArgs: collection: The NSID of the record collection. rkey: The record key. record: The record data. Must include a ‘$type’ field. validate: Whether to validate against the Lexicon schema. swap_commit: Optional CID for compare-and-swap update.\nReturns: The AT URI of the record.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If operation fails.\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\nArgs: data: Binary data to upload. mime_type: MIME type of the data (for reference, not enforced by PDS).\nReturns: A blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’. This can be embedded directly in record fields.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If upload fails." 1079 + "text": "atmosphere.AtmosphereClient(base_url=None, *, _client=None)\nATProto client wrapper for atdata operations.\nThis class wraps the atproto SDK client and provides higher-level methods for working with atdata records (schemas, datasets, lenses).\n\n\n::\n>>> client = AtmosphereClient()\n>>> client.login(\"alice.bsky.social\", \"app-password\")\n>>> print(client.did)\n'did:plc:...'\n\n\n\nThe password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1080 + }, 1081 + { 1082 + "objectID": "api/AtmosphereClient.html#example", 1083 + "href": "api/AtmosphereClient.html#example", 1084 + "title": "AtmosphereClient", 1085 + "section": "", 1086 + "text": "::\n>>> client = AtmosphereClient()\n>>> client.login(\"alice.bsky.social\", \"app-password\")\n>>> print(client.did)\n'did:plc:...'" 1087 + }, 1088 + { 1089 + "objectID": "api/AtmosphereClient.html#note", 1090 + "href": "api/AtmosphereClient.html#note", 1091 + "title": "AtmosphereClient", 1092 + "section": "", 1093 + "text": "The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings." 1066 1094 }, 1067 1095 { 1068 1096 "objectID": "api/AtmosphereClient.html#attributes", ··· 1076 1104 "href": "api/AtmosphereClient.html#methods", 1077 1105 "title": "AtmosphereClient", 1078 1106 "section": "", 1079 - "text": "Name\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\nArgs: collection: The NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’). record: The record data. Must include a ‘$type’ field. rkey: Optional explicit record key. If not provided, a TID is generated. validate: Whether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nReturns: The AT URI of the created record.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If record creation fails.\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\nArgs: uri: The AT URI of the record to delete. swap_commit: Optional CID for compare-and-swap delete.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If deletion fails.\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\nReturns: Session string that can be passed to login_with_session().\nRaises: ValueError: If not authenticated.\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\nArgs: did: The DID of the repository containing the blob. cid: The CID of the blob.\nReturns: The blob data as bytes.\nRaises: ValueError: If PDS endpoint cannot be resolved. requests.HTTPError: If blob fetch fails.\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\nArgs: did: The DID of the repository containing the blob. cid: The CID of the blob.\nReturns: The full URL for fetching the blob.\nRaises: ValueError: If PDS endpoint cannot be resolved.\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\nArgs: uri: The AT URI of the record.\nReturns: The record data as a dictionary.\nRaises: atproto.exceptions.AtProtocolError: If record not found.\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of dataset records.\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of lens records.\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\nArgs: collection: The NSID of the record collection. repo: The DID of the repository to query. Defaults to the authenticated user’s repository. limit: Maximum number of records to return (default 100). cursor: Pagination cursor from a previous call.\nReturns: A tuple of (records, next_cursor). The cursor is None if there are no more records.\nRaises: ValueError: If repo is None and not authenticated.\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\nArgs: repo: The DID to query. Defaults to authenticated user. limit: Maximum number to return.\nReturns: List of schema records.\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\nArgs: handle: Your Bluesky handle (e.g., ‘alice.bsky.social’). password: App-specific password (not your main password).\nRaises: atproto.exceptions.AtProtocolError: If authentication fails.\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\nArgs: session_string: Session string from export_session().\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\nArgs: collection: The NSID of the record collection. rkey: The record key. record: The record data. Must include a ‘$type’ field. validate: Whether to validate against the Lexicon schema. swap_commit: Optional CID for compare-and-swap update.\nReturns: The AT URI of the record.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If operation fails.\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\nArgs: data: Binary data to upload. mime_type: MIME type of the data (for reference, not enforced by PDS).\nReturns: A blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’. This can be embedded directly in record fields.\nRaises: ValueError: If not authenticated. atproto.exceptions.AtProtocolError: If upload fails." 1107 + "text": "Name\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1080 1108 }, 1081 1109 { 1082 1110 "objectID": "api/DictSample.html", 1083 1111 "href": "api/DictSample.html", 1084 1112 "title": "DictSample", 1085 1113 "section": "", 1086 - "text": "DictSample(_data=None, **kwargs)\nDynamic sample type providing dict-like access to raw msgpack data.\nThis class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample[\"field\"]) access to fields.\nDictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema\nTo convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.\nExample: >>> ds = load_dataset(“path/to/data.tar”) # Returns DatasetDictSample >>> for sample in ds.ordered(): … print(sample.some_field) # Attribute access … print(sample[“other_field”]) # Dict access … print(sample.keys()) # Inspect available fields … >>> # Convert to typed schema >>> typed_ds = ds.as_type(MyTypedSample)\nNote: NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a DictSample from raw msgpack bytes.\n\n\nfrom_data\nCreate a DictSample from unpacked msgpack data.\n\n\nget\nGet a field value with optional default.\n\n\nitems\nReturn list of (field_name, value) tuples.\n\n\nkeys\nReturn list of field names.\n\n\nto_dict\nReturn a copy of the underlying data dictionary.\n\n\nvalues\nReturn list of field values.\n\n\n\n\n\nDictSample.from_bytes(bs)\nCreate a DictSample from raw msgpack bytes.\nArgs: bs: Raw bytes from a msgpack-serialized sample.\nReturns: New DictSample instance with the unpacked data.\n\n\n\nDictSample.from_data(data)\nCreate a DictSample from unpacked msgpack data.\nArgs: data: Dictionary with field names as keys.\nReturns: New DictSample instance wrapping the data.\n\n\n\nDictSample.get(key, default=None)\nGet a field value with optional default.\nArgs: key: Field name to access. default: Value to return if field doesn’t exist.\nReturns: The field value or default.\n\n\n\nDictSample.items()\nReturn list of (field_name, value) tuples.\n\n\n\nDictSample.keys()\nReturn list of field names.\n\n\n\nDictSample.to_dict()\nReturn a copy of the underlying data dictionary.\n\n\n\nDictSample.values()\nReturn list of field values." 1114 + "text": "DictSample(_data=None, **kwargs)\nDynamic sample type providing dict-like access to raw msgpack data.\nThis class is the default sample type for datasets when no explicit type is specified. It stores the raw unpacked msgpack data and provides both attribute-style (sample.field) and dict-style (sample[\"field\"]) access to fields.\nDictSample is useful for: - Exploring datasets without defining a schema first - Working with datasets that have variable schemas - Prototyping before committing to a typed schema\nTo convert to a typed schema, use Dataset.as_type() with a @packable-decorated class. Every @packable class automatically registers a lens from DictSample, making this conversion seamless.\n\n\n::\n>>> ds = load_dataset(\"path/to/data.tar\") # Returns Dataset[DictSample]\n>>> for sample in ds.ordered():\n... print(sample.some_field) # Attribute access\n... print(sample[\"other_field\"]) # Dict access\n... print(sample.keys()) # Inspect available fields\n...\n>>> # Convert to typed schema\n>>> typed_ds = ds.as_type(MyTypedSample)\n\n\n\nNDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a DictSample from raw msgpack bytes.\n\n\nfrom_data\nCreate a DictSample from unpacked msgpack data.\n\n\nget\nGet a field value with optional default.\n\n\nitems\nReturn list of (field_name, value) tuples.\n\n\nkeys\nReturn list of field names.\n\n\nto_dict\nReturn a copy of the underlying data dictionary.\n\n\nvalues\nReturn list of field values.\n\n\n\n\n\nDictSample.from_bytes(bs)\nCreate a DictSample from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDictSample\nNew DictSample instance with the unpacked data.\n\n\n\n\n\n\n\nDictSample.from_data(data)\nCreate a DictSample from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\ndict[str, Any]\nDictionary with field names as keys.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDictSample\nNew DictSample instance wrapping the data.\n\n\n\n\n\n\n\nDictSample.get(key, default=None)\nGet a field value with optional default.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nkey\nstr\nField name to access.\nrequired\n\n\ndefault\nAny\nValue to return if field doesn’t exist.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAny\nThe field value or default.\n\n\n\n\n\n\n\nDictSample.items()\nReturn list of (field_name, value) tuples.\n\n\n\nDictSample.keys()\nReturn list of field names.\n\n\n\nDictSample.to_dict()\nReturn a copy of the underlying data dictionary.\n\n\n\nDictSample.values()\nReturn list of field values." 1115 + }, 1116 + { 1117 + "objectID": "api/DictSample.html#example", 1118 + "href": "api/DictSample.html#example", 1119 + "title": "DictSample", 1120 + "section": "", 1121 + "text": "::\n>>> ds = load_dataset(\"path/to/data.tar\") # Returns Dataset[DictSample]\n>>> for sample in ds.ordered():\n... print(sample.some_field) # Attribute access\n... print(sample[\"other_field\"]) # Dict access\n... print(sample.keys()) # Inspect available fields\n...\n>>> # Convert to typed schema\n>>> typed_ds = ds.as_type(MyTypedSample)" 1122 + }, 1123 + { 1124 + "objectID": "api/DictSample.html#note", 1125 + "href": "api/DictSample.html#note", 1126 + "title": "DictSample", 1127 + "section": "", 1128 + "text": "NDArray fields are stored as raw bytes in DictSample. They are only converted to numpy arrays when accessed through a typed sample class." 1087 1129 }, 1088 1130 { 1089 1131 "objectID": "api/DictSample.html#attributes", ··· 1097 1139 "href": "api/DictSample.html#methods", 1098 1140 "title": "DictSample", 1099 1141 "section": "", 1100 - "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a DictSample from raw msgpack bytes.\n\n\nfrom_data\nCreate a DictSample from unpacked msgpack data.\n\n\nget\nGet a field value with optional default.\n\n\nitems\nReturn list of (field_name, value) tuples.\n\n\nkeys\nReturn list of field names.\n\n\nto_dict\nReturn a copy of the underlying data dictionary.\n\n\nvalues\nReturn list of field values.\n\n\n\n\n\nDictSample.from_bytes(bs)\nCreate a DictSample from raw msgpack bytes.\nArgs: bs: Raw bytes from a msgpack-serialized sample.\nReturns: New DictSample instance with the unpacked data.\n\n\n\nDictSample.from_data(data)\nCreate a DictSample from unpacked msgpack data.\nArgs: data: Dictionary with field names as keys.\nReturns: New DictSample instance wrapping the data.\n\n\n\nDictSample.get(key, default=None)\nGet a field value with optional default.\nArgs: key: Field name to access. default: Value to return if field doesn’t exist.\nReturns: The field value or default.\n\n\n\nDictSample.items()\nReturn list of (field_name, value) tuples.\n\n\n\nDictSample.keys()\nReturn list of field names.\n\n\n\nDictSample.to_dict()\nReturn a copy of the underlying data dictionary.\n\n\n\nDictSample.values()\nReturn list of field values." 1142 + "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a DictSample from raw msgpack bytes.\n\n\nfrom_data\nCreate a DictSample from unpacked msgpack data.\n\n\nget\nGet a field value with optional default.\n\n\nitems\nReturn list of (field_name, value) tuples.\n\n\nkeys\nReturn list of field names.\n\n\nto_dict\nReturn a copy of the underlying data dictionary.\n\n\nvalues\nReturn list of field values.\n\n\n\n\n\nDictSample.from_bytes(bs)\nCreate a DictSample from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDictSample\nNew DictSample instance with the unpacked data.\n\n\n\n\n\n\n\nDictSample.from_data(data)\nCreate a DictSample from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\ndict[str, Any]\nDictionary with field names as keys.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDictSample\nNew DictSample instance wrapping the data.\n\n\n\n\n\n\n\nDictSample.get(key, default=None)\nGet a field value with optional default.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nkey\nstr\nField name to access.\nrequired\n\n\ndefault\nAny\nValue to return if field doesn’t exist.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAny\nThe field value or default.\n\n\n\n\n\n\n\nDictSample.items()\nReturn list of (field_name, value) tuples.\n\n\n\nDictSample.keys()\nReturn list of field names.\n\n\n\nDictSample.to_dict()\nReturn a copy of the underlying data dictionary.\n\n\n\nDictSample.values()\nReturn list of field values." 1101 1143 }, 1102 1144 { 1103 1145 "objectID": "api/LensLoader.html", 1104 1146 "href": "api/LensLoader.html", 1105 1147 "title": "LensLoader", 1106 1148 "section": "", 1107 - "text": "atmosphere.LensLoader(client)\nLoads lens records from ATProto.\nThis class fetches lens transformation records. Note that actually using a lens requires installing the referenced code and importing it manually.\nExample: >>> client = AtmosphereClient() >>> loader = LensLoader(client) >>> >>> record = loader.get(“at://did:plc:abc/ac.foundation.dataset.lens/xyz”) >>> print(record[“name”]) >>> print(record[“sourceSchema”]) >>> print(record.get(“getterCode”, {}).get(“repository”))\n\n\n\n\n\nName\nDescription\n\n\n\n\nfind_by_schemas\nFind lenses that transform between specific schemas.\n\n\nget\nFetch a lens record by AT URI.\n\n\nlist_all\nList lens records from a repository.\n\n\n\n\n\natmosphere.LensLoader.find_by_schemas(\n source_schema_uri,\n target_schema_uri=None,\n repo=None,\n)\nFind lenses that transform between specific schemas.\nArgs: source_schema_uri: AT URI of the source schema. target_schema_uri: Optional AT URI of the target schema. If not provided, returns all lenses from the source. repo: The DID of the repository to search.\nReturns: List of matching lens records.\n\n\n\natmosphere.LensLoader.get(uri)\nFetch a lens record by AT URI.\nArgs: uri: The AT URI of the lens record.\nReturns: The lens record as a dictionary.\nRaises: ValueError: If the record is not a lens record.\n\n\n\natmosphere.LensLoader.list_all(repo=None, limit=100)\nList lens records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of lens records." 1149 + "text": "atmosphere.LensLoader(client)\nLoads lens records from ATProto.\nThis class fetches lens transformation records. Note that actually using a lens requires installing the referenced code and importing it manually.\n\n\n::\n>>> client = AtmosphereClient()\n>>> loader = LensLoader(client)\n>>>\n>>> record = loader.get(\"at://did:plc:abc/ac.foundation.dataset.lens/xyz\")\n>>> print(record[\"name\"])\n>>> print(record[\"sourceSchema\"])\n>>> print(record.get(\"getterCode\", {}).get(\"repository\"))\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfind_by_schemas\nFind lenses that transform between specific schemas.\n\n\nget\nFetch a lens record by AT URI.\n\n\nlist_all\nList lens records from a repository.\n\n\n\n\n\natmosphere.LensLoader.find_by_schemas(\n source_schema_uri,\n target_schema_uri=None,\n repo=None,\n)\nFind lenses that transform between specific schemas.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nOptional[str]\nOptional AT URI of the target schema. If not provided, returns all lenses from the source.\nNone\n\n\nrepo\nOptional[str]\nThe DID of the repository to search.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of matching lens records.\n\n\n\n\n\n\n\natmosphere.LensLoader.get(uri)\nFetch a lens record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the lens record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe lens record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a lens record.\n\n\n\n\n\n\n\natmosphere.LensLoader.list_all(repo=None, limit=100)\nList lens records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records." 1150 + }, 1151 + { 1152 + "objectID": "api/LensLoader.html#example", 1153 + "href": "api/LensLoader.html#example", 1154 + "title": "LensLoader", 1155 + "section": "", 1156 + "text": "::\n>>> client = AtmosphereClient()\n>>> loader = LensLoader(client)\n>>>\n>>> record = loader.get(\"at://did:plc:abc/ac.foundation.dataset.lens/xyz\")\n>>> print(record[\"name\"])\n>>> print(record[\"sourceSchema\"])\n>>> print(record.get(\"getterCode\", {}).get(\"repository\"))" 1108 1157 }, 1109 1158 { 1110 1159 "objectID": "api/LensLoader.html#methods", 1111 1160 "href": "api/LensLoader.html#methods", 1112 1161 "title": "LensLoader", 1113 1162 "section": "", 1114 - "text": "Name\nDescription\n\n\n\n\nfind_by_schemas\nFind lenses that transform between specific schemas.\n\n\nget\nFetch a lens record by AT URI.\n\n\nlist_all\nList lens records from a repository.\n\n\n\n\n\natmosphere.LensLoader.find_by_schemas(\n source_schema_uri,\n target_schema_uri=None,\n repo=None,\n)\nFind lenses that transform between specific schemas.\nArgs: source_schema_uri: AT URI of the source schema. target_schema_uri: Optional AT URI of the target schema. If not provided, returns all lenses from the source. repo: The DID of the repository to search.\nReturns: List of matching lens records.\n\n\n\natmosphere.LensLoader.get(uri)\nFetch a lens record by AT URI.\nArgs: uri: The AT URI of the lens record.\nReturns: The lens record as a dictionary.\nRaises: ValueError: If the record is not a lens record.\n\n\n\natmosphere.LensLoader.list_all(repo=None, limit=100)\nList lens records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of lens records." 1163 + "text": "Name\nDescription\n\n\n\n\nfind_by_schemas\nFind lenses that transform between specific schemas.\n\n\nget\nFetch a lens record by AT URI.\n\n\nlist_all\nList lens records from a repository.\n\n\n\n\n\natmosphere.LensLoader.find_by_schemas(\n source_schema_uri,\n target_schema_uri=None,\n repo=None,\n)\nFind lenses that transform between specific schemas.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nOptional[str]\nOptional AT URI of the target schema. If not provided, returns all lenses from the source.\nNone\n\n\nrepo\nOptional[str]\nThe DID of the repository to search.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of matching lens records.\n\n\n\n\n\n\n\natmosphere.LensLoader.get(uri)\nFetch a lens record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the lens record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe lens record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a lens record.\n\n\n\n\n\n\n\natmosphere.LensLoader.list_all(repo=None, limit=100)\nList lens records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records." 1115 1164 }, 1116 1165 { 1117 1166 "objectID": "api/AtmosphereIndex.html", 1118 1167 "href": "api/AtmosphereIndex.html", 1119 1168 "title": "AtmosphereIndex", 1120 1169 "section": "", 1121 - "text": "atmosphere.AtmosphereIndex(client)\nATProto index implementing AbstractIndex protocol.\nWraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.\nExample: >>> client = AtmosphereClient() >>> client.login(“handle.bsky.social”, “app-password”) >>> >>> index = AtmosphereIndex(client) >>> schema_ref = index.publish_schema(MySample, version=“1.0.0”) >>> entry = index.insert_dataset(dataset, name=“my-data”)\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\nArgs: ref: AT URI of the schema record.\nReturns: Dynamically generated Packable type.\nRaises: ValueError: If schema cannot be decoded.\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\nArgs: ref: AT URI of the dataset record.\nReturns: AtmosphereIndexEntry for the dataset.\nRaises: ValueError: If record is not a dataset.\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\nArgs: ref: AT URI of the schema record.\nReturns: Schema record dictionary.\nRaises: ValueError: If record is not a schema.\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\nArgs: ds: The Dataset to publish. name: Human-readable name. schema_ref: Optional schema AT URI. If None, auto-publishes schema. **kwargs: Additional options (description, tags, license).\nReturns: AtmosphereIndexEntry for the inserted dataset.\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\nArgs: repo: DID of repository. Defaults to authenticated user.\nReturns: List of AtmosphereIndexEntry for each dataset.\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\nArgs: repo: DID of repository. Defaults to authenticated user.\nReturns: List of schema records as dictionaries.\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\nArgs: sample_type: A Packable type (PackableSample subclass or @packable-decorated). version: Semantic version string. **kwargs: Additional options (description, metadata).\nReturns: AT URI of the schema record." 1170 + "text": "atmosphere.AtmosphereIndex(client)\nATProto index implementing AbstractIndex protocol.\nWraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.\n\n\n::\n>>> client = AtmosphereClient()\n>>> client.login(\"handle.bsky.social\", \"app-password\")\n>>>\n>>> index = AtmosphereIndex(client)\n>>> schema_ref = index.publish_schema(MySample, version=\"1.0.0\")\n>>> entry = index.insert_dataset(dataset, name=\"my-data\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nDynamically generated Packable type.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a schema.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable name.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional schema AT URI. If None, auto-publishes schema.\nNone\n\n\n**kwargs\n\nAdditional options (description, tags, license).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[AtmosphereIndexEntry]\nList of AtmosphereIndexEntry for each dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string.\n'1.0.0'\n\n\n**kwargs\n\nAdditional options (description, metadata).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the schema record." 1171 + }, 1172 + { 1173 + "objectID": "api/AtmosphereIndex.html#example", 1174 + "href": "api/AtmosphereIndex.html#example", 1175 + "title": "AtmosphereIndex", 1176 + "section": "", 1177 + "text": "::\n>>> client = AtmosphereClient()\n>>> client.login(\"handle.bsky.social\", \"app-password\")\n>>>\n>>> index = AtmosphereIndex(client)\n>>> schema_ref = index.publish_schema(MySample, version=\"1.0.0\")\n>>> entry = index.insert_dataset(dataset, name=\"my-data\")" 1122 1178 }, 1123 1179 { 1124 1180 "objectID": "api/AtmosphereIndex.html#attributes", ··· 1132 1188 "href": "api/AtmosphereIndex.html#methods", 1133 1189 "title": "AtmosphereIndex", 1134 1190 "section": "", 1135 - "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\nArgs: ref: AT URI of the schema record.\nReturns: Dynamically generated Packable type.\nRaises: ValueError: If schema cannot be decoded.\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\nArgs: ref: AT URI of the dataset record.\nReturns: AtmosphereIndexEntry for the dataset.\nRaises: ValueError: If record is not a dataset.\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\nArgs: ref: AT URI of the schema record.\nReturns: Schema record dictionary.\nRaises: ValueError: If record is not a schema.\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\nArgs: ds: The Dataset to publish. name: Human-readable name. schema_ref: Optional schema AT URI. If None, auto-publishes schema. **kwargs: Additional options (description, tags, license).\nReturns: AtmosphereIndexEntry for the inserted dataset.\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\nArgs: repo: DID of repository. Defaults to authenticated user.\nReturns: List of AtmosphereIndexEntry for each dataset.\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\nArgs: repo: DID of repository. Defaults to authenticated user.\nReturns: List of schema records as dictionaries.\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\nArgs: sample_type: A Packable type (PackableSample subclass or @packable-decorated). version: Semantic version string. **kwargs: Additional options (description, metadata).\nReturns: AT URI of the schema record." 1191 + "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nDynamically generated Packable type.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a schema.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable name.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional schema AT URI. If None, auto-publishes schema.\nNone\n\n\n**kwargs\n\nAdditional options (description, tags, license).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[AtmosphereIndexEntry]\nList of AtmosphereIndexEntry for each dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string.\n'1.0.0'\n\n\n**kwargs\n\nAdditional options (description, metadata).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the schema record." 1136 1192 }, 1137 1193 { 1138 1194 "objectID": "api/DataSource.html", 1139 1195 "href": "api/DataSource.html", 1140 1196 "title": "DataSource", 1141 1197 "section": "", 1142 - "text": "DataSource()\nProtocol for data sources that provide streams to Dataset.\nA DataSource abstracts over different ways of accessing dataset shards: - URLSource: Standard WebDataset-compatible URLs (http, https, pipe, gs, etc.) - S3Source: S3-compatible storage with explicit credentials - BlobSource: ATProto blob references (future)\nThe key method is shards(), which yields (identifier, stream) pairs. These are fed directly to WebDataset’s tar_file_expander, bypassing URL resolution entirely. This enables: - Private S3 repos with credentials - Custom endpoints (Cloudflare R2, MinIO) - ATProto blob streaming - Any other source that can provide file-like objects\nExample: >>> source = S3Source( … bucket=“my-bucket”, … keys=[“data-000.tar”, “data-001.tar”], … endpoint=“https://r2.example.com”, … credentials=creds, … ) >>> ds = DatasetMySample >>> for sample in ds.ordered(): … print(sample)\n\n\n\n\n\nName\nDescription\n\n\n\n\nshards\nLazily yield (identifier, stream) pairs for each shard.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlist_shards\nGet list of shard identifiers without opening streams.\n\n\nopen_shard\nOpen a single shard by its identifier.\n\n\n\n\n\nDataSource.list_shards()\nGet list of shard identifiers without opening streams.\nUsed for metadata queries like counting shards without actually streaming data. Implementations should return identifiers that match what shards would yield.\nReturns: List of shard identifier strings.\n\n\n\nDataSource.open_shard(shard_id)\nOpen a single shard by its identifier.\nThis method enables random access to individual shards, which is required for PyTorch DataLoader worker splitting. Each worker opens only its assigned shards rather than iterating all shards.\nArgs: shard_id: Shard identifier from shard_list.\nReturns: File-like stream for reading the shard.\nRaises: KeyError: If shard_id is not in shard_list." 1198 + "text": "DataSource()\nProtocol for data sources that provide streams to Dataset.\nA DataSource abstracts over different ways of accessing dataset shards: - URLSource: Standard WebDataset-compatible URLs (http, https, pipe, gs, etc.) - S3Source: S3-compatible storage with explicit credentials - BlobSource: ATProto blob references (future)\nThe key method is shards(), which yields (identifier, stream) pairs. These are fed directly to WebDataset’s tar_file_expander, bypassing URL resolution entirely. This enables: - Private S3 repos with credentials - Custom endpoints (Cloudflare R2, MinIO) - ATProto blob streaming - Any other source that can provide file-like objects\n\n\n::\n>>> source = S3Source(\n... bucket=\"my-bucket\",\n... keys=[\"data-000.tar\", \"data-001.tar\"],\n... endpoint=\"https://r2.example.com\",\n... credentials=creds,\n... )\n>>> ds = Dataset[MySample](source)\n>>> for sample in ds.ordered():\n... print(sample)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nshards\nLazily yield (identifier, stream) pairs for each shard.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlist_shards\nGet list of shard identifiers without opening streams.\n\n\nopen_shard\nOpen a single shard by its identifier.\n\n\n\n\n\nDataSource.list_shards()\nGet list of shard identifiers without opening streams.\nUsed for metadata queries like counting shards without actually streaming data. Implementations should return identifiers that match what shards would yield.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of shard identifier strings.\n\n\n\n\n\n\n\nDataSource.open_shard(shard_id)\nOpen a single shard by its identifier.\nThis method enables random access to individual shards, which is required for PyTorch DataLoader worker splitting. Each worker opens only its assigned shards rather than iterating all shards.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nShard identifier from shard_list.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nFile-like stream for reading the shard.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in shard_list." 1199 + }, 1200 + { 1201 + "objectID": "api/DataSource.html#example", 1202 + "href": "api/DataSource.html#example", 1203 + "title": "DataSource", 1204 + "section": "", 1205 + "text": "::\n>>> source = S3Source(\n... bucket=\"my-bucket\",\n... keys=[\"data-000.tar\", \"data-001.tar\"],\n... endpoint=\"https://r2.example.com\",\n... credentials=creds,\n... )\n>>> ds = Dataset[MySample](source)\n>>> for sample in ds.ordered():\n... print(sample)" 1143 1206 }, 1144 1207 { 1145 1208 "objectID": "api/DataSource.html#attributes", ··· 1153 1216 "href": "api/DataSource.html#methods", 1154 1217 "title": "DataSource", 1155 1218 "section": "", 1156 - "text": "Name\nDescription\n\n\n\n\nlist_shards\nGet list of shard identifiers without opening streams.\n\n\nopen_shard\nOpen a single shard by its identifier.\n\n\n\n\n\nDataSource.list_shards()\nGet list of shard identifiers without opening streams.\nUsed for metadata queries like counting shards without actually streaming data. Implementations should return identifiers that match what shards would yield.\nReturns: List of shard identifier strings.\n\n\n\nDataSource.open_shard(shard_id)\nOpen a single shard by its identifier.\nThis method enables random access to individual shards, which is required for PyTorch DataLoader worker splitting. Each worker opens only its assigned shards rather than iterating all shards.\nArgs: shard_id: Shard identifier from shard_list.\nReturns: File-like stream for reading the shard.\nRaises: KeyError: If shard_id is not in shard_list." 1219 + "text": "Name\nDescription\n\n\n\n\nlist_shards\nGet list of shard identifiers without opening streams.\n\n\nopen_shard\nOpen a single shard by its identifier.\n\n\n\n\n\nDataSource.list_shards()\nGet list of shard identifiers without opening streams.\nUsed for metadata queries like counting shards without actually streaming data. Implementations should return identifiers that match what shards would yield.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of shard identifier strings.\n\n\n\n\n\n\n\nDataSource.open_shard(shard_id)\nOpen a single shard by its identifier.\nThis method enables random access to individual shards, which is required for PyTorch DataLoader worker splitting. Each worker opens only its assigned shards rather than iterating all shards.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nShard identifier from shard_list.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nFile-like stream for reading the shard.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in shard_list." 1157 1220 }, 1158 1221 { 1159 1222 "objectID": "api/DatasetLoader.html", 1160 1223 "href": "api/DatasetLoader.html", 1161 1224 "title": "DatasetLoader", 1162 1225 "section": "", 1163 - "text": "atmosphere.DatasetLoader(client)\nLoads dataset records from ATProto.\nThis class fetches dataset index records and can create Dataset objects from them. Note that loading a dataset requires having the corresponding Python class for the sample type.\nExample: >>> client = AtmosphereClient() >>> loader = DatasetLoader(client) >>> >>> # List available datasets >>> datasets = loader.list() >>> for ds in datasets: … print(ds[“name”], ds[“schemaRef”]) >>> >>> # Get a specific dataset record >>> record = loader.get(“at://did:plc:abc/ac.foundation.dataset.record/xyz”)\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nFetch a dataset record by AT URI.\n\n\nget_blob_urls\nGet fetchable URLs for blob-stored dataset shards.\n\n\nget_blobs\nGet the blob references from a dataset record.\n\n\nget_metadata\nGet the metadata from a dataset record.\n\n\nget_storage_type\nGet the storage type of a dataset record.\n\n\nget_urls\nGet the WebDataset URLs from a dataset record.\n\n\nlist_all\nList dataset records from a repository.\n\n\nto_dataset\nCreate a Dataset object from an ATProto record.\n\n\n\n\n\natmosphere.DatasetLoader.get(uri)\nFetch a dataset record by AT URI.\nArgs: uri: The AT URI of the dataset record.\nReturns: The dataset record as a dictionary.\nRaises: ValueError: If the record is not a dataset record.\n\n\n\natmosphere.DatasetLoader.get_blob_urls(uri)\nGet fetchable URLs for blob-stored dataset shards.\nThis resolves the PDS endpoint and constructs URLs that can be used to fetch the blob data directly.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of URLs for fetching the blob data.\nRaises: ValueError: If storage type is not blobs or PDS cannot be resolved.\n\n\n\natmosphere.DatasetLoader.get_blobs(uri)\nGet the blob references from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of blob reference dicts with keys: $type, ref, mimeType, size.\nRaises: ValueError: If the storage type is not blobs.\n\n\n\natmosphere.DatasetLoader.get_metadata(uri)\nGet the metadata from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: The metadata dictionary, or None if no metadata.\n\n\n\natmosphere.DatasetLoader.get_storage_type(uri)\nGet the storage type of a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: Either “external” or “blobs”.\nRaises: ValueError: If storage type is unknown.\n\n\n\natmosphere.DatasetLoader.get_urls(uri)\nGet the WebDataset URLs from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of WebDataset URLs.\nRaises: ValueError: If the storage type is not external URLs.\n\n\n\natmosphere.DatasetLoader.list_all(repo=None, limit=100)\nList dataset records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of dataset records.\n\n\n\natmosphere.DatasetLoader.to_dataset(uri, sample_type)\nCreate a Dataset object from an ATProto record.\nThis method creates a Dataset instance from a published record. You must provide the sample type class, which should match the schema referenced by the record.\nSupports both external URL storage and ATProto blob storage.\nArgs: uri: The AT URI of the dataset record. sample_type: The Python class for the sample type.\nReturns: A Dataset instance configured from the record.\nRaises: ValueError: If no storage URLs can be resolved.\nExample: >>> loader = DatasetLoader(client) >>> dataset = loader.to_dataset(uri, MySampleType) >>> for batch in dataset.shuffled(batch_size=32): … process(batch)" 1226 + "text": "atmosphere.DatasetLoader(client)\nLoads dataset records from ATProto.\nThis class fetches dataset index records and can create Dataset objects from them. Note that loading a dataset requires having the corresponding Python class for the sample type.\n\n\n::\n>>> client = AtmosphereClient()\n>>> loader = DatasetLoader(client)\n>>>\n>>> # List available datasets\n>>> datasets = loader.list()\n>>> for ds in datasets:\n... print(ds[\"name\"], ds[\"schemaRef\"])\n>>>\n>>> # Get a specific dataset record\n>>> record = loader.get(\"at://did:plc:abc/ac.foundation.dataset.record/xyz\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nFetch a dataset record by AT URI.\n\n\nget_blob_urls\nGet fetchable URLs for blob-stored dataset shards.\n\n\nget_blobs\nGet the blob references from a dataset record.\n\n\nget_metadata\nGet the metadata from a dataset record.\n\n\nget_storage_type\nGet the storage type of a dataset record.\n\n\nget_urls\nGet the WebDataset URLs from a dataset record.\n\n\nlist_all\nList dataset records from a repository.\n\n\nto_dataset\nCreate a Dataset object from an ATProto record.\n\n\n\n\n\natmosphere.DatasetLoader.get(uri)\nFetch a dataset record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe dataset record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a dataset record.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_blob_urls(uri)\nGet fetchable URLs for blob-stored dataset shards.\nThis resolves the PDS endpoint and constructs URLs that can be used to fetch the blob data directly.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of URLs for fetching the blob data.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf storage type is not blobs or PDS cannot be resolved.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_blobs(uri)\nGet the blob references from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of blob reference dicts with keys: $type, ref, mimeType, size.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the storage type is not blobs.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_metadata(uri)\nGet the metadata from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nOptional[dict]\nThe metadata dictionary, or None if no metadata.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_storage_type(uri)\nGet the storage type of a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nEither “external” or “blobs”.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf storage type is unknown.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_urls(uri)\nGet the WebDataset URLs from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of WebDataset URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the storage type is not external URLs.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.list_all(repo=None, limit=100)\nList dataset records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.to_dataset(uri, sample_type)\nCreate a Dataset object from an ATProto record.\nThis method creates a Dataset instance from a published record. You must provide the sample type class, which should match the schema referenced by the record.\nSupports both external URL storage and ATProto blob storage.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\nsample_type\nType[ST]\nThe Python class for the sample type.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDataset[ST]\nA Dataset instance configured from the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no storage URLs can be resolved.\n\n\n\n\n\n\n::\n>>> loader = DatasetLoader(client)\n>>> dataset = loader.to_dataset(uri, MySampleType)\n>>> for batch in dataset.shuffled(batch_size=32):\n... process(batch)" 1227 + }, 1228 + { 1229 + "objectID": "api/DatasetLoader.html#example", 1230 + "href": "api/DatasetLoader.html#example", 1231 + "title": "DatasetLoader", 1232 + "section": "", 1233 + "text": "::\n>>> client = AtmosphereClient()\n>>> loader = DatasetLoader(client)\n>>>\n>>> # List available datasets\n>>> datasets = loader.list()\n>>> for ds in datasets:\n... print(ds[\"name\"], ds[\"schemaRef\"])\n>>>\n>>> # Get a specific dataset record\n>>> record = loader.get(\"at://did:plc:abc/ac.foundation.dataset.record/xyz\")" 1164 1234 }, 1165 1235 { 1166 1236 "objectID": "api/DatasetLoader.html#methods", 1167 1237 "href": "api/DatasetLoader.html#methods", 1168 1238 "title": "DatasetLoader", 1169 1239 "section": "", 1170 - "text": "Name\nDescription\n\n\n\n\nget\nFetch a dataset record by AT URI.\n\n\nget_blob_urls\nGet fetchable URLs for blob-stored dataset shards.\n\n\nget_blobs\nGet the blob references from a dataset record.\n\n\nget_metadata\nGet the metadata from a dataset record.\n\n\nget_storage_type\nGet the storage type of a dataset record.\n\n\nget_urls\nGet the WebDataset URLs from a dataset record.\n\n\nlist_all\nList dataset records from a repository.\n\n\nto_dataset\nCreate a Dataset object from an ATProto record.\n\n\n\n\n\natmosphere.DatasetLoader.get(uri)\nFetch a dataset record by AT URI.\nArgs: uri: The AT URI of the dataset record.\nReturns: The dataset record as a dictionary.\nRaises: ValueError: If the record is not a dataset record.\n\n\n\natmosphere.DatasetLoader.get_blob_urls(uri)\nGet fetchable URLs for blob-stored dataset shards.\nThis resolves the PDS endpoint and constructs URLs that can be used to fetch the blob data directly.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of URLs for fetching the blob data.\nRaises: ValueError: If storage type is not blobs or PDS cannot be resolved.\n\n\n\natmosphere.DatasetLoader.get_blobs(uri)\nGet the blob references from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of blob reference dicts with keys: $type, ref, mimeType, size.\nRaises: ValueError: If the storage type is not blobs.\n\n\n\natmosphere.DatasetLoader.get_metadata(uri)\nGet the metadata from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: The metadata dictionary, or None if no metadata.\n\n\n\natmosphere.DatasetLoader.get_storage_type(uri)\nGet the storage type of a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: Either “external” or “blobs”.\nRaises: ValueError: If storage type is unknown.\n\n\n\natmosphere.DatasetLoader.get_urls(uri)\nGet the WebDataset URLs from a dataset record.\nArgs: uri: The AT URI of the dataset record.\nReturns: List of WebDataset URLs.\nRaises: ValueError: If the storage type is not external URLs.\n\n\n\natmosphere.DatasetLoader.list_all(repo=None, limit=100)\nList dataset records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of dataset records.\n\n\n\natmosphere.DatasetLoader.to_dataset(uri, sample_type)\nCreate a Dataset object from an ATProto record.\nThis method creates a Dataset instance from a published record. You must provide the sample type class, which should match the schema referenced by the record.\nSupports both external URL storage and ATProto blob storage.\nArgs: uri: The AT URI of the dataset record. sample_type: The Python class for the sample type.\nReturns: A Dataset instance configured from the record.\nRaises: ValueError: If no storage URLs can be resolved.\nExample: >>> loader = DatasetLoader(client) >>> dataset = loader.to_dataset(uri, MySampleType) >>> for batch in dataset.shuffled(batch_size=32): … process(batch)" 1240 + "text": "Name\nDescription\n\n\n\n\nget\nFetch a dataset record by AT URI.\n\n\nget_blob_urls\nGet fetchable URLs for blob-stored dataset shards.\n\n\nget_blobs\nGet the blob references from a dataset record.\n\n\nget_metadata\nGet the metadata from a dataset record.\n\n\nget_storage_type\nGet the storage type of a dataset record.\n\n\nget_urls\nGet the WebDataset URLs from a dataset record.\n\n\nlist_all\nList dataset records from a repository.\n\n\nto_dataset\nCreate a Dataset object from an ATProto record.\n\n\n\n\n\natmosphere.DatasetLoader.get(uri)\nFetch a dataset record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe dataset record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a dataset record.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_blob_urls(uri)\nGet fetchable URLs for blob-stored dataset shards.\nThis resolves the PDS endpoint and constructs URLs that can be used to fetch the blob data directly.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of URLs for fetching the blob data.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf storage type is not blobs or PDS cannot be resolved.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_blobs(uri)\nGet the blob references from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of blob reference dicts with keys: $type, ref, mimeType, size.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the storage type is not blobs.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_metadata(uri)\nGet the metadata from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nOptional[dict]\nThe metadata dictionary, or None if no metadata.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_storage_type(uri)\nGet the storage type of a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nEither “external” or “blobs”.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf storage type is unknown.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.get_urls(uri)\nGet the WebDataset URLs from a dataset record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of WebDataset URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the storage type is not external URLs.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.list_all(repo=None, limit=100)\nList dataset records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.DatasetLoader.to_dataset(uri, sample_type)\nCreate a Dataset object from an ATProto record.\nThis method creates a Dataset instance from a published record. You must provide the sample type class, which should match the schema referenced by the record.\nSupports both external URL storage and ATProto blob storage.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the dataset record.\nrequired\n\n\nsample_type\nType[ST]\nThe Python class for the sample type.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDataset[ST]\nA Dataset instance configured from the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no storage URLs can be resolved.\n\n\n\n\n\n\n::\n>>> loader = DatasetLoader(client)\n>>> dataset = loader.to_dataset(uri, MySampleType)\n>>> for batch in dataset.shuffled(batch_size=32):\n... process(batch)" 1171 1241 }, 1172 1242 { 1173 1243 "objectID": "api/Lens.html", 1174 1244 "href": "api/Lens.html", 1175 1245 "title": "lens", 1176 1246 "section": "", 1177 - "text": "lens\nLens-based type transformations for datasets.\nThis module implements a lens system for bidirectional transformations between different sample types. Lenses enable viewing a dataset through different type schemas without duplicating the underlying data.\nKey components:\n\nLens: Bidirectional transformation with getter (S -> V) and optional putter (V, S -> S)\nLensNetwork: Global singleton registry for lens transformations\n@lens: Decorator to create and register lens transformations\n\nLenses support the functional programming concept of composable, well-behaved transformations that satisfy lens laws (GetPut and PutGet).\nExample: >>> @packable … class FullData: … name: str … age: int … embedding: NDArray … >>> @packable … class NameOnly: … name: str … >>> @lens … def name_view(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> @name_view.putter … def name_view_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age, … embedding=source.embedding) … >>> ds = DatasetFullData >>> ds_names = ds.as_type(NameOnly) # Uses registered lens\n\n\n\n\n\nName\nDescription\n\n\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nLensNetwork\nGlobal registry for lens transformations between sample types.\n\n\n\n\n\nlens.Lens(get, put=None)\nA bidirectional transformation between two sample types.\nA lens provides a way to view and update data of type S (source) as if it were type V (view). It consists of a getter that transforms S -> V and an optional putter that transforms (V, S) -> S, enabling updates to the view to be reflected back in the source.\nType Parameters: S: The source type, must derive from PackableSample. V: The view type, must derive from PackableSample.\nExample: >>> @lens … def name_lens(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> @name_lens.putter … def name_lens_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nTransform the source into the view type.\n\n\nput\nUpdate the source based on a modified view.\n\n\nputter\nDecorator to register a putter function for this lens.\n\n\n\n\n\nlens.Lens.get(s)\nTransform the source into the view type.\nArgs: s: The source sample of type S.\nReturns: A view of the source as type V.\n\n\n\nlens.Lens.put(v, s)\nUpdate the source based on a modified view.\nArgs: v: The modified view of type V. s: The original source of type S.\nReturns: An updated source of type S that reflects changes from the view.\n\n\n\nlens.Lens.putter(put)\nDecorator to register a putter function for this lens.\nArgs: put: A function that takes a view of type V and source of type S, and returns an updated source of type S.\nReturns: The putter function, allowing this to be used as a decorator.\nExample: >>> @my_lens.putter … def my_lens_put(view: ViewType, source: SourceType) -> SourceType: … return SourceType(…)\n\n\n\n\n\nlens.LensNetwork()\nGlobal registry for lens transformations between sample types.\nThis class implements a singleton pattern to maintain a global registry of all lenses decorated with @lens. It enables looking up transformations between different PackableSample types.\nAttributes: _instance: The singleton instance of this class. _registry: Dictionary mapping (source_type, view_type) tuples to their corresponding Lens objects.\n\n\n\n\n\nName\nDescription\n\n\n\n\nregister\nRegister a lens as the canonical transformation between two types.\n\n\ntransform\nLook up the lens transformation between two sample types.\n\n\n\n\n\nlens.LensNetwork.register(_lens)\nRegister a lens as the canonical transformation between two types.\nArgs: _lens: The lens to register. Will be stored in the registry under the key (_lens.source_type, _lens.view_type).\nNote: If a lens already exists for the same type pair, it will be overwritten.\n\n\n\nlens.LensNetwork.transform(source, view)\nLook up the lens transformation between two sample types.\nArgs: source: The source sample type (must derive from PackableSample). view: The target view type (must derive from PackableSample).\nReturns: The registered Lens that transforms from source to view.\nRaises: ValueError: If no lens has been registered for the given type pair.\nNote: Currently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlens\nDecorator to create and register a lens transformation.\n\n\n\n\n\nlens.lens(f)\nDecorator to create and register a lens transformation.\nThis decorator converts a getter function into a Lens object and automatically registers it in the global LensNetwork registry.\nArgs: f: A getter function that transforms from source type S to view type V. Must have exactly one parameter with a type annotation.\nReturns: A Lens[S, V] object that can be called to apply the transformation or decorated with @lens_name.putter to add a putter function.\nExample: >>> @lens … def extract_name(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> @extract_name.putter … def extract_name_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)" 1247 + "text": "lens\nLens-based type transformations for datasets.\nThis module implements a lens system for bidirectional transformations between different sample types. Lenses enable viewing a dataset through different type schemas without duplicating the underlying data.\nKey components:\n\nLens: Bidirectional transformation with getter (S -> V) and optional putter (V, S -> S)\nLensNetwork: Global singleton registry for lens transformations\n@lens: Decorator to create and register lens transformations\n\nLenses support the functional programming concept of composable, well-behaved transformations that satisfy lens laws (GetPut and PutGet).\n\n\n::\n>>> @packable\n... class FullData:\n... name: str\n... age: int\n... embedding: NDArray\n...\n>>> @packable\n... class NameOnly:\n... name: str\n...\n>>> @lens\n... def name_view(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @name_view.putter\n... def name_view_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age,\n... embedding=source.embedding)\n...\n>>> ds = Dataset[FullData](\"data.tar\")\n>>> ds_names = ds.as_type(NameOnly) # Uses registered lens\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nLensNetwork\nGlobal registry for lens transformations between sample types.\n\n\n\n\n\nlens.Lens(get, put=None)\nA bidirectional transformation between two sample types.\nA lens provides a way to view and update data of type S (source) as if it were type V (view). It consists of a getter that transforms S -> V and an optional putter that transforms (V, S) -> S, enabling updates to the view to be reflected back in the source.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nS\n\nThe source type, must derive from PackableSample.\nrequired\n\n\nV\n\nThe view type, must derive from PackableSample.\nrequired\n\n\n\n\n\n\n::\n>>> @lens\n... def name_lens(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @name_lens.putter\n... def name_lens_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nTransform the source into the view type.\n\n\nput\nUpdate the source based on a modified view.\n\n\nputter\nDecorator to register a putter function for this lens.\n\n\n\n\n\nlens.Lens.get(s)\nTransform the source into the view type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ns\nS\nThe source sample of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nV\nA view of the source as type V.\n\n\n\n\n\n\n\nlens.Lens.put(v, s)\nUpdate the source based on a modified view.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nv\nV\nThe modified view of type V.\nrequired\n\n\ns\nS\nThe original source of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nS\nAn updated source of type S that reflects changes from the view.\n\n\n\n\n\n\n\nlens.Lens.putter(put)\nDecorator to register a putter function for this lens.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nput\nLensPutter[S, V]\nA function that takes a view of type V and source of type S, and returns an updated source of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLensPutter[S, V]\nThe putter function, allowing this to be used as a decorator.\n\n\n\n\n\n\n::\n>>> @my_lens.putter\n... def my_lens_put(view: ViewType, source: SourceType) -> SourceType:\n... return SourceType(...)\n\n\n\n\n\n\nlens.LensNetwork()\nGlobal registry for lens transformations between sample types.\nThis class implements a singleton pattern to maintain a global registry of all lenses decorated with @lens. It enables looking up transformations between different PackableSample types.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_instance\n\nThe singleton instance of this class.\n\n\n_registry\nDict[LensSignature, Lens]\nDictionary mapping (source_type, view_type) tuples to their corresponding Lens objects.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nregister\nRegister a lens as the canonical transformation between two types.\n\n\ntransform\nLook up the lens transformation between two sample types.\n\n\n\n\n\nlens.LensNetwork.register(_lens)\nRegister a lens as the canonical transformation between two types.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\n_lens\nLens\nThe lens to register. Will be stored in the registry under the key (_lens.source_type, _lens.view_type).\nrequired\n\n\n\n\n\n\nIf a lens already exists for the same type pair, it will be overwritten.\n\n\n\n\nlens.LensNetwork.transform(source, view)\nLook up the lens transformation between two sample types.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsource\nDatasetType\nThe source sample type (must derive from PackableSample).\nrequired\n\n\nview\nDatasetType\nThe target view type (must derive from PackableSample).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLens\nThe registered Lens that transforms from source to view.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no lens has been registered for the given type pair.\n\n\n\n\n\n\nCurrently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented.\n\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlens\nDecorator to create and register a lens transformation.\n\n\n\n\n\nlens.lens(f)\nDecorator to create and register a lens transformation.\nThis decorator converts a getter function into a Lens object and automatically registers it in the global LensNetwork registry.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nf\nLensGetter[S, V]\nA getter function that transforms from source type S to view type V. Must have exactly one parameter with a type annotation.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLens[S, V]\nA Lens[S, V] object that can be called to apply the transformation\n\n\n\nLens[S, V]\nor decorated with @lens_name.putter to add a putter function.\n\n\n\n\n\n\n::\n>>> @lens\n... def extract_name(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @extract_name.putter\n... def extract_name_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age)" 1248 + }, 1249 + { 1250 + "objectID": "api/Lens.html#example", 1251 + "href": "api/Lens.html#example", 1252 + "title": "lens", 1253 + "section": "", 1254 + "text": "::\n>>> @packable\n... class FullData:\n... name: str\n... age: int\n... embedding: NDArray\n...\n>>> @packable\n... class NameOnly:\n... name: str\n...\n>>> @lens\n... def name_view(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @name_view.putter\n... def name_view_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age,\n... embedding=source.embedding)\n...\n>>> ds = Dataset[FullData](\"data.tar\")\n>>> ds_names = ds.as_type(NameOnly) # Uses registered lens" 1178 1255 }, 1179 1256 { 1180 1257 "objectID": "api/Lens.html#classes", 1181 1258 "href": "api/Lens.html#classes", 1182 1259 "title": "lens", 1183 1260 "section": "", 1184 - "text": "Name\nDescription\n\n\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nLensNetwork\nGlobal registry for lens transformations between sample types.\n\n\n\n\n\nlens.Lens(get, put=None)\nA bidirectional transformation between two sample types.\nA lens provides a way to view and update data of type S (source) as if it were type V (view). It consists of a getter that transforms S -> V and an optional putter that transforms (V, S) -> S, enabling updates to the view to be reflected back in the source.\nType Parameters: S: The source type, must derive from PackableSample. V: The view type, must derive from PackableSample.\nExample: >>> @lens … def name_lens(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> @name_lens.putter … def name_lens_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nTransform the source into the view type.\n\n\nput\nUpdate the source based on a modified view.\n\n\nputter\nDecorator to register a putter function for this lens.\n\n\n\n\n\nlens.Lens.get(s)\nTransform the source into the view type.\nArgs: s: The source sample of type S.\nReturns: A view of the source as type V.\n\n\n\nlens.Lens.put(v, s)\nUpdate the source based on a modified view.\nArgs: v: The modified view of type V. s: The original source of type S.\nReturns: An updated source of type S that reflects changes from the view.\n\n\n\nlens.Lens.putter(put)\nDecorator to register a putter function for this lens.\nArgs: put: A function that takes a view of type V and source of type S, and returns an updated source of type S.\nReturns: The putter function, allowing this to be used as a decorator.\nExample: >>> @my_lens.putter … def my_lens_put(view: ViewType, source: SourceType) -> SourceType: … return SourceType(…)\n\n\n\n\n\nlens.LensNetwork()\nGlobal registry for lens transformations between sample types.\nThis class implements a singleton pattern to maintain a global registry of all lenses decorated with @lens. It enables looking up transformations between different PackableSample types.\nAttributes: _instance: The singleton instance of this class. _registry: Dictionary mapping (source_type, view_type) tuples to their corresponding Lens objects.\n\n\n\n\n\nName\nDescription\n\n\n\n\nregister\nRegister a lens as the canonical transformation between two types.\n\n\ntransform\nLook up the lens transformation between two sample types.\n\n\n\n\n\nlens.LensNetwork.register(_lens)\nRegister a lens as the canonical transformation between two types.\nArgs: _lens: The lens to register. Will be stored in the registry under the key (_lens.source_type, _lens.view_type).\nNote: If a lens already exists for the same type pair, it will be overwritten.\n\n\n\nlens.LensNetwork.transform(source, view)\nLook up the lens transformation between two sample types.\nArgs: source: The source sample type (must derive from PackableSample). view: The target view type (must derive from PackableSample).\nReturns: The registered Lens that transforms from source to view.\nRaises: ValueError: If no lens has been registered for the given type pair.\nNote: Currently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented." 1261 + "text": "Name\nDescription\n\n\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nLensNetwork\nGlobal registry for lens transformations between sample types.\n\n\n\n\n\nlens.Lens(get, put=None)\nA bidirectional transformation between two sample types.\nA lens provides a way to view and update data of type S (source) as if it were type V (view). It consists of a getter that transforms S -> V and an optional putter that transforms (V, S) -> S, enabling updates to the view to be reflected back in the source.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nS\n\nThe source type, must derive from PackableSample.\nrequired\n\n\nV\n\nThe view type, must derive from PackableSample.\nrequired\n\n\n\n\n\n\n::\n>>> @lens\n... def name_lens(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @name_lens.putter\n... def name_lens_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nTransform the source into the view type.\n\n\nput\nUpdate the source based on a modified view.\n\n\nputter\nDecorator to register a putter function for this lens.\n\n\n\n\n\nlens.Lens.get(s)\nTransform the source into the view type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ns\nS\nThe source sample of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nV\nA view of the source as type V.\n\n\n\n\n\n\n\nlens.Lens.put(v, s)\nUpdate the source based on a modified view.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nv\nV\nThe modified view of type V.\nrequired\n\n\ns\nS\nThe original source of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nS\nAn updated source of type S that reflects changes from the view.\n\n\n\n\n\n\n\nlens.Lens.putter(put)\nDecorator to register a putter function for this lens.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nput\nLensPutter[S, V]\nA function that takes a view of type V and source of type S, and returns an updated source of type S.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLensPutter[S, V]\nThe putter function, allowing this to be used as a decorator.\n\n\n\n\n\n\n::\n>>> @my_lens.putter\n... def my_lens_put(view: ViewType, source: SourceType) -> SourceType:\n... return SourceType(...)\n\n\n\n\n\n\nlens.LensNetwork()\nGlobal registry for lens transformations between sample types.\nThis class implements a singleton pattern to maintain a global registry of all lenses decorated with @lens. It enables looking up transformations between different PackableSample types.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_instance\n\nThe singleton instance of this class.\n\n\n_registry\nDict[LensSignature, Lens]\nDictionary mapping (source_type, view_type) tuples to their corresponding Lens objects.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nregister\nRegister a lens as the canonical transformation between two types.\n\n\ntransform\nLook up the lens transformation between two sample types.\n\n\n\n\n\nlens.LensNetwork.register(_lens)\nRegister a lens as the canonical transformation between two types.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\n_lens\nLens\nThe lens to register. Will be stored in the registry under the key (_lens.source_type, _lens.view_type).\nrequired\n\n\n\n\n\n\nIf a lens already exists for the same type pair, it will be overwritten.\n\n\n\n\nlens.LensNetwork.transform(source, view)\nLook up the lens transformation between two sample types.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsource\nDatasetType\nThe source sample type (must derive from PackableSample).\nrequired\n\n\nview\nDatasetType\nThe target view type (must derive from PackableSample).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLens\nThe registered Lens that transforms from source to view.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no lens has been registered for the given type pair.\n\n\n\n\n\n\nCurrently only supports direct transformations. Compositional transformations (chaining multiple lenses) are not yet implemented." 1185 1262 }, 1186 1263 { 1187 1264 "objectID": "api/Lens.html#functions", 1188 1265 "href": "api/Lens.html#functions", 1189 1266 "title": "lens", 1190 1267 "section": "", 1191 - "text": "Name\nDescription\n\n\n\n\nlens\nDecorator to create and register a lens transformation.\n\n\n\n\n\nlens.lens(f)\nDecorator to create and register a lens transformation.\nThis decorator converts a getter function into a Lens object and automatically registers it in the global LensNetwork registry.\nArgs: f: A getter function that transforms from source type S to view type V. Must have exactly one parameter with a type annotation.\nReturns: A Lens[S, V] object that can be called to apply the transformation or decorated with @lens_name.putter to add a putter function.\nExample: >>> @lens … def extract_name(full: FullData) -> NameOnly: … return NameOnly(name=full.name) … >>> @extract_name.putter … def extract_name_put(view: NameOnly, source: FullData) -> FullData: … return FullData(name=view.name, age=source.age)" 1268 + "text": "Name\nDescription\n\n\n\n\nlens\nDecorator to create and register a lens transformation.\n\n\n\n\n\nlens.lens(f)\nDecorator to create and register a lens transformation.\nThis decorator converts a getter function into a Lens object and automatically registers it in the global LensNetwork registry.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nf\nLensGetter[S, V]\nA getter function that transforms from source type S to view type V. Must have exactly one parameter with a type annotation.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLens[S, V]\nA Lens[S, V] object that can be called to apply the transformation\n\n\n\nLens[S, V]\nor decorated with @lens_name.putter to add a putter function.\n\n\n\n\n\n\n::\n>>> @lens\n... def extract_name(full: FullData) -> NameOnly:\n... return NameOnly(name=full.name)\n...\n>>> @extract_name.putter\n... def extract_name_put(view: NameOnly, source: FullData) -> FullData:\n... return FullData(name=view.name, age=source.age)" 1192 1269 }, 1193 1270 { 1194 1271 "objectID": "api/local.Index.html", 1195 1272 "href": "api/local.Index.html", 1196 1273 "title": "local.Index", 1197 1274 "section": "", 1198 - "text": "local.Index(\n redis=None,\n data_store=None,\n auto_stubs=False,\n stub_dir=None,\n **kwargs,\n)\nRedis-backed index for tracking datasets in a repository.\nImplements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.\nWhen initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.\nAttributes: _redis: Redis connection for index storage. _data_store: Optional AbstractDataStore for writing dataset shards.\n\n\n\n\n\nName\nDescription\n\n\n\n\nall_entries\nGet all index entries as a list (deprecated, use list_entries()).\n\n\ndata_store\nThe data store for writing shards, or None if index-only.\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nentries\nIterate over all index entries.\n\n\nschemas\nIterate over all schema records in this index.\n\n\nstub_dir\nDirectory where stub files are written, or None if auto-stubs disabled.\n\n\ntypes\nNamespace for accessing loaded schema types.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\nArgs: ds: The dataset to add to the index. name: Human-readable name for the dataset. schema_ref: Optional schema reference. If None, generates from sample type. metadata: Optional metadata dictionary. If None, uses ds._metadata if available.\nReturns: The created LocalDatasetEntry object.\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\nReturns: Number of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\nArgs: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nReturns: A PackableSample subclass - either imported from a generated module (if auto_stubs is enabled) or dynamically created.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded.\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\nArgs: ref: Schema reference string. type_hint: The stub type to use for type hints. Import this from the generated stub file.\nReturns: The decoded type, cast to match the type_hint for IDE support.\nExample: >>> # After enabling auto_stubs and configuring IDE extraPaths: >>> from local.MySample_1_0_0 import MySample >>> >>> # This gives full IDE autocomplete: >>> DecodedType = index.decode_schema_as(ref, MySample) >>> sample = DecodedType(text=“hello”, value=42) # IDE knows signature!\nNote: The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\nArgs: ref: Dataset name.\nReturns: IndexEntry for the dataset.\nRaises: KeyError: If dataset not found.\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\nArgs: cid: Content identifier of the entry.\nReturns: LocalDatasetEntry for the given CID.\nRaises: KeyError: If entry not found.\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\nArgs: name: Human-readable name of the entry.\nReturns: LocalDatasetEntry with the given name.\nRaises: KeyError: If no entry with that name exists.\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\nArgs: ref: Schema reference string.\nReturns: Import path like “local.MySample_1_0_0”, or None if auto_stubs is disabled.\nExample: >>> index = LocalIndex(auto_stubs=True) >>> ref = index.publish_schema(MySample, version=“1.0.0”) >>> index.load_schema(ref) >>> print(index.get_import_path(ref)) local.MySample_1_0_0 >>> # Then in your code: >>> # from local.MySample_1_0_0 import MySample\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\nArgs: ref: Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nReturns: Schema record as a dictionary with keys ‘name’, ‘version’, ‘fields’, ‘$ref’, etc.\nRaises: KeyError: If schema not found. ValueError: If reference format is invalid.\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\nArgs: ref: Schema reference string.\nReturns: LocalSchemaRecord with schema details.\nRaises: KeyError: If schema not found. ValueError: If reference format is invalid.\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\nArgs: ds: The Dataset to register. name: Human-readable name for the dataset. schema_ref: Optional schema reference. **kwargs: Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\nReturns: IndexEntry for the inserted dataset.\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\nReturns: List of IndexEntry for each dataset.\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\nReturns: List of all LocalDatasetEntry objects in the index.\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\nReturns: List of schema records as dictionaries.\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\nArgs: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nReturns: The decoded PackableSample subclass. Also available via index.types.<ClassName> after this call.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded.\nExample: >>> # Load and use immediately >>> MyType = index.load_schema(“atdata://local/sampleSchema/MySample@1.0.0”) >>> sample = MyType(name=“hello”, value=42) >>> >>> # Or access later via namespace >>> index.load_schema(“atdata://local/sampleSchema/OtherType@1.0.0”) >>> other = index.types.OtherType(data=“test”)\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\nArgs: sample_type: The PackableSample subclass to publish. version: Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists. description: Optional human-readable description. If None, uses the class docstring.\nReturns: Schema reference string: ‘atdata://local/sampleSchema/{name}@version’.\nRaises: ValueError: If sample_type is not a dataclass. TypeError: If a field type is not supported." 1275 + "text": "local.Index(\n redis=None,\n data_store=None,\n auto_stubs=False,\n stub_dir=None,\n **kwargs,\n)\nRedis-backed index for tracking datasets in a repository.\nImplements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.\nWhen initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_redis\n\nRedis connection for index storage.\n\n\n_data_store\n\nOptional AbstractDataStore for writing dataset shards.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n>>> # After enabling auto_stubs and configuring IDE extraPaths:\n>>> from local.MySample_1_0_0 import MySample\n>>>\n>>> # This gives full IDE autocomplete:\n>>> DecodedType = index.decode_schema_as(ref, MySample)\n>>> sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n>>> index = LocalIndex(auto_stubs=True)\n>>> ref = index.publish_schema(MySample, version=\"1.0.0\")\n>>> index.load_schema(ref)\n>>> print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n>>> # Then in your code:\n>>> # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.<ClassName> after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n>>> # Load and use immediately\n>>> MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n>>> sample = MyType(name=\"hello\", value=42)\n>>>\n>>> # Or access later via namespace\n>>> index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n>>> other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nThe PackableSample subclass to publish.\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf a field type is not supported." 1199 1276 }, 1200 1277 { 1201 1278 "objectID": "api/local.Index.html#attributes", 1202 1279 "href": "api/local.Index.html#attributes", 1203 1280 "title": "local.Index", 1204 1281 "section": "", 1205 - "text": "Name\nDescription\n\n\n\n\nall_entries\nGet all index entries as a list (deprecated, use list_entries()).\n\n\ndata_store\nThe data store for writing shards, or None if index-only.\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nentries\nIterate over all index entries.\n\n\nschemas\nIterate over all schema records in this index.\n\n\nstub_dir\nDirectory where stub files are written, or None if auto-stubs disabled.\n\n\ntypes\nNamespace for accessing loaded schema types." 1282 + "text": "Name\nType\nDescription\n\n\n\n\n_redis\n\nRedis connection for index storage.\n\n\n_data_store\n\nOptional AbstractDataStore for writing dataset shards." 1206 1283 }, 1207 1284 { 1208 1285 "objectID": "api/local.Index.html#methods", 1209 1286 "href": "api/local.Index.html#methods", 1210 1287 "title": "local.Index", 1211 1288 "section": "", 1212 - "text": "Name\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\nArgs: ds: The dataset to add to the index. name: Human-readable name for the dataset. schema_ref: Optional schema reference. If None, generates from sample type. metadata: Optional metadata dictionary. If None, uses ds._metadata if available.\nReturns: The created LocalDatasetEntry object.\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\nReturns: Number of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\nArgs: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nReturns: A PackableSample subclass - either imported from a generated module (if auto_stubs is enabled) or dynamically created.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded.\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\nArgs: ref: Schema reference string. type_hint: The stub type to use for type hints. Import this from the generated stub file.\nReturns: The decoded type, cast to match the type_hint for IDE support.\nExample: >>> # After enabling auto_stubs and configuring IDE extraPaths: >>> from local.MySample_1_0_0 import MySample >>> >>> # This gives full IDE autocomplete: >>> DecodedType = index.decode_schema_as(ref, MySample) >>> sample = DecodedType(text=“hello”, value=42) # IDE knows signature!\nNote: The type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\nArgs: ref: Dataset name.\nReturns: IndexEntry for the dataset.\nRaises: KeyError: If dataset not found.\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\nArgs: cid: Content identifier of the entry.\nReturns: LocalDatasetEntry for the given CID.\nRaises: KeyError: If entry not found.\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\nArgs: name: Human-readable name of the entry.\nReturns: LocalDatasetEntry with the given name.\nRaises: KeyError: If no entry with that name exists.\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\nArgs: ref: Schema reference string.\nReturns: Import path like “local.MySample_1_0_0”, or None if auto_stubs is disabled.\nExample: >>> index = LocalIndex(auto_stubs=True) >>> ref = index.publish_schema(MySample, version=“1.0.0”) >>> index.load_schema(ref) >>> print(index.get_import_path(ref)) local.MySample_1_0_0 >>> # Then in your code: >>> # from local.MySample_1_0_0 import MySample\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\nArgs: ref: Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nReturns: Schema record as a dictionary with keys ‘name’, ‘version’, ‘fields’, ‘$ref’, etc.\nRaises: KeyError: If schema not found. ValueError: If reference format is invalid.\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\nArgs: ref: Schema reference string.\nReturns: LocalSchemaRecord with schema details.\nRaises: KeyError: If schema not found. ValueError: If reference format is invalid.\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\nArgs: ds: The Dataset to register. name: Human-readable name for the dataset. schema_ref: Optional schema reference. **kwargs: Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\nReturns: IndexEntry for the inserted dataset.\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\nReturns: List of IndexEntry for each dataset.\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\nReturns: List of all LocalDatasetEntry objects in the index.\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\nReturns: List of schema records as dictionaries.\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\nArgs: ref: Schema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nReturns: The decoded PackableSample subclass. Also available via index.types.<ClassName> after this call.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded.\nExample: >>> # Load and use immediately >>> MyType = index.load_schema(“atdata://local/sampleSchema/MySample@1.0.0”) >>> sample = MyType(name=“hello”, value=42) >>> >>> # Or access later via namespace >>> index.load_schema(“atdata://local/sampleSchema/OtherType@1.0.0”) >>> other = index.types.OtherType(data=“test”)\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\nArgs: sample_type: The PackableSample subclass to publish. version: Semantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists. description: Optional human-readable description. If None, uses the class docstring.\nReturns: Schema reference string: ‘atdata://local/sampleSchema/{name}@version’.\nRaises: ValueError: If sample_type is not a dataclass. TypeError: If a field type is not supported." 1289 + "text": "Name\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n>>> # After enabling auto_stubs and configuring IDE extraPaths:\n>>> from local.MySample_1_0_0 import MySample\n>>>\n>>> # This gives full IDE autocomplete:\n>>> DecodedType = index.decode_schema_as(ref, MySample)\n>>> sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n>>> index = LocalIndex(auto_stubs=True)\n>>> ref = index.publish_schema(MySample, version=\"1.0.0\")\n>>> index.load_schema(ref)\n>>> print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n>>> # Then in your code:\n>>> # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.<ClassName> after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n>>> # Load and use immediately\n>>> MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n>>> sample = MyType(name=\"hello\", value=42)\n>>>\n>>> # Or access later via namespace\n>>> index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n>>> other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nThe PackableSample subclass to publish.\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf a field type is not supported." 1213 1290 }, 1214 1291 { 1215 1292 "objectID": "api/Dataset.html", 1216 1293 "href": "api/Dataset.html", 1217 1294 "title": "Dataset", 1218 1295 "section": "", 1219 - "text": "Dataset(source=None, metadata_url=None, *, url=None)\nA typed dataset built on WebDataset with lens transformations.\nThis class wraps WebDataset tar archives and provides type-safe iteration over samples of a specific PackableSample type. Samples are stored as msgpack-serialized data within WebDataset shards.\nThe dataset supports: - Ordered and shuffled iteration - Automatic batching with SampleBatch - Type transformations via the lens system (as_type()) - Export to parquet format\nType Parameters: ST: The sample type for this dataset, must derive from PackableSample.\nAttributes: url: WebDataset brace-notation URL for the tar file(s).\nExample: >>> ds = DatasetMyData >>> for sample in ds.ordered(batch_size=32): … # sample is SampleBatch[MyData] with batch_size samples … embeddings = sample.embeddings # shape: (32, …) … >>> # Transform to a different view >>> ds_view = ds.as_type(MyDataView)\nNote: This class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax Dataset[MyType](url) rather than calling the constructor directly with an unsubscripted class.\n\n\n\n\n\nName\nDescription\n\n\n\n\nbatch_type\nThe type of batches produced by this dataset.\n\n\nmetadata\nFetch and cache metadata from metadata_url.\n\n\nmetadata_url\nOptional URL to msgpack-encoded metadata for this dataset.\n\n\nsample_type\nThe type of each returned sample from this dataset’s iterator.\n\n\nshard_list\nList of individual dataset shards (deprecated, use list_shards()).\n\n\nsource\nThe underlying data source for this dataset.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_type\nView this dataset through a different sample type using a registered lens.\n\n\nlist_shards\nGet list of individual dataset shards.\n\n\nordered\nIterate over the dataset in order\n\n\nshuffled\nIterate over the dataset in random order.\n\n\nto_parquet\nExport dataset contents to parquet format.\n\n\nwrap\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\nwrap_batch\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nDataset.as_type(other)\nView this dataset through a different sample type using a registered lens.\nArgs: other: The target sample type to transform into. Must be a type derived from PackableSample.\nReturns: A new Dataset instance that yields samples of type other by applying the appropriate lens transformation from the global LensNetwork registry.\nRaises: ValueError: If no registered lens exists between the current sample type and the target type.\n\n\n\nDataset.list_shards()\nGet list of individual dataset shards.\nReturns: A full (non-lazy) list of the individual tar files within the source WebDataset.\n\n\n\nDataset.ordered(batch_size=None)\nIterate over the dataset in order\nArgs: batch_size (:obj:int, optional): The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nReturns: :obj:webdataset.DataPipeline A data pipeline that iterates over the dataset in its original sample order\n\n\n\nDataset.shuffled(buffer_shards=100, buffer_samples=10000, batch_size=None)\nIterate over the dataset in random order.\nArgs: buffer_shards: Number of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100. buffer_samples: Number of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000. batch_size: The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nReturns: A WebDataset data pipeline that iterates over the dataset in randomized order. If batch_size is not None, yields SampleBatch[ST] instances; otherwise yields individual ST samples.\n\n\n\nDataset.to_parquet(path, sample_map=None, maxcount=None, **kwargs)\nExport dataset contents to parquet format.\nConverts all samples to a pandas DataFrame and saves to parquet file(s). Useful for interoperability with data analysis tools.\nArgs: path: Output path for the parquet file. If maxcount is specified, files are named {stem}-{segment:06d}.parquet. sample_map: Optional function to convert samples to dictionaries. Defaults to dataclasses.asdict. maxcount: If specified, split output into multiple files with at most this many samples each. Recommended for large datasets. **kwargs: Additional arguments passed to pandas.DataFrame.to_parquet(). Common options include compression, index, engine.\nWarning: Memory Usage: When maxcount=None (default), this method loads the entire dataset into memory as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.\nFor datasets larger than available RAM, always specify ``maxcount``::\n\n # Safe for large datasets - processes in chunks\n ds.to_parquet(\"output.parquet\", maxcount=10000)\n\nThis creates multiple parquet files: ``output-000000.parquet``,\n``output-000001.parquet``, etc.\nExample: >>> ds = DatasetMySample >>> # Small dataset - load all at once >>> ds.to_parquet(“output.parquet”) >>> >>> # Large dataset - process in chunks >>> ds.to_parquet(“output.parquet”, maxcount=50000)\n\n\n\nDataset.wrap(sample)\nWrap a raw msgpack sample into the appropriate dataset-specific type.\nArgs: sample: A dictionary containing at minimum a 'msgpack' key with serialized sample bytes.\nReturns: A deserialized sample of type ST, optionally transformed through a lens if as_type() was called.\n\n\n\nDataset.wrap_batch(batch)\nWrap a batch of raw msgpack samples into a typed SampleBatch.\nArgs: batch: A dictionary containing a 'msgpack' key with a list of serialized sample bytes.\nReturns: A SampleBatch[ST] containing deserialized samples, optionally transformed through a lens if as_type() was called.\nNote: This implementation deserializes samples one at a time, then aggregates them into a batch." 1296 + "text": "Dataset(source=None, metadata_url=None, *, url=None)\nA typed dataset built on WebDataset with lens transformations.\nThis class wraps WebDataset tar archives and provides type-safe iteration over samples of a specific PackableSample type. Samples are stored as msgpack-serialized data within WebDataset shards.\nThe dataset supports: - Ordered and shuffled iteration - Automatic batching with SampleBatch - Type transformations via the lens system (as_type()) - Export to parquet format\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nST\n\nThe sample type for this dataset, must derive from PackableSample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nurl\n\nWebDataset brace-notation URL for the tar file(s).\n\n\n\n\n\n\n::\n>>> ds = Dataset[MyData](\"path/to/data-{000000..000009}.tar\")\n>>> for sample in ds.ordered(batch_size=32):\n... # sample is SampleBatch[MyData] with batch_size samples\n... embeddings = sample.embeddings # shape: (32, ...)\n...\n>>> # Transform to a different view\n>>> ds_view = ds.as_type(MyDataView)\n\n\n\nThis class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax Dataset[MyType](url) rather than calling the constructor directly with an unsubscripted class.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_type\nView this dataset through a different sample type using a registered lens.\n\n\nlist_shards\nGet list of individual dataset shards.\n\n\nordered\nIterate over the dataset in order\n\n\nshuffled\nIterate over the dataset in random order.\n\n\nto_parquet\nExport dataset contents to parquet format.\n\n\nwrap\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\nwrap_batch\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nDataset.as_type(other)\nView this dataset through a different sample type using a registered lens.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nother\nType[RT]\nThe target sample type to transform into. Must be a type derived from PackableSample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDataset[RT]\nA new Dataset instance that yields samples of type other\n\n\n\nDataset[RT]\nby applying the appropriate lens transformation from the global\n\n\n\nDataset[RT]\nLensNetwork registry.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no registered lens exists between the current sample type and the target type.\n\n\n\n\n\n\n\nDataset.list_shards()\nGet list of individual dataset shards.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nA full (non-lazy) list of the individual tar files within the\n\n\n\nlist[str]\nsource WebDataset.\n\n\n\n\n\n\n\nDataset.ordered(batch_size=None)\nIterate over the dataset in order\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch_size (\n\nobj:int, optional): The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIterable[ST]\nobj:webdataset.DataPipeline A data pipeline that iterates over\n\n\n\nIterable[ST]\nthe dataset in its original sample order\n\n\n\n\n\n\n\nDataset.shuffled(buffer_shards=100, buffer_samples=10000, batch_size=None)\nIterate over the dataset in random order.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbuffer_shards\nint\nNumber of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100.\n100\n\n\nbuffer_samples\nint\nNumber of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000.\n10000\n\n\nbatch_size\nint | None\nThe size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIterable[ST]\nA WebDataset data pipeline that iterates over the dataset in\n\n\n\nIterable[ST]\nrandomized order. If batch_size is not None, yields\n\n\n\nIterable[ST]\nSampleBatch[ST] instances; otherwise yields individual ST\n\n\n\nIterable[ST]\nsamples.\n\n\n\n\n\n\n\nDataset.to_parquet(path, sample_map=None, maxcount=None, **kwargs)\nExport dataset contents to parquet format.\nConverts all samples to a pandas DataFrame and saves to parquet file(s). Useful for interoperability with data analysis tools.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\npath\nPathlike\nOutput path for the parquet file. If maxcount is specified, files are named {stem}-{segment:06d}.parquet.\nrequired\n\n\nsample_map\nOptional[SampleExportMap]\nOptional function to convert samples to dictionaries. Defaults to dataclasses.asdict.\nNone\n\n\nmaxcount\nOptional[int]\nIf specified, split output into multiple files with at most this many samples each. Recommended for large datasets.\nNone\n\n\n**kwargs\n\nAdditional arguments passed to pandas.DataFrame.to_parquet(). Common options include compression, index, engine.\n{}\n\n\n\n\n\n\nMemory Usage: When maxcount=None (default), this method loads the entire dataset into memory as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.\nFor datasets larger than available RAM, always specify maxcount::\n# Safe for large datasets - processes in chunks\nds.to_parquet(\"output.parquet\", maxcount=10000)\nThis creates multiple parquet files: output-000000.parquet, output-000001.parquet, etc.\n\n\n\n::\n>>> ds = Dataset[MySample](\"data.tar\")\n>>> # Small dataset - load all at once\n>>> ds.to_parquet(\"output.parquet\")\n>>>\n>>> # Large dataset - process in chunks\n>>> ds.to_parquet(\"output.parquet\", maxcount=50000)\n\n\n\n\nDataset.wrap(sample)\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample\nWDSRawSample\nA dictionary containing at minimum a 'msgpack' key with serialized sample bytes.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nST\nA deserialized sample of type ST, optionally transformed through\n\n\n\nST\na lens if as_type() was called.\n\n\n\n\n\n\n\nDataset.wrap_batch(batch)\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\nWDSRawBatch\nA dictionary containing a 'msgpack' key with a list of serialized sample bytes.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSampleBatch[ST]\nA SampleBatch[ST] containing deserialized samples, optionally\n\n\n\nSampleBatch[ST]\ntransformed through a lens if as_type() was called.\n\n\n\n\n\n\nThis implementation deserializes samples one at a time, then aggregates them into a batch." 1297 + }, 1298 + { 1299 + "objectID": "api/Dataset.html#parameters", 1300 + "href": "api/Dataset.html#parameters", 1301 + "title": "Dataset", 1302 + "section": "", 1303 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\nST\n\nThe sample type for this dataset, must derive from PackableSample.\nrequired" 1220 1304 }, 1221 1305 { 1222 1306 "objectID": "api/Dataset.html#attributes", 1223 1307 "href": "api/Dataset.html#attributes", 1224 1308 "title": "Dataset", 1225 1309 "section": "", 1226 - "text": "Name\nDescription\n\n\n\n\nbatch_type\nThe type of batches produced by this dataset.\n\n\nmetadata\nFetch and cache metadata from metadata_url.\n\n\nmetadata_url\nOptional URL to msgpack-encoded metadata for this dataset.\n\n\nsample_type\nThe type of each returned sample from this dataset’s iterator.\n\n\nshard_list\nList of individual dataset shards (deprecated, use list_shards()).\n\n\nsource\nThe underlying data source for this dataset." 1310 + "text": "Name\nType\nDescription\n\n\n\n\nurl\n\nWebDataset brace-notation URL for the tar file(s)." 1311 + }, 1312 + { 1313 + "objectID": "api/Dataset.html#example", 1314 + "href": "api/Dataset.html#example", 1315 + "title": "Dataset", 1316 + "section": "", 1317 + "text": "::\n>>> ds = Dataset[MyData](\"path/to/data-{000000..000009}.tar\")\n>>> for sample in ds.ordered(batch_size=32):\n... # sample is SampleBatch[MyData] with batch_size samples\n... embeddings = sample.embeddings # shape: (32, ...)\n...\n>>> # Transform to a different view\n>>> ds_view = ds.as_type(MyDataView)" 1318 + }, 1319 + { 1320 + "objectID": "api/Dataset.html#note", 1321 + "href": "api/Dataset.html#note", 1322 + "title": "Dataset", 1323 + "section": "", 1324 + "text": "This class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax Dataset[MyType](url) rather than calling the constructor directly with an unsubscripted class." 1227 1325 }, 1228 1326 { 1229 1327 "objectID": "api/Dataset.html#methods", 1230 1328 "href": "api/Dataset.html#methods", 1231 1329 "title": "Dataset", 1232 1330 "section": "", 1233 - "text": "Name\nDescription\n\n\n\n\nas_type\nView this dataset through a different sample type using a registered lens.\n\n\nlist_shards\nGet list of individual dataset shards.\n\n\nordered\nIterate over the dataset in order\n\n\nshuffled\nIterate over the dataset in random order.\n\n\nto_parquet\nExport dataset contents to parquet format.\n\n\nwrap\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\nwrap_batch\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nDataset.as_type(other)\nView this dataset through a different sample type using a registered lens.\nArgs: other: The target sample type to transform into. Must be a type derived from PackableSample.\nReturns: A new Dataset instance that yields samples of type other by applying the appropriate lens transformation from the global LensNetwork registry.\nRaises: ValueError: If no registered lens exists between the current sample type and the target type.\n\n\n\nDataset.list_shards()\nGet list of individual dataset shards.\nReturns: A full (non-lazy) list of the individual tar files within the source WebDataset.\n\n\n\nDataset.ordered(batch_size=None)\nIterate over the dataset in order\nArgs: batch_size (:obj:int, optional): The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nReturns: :obj:webdataset.DataPipeline A data pipeline that iterates over the dataset in its original sample order\n\n\n\nDataset.shuffled(buffer_shards=100, buffer_samples=10000, batch_size=None)\nIterate over the dataset in random order.\nArgs: buffer_shards: Number of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100. buffer_samples: Number of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000. batch_size: The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nReturns: A WebDataset data pipeline that iterates over the dataset in randomized order. If batch_size is not None, yields SampleBatch[ST] instances; otherwise yields individual ST samples.\n\n\n\nDataset.to_parquet(path, sample_map=None, maxcount=None, **kwargs)\nExport dataset contents to parquet format.\nConverts all samples to a pandas DataFrame and saves to parquet file(s). Useful for interoperability with data analysis tools.\nArgs: path: Output path for the parquet file. If maxcount is specified, files are named {stem}-{segment:06d}.parquet. sample_map: Optional function to convert samples to dictionaries. Defaults to dataclasses.asdict. maxcount: If specified, split output into multiple files with at most this many samples each. Recommended for large datasets. **kwargs: Additional arguments passed to pandas.DataFrame.to_parquet(). Common options include compression, index, engine.\nWarning: Memory Usage: When maxcount=None (default), this method loads the entire dataset into memory as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.\nFor datasets larger than available RAM, always specify ``maxcount``::\n\n # Safe for large datasets - processes in chunks\n ds.to_parquet(\"output.parquet\", maxcount=10000)\n\nThis creates multiple parquet files: ``output-000000.parquet``,\n``output-000001.parquet``, etc.\nExample: >>> ds = DatasetMySample >>> # Small dataset - load all at once >>> ds.to_parquet(“output.parquet”) >>> >>> # Large dataset - process in chunks >>> ds.to_parquet(“output.parquet”, maxcount=50000)\n\n\n\nDataset.wrap(sample)\nWrap a raw msgpack sample into the appropriate dataset-specific type.\nArgs: sample: A dictionary containing at minimum a 'msgpack' key with serialized sample bytes.\nReturns: A deserialized sample of type ST, optionally transformed through a lens if as_type() was called.\n\n\n\nDataset.wrap_batch(batch)\nWrap a batch of raw msgpack samples into a typed SampleBatch.\nArgs: batch: A dictionary containing a 'msgpack' key with a list of serialized sample bytes.\nReturns: A SampleBatch[ST] containing deserialized samples, optionally transformed through a lens if as_type() was called.\nNote: This implementation deserializes samples one at a time, then aggregates them into a batch." 1331 + "text": "Name\nDescription\n\n\n\n\nas_type\nView this dataset through a different sample type using a registered lens.\n\n\nlist_shards\nGet list of individual dataset shards.\n\n\nordered\nIterate over the dataset in order\n\n\nshuffled\nIterate over the dataset in random order.\n\n\nto_parquet\nExport dataset contents to parquet format.\n\n\nwrap\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\nwrap_batch\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nDataset.as_type(other)\nView this dataset through a different sample type using a registered lens.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nother\nType[RT]\nThe target sample type to transform into. Must be a type derived from PackableSample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDataset[RT]\nA new Dataset instance that yields samples of type other\n\n\n\nDataset[RT]\nby applying the appropriate lens transformation from the global\n\n\n\nDataset[RT]\nLensNetwork registry.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf no registered lens exists between the current sample type and the target type.\n\n\n\n\n\n\n\nDataset.list_shards()\nGet list of individual dataset shards.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nA full (non-lazy) list of the individual tar files within the\n\n\n\nlist[str]\nsource WebDataset.\n\n\n\n\n\n\n\nDataset.ordered(batch_size=None)\nIterate over the dataset in order\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch_size (\n\nobj:int, optional): The size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIterable[ST]\nobj:webdataset.DataPipeline A data pipeline that iterates over\n\n\n\nIterable[ST]\nthe dataset in its original sample order\n\n\n\n\n\n\n\nDataset.shuffled(buffer_shards=100, buffer_samples=10000, batch_size=None)\nIterate over the dataset in random order.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbuffer_shards\nint\nNumber of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100.\n100\n\n\nbuffer_samples\nint\nNumber of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000.\n10000\n\n\nbatch_size\nint | None\nThe size of iterated batches. Default: None (unbatched). If None, iterates over one sample at a time with no batch dimension.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIterable[ST]\nA WebDataset data pipeline that iterates over the dataset in\n\n\n\nIterable[ST]\nrandomized order. If batch_size is not None, yields\n\n\n\nIterable[ST]\nSampleBatch[ST] instances; otherwise yields individual ST\n\n\n\nIterable[ST]\nsamples.\n\n\n\n\n\n\n\nDataset.to_parquet(path, sample_map=None, maxcount=None, **kwargs)\nExport dataset contents to parquet format.\nConverts all samples to a pandas DataFrame and saves to parquet file(s). Useful for interoperability with data analysis tools.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\npath\nPathlike\nOutput path for the parquet file. If maxcount is specified, files are named {stem}-{segment:06d}.parquet.\nrequired\n\n\nsample_map\nOptional[SampleExportMap]\nOptional function to convert samples to dictionaries. Defaults to dataclasses.asdict.\nNone\n\n\nmaxcount\nOptional[int]\nIf specified, split output into multiple files with at most this many samples each. Recommended for large datasets.\nNone\n\n\n**kwargs\n\nAdditional arguments passed to pandas.DataFrame.to_parquet(). Common options include compression, index, engine.\n{}\n\n\n\n\n\n\nMemory Usage: When maxcount=None (default), this method loads the entire dataset into memory as a pandas DataFrame before writing. For large datasets, this can cause memory exhaustion.\nFor datasets larger than available RAM, always specify maxcount::\n# Safe for large datasets - processes in chunks\nds.to_parquet(\"output.parquet\", maxcount=10000)\nThis creates multiple parquet files: output-000000.parquet, output-000001.parquet, etc.\n\n\n\n::\n>>> ds = Dataset[MySample](\"data.tar\")\n>>> # Small dataset - load all at once\n>>> ds.to_parquet(\"output.parquet\")\n>>>\n>>> # Large dataset - process in chunks\n>>> ds.to_parquet(\"output.parquet\", maxcount=50000)\n\n\n\n\nDataset.wrap(sample)\nWrap a raw msgpack sample into the appropriate dataset-specific type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample\nWDSRawSample\nA dictionary containing at minimum a 'msgpack' key with serialized sample bytes.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nST\nA deserialized sample of type ST, optionally transformed through\n\n\n\nST\na lens if as_type() was called.\n\n\n\n\n\n\n\nDataset.wrap_batch(batch)\nWrap a batch of raw msgpack samples into a typed SampleBatch.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbatch\nWDSRawBatch\nA dictionary containing a 'msgpack' key with a list of serialized sample bytes.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSampleBatch[ST]\nA SampleBatch[ST] containing deserialized samples, optionally\n\n\n\nSampleBatch[ST]\ntransformed through a lens if as_type() was called.\n\n\n\n\n\n\nThis implementation deserializes samples one at a time, then aggregates them into a batch." 1234 1332 }, 1235 1333 { 1236 1334 "objectID": "api/AbstractDataStore.html", 1237 1335 "href": "api/AbstractDataStore.html", 1238 1336 "title": "AbstractDataStore", 1239 1337 "section": "", 1240 - "text": "AbstractDataStore()\nProtocol for data storage operations.\nThis protocol abstracts over different storage backends for dataset data: - S3DataStore: S3-compatible object storage - PDSBlobStore: ATProto PDS blob storage (future)\nThe separation of index (metadata) from data store (actual files) allows flexible deployment: local index with S3 storage, atmosphere index with S3 storage, or atmosphere index with PDS blobs.\nExample: >>> store = S3DataStore(credentials, bucket=“my-bucket”) >>> urls = store.write_shards(dataset, prefix=“training/v1”) >>> print(urls) [‘s3://my-bucket/training/v1/shard-000000.tar’, …]\n\n\n\n\n\nName\nDescription\n\n\n\n\nread_url\nResolve a storage URL for reading.\n\n\nsupports_streaming\nWhether this store supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to storage.\n\n\n\n\n\nAbstractDataStore.read_url(url)\nResolve a storage URL for reading.\nSome storage backends may need to transform URLs (e.g., signing S3 URLs or resolving blob references). This method returns a URL that can be used directly with WebDataset.\nArgs: url: Storage URL to resolve.\nReturns: WebDataset-compatible URL for reading.\n\n\n\nAbstractDataStore.supports_streaming()\nWhether this store supports streaming reads.\nReturns: True if the store supports efficient streaming (like S3), False if data must be fully downloaded first.\n\n\n\nAbstractDataStore.write_shards(ds, *, prefix, **kwargs)\nWrite dataset shards to storage.\nArgs: ds: The Dataset to write. prefix: Path prefix for the shards (e.g., ‘datasets/mnist/v1’). **kwargs: Backend-specific options (e.g., maxcount for shard size).\nReturns: List of URLs for the written shards, suitable for use with WebDataset or atdata.Dataset()." 1338 + "text": "AbstractDataStore()\nProtocol for data storage operations.\nThis protocol abstracts over different storage backends for dataset data: - S3DataStore: S3-compatible object storage - PDSBlobStore: ATProto PDS blob storage (future)\nThe separation of index (metadata) from data store (actual files) allows flexible deployment: local index with S3 storage, atmosphere index with S3 storage, or atmosphere index with PDS blobs.\n\n\n::\n>>> store = S3DataStore(credentials, bucket=\"my-bucket\")\n>>> urls = store.write_shards(dataset, prefix=\"training/v1\")\n>>> print(urls)\n['s3://my-bucket/training/v1/shard-000000.tar', ...]\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nread_url\nResolve a storage URL for reading.\n\n\nsupports_streaming\nWhether this store supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to storage.\n\n\n\n\n\nAbstractDataStore.read_url(url)\nResolve a storage URL for reading.\nSome storage backends may need to transform URLs (e.g., signing S3 URLs or resolving blob references). This method returns a URL that can be used directly with WebDataset.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nStorage URL to resolve.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nWebDataset-compatible URL for reading.\n\n\n\n\n\n\n\nAbstractDataStore.supports_streaming()\nWhether this store supports streaming reads.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue if the store supports efficient streaming (like S3),\n\n\n\nbool\nFalse if data must be fully downloaded first.\n\n\n\n\n\n\n\nAbstractDataStore.write_shards(ds, *, prefix, **kwargs)\nWrite dataset shards to storage.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nPath prefix for the shards (e.g., ‘datasets/mnist/v1’).\nrequired\n\n\n**kwargs\n\nBackend-specific options (e.g., maxcount for shard size).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of URLs for the written shards, suitable for use with\n\n\n\nlist[str]\nWebDataset or atdata.Dataset()." 1339 + }, 1340 + { 1341 + "objectID": "api/AbstractDataStore.html#example", 1342 + "href": "api/AbstractDataStore.html#example", 1343 + "title": "AbstractDataStore", 1344 + "section": "", 1345 + "text": "::\n>>> store = S3DataStore(credentials, bucket=\"my-bucket\")\n>>> urls = store.write_shards(dataset, prefix=\"training/v1\")\n>>> print(urls)\n['s3://my-bucket/training/v1/shard-000000.tar', ...]" 1241 1346 }, 1242 1347 { 1243 1348 "objectID": "api/AbstractDataStore.html#methods", 1244 1349 "href": "api/AbstractDataStore.html#methods", 1245 1350 "title": "AbstractDataStore", 1246 1351 "section": "", 1247 - "text": "Name\nDescription\n\n\n\n\nread_url\nResolve a storage URL for reading.\n\n\nsupports_streaming\nWhether this store supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to storage.\n\n\n\n\n\nAbstractDataStore.read_url(url)\nResolve a storage URL for reading.\nSome storage backends may need to transform URLs (e.g., signing S3 URLs or resolving blob references). This method returns a URL that can be used directly with WebDataset.\nArgs: url: Storage URL to resolve.\nReturns: WebDataset-compatible URL for reading.\n\n\n\nAbstractDataStore.supports_streaming()\nWhether this store supports streaming reads.\nReturns: True if the store supports efficient streaming (like S3), False if data must be fully downloaded first.\n\n\n\nAbstractDataStore.write_shards(ds, *, prefix, **kwargs)\nWrite dataset shards to storage.\nArgs: ds: The Dataset to write. prefix: Path prefix for the shards (e.g., ‘datasets/mnist/v1’). **kwargs: Backend-specific options (e.g., maxcount for shard size).\nReturns: List of URLs for the written shards, suitable for use with WebDataset or atdata.Dataset()." 1352 + "text": "Name\nDescription\n\n\n\n\nread_url\nResolve a storage URL for reading.\n\n\nsupports_streaming\nWhether this store supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to storage.\n\n\n\n\n\nAbstractDataStore.read_url(url)\nResolve a storage URL for reading.\nSome storage backends may need to transform URLs (e.g., signing S3 URLs or resolving blob references). This method returns a URL that can be used directly with WebDataset.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nStorage URL to resolve.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nWebDataset-compatible URL for reading.\n\n\n\n\n\n\n\nAbstractDataStore.supports_streaming()\nWhether this store supports streaming reads.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue if the store supports efficient streaming (like S3),\n\n\n\nbool\nFalse if data must be fully downloaded first.\n\n\n\n\n\n\n\nAbstractDataStore.write_shards(ds, *, prefix, **kwargs)\nWrite dataset shards to storage.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nPath prefix for the shards (e.g., ‘datasets/mnist/v1’).\nrequired\n\n\n**kwargs\n\nBackend-specific options (e.g., maxcount for shard size).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of URLs for the written shards, suitable for use with\n\n\n\nlist[str]\nWebDataset or atdata.Dataset()." 1248 1353 }, 1249 1354 { 1250 1355 "objectID": "api/local.S3DataStore.html", 1251 1356 "href": "api/local.S3DataStore.html", 1252 1357 "title": "local.S3DataStore", 1253 1358 "section": "", 1254 - "text": "local.S3DataStore(credentials, *, bucket)\nS3-compatible data store implementing AbstractDataStore protocol.\nHandles writing dataset shards to S3-compatible object storage and resolving URLs for reading.\nAttributes: credentials: S3 credentials dictionary. bucket: Target bucket name. _fs: S3FileSystem instance.\n\n\n\n\n\nName\nDescription\n\n\n\n\nread_url\nResolve an S3 URL for reading/streaming.\n\n\nsupports_streaming\nS3 supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to S3.\n\n\n\n\n\nlocal.S3DataStore.read_url(url)\nResolve an S3 URL for reading/streaming.\nFor S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.\nFor standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.\nArgs: url: S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).\nReturns: HTTPS URL if custom endpoint is configured, otherwise unchanged. Example: ‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’\n\n\n\nlocal.S3DataStore.supports_streaming()\nS3 supports streaming reads.\nReturns: True.\n\n\n\nlocal.S3DataStore.write_shards(ds, *, prefix, cache_local=False, **kwargs)\nWrite dataset shards to S3.\nArgs: ds: The Dataset to write. prefix: Path prefix within bucket (e.g., ‘datasets/mnist/v1’). cache_local: If True, write locally first then copy to S3. **kwargs: Additional args passed to wds.ShardWriter (e.g., maxcount).\nReturns: List of S3 URLs for the written shards.\nRaises: RuntimeError: If no shards were written." 1359 + "text": "local.S3DataStore(credentials, *, bucket)\nS3-compatible data store implementing AbstractDataStore protocol.\nHandles writing dataset shards to S3-compatible object storage and resolving URLs for reading.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\ncredentials\n\nS3 credentials dictionary.\n\n\nbucket\n\nTarget bucket name.\n\n\n_fs\n\nS3FileSystem instance.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nread_url\nResolve an S3 URL for reading/streaming.\n\n\nsupports_streaming\nS3 supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to S3.\n\n\n\n\n\nlocal.S3DataStore.read_url(url)\nResolve an S3 URL for reading/streaming.\nFor S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.\nFor standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nS3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nHTTPS URL if custom endpoint is configured, otherwise unchanged.\n\n\nExample\nstr\n‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’\n\n\n\n\n\n\n\nlocal.S3DataStore.supports_streaming()\nS3 supports streaming reads.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue.\n\n\n\n\n\n\n\nlocal.S3DataStore.write_shards(ds, *, prefix, cache_local=False, **kwargs)\nWrite dataset shards to S3.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nPath prefix within bucket (e.g., ‘datasets/mnist/v1’).\nrequired\n\n\ncache_local\nbool\nIf True, write locally first then copy to S3.\nFalse\n\n\n**kwargs\n\nAdditional args passed to wds.ShardWriter (e.g., maxcount).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of S3 URLs for the written shards.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nRuntimeError\nIf no shards were written." 1360 + }, 1361 + { 1362 + "objectID": "api/local.S3DataStore.html#attributes", 1363 + "href": "api/local.S3DataStore.html#attributes", 1364 + "title": "local.S3DataStore", 1365 + "section": "", 1366 + "text": "Name\nType\nDescription\n\n\n\n\ncredentials\n\nS3 credentials dictionary.\n\n\nbucket\n\nTarget bucket name.\n\n\n_fs\n\nS3FileSystem instance." 1255 1367 }, 1256 1368 { 1257 1369 "objectID": "api/local.S3DataStore.html#methods", 1258 1370 "href": "api/local.S3DataStore.html#methods", 1259 1371 "title": "local.S3DataStore", 1260 1372 "section": "", 1261 - "text": "Name\nDescription\n\n\n\n\nread_url\nResolve an S3 URL for reading/streaming.\n\n\nsupports_streaming\nS3 supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to S3.\n\n\n\n\n\nlocal.S3DataStore.read_url(url)\nResolve an S3 URL for reading/streaming.\nFor S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.\nFor standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.\nArgs: url: S3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).\nReturns: HTTPS URL if custom endpoint is configured, otherwise unchanged. Example: ‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’\n\n\n\nlocal.S3DataStore.supports_streaming()\nS3 supports streaming reads.\nReturns: True.\n\n\n\nlocal.S3DataStore.write_shards(ds, *, prefix, cache_local=False, **kwargs)\nWrite dataset shards to S3.\nArgs: ds: The Dataset to write. prefix: Path prefix within bucket (e.g., ‘datasets/mnist/v1’). cache_local: If True, write locally first then copy to S3. **kwargs: Additional args passed to wds.ShardWriter (e.g., maxcount).\nReturns: List of S3 URLs for the written shards.\nRaises: RuntimeError: If no shards were written." 1373 + "text": "Name\nDescription\n\n\n\n\nread_url\nResolve an S3 URL for reading/streaming.\n\n\nsupports_streaming\nS3 supports streaming reads.\n\n\nwrite_shards\nWrite dataset shards to S3.\n\n\n\n\n\nlocal.S3DataStore.read_url(url)\nResolve an S3 URL for reading/streaming.\nFor S3-compatible stores with custom endpoints (like Cloudflare R2, MinIO, etc.), converts s3:// URLs to HTTPS URLs that WebDataset can stream directly.\nFor standard AWS S3 (no custom endpoint), URLs are returned unchanged since WebDataset’s built-in s3fs integration handles them.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nS3 URL to resolve (e.g., ‘s3://bucket/path/file.tar’).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nHTTPS URL if custom endpoint is configured, otherwise unchanged.\n\n\nExample\nstr\n‘s3://bucket/path’ -> ‘https://endpoint.com/bucket/path’\n\n\n\n\n\n\n\nlocal.S3DataStore.supports_streaming()\nS3 supports streaming reads.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue.\n\n\n\n\n\n\n\nlocal.S3DataStore.write_shards(ds, *, prefix, cache_local=False, **kwargs)\nWrite dataset shards to S3.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nPath prefix within bucket (e.g., ‘datasets/mnist/v1’).\nrequired\n\n\ncache_local\nbool\nIf True, write locally first then copy to S3.\nFalse\n\n\n**kwargs\n\nAdditional args passed to wds.ShardWriter (e.g., maxcount).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of S3 URLs for the written shards.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nRuntimeError\nIf no shards were written." 1262 1374 }, 1263 1375 { 1264 1376 "objectID": "api/AtUri.html", 1265 1377 "href": "api/AtUri.html", 1266 1378 "title": "AtUri", 1267 1379 "section": "", 1268 - "text": "atmosphere.AtUri(authority, collection, rkey)\nParsed AT Protocol URI.\nAT URIs follow the format: at:////\nExample: >>> uri = AtUri.parse(“at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz”) >>> uri.authority ‘did:plc:abc123’ >>> uri.collection ‘ac.foundation.dataset.sampleSchema’ >>> uri.rkey ‘xyz’\n\n\n\n\n\nName\nDescription\n\n\n\n\nauthority\nThe DID or handle of the repository owner.\n\n\ncollection\nThe NSID of the record collection.\n\n\nrkey\nThe record key within the collection.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nparse\nParse an AT URI string into components.\n\n\n\n\n\natmosphere.AtUri.parse(uri)\nParse an AT URI string into components.\nArgs: uri: AT URI string in format at://<authority>/<collection>/<rkey>\nReturns: Parsed AtUri instance.\nRaises: ValueError: If the URI format is invalid." 1380 + "text": "atmosphere.AtUri(authority, collection, rkey)\nParsed AT Protocol URI.\nAT URIs follow the format: at:////\n\n\n::\n>>> uri = AtUri.parse(\"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz\")\n>>> uri.authority\n'did:plc:abc123'\n>>> uri.collection\n'ac.foundation.dataset.sampleSchema'\n>>> uri.rkey\n'xyz'\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nauthority\nThe DID or handle of the repository owner.\n\n\ncollection\nThe NSID of the record collection.\n\n\nrkey\nThe record key within the collection.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nparse\nParse an AT URI string into components.\n\n\n\n\n\natmosphere.AtUri.parse(uri)\nParse an AT URI string into components.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr\nAT URI string in format at://<authority>/<collection>/<rkey>\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nParsed AtUri instance.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the URI format is invalid." 1381 + }, 1382 + { 1383 + "objectID": "api/AtUri.html#example", 1384 + "href": "api/AtUri.html#example", 1385 + "title": "AtUri", 1386 + "section": "", 1387 + "text": "::\n>>> uri = AtUri.parse(\"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz\")\n>>> uri.authority\n'did:plc:abc123'\n>>> uri.collection\n'ac.foundation.dataset.sampleSchema'\n>>> uri.rkey\n'xyz'" 1269 1388 }, 1270 1389 { 1271 1390 "objectID": "api/AtUri.html#attributes", ··· 1279 1398 "href": "api/AtUri.html#methods", 1280 1399 "title": "AtUri", 1281 1400 "section": "", 1282 - "text": "Name\nDescription\n\n\n\n\nparse\nParse an AT URI string into components.\n\n\n\n\n\natmosphere.AtUri.parse(uri)\nParse an AT URI string into components.\nArgs: uri: AT URI string in format at://<authority>/<collection>/<rkey>\nReturns: Parsed AtUri instance.\nRaises: ValueError: If the URI format is invalid." 1401 + "text": "Name\nDescription\n\n\n\n\nparse\nParse an AT URI string into components.\n\n\n\n\n\natmosphere.AtUri.parse(uri)\nParse an AT URI string into components.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr\nAT URI string in format at://<authority>/<collection>/<rkey>\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nParsed AtUri instance.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the URI format is invalid." 1283 1402 }, 1284 1403 { 1285 1404 "objectID": "api/Packable-protocol.html", 1286 1405 "href": "api/Packable-protocol.html", 1287 1406 "title": "Packable", 1288 1407 "section": "", 1289 - "text": "Packable()\nStructural protocol for packable sample types.\nThis protocol allows classes decorated with @packable to be recognized as valid types for lens transformations and schema operations, even though the decorator doesn’t change the class’s nominal type at static analysis time.\nBoth PackableSample subclasses and @packable-decorated classes satisfy this protocol structurally.\nThe protocol captures the full interface needed for: - Lens type transformations (as_wds, from_data) - Schema publishing (class introspection via dataclass fields) - Serialization/deserialization (packed, from_bytes)\nExample: >>> @packable … class MySample: … name: str … value: int … >>> def process(sample_type: TypePackable) -> None: … # Type checker knows sample_type has from_bytes, packed, etc. … instance = sample_type.from_bytes(data) … print(instance.packed)\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nWebDataset-compatible representation with key and msgpack.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate instance from raw msgpack bytes.\n\n\nfrom_data\nCreate instance from unpacked msgpack data dictionary.\n\n\n\n\n\nPackable.from_bytes(bs)\nCreate instance from raw msgpack bytes.\n\n\n\nPackable.from_data(data)\nCreate instance from unpacked msgpack data dictionary." 1408 + "text": "Packable()\nStructural protocol for packable sample types.\nThis protocol allows classes decorated with @packable to be recognized as valid types for lens transformations and schema operations, even though the decorator doesn’t change the class’s nominal type at static analysis time.\nBoth PackableSample subclasses and @packable-decorated classes satisfy this protocol structurally.\nThe protocol captures the full interface needed for: - Lens type transformations (as_wds, from_data) - Schema publishing (class introspection via dataclass fields) - Serialization/deserialization (packed, from_bytes)\n\n\n::\n>>> @packable\n... class MySample:\n... name: str\n... value: int\n...\n>>> def process(sample_type: Type[Packable]) -> None:\n... # Type checker knows sample_type has from_bytes, packed, etc.\n... instance = sample_type.from_bytes(data)\n... print(instance.packed)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nWebDataset-compatible representation with key and msgpack.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate instance from raw msgpack bytes.\n\n\nfrom_data\nCreate instance from unpacked msgpack data dictionary.\n\n\n\n\n\nPackable.from_bytes(bs)\nCreate instance from raw msgpack bytes.\n\n\n\nPackable.from_data(data)\nCreate instance from unpacked msgpack data dictionary." 1409 + }, 1410 + { 1411 + "objectID": "api/Packable-protocol.html#example", 1412 + "href": "api/Packable-protocol.html#example", 1413 + "title": "Packable", 1414 + "section": "", 1415 + "text": "::\n>>> @packable\n... class MySample:\n... name: str\n... value: int\n...\n>>> def process(sample_type: Type[Packable]) -> None:\n... # Type checker knows sample_type has from_bytes, packed, etc.\n... instance = sample_type.from_bytes(data)\n... print(instance.packed)" 1290 1416 }, 1291 1417 { 1292 1418 "objectID": "api/Packable-protocol.html#attributes", ··· 1307 1433 "href": "api/packable.html", 1308 1434 "title": "packable", 1309 1435 "section": "", 1310 - "text": "packable\npackable(cls)\nDecorator to convert a regular class into a PackableSample.\nThis decorator transforms a class into a dataclass that inherits from PackableSample, enabling automatic msgpack serialization/deserialization with special handling for NDArray fields.\nThe resulting class satisfies the Packable protocol, making it compatible with all atdata APIs that accept packable types (e.g., publish_schema, lens transformations, etc.).\nArgs: cls: The class to convert. Should have type annotations for its fields.\nReturns: A new dataclass that inherits from PackableSample with the same name and annotations as the original class. The class satisfies the Packable protocol and can be used with Type[Packable] signatures.\nExample: >>> @packable … class MyData: … name: str … values: NDArray … >>> sample = MyData(name=“test”, values=np.array([1, 2, 3])) >>> bytes_data = sample.packed >>> restored = MyData.from_bytes(bytes_data) >>> >>> # Works with Packable-typed APIs >>> index.publish_schema(MyData, version=“1.0.0”) # Type-safe" 1436 + "text": "packable(cls)\nDecorator to convert a regular class into a PackableSample.\nThis decorator transforms a class into a dataclass that inherits from PackableSample, enabling automatic msgpack serialization/deserialization with special handling for NDArray fields.\nThe resulting class satisfies the Packable protocol, making it compatible with all atdata APIs that accept packable types (e.g., publish_schema, lens transformations, etc.).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncls\ntype[_T]\nThe class to convert. Should have type annotations for its fields.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[_T]\nA new dataclass that inherits from PackableSample with the same\n\n\n\ntype[_T]\nname and annotations as the original class. The class satisfies the\n\n\n\ntype[_T]\nPackable protocol and can be used with Type[Packable] signatures.\n\n\n\n\n\n\nThis is a test of the functionality::\n@packable\nclass MyData:\n name: str\n values: NDArray\n\nsample = MyData(name=\"test\", values=np.array([1, 2, 3]))\nbytes_data = sample.packed\nrestored = MyData.from_bytes(bytes_data)\n\n# Works with Packable-typed APIs\nindex.publish_schema(MyData, version=\"1.0.0\") # Type-safe" 1437 + }, 1438 + { 1439 + "objectID": "api/packable.html#parameters", 1440 + "href": "api/packable.html#parameters", 1441 + "title": "packable", 1442 + "section": "", 1443 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\ncls\ntype[_T]\nThe class to convert. Should have type annotations for its fields.\nrequired" 1444 + }, 1445 + { 1446 + "objectID": "api/packable.html#returns", 1447 + "href": "api/packable.html#returns", 1448 + "title": "packable", 1449 + "section": "", 1450 + "text": "Name\nType\nDescription\n\n\n\n\n\ntype[_T]\nA new dataclass that inherits from PackableSample with the same\n\n\n\ntype[_T]\nname and annotations as the original class. The class satisfies the\n\n\n\ntype[_T]\nPackable protocol and can be used with Type[Packable] signatures." 1451 + }, 1452 + { 1453 + "objectID": "api/packable.html#examples", 1454 + "href": "api/packable.html#examples", 1455 + "title": "packable", 1456 + "section": "", 1457 + "text": "This is a test of the functionality::\n@packable\nclass MyData:\n name: str\n values: NDArray\n\nsample = MyData(name=\"test\", values=np.array([1, 2, 3]))\nbytes_data = sample.packed\nrestored = MyData.from_bytes(bytes_data)\n\n# Works with Packable-typed APIs\nindex.publish_schema(MyData, version=\"1.0.0\") # Type-safe" 1311 1458 }, 1312 1459 { 1313 1460 "objectID": "index.html", ··· 1402 1549 "href": "api/SampleBatch.html", 1403 1550 "title": "SampleBatch", 1404 1551 "section": "", 1405 - "text": "SampleBatch(samples)\nA batch of samples with automatic attribute aggregation.\nThis class wraps a sequence of samples and provides magic __getattr__ access to aggregate sample attributes. When you access an attribute that exists on the sample type, it automatically aggregates values across all samples in the batch.\nNDArray fields are stacked into a numpy array with a batch dimension. Other fields are aggregated into a list.\nType Parameters: DT: The sample type, must derive from PackableSample.\nAttributes: samples: The list of sample instances in this batch.\nExample: >>> batch = SampleBatchMyData >>> batch.embeddings # Returns stacked numpy array of shape (3, …) >>> batch.names # Returns list of names\nNote: This class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax SampleBatch[MyType](samples) rather than calling the constructor directly with an unsubscripted class.\n\n\n\n\n\nName\nDescription\n\n\n\n\nsample_type\nThe type of each sample in this batch." 1552 + "text": "SampleBatch(samples)\nA batch of samples with automatic attribute aggregation.\nThis class wraps a sequence of samples and provides magic __getattr__ access to aggregate sample attributes. When you access an attribute that exists on the sample type, it automatically aggregates values across all samples in the batch.\nNDArray fields are stacked into a numpy array with a batch dimension. Other fields are aggregated into a list.\n\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nDT\n\nThe sample type, must derive from PackableSample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nsamples\n\nThe list of sample instances in this batch.\n\n\n\n\n\n\n::\n>>> batch = SampleBatch[MyData]([sample1, sample2, sample3])\n>>> batch.embeddings # Returns stacked numpy array of shape (3, ...)\n>>> batch.names # Returns list of names\n\n\n\nThis class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax SampleBatch[MyType](samples) rather than calling the constructor directly with an unsubscripted class." 1553 + }, 1554 + { 1555 + "objectID": "api/SampleBatch.html#parameters", 1556 + "href": "api/SampleBatch.html#parameters", 1557 + "title": "SampleBatch", 1558 + "section": "", 1559 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\nDT\n\nThe sample type, must derive from PackableSample.\nrequired" 1406 1560 }, 1407 1561 { 1408 1562 "objectID": "api/SampleBatch.html#attributes", 1409 1563 "href": "api/SampleBatch.html#attributes", 1410 1564 "title": "SampleBatch", 1411 1565 "section": "", 1412 - "text": "Name\nDescription\n\n\n\n\nsample_type\nThe type of each sample in this batch." 1566 + "text": "Name\nType\nDescription\n\n\n\n\nsamples\n\nThe list of sample instances in this batch." 1567 + }, 1568 + { 1569 + "objectID": "api/SampleBatch.html#example", 1570 + "href": "api/SampleBatch.html#example", 1571 + "title": "SampleBatch", 1572 + "section": "", 1573 + "text": "::\n>>> batch = SampleBatch[MyData]([sample1, sample2, sample3])\n>>> batch.embeddings # Returns stacked numpy array of shape (3, ...)\n>>> batch.names # Returns list of names" 1574 + }, 1575 + { 1576 + "objectID": "api/SampleBatch.html#note", 1577 + "href": "api/SampleBatch.html#note", 1578 + "title": "SampleBatch", 1579 + "section": "", 1580 + "text": "This class uses Python’s __orig_class__ mechanism to extract the type parameter at runtime. Instances must be created using the subscripted syntax SampleBatch[MyType](samples) rather than calling the constructor directly with an unsubscripted class." 1413 1581 }, 1414 1582 { 1415 1583 "objectID": "api/LensPublisher.html", 1416 1584 "href": "api/LensPublisher.html", 1417 1585 "title": "LensPublisher", 1418 1586 "section": "", 1419 - "text": "atmosphere.LensPublisher(client)\nPublishes Lens transformation records to ATProto.\nThis class creates lens records that reference source and target schemas and point to the transformation code in a git repository.\nExample: >>> @atdata.lens … def my_lens(source: SourceType) -> TargetType: … return TargetType(field=source.other_field) >>> >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = LensPublisher(client) >>> uri = publisher.publish( … name=“my_lens”, … source_schema_uri=“at://did:plc:abc/ac.foundation.dataset.sampleSchema/source”, … target_schema_uri=“at://did:plc:abc/ac.foundation.dataset.sampleSchema/target”, … code_repository=“https://github.com/user/repo”, … code_commit=“abc123def456”, … getter_path=“mymodule.lenses:my_lens”, … putter_path=“mymodule.lenses:my_lens_putter”, … )\nSecurity Note: Lens code is stored as references to git repositories rather than inline code. This prevents arbitrary code execution from ATProto records. Users must manually install and trust lens implementations.\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a lens transformation record to ATProto.\n\n\npublish_from_lens\nPublish a lens record from an existing Lens object.\n\n\n\n\n\natmosphere.LensPublisher.publish(\n name,\n source_schema_uri,\n target_schema_uri,\n description=None,\n code_repository=None,\n code_commit=None,\n getter_path=None,\n putter_path=None,\n rkey=None,\n)\nPublish a lens transformation record to ATProto.\nArgs: name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. description: What this transformation does. code_repository: Git repository URL containing the lens code. code_commit: Git commit hash for reproducibility. getter_path: Module path to the getter function (e.g., ‘mymodule.lenses:my_getter’). putter_path: Module path to the putter function (e.g., ‘mymodule.lenses:my_putter’). rkey: Optional explicit record key.\nReturns: The AT URI of the created lens record.\nRaises: ValueError: If code references are incomplete.\n\n\n\natmosphere.LensPublisher.publish_from_lens(\n lens_obj,\n *,\n name,\n source_schema_uri,\n target_schema_uri,\n code_repository,\n code_commit,\n description=None,\n rkey=None,\n)\nPublish a lens record from an existing Lens object.\nThis method extracts the getter and putter function names from the Lens object and publishes a record referencing them.\nArgs: lens_obj: The Lens object to publish. name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. code_repository: Git repository URL. code_commit: Git commit hash. description: What this transformation does. rkey: Optional explicit record key.\nReturns: The AT URI of the created lens record." 1587 + "text": "atmosphere.LensPublisher(client)\nPublishes Lens transformation records to ATProto.\nThis class creates lens records that reference source and target schemas and point to the transformation code in a git repository.\n\n\n::\n>>> @atdata.lens\n... def my_lens(source: SourceType) -> TargetType:\n... return TargetType(field=source.other_field)\n>>>\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = LensPublisher(client)\n>>> uri = publisher.publish(\n... name=\"my_lens\",\n... source_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source\",\n... target_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target\",\n... code_repository=\"https://github.com/user/repo\",\n... code_commit=\"abc123def456\",\n... getter_path=\"mymodule.lenses:my_lens\",\n... putter_path=\"mymodule.lenses:my_lens_putter\",\n... )\n\n\n\nLens code is stored as references to git repositories rather than inline code. This prevents arbitrary code execution from ATProto records. Users must manually install and trust lens implementations.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a lens transformation record to ATProto.\n\n\npublish_from_lens\nPublish a lens record from an existing Lens object.\n\n\n\n\n\natmosphere.LensPublisher.publish(\n name,\n source_schema_uri,\n target_schema_uri,\n description=None,\n code_repository=None,\n code_commit=None,\n getter_path=None,\n putter_path=None,\n rkey=None,\n)\nPublish a lens transformation record to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable lens name.\nrequired\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nstr\nAT URI of the target schema.\nrequired\n\n\ndescription\nOptional[str]\nWhat this transformation does.\nNone\n\n\ncode_repository\nOptional[str]\nGit repository URL containing the lens code.\nNone\n\n\ncode_commit\nOptional[str]\nGit commit hash for reproducibility.\nNone\n\n\ngetter_path\nOptional[str]\nModule path to the getter function (e.g., ‘mymodule.lenses:my_getter’).\nNone\n\n\nputter_path\nOptional[str]\nModule path to the putter function (e.g., ‘mymodule.lenses:my_putter’).\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created lens record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf code references are incomplete.\n\n\n\n\n\n\n\natmosphere.LensPublisher.publish_from_lens(\n lens_obj,\n *,\n name,\n source_schema_uri,\n target_schema_uri,\n code_repository,\n code_commit,\n description=None,\n rkey=None,\n)\nPublish a lens record from an existing Lens object.\nThis method extracts the getter and putter function names from the Lens object and publishes a record referencing them.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nlens_obj\nLens\nThe Lens object to publish.\nrequired\n\n\nname\nstr\nHuman-readable lens name.\nrequired\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nstr\nAT URI of the target schema.\nrequired\n\n\ncode_repository\nstr\nGit repository URL.\nrequired\n\n\ncode_commit\nstr\nGit commit hash.\nrequired\n\n\ndescription\nOptional[str]\nWhat this transformation does.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created lens record." 1588 + }, 1589 + { 1590 + "objectID": "api/LensPublisher.html#example", 1591 + "href": "api/LensPublisher.html#example", 1592 + "title": "LensPublisher", 1593 + "section": "", 1594 + "text": "::\n>>> @atdata.lens\n... def my_lens(source: SourceType) -> TargetType:\n... return TargetType(field=source.other_field)\n>>>\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = LensPublisher(client)\n>>> uri = publisher.publish(\n... name=\"my_lens\",\n... source_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source\",\n... target_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target\",\n... code_repository=\"https://github.com/user/repo\",\n... code_commit=\"abc123def456\",\n... getter_path=\"mymodule.lenses:my_lens\",\n... putter_path=\"mymodule.lenses:my_lens_putter\",\n... )" 1595 + }, 1596 + { 1597 + "objectID": "api/LensPublisher.html#security-note", 1598 + "href": "api/LensPublisher.html#security-note", 1599 + "title": "LensPublisher", 1600 + "section": "", 1601 + "text": "Lens code is stored as references to git repositories rather than inline code. This prevents arbitrary code execution from ATProto records. Users must manually install and trust lens implementations." 1420 1602 }, 1421 1603 { 1422 1604 "objectID": "api/LensPublisher.html#methods", 1423 1605 "href": "api/LensPublisher.html#methods", 1424 1606 "title": "LensPublisher", 1425 1607 "section": "", 1426 - "text": "Name\nDescription\n\n\n\n\npublish\nPublish a lens transformation record to ATProto.\n\n\npublish_from_lens\nPublish a lens record from an existing Lens object.\n\n\n\n\n\natmosphere.LensPublisher.publish(\n name,\n source_schema_uri,\n target_schema_uri,\n description=None,\n code_repository=None,\n code_commit=None,\n getter_path=None,\n putter_path=None,\n rkey=None,\n)\nPublish a lens transformation record to ATProto.\nArgs: name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. description: What this transformation does. code_repository: Git repository URL containing the lens code. code_commit: Git commit hash for reproducibility. getter_path: Module path to the getter function (e.g., ‘mymodule.lenses:my_getter’). putter_path: Module path to the putter function (e.g., ‘mymodule.lenses:my_putter’). rkey: Optional explicit record key.\nReturns: The AT URI of the created lens record.\nRaises: ValueError: If code references are incomplete.\n\n\n\natmosphere.LensPublisher.publish_from_lens(\n lens_obj,\n *,\n name,\n source_schema_uri,\n target_schema_uri,\n code_repository,\n code_commit,\n description=None,\n rkey=None,\n)\nPublish a lens record from an existing Lens object.\nThis method extracts the getter and putter function names from the Lens object and publishes a record referencing them.\nArgs: lens_obj: The Lens object to publish. name: Human-readable lens name. source_schema_uri: AT URI of the source schema. target_schema_uri: AT URI of the target schema. code_repository: Git repository URL. code_commit: Git commit hash. description: What this transformation does. rkey: Optional explicit record key.\nReturns: The AT URI of the created lens record." 1608 + "text": "Name\nDescription\n\n\n\n\npublish\nPublish a lens transformation record to ATProto.\n\n\npublish_from_lens\nPublish a lens record from an existing Lens object.\n\n\n\n\n\natmosphere.LensPublisher.publish(\n name,\n source_schema_uri,\n target_schema_uri,\n description=None,\n code_repository=None,\n code_commit=None,\n getter_path=None,\n putter_path=None,\n rkey=None,\n)\nPublish a lens transformation record to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable lens name.\nrequired\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nstr\nAT URI of the target schema.\nrequired\n\n\ndescription\nOptional[str]\nWhat this transformation does.\nNone\n\n\ncode_repository\nOptional[str]\nGit repository URL containing the lens code.\nNone\n\n\ncode_commit\nOptional[str]\nGit commit hash for reproducibility.\nNone\n\n\ngetter_path\nOptional[str]\nModule path to the getter function (e.g., ‘mymodule.lenses:my_getter’).\nNone\n\n\nputter_path\nOptional[str]\nModule path to the putter function (e.g., ‘mymodule.lenses:my_putter’).\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created lens record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf code references are incomplete.\n\n\n\n\n\n\n\natmosphere.LensPublisher.publish_from_lens(\n lens_obj,\n *,\n name,\n source_schema_uri,\n target_schema_uri,\n code_repository,\n code_commit,\n description=None,\n rkey=None,\n)\nPublish a lens record from an existing Lens object.\nThis method extracts the getter and putter function names from the Lens object and publishes a record referencing them.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nlens_obj\nLens\nThe Lens object to publish.\nrequired\n\n\nname\nstr\nHuman-readable lens name.\nrequired\n\n\nsource_schema_uri\nstr\nAT URI of the source schema.\nrequired\n\n\ntarget_schema_uri\nstr\nAT URI of the target schema.\nrequired\n\n\ncode_repository\nstr\nGit repository URL.\nrequired\n\n\ncode_commit\nstr\nGit commit hash.\nrequired\n\n\ndescription\nOptional[str]\nWhat this transformation does.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created lens record." 1427 1609 }, 1428 1610 { 1429 1611 "objectID": "api/AtmosphereIndexEntry.html", 1430 1612 "href": "api/AtmosphereIndexEntry.html", 1431 1613 "title": "AtmosphereIndexEntry", 1432 1614 "section": "", 1433 - "text": "atmosphere.AtmosphereIndexEntry(uri, record)\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\nAttributes: _uri: AT URI of the record. _record: Raw record dictionary.\n\n\n\n\n\nName\nDescription\n\n\n\n\ndata_urls\nWebDataset URLs from external storage.\n\n\nmetadata\nMetadata from the record, if any.\n\n\nname\nHuman-readable dataset name.\n\n\nschema_ref\nAT URI of the schema record.\n\n\nuri\nAT URI of this record." 1615 + "text": "atmosphere.AtmosphereIndexEntry(uri, record)\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_uri\n\nAT URI of the record.\n\n\n_record\n\nRaw record dictionary." 1434 1616 }, 1435 1617 { 1436 1618 "objectID": "api/AtmosphereIndexEntry.html#attributes", 1437 1619 "href": "api/AtmosphereIndexEntry.html#attributes", 1438 1620 "title": "AtmosphereIndexEntry", 1439 1621 "section": "", 1440 - "text": "Name\nDescription\n\n\n\n\ndata_urls\nWebDataset URLs from external storage.\n\n\nmetadata\nMetadata from the record, if any.\n\n\nname\nHuman-readable dataset name.\n\n\nschema_ref\nAT URI of the schema record.\n\n\nuri\nAT URI of this record." 1622 + "text": "Name\nType\nDescription\n\n\n\n\n_uri\n\nAT URI of the record.\n\n\n_record\n\nRaw record dictionary." 1441 1623 }, 1442 1624 { 1443 1625 "objectID": "api/AbstractIndex.html", 1444 1626 "href": "api/AbstractIndex.html", 1445 1627 "title": "AbstractIndex", 1446 1628 "section": "", 1447 - "text": "AbstractIndex()\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\nThis protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses\nA single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.\nOptional Extensions: Some index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.\nExample: >>> def publish_and_list(index: AbstractIndex) -> None: … # Publish schemas for different types … schema1 = index.publish_schema(ImageSample, version=“1.0.0”) … schema2 = index.publish_schema(TextSample, version=“1.0.0”) … … # Insert datasets of different types … index.insert_dataset(image_ds, name=“images”) … index.insert_dataset(text_ds, name=“texts”) … … # List all datasets (mixed types) … for entry in index.list_datasets(): … print(f”{entry.name} -> {entry.schema_ref}“)\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\nArgs: ref: Schema reference string (local:// or at://).\nReturns: A dynamically generated Packable class with fields matching the schema definition. The class can be used with Dataset[T] to load and iterate over samples.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded (unsupported field types).\nExample: >>> entry = index.get_dataset(“my-dataset”) >>> SampleType = index.decode_schema(entry.schema_ref) >>> ds = DatasetSampleType >>> for sample in ds.ordered(): … print(sample) # sample is instance of SampleType\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\nArgs: ref: Dataset name, path, or full reference string.\nReturns: IndexEntry for the dataset.\nRaises: KeyError: If dataset not found.\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\nArgs: ref: Schema reference string (local:// or at://).\nReturns: Schema record as a dictionary with fields like ‘name’, ‘version’, ‘fields’, etc.\nRaises: KeyError: If schema not found.\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\nArgs: ds: The Dataset to register in the index (any sample type). name: Human-readable name for the dataset. schema_ref: Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type. **kwargs: Additional backend-specific options.\nReturns: IndexEntry for the inserted dataset.\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\nReturns: List of IndexEntry for each dataset.\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\nReturns: List of schema records as dictionaries.\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\nArgs: sample_type: A Packable type (PackableSample subclass or @packable-decorated). version: Semantic version string for the schema. **kwargs: Additional backend-specific options.\nReturns: Schema reference string: - Local: ‘local://schemas/{module.Class}@version’ - Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1629 + "text": "AbstractIndex()\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\nThis protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses\nA single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.\n\n\nSome index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.\n\n\n\n::\n>>> def publish_and_list(index: AbstractIndex) -> None:\n... # Publish schemas for different types\n... schema1 = index.publish_schema(ImageSample, version=\"1.0.0\")\n... schema2 = index.publish_schema(TextSample, version=\"1.0.0\")\n...\n... # Insert datasets of different types\n... index.insert_dataset(image_ds, name=\"images\")\n... index.insert_dataset(text_ds, name=\"texts\")\n...\n... # List all datasets (mixed types)\n... for entry in index.list_datasets():\n... print(f\"{entry.name} -> {entry.schema_ref}\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n>>> entry = index.get_dataset(\"my-dataset\")\n>>> SampleType = index.decode_schema(entry.schema_ref)\n>>> ds = Dataset[SampleType](entry.data_urls[0])\n>>> for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1630 + }, 1631 + { 1632 + "objectID": "api/AbstractIndex.html#optional-extensions", 1633 + "href": "api/AbstractIndex.html#optional-extensions", 1634 + "title": "AbstractIndex", 1635 + "section": "", 1636 + "text": "Some index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution." 1637 + }, 1638 + { 1639 + "objectID": "api/AbstractIndex.html#example", 1640 + "href": "api/AbstractIndex.html#example", 1641 + "title": "AbstractIndex", 1642 + "section": "", 1643 + "text": "::\n>>> def publish_and_list(index: AbstractIndex) -> None:\n... # Publish schemas for different types\n... schema1 = index.publish_schema(ImageSample, version=\"1.0.0\")\n... schema2 = index.publish_schema(TextSample, version=\"1.0.0\")\n...\n... # Insert datasets of different types\n... index.insert_dataset(image_ds, name=\"images\")\n... index.insert_dataset(text_ds, name=\"texts\")\n...\n... # List all datasets (mixed types)\n... for entry in index.list_datasets():\n... print(f\"{entry.name} -> {entry.schema_ref}\")" 1448 1644 }, 1449 1645 { 1450 1646 "objectID": "api/AbstractIndex.html#attributes", ··· 1458 1654 "href": "api/AbstractIndex.html#methods", 1459 1655 "title": "AbstractIndex", 1460 1656 "section": "", 1461 - "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\nArgs: ref: Schema reference string (local:// or at://).\nReturns: A dynamically generated Packable class with fields matching the schema definition. The class can be used with Dataset[T] to load and iterate over samples.\nRaises: KeyError: If schema not found. ValueError: If schema cannot be decoded (unsupported field types).\nExample: >>> entry = index.get_dataset(“my-dataset”) >>> SampleType = index.decode_schema(entry.schema_ref) >>> ds = DatasetSampleType >>> for sample in ds.ordered(): … print(sample) # sample is instance of SampleType\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\nArgs: ref: Dataset name, path, or full reference string.\nReturns: IndexEntry for the dataset.\nRaises: KeyError: If dataset not found.\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\nArgs: ref: Schema reference string (local:// or at://).\nReturns: Schema record as a dictionary with fields like ‘name’, ‘version’, ‘fields’, etc.\nRaises: KeyError: If schema not found.\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\nArgs: ds: The Dataset to register in the index (any sample type). name: Human-readable name for the dataset. schema_ref: Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type. **kwargs: Additional backend-specific options.\nReturns: IndexEntry for the inserted dataset.\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\nReturns: List of IndexEntry for each dataset.\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\nReturns: List of schema records as dictionaries.\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\nArgs: sample_type: A Packable type (PackableSample subclass or @packable-decorated). version: Semantic version string for the schema. **kwargs: Additional backend-specific options.\nReturns: Schema reference string: - Local: ‘local://schemas/{module.Class}@version’ - Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1657 + "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n>>> entry = index.get_dataset(\"my-dataset\")\n>>> SampleType = index.decode_schema(entry.schema_ref)\n>>> ds = Dataset[SampleType](entry.data_urls[0])\n>>> for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1462 1658 }, 1463 1659 { 1464 1660 "objectID": "api/local.LocalDatasetEntry.html", 1465 1661 "href": "api/local.LocalDatasetEntry.html", 1466 1662 "title": "local.LocalDatasetEntry", 1467 1663 "section": "", 1468 - "text": "local.LocalDatasetEntry(\n name,\n schema_ref,\n data_urls,\n metadata=None,\n _cid=None,\n _legacy_uuid=None,\n)\nIndex entry for a dataset stored in the local repository.\nImplements the IndexEntry protocol for compatibility with AbstractIndex. Uses dual identity: a content-addressable CID (ATProto-compatible) and a human-readable name.\nThe CID is generated from the entry’s content (schema_ref + data_urls), ensuring the same data produces the same CID whether stored locally or in the atmosphere. This enables seamless promotion from local to ATProto.\nAttributes: name: Human-readable name for this dataset. schema_ref: Reference to the schema for this dataset. data_urls: WebDataset URLs for the data. metadata: Arbitrary metadata dictionary, or None if not set.\n\n\n\n\n\nName\nDescription\n\n\n\n\ncid\nContent identifier (ATProto-compatible CID).\n\n\ndata_urls\nWebDataset URLs for the data.\n\n\nmetadata\nArbitrary metadata dictionary, or None if not set.\n\n\nname\nHuman-readable name for this dataset.\n\n\nsample_kind\nLegacy property: returns schema_ref for backwards compatibility.\n\n\nschema_ref\nReference to the schema for this dataset.\n\n\nwds_url\nLegacy property: returns first data URL for backwards compatibility.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_redis\nLoad an entry from Redis by CID.\n\n\nwrite_to\nPersist this index entry to Redis.\n\n\n\n\n\nlocal.LocalDatasetEntry.from_redis(redis, cid)\nLoad an entry from Redis by CID.\nArgs: redis: Redis connection to read from. cid: Content identifier of the entry to load.\nReturns: LocalDatasetEntry loaded from Redis.\nRaises: KeyError: If entry not found.\n\n\n\nlocal.LocalDatasetEntry.write_to(redis)\nPersist this index entry to Redis.\nStores the entry as a Redis hash with key ‘{REDIS_KEY_DATASET_ENTRY}:{cid}’.\nArgs: redis: Redis connection to write to." 1664 + "text": "local.LocalDatasetEntry(\n name,\n schema_ref,\n data_urls,\n metadata=None,\n _cid=None,\n _legacy_uuid=None,\n)\nIndex entry for a dataset stored in the local repository.\nImplements the IndexEntry protocol for compatibility with AbstractIndex. Uses dual identity: a content-addressable CID (ATProto-compatible) and a human-readable name.\nThe CID is generated from the entry’s content (schema_ref + data_urls), ensuring the same data produces the same CID whether stored locally or in the atmosphere. This enables seamless promotion from local to ATProto.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nname\nstr\nHuman-readable name for this dataset.\n\n\nschema_ref\nstr\nReference to the schema for this dataset.\n\n\ndata_urls\nlist[str]\nWebDataset URLs for the data.\n\n\nmetadata\ndict | None\nArbitrary metadata dictionary, or None if not set.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_redis\nLoad an entry from Redis by CID.\n\n\nwrite_to\nPersist this index entry to Redis.\n\n\n\n\n\nlocal.LocalDatasetEntry.from_redis(redis, cid)\nLoad an entry from Redis by CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nredis\nRedis\nRedis connection to read from.\nrequired\n\n\ncid\nstr\nContent identifier of the entry to load.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry loaded from Redis.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.LocalDatasetEntry.write_to(redis)\nPersist this index entry to Redis.\nStores the entry as a Redis hash with key ‘{REDIS_KEY_DATASET_ENTRY}:{cid}’.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nredis\nRedis\nRedis connection to write to.\nrequired" 1469 1665 }, 1470 1666 { 1471 1667 "objectID": "api/local.LocalDatasetEntry.html#attributes", 1472 1668 "href": "api/local.LocalDatasetEntry.html#attributes", 1473 1669 "title": "local.LocalDatasetEntry", 1474 1670 "section": "", 1475 - "text": "Name\nDescription\n\n\n\n\ncid\nContent identifier (ATProto-compatible CID).\n\n\ndata_urls\nWebDataset URLs for the data.\n\n\nmetadata\nArbitrary metadata dictionary, or None if not set.\n\n\nname\nHuman-readable name for this dataset.\n\n\nsample_kind\nLegacy property: returns schema_ref for backwards compatibility.\n\n\nschema_ref\nReference to the schema for this dataset.\n\n\nwds_url\nLegacy property: returns first data URL for backwards compatibility." 1671 + "text": "Name\nType\nDescription\n\n\n\n\nname\nstr\nHuman-readable name for this dataset.\n\n\nschema_ref\nstr\nReference to the schema for this dataset.\n\n\ndata_urls\nlist[str]\nWebDataset URLs for the data.\n\n\nmetadata\ndict | None\nArbitrary metadata dictionary, or None if not set." 1476 1672 }, 1477 1673 { 1478 1674 "objectID": "api/local.LocalDatasetEntry.html#methods", 1479 1675 "href": "api/local.LocalDatasetEntry.html#methods", 1480 1676 "title": "local.LocalDatasetEntry", 1481 1677 "section": "", 1482 - "text": "Name\nDescription\n\n\n\n\nfrom_redis\nLoad an entry from Redis by CID.\n\n\nwrite_to\nPersist this index entry to Redis.\n\n\n\n\n\nlocal.LocalDatasetEntry.from_redis(redis, cid)\nLoad an entry from Redis by CID.\nArgs: redis: Redis connection to read from. cid: Content identifier of the entry to load.\nReturns: LocalDatasetEntry loaded from Redis.\nRaises: KeyError: If entry not found.\n\n\n\nlocal.LocalDatasetEntry.write_to(redis)\nPersist this index entry to Redis.\nStores the entry as a Redis hash with key ‘{REDIS_KEY_DATASET_ENTRY}:{cid}’.\nArgs: redis: Redis connection to write to." 1678 + "text": "Name\nDescription\n\n\n\n\nfrom_redis\nLoad an entry from Redis by CID.\n\n\nwrite_to\nPersist this index entry to Redis.\n\n\n\n\n\nlocal.LocalDatasetEntry.from_redis(redis, cid)\nLoad an entry from Redis by CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nredis\nRedis\nRedis connection to read from.\nrequired\n\n\ncid\nstr\nContent identifier of the entry to load.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry loaded from Redis.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.LocalDatasetEntry.write_to(redis)\nPersist this index entry to Redis.\nStores the entry as a Redis hash with key ‘{REDIS_KEY_DATASET_ENTRY}:{cid}’.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nredis\nRedis\nRedis connection to write to.\nrequired" 1483 1679 }, 1484 1680 { 1485 1681 "objectID": "api/S3Source.html", 1486 1682 "href": "api/S3Source.html", 1487 1683 "title": "S3Source", 1488 1684 "section": "", 1489 - "text": "S3Source(\n bucket,\n keys,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n _client=None,\n)\nData source for S3-compatible storage with explicit credentials.\nUses boto3 to stream directly from S3, supporting: - Standard AWS S3 - S3-compatible endpoints (Cloudflare R2, MinIO, etc.) - Private buckets with credentials - IAM role authentication (when keys not provided)\nUnlike URL-based approaches, this doesn’t require URL transformation or global gopen_schemes registration. Credentials are scoped to the source instance.\nAttributes: bucket: S3 bucket name. keys: List of object keys (paths within bucket). endpoint: Optional custom endpoint URL for S3-compatible services. access_key: Optional AWS access key ID. secret_key: Optional AWS secret access key. region: Optional AWS region (defaults to us-east-1).\nExample: >>> source = S3Source( … bucket=“my-datasets”, … keys=[“train/shard-000.tar”, “train/shard-001.tar”], … endpoint=“https://abc123.r2.cloudflarestorage.com”, … access_key=“AKIAIOSFODNN7EXAMPLE”, … secret_key=“wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY”, … ) >>> for shard_id, stream in source.shards: … process(stream)\n\n\n\n\n\nName\nDescription\n\n\n\n\nshard_list\nReturn list of S3 URIs for the shards (deprecated, use list_shards()).\n\n\nshards\nLazily yield (s3_uri, stream) pairs for each shard.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_credentials\nCreate S3Source from a credentials dictionary.\n\n\nfrom_urls\nCreate S3Source from s3:// URLs.\n\n\nlist_shards\nReturn list of S3 URIs for the shards.\n\n\nopen_shard\nOpen a single shard by S3 URI.\n\n\n\n\n\nS3Source.from_credentials(credentials, bucket, keys)\nCreate S3Source from a credentials dictionary.\nAccepts the same credential format used by S3DataStore.\nArgs: credentials: Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT. bucket: S3 bucket name. keys: List of object keys.\nReturns: Configured S3Source.\nExample: >>> creds = { … “AWS_ACCESS_KEY_ID”: “…”, … “AWS_SECRET_ACCESS_KEY”: “…”, … “AWS_ENDPOINT”: “https://r2.example.com”, … } >>> source = S3Source.from_credentials(creds, “my-bucket”, [“data.tar”])\n\n\n\nS3Source.from_urls(\n urls,\n *,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n)\nCreate S3Source from s3:// URLs.\nParses s3://bucket/key URLs and extracts bucket and keys. All URLs must be in the same bucket.\nArgs: urls: List of s3:// URLs. endpoint: Optional custom endpoint. access_key: Optional access key. secret_key: Optional secret key. region: Optional region.\nReturns: S3Source configured for the given URLs.\nRaises: ValueError: If URLs are not valid s3:// URLs or span multiple buckets.\nExample: >>> source = S3Source.from_urls( … [“s3://my-bucket/train-000.tar”, “s3://my-bucket/train-001.tar”], … endpoint=“https://r2.example.com”, … )\n\n\n\nS3Source.list_shards()\nReturn list of S3 URIs for the shards.\n\n\n\nS3Source.open_shard(shard_id)\nOpen a single shard by S3 URI.\nArgs: shard_id: S3 URI of the shard (s3://bucket/key).\nReturns: StreamingBody for reading the object.\nRaises: KeyError: If shard_id is not in list_shards()." 1685 + "text": "S3Source(\n bucket,\n keys,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n _client=None,\n)\nData source for S3-compatible storage with explicit credentials.\nUses boto3 to stream directly from S3, supporting: - Standard AWS S3 - S3-compatible endpoints (Cloudflare R2, MinIO, etc.) - Private buckets with credentials - IAM role authentication (when keys not provided)\nUnlike URL-based approaches, this doesn’t require URL transformation or global gopen_schemes registration. Credentials are scoped to the source instance.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nbucket\nstr\nS3 bucket name.\n\n\nkeys\nlist[str]\nList of object keys (paths within bucket).\n\n\nendpoint\nstr | None\nOptional custom endpoint URL for S3-compatible services.\n\n\naccess_key\nstr | None\nOptional AWS access key ID.\n\n\nsecret_key\nstr | None\nOptional AWS secret access key.\n\n\nregion\nstr | None\nOptional AWS region (defaults to us-east-1).\n\n\n\n\n\n\n::\n>>> source = S3Source(\n... bucket=\"my-datasets\",\n... keys=[\"train/shard-000.tar\", \"train/shard-001.tar\"],\n... endpoint=\"https://abc123.r2.cloudflarestorage.com\",\n... access_key=\"AKIAIOSFODNN7EXAMPLE\",\n... secret_key=\"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\",\n... )\n>>> for shard_id, stream in source.shards:\n... process(stream)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_credentials\nCreate S3Source from a credentials dictionary.\n\n\nfrom_urls\nCreate S3Source from s3:// URLs.\n\n\nlist_shards\nReturn list of S3 URIs for the shards.\n\n\nopen_shard\nOpen a single shard by S3 URI.\n\n\n\n\n\nS3Source.from_credentials(credentials, bucket, keys)\nCreate S3Source from a credentials dictionary.\nAccepts the same credential format used by S3DataStore.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncredentials\ndict[str, str]\nDict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT.\nrequired\n\n\nbucket\nstr\nS3 bucket name.\nrequired\n\n\nkeys\nlist[str]\nList of object keys.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'S3Source'\nConfigured S3Source.\n\n\n\n\n\n\n::\n>>> creds = {\n... \"AWS_ACCESS_KEY_ID\": \"...\",\n... \"AWS_SECRET_ACCESS_KEY\": \"...\",\n... \"AWS_ENDPOINT\": \"https://r2.example.com\",\n... }\n>>> source = S3Source.from_credentials(creds, \"my-bucket\", [\"data.tar\"])\n\n\n\n\nS3Source.from_urls(\n urls,\n *,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n)\nCreate S3Source from s3:// URLs.\nParses s3://bucket/key URLs and extracts bucket and keys. All URLs must be in the same bucket.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of s3:// URLs.\nrequired\n\n\nendpoint\nstr | None\nOptional custom endpoint.\nNone\n\n\naccess_key\nstr | None\nOptional access key.\nNone\n\n\nsecret_key\nstr | None\nOptional secret key.\nNone\n\n\nregion\nstr | None\nOptional region.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'S3Source'\nS3Source configured for the given URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URLs are not valid s3:// URLs or span multiple buckets.\n\n\n\n\n\n\n::\n>>> source = S3Source.from_urls(\n... [\"s3://my-bucket/train-000.tar\", \"s3://my-bucket/train-001.tar\"],\n... endpoint=\"https://r2.example.com\",\n... )\n\n\n\n\nS3Source.list_shards()\nReturn list of S3 URIs for the shards.\n\n\n\nS3Source.open_shard(shard_id)\nOpen a single shard by S3 URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nS3 URI of the shard (s3://bucket/key).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nStreamingBody for reading the object.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards()." 1490 1686 }, 1491 1687 { 1492 1688 "objectID": "api/S3Source.html#attributes", 1493 1689 "href": "api/S3Source.html#attributes", 1494 1690 "title": "S3Source", 1495 1691 "section": "", 1496 - "text": "Name\nDescription\n\n\n\n\nshard_list\nReturn list of S3 URIs for the shards (deprecated, use list_shards()).\n\n\nshards\nLazily yield (s3_uri, stream) pairs for each shard." 1692 + "text": "Name\nType\nDescription\n\n\n\n\nbucket\nstr\nS3 bucket name.\n\n\nkeys\nlist[str]\nList of object keys (paths within bucket).\n\n\nendpoint\nstr | None\nOptional custom endpoint URL for S3-compatible services.\n\n\naccess_key\nstr | None\nOptional AWS access key ID.\n\n\nsecret_key\nstr | None\nOptional AWS secret access key.\n\n\nregion\nstr | None\nOptional AWS region (defaults to us-east-1)." 1693 + }, 1694 + { 1695 + "objectID": "api/S3Source.html#example", 1696 + "href": "api/S3Source.html#example", 1697 + "title": "S3Source", 1698 + "section": "", 1699 + "text": "::\n>>> source = S3Source(\n... bucket=\"my-datasets\",\n... keys=[\"train/shard-000.tar\", \"train/shard-001.tar\"],\n... endpoint=\"https://abc123.r2.cloudflarestorage.com\",\n... access_key=\"AKIAIOSFODNN7EXAMPLE\",\n... secret_key=\"wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY\",\n... )\n>>> for shard_id, stream in source.shards:\n... process(stream)" 1497 1700 }, 1498 1701 { 1499 1702 "objectID": "api/S3Source.html#methods", 1500 1703 "href": "api/S3Source.html#methods", 1501 1704 "title": "S3Source", 1502 1705 "section": "", 1503 - "text": "Name\nDescription\n\n\n\n\nfrom_credentials\nCreate S3Source from a credentials dictionary.\n\n\nfrom_urls\nCreate S3Source from s3:// URLs.\n\n\nlist_shards\nReturn list of S3 URIs for the shards.\n\n\nopen_shard\nOpen a single shard by S3 URI.\n\n\n\n\n\nS3Source.from_credentials(credentials, bucket, keys)\nCreate S3Source from a credentials dictionary.\nAccepts the same credential format used by S3DataStore.\nArgs: credentials: Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT. bucket: S3 bucket name. keys: List of object keys.\nReturns: Configured S3Source.\nExample: >>> creds = { … “AWS_ACCESS_KEY_ID”: “…”, … “AWS_SECRET_ACCESS_KEY”: “…”, … “AWS_ENDPOINT”: “https://r2.example.com”, … } >>> source = S3Source.from_credentials(creds, “my-bucket”, [“data.tar”])\n\n\n\nS3Source.from_urls(\n urls,\n *,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n)\nCreate S3Source from s3:// URLs.\nParses s3://bucket/key URLs and extracts bucket and keys. All URLs must be in the same bucket.\nArgs: urls: List of s3:// URLs. endpoint: Optional custom endpoint. access_key: Optional access key. secret_key: Optional secret key. region: Optional region.\nReturns: S3Source configured for the given URLs.\nRaises: ValueError: If URLs are not valid s3:// URLs or span multiple buckets.\nExample: >>> source = S3Source.from_urls( … [“s3://my-bucket/train-000.tar”, “s3://my-bucket/train-001.tar”], … endpoint=“https://r2.example.com”, … )\n\n\n\nS3Source.list_shards()\nReturn list of S3 URIs for the shards.\n\n\n\nS3Source.open_shard(shard_id)\nOpen a single shard by S3 URI.\nArgs: shard_id: S3 URI of the shard (s3://bucket/key).\nReturns: StreamingBody for reading the object.\nRaises: KeyError: If shard_id is not in list_shards()." 1706 + "text": "Name\nDescription\n\n\n\n\nfrom_credentials\nCreate S3Source from a credentials dictionary.\n\n\nfrom_urls\nCreate S3Source from s3:// URLs.\n\n\nlist_shards\nReturn list of S3 URIs for the shards.\n\n\nopen_shard\nOpen a single shard by S3 URI.\n\n\n\n\n\nS3Source.from_credentials(credentials, bucket, keys)\nCreate S3Source from a credentials dictionary.\nAccepts the same credential format used by S3DataStore.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncredentials\ndict[str, str]\nDict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT.\nrequired\n\n\nbucket\nstr\nS3 bucket name.\nrequired\n\n\nkeys\nlist[str]\nList of object keys.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'S3Source'\nConfigured S3Source.\n\n\n\n\n\n\n::\n>>> creds = {\n... \"AWS_ACCESS_KEY_ID\": \"...\",\n... \"AWS_SECRET_ACCESS_KEY\": \"...\",\n... \"AWS_ENDPOINT\": \"https://r2.example.com\",\n... }\n>>> source = S3Source.from_credentials(creds, \"my-bucket\", [\"data.tar\"])\n\n\n\n\nS3Source.from_urls(\n urls,\n *,\n endpoint=None,\n access_key=None,\n secret_key=None,\n region=None,\n)\nCreate S3Source from s3:// URLs.\nParses s3://bucket/key URLs and extracts bucket and keys. All URLs must be in the same bucket.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of s3:// URLs.\nrequired\n\n\nendpoint\nstr | None\nOptional custom endpoint.\nNone\n\n\naccess_key\nstr | None\nOptional access key.\nNone\n\n\nsecret_key\nstr | None\nOptional secret key.\nNone\n\n\nregion\nstr | None\nOptional region.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'S3Source'\nS3Source configured for the given URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URLs are not valid s3:// URLs or span multiple buckets.\n\n\n\n\n\n\n::\n>>> source = S3Source.from_urls(\n... [\"s3://my-bucket/train-000.tar\", \"s3://my-bucket/train-001.tar\"],\n... endpoint=\"https://r2.example.com\",\n... )\n\n\n\n\nS3Source.list_shards()\nReturn list of S3 URIs for the shards.\n\n\n\nS3Source.open_shard(shard_id)\nOpen a single shard by S3 URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nS3 URI of the shard (s3://bucket/key).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nStreamingBody for reading the object.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards()." 1504 1707 }, 1505 1708 { 1506 1709 "objectID": "api/IndexEntry.html", 1507 1710 "href": "api/IndexEntry.html", 1508 1711 "title": "IndexEntry", 1509 1712 "section": "", 1510 - "text": "IndexEntry()\nCommon interface for index entries (local or atmosphere).\nBoth LocalDatasetEntry and atmosphere DatasetRecord-based entries should satisfy this protocol, enabling code that works with either.\nProperties: name: Human-readable dataset name schema_ref: Reference to schema (local:// path or AT URI) data_urls: WebDataset URLs for the data metadata: Arbitrary metadata dict, or None\n\n\n\n\n\nName\nDescription\n\n\n\n\ndata_urls\nWebDataset URLs for the data.\n\n\nmetadata\nArbitrary metadata dictionary, or None if not set.\n\n\nname\nHuman-readable dataset name.\n\n\nschema_ref\nReference to the schema for this dataset." 1713 + "text": "IndexEntry()\nCommon interface for index entries (local or atmosphere).\nBoth LocalDatasetEntry and atmosphere DatasetRecord-based entries should satisfy this protocol, enabling code that works with either.\n\n\nname: Human-readable dataset name schema_ref: Reference to schema (local:// path or AT URI) data_urls: WebDataset URLs for the data metadata: Arbitrary metadata dict, or None\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndata_urls\nWebDataset URLs for the data.\n\n\nmetadata\nArbitrary metadata dictionary, or None if not set.\n\n\nname\nHuman-readable dataset name.\n\n\nschema_ref\nReference to the schema for this dataset." 1714 + }, 1715 + { 1716 + "objectID": "api/IndexEntry.html#properties", 1717 + "href": "api/IndexEntry.html#properties", 1718 + "title": "IndexEntry", 1719 + "section": "", 1720 + "text": "name: Human-readable dataset name schema_ref: Reference to schema (local:// path or AT URI) data_urls: WebDataset URLs for the data metadata: Arbitrary metadata dict, or None" 1511 1721 }, 1512 1722 { 1513 1723 "objectID": "api/IndexEntry.html#attributes", ··· 1570 1780 "href": "api/URLSource.html", 1571 1781 "title": "URLSource", 1572 1782 "section": "", 1573 - "text": "URLSource(url)\nData source for WebDataset-compatible URLs.\nWraps WebDataset’s gopen to open URLs using built-in handlers for http, https, pipe, gs, hf, sftp, etc. Supports brace expansion for shard patterns like “data-{000..099}.tar”.\nThis is the default source type when a string URL is passed to Dataset.\nAttributes: url: URL or brace pattern for the shards.\nExample: >>> source = URLSource(“https://example.com/train-{000..009}.tar”) >>> for shard_id, stream in source.shards: … print(f”Streaming {shard_id}“)\n\n\n\n\n\nName\nDescription\n\n\n\n\nshard_list\nExpand brace pattern and return list of shard URLs (deprecated, use list_shards()).\n\n\nshards\nLazily yield (url, stream) pairs for each shard.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlist_shards\nExpand brace pattern and return list of shard URLs.\n\n\nopen_shard\nOpen a single shard by URL.\n\n\n\n\n\nURLSource.list_shards()\nExpand brace pattern and return list of shard URLs.\n\n\n\nURLSource.open_shard(shard_id)\nOpen a single shard by URL.\nArgs: shard_id: URL of the shard to open.\nReturns: File-like stream from gopen.\nRaises: KeyError: If shard_id is not in list_shards()." 1783 + "text": "URLSource(url)\nData source for WebDataset-compatible URLs.\nWraps WebDataset’s gopen to open URLs using built-in handlers for http, https, pipe, gs, hf, sftp, etc. Supports brace expansion for shard patterns like “data-{000..099}.tar”.\nThis is the default source type when a string URL is passed to Dataset.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nurl\nstr\nURL or brace pattern for the shards.\n\n\n\n\n\n\n::\n>>> source = URLSource(\"https://example.com/train-{000..009}.tar\")\n>>> for shard_id, stream in source.shards:\n... print(f\"Streaming {shard_id}\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nlist_shards\nExpand brace pattern and return list of shard URLs.\n\n\nopen_shard\nOpen a single shard by URL.\n\n\n\n\n\nURLSource.list_shards()\nExpand brace pattern and return list of shard URLs.\n\n\n\nURLSource.open_shard(shard_id)\nOpen a single shard by URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nURL of the shard to open.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nFile-like stream from gopen.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards()." 1574 1784 }, 1575 1785 { 1576 1786 "objectID": "api/URLSource.html#attributes", 1577 1787 "href": "api/URLSource.html#attributes", 1578 1788 "title": "URLSource", 1579 1789 "section": "", 1580 - "text": "Name\nDescription\n\n\n\n\nshard_list\nExpand brace pattern and return list of shard URLs (deprecated, use list_shards()).\n\n\nshards\nLazily yield (url, stream) pairs for each shard." 1790 + "text": "Name\nType\nDescription\n\n\n\n\nurl\nstr\nURL or brace pattern for the shards." 1791 + }, 1792 + { 1793 + "objectID": "api/URLSource.html#example", 1794 + "href": "api/URLSource.html#example", 1795 + "title": "URLSource", 1796 + "section": "", 1797 + "text": "::\n>>> source = URLSource(\"https://example.com/train-{000..009}.tar\")\n>>> for shard_id, stream in source.shards:\n... print(f\"Streaming {shard_id}\")" 1581 1798 }, 1582 1799 { 1583 1800 "objectID": "api/URLSource.html#methods", 1584 1801 "href": "api/URLSource.html#methods", 1585 1802 "title": "URLSource", 1586 1803 "section": "", 1587 - "text": "Name\nDescription\n\n\n\n\nlist_shards\nExpand brace pattern and return list of shard URLs.\n\n\nopen_shard\nOpen a single shard by URL.\n\n\n\n\n\nURLSource.list_shards()\nExpand brace pattern and return list of shard URLs.\n\n\n\nURLSource.open_shard(shard_id)\nOpen a single shard by URL.\nArgs: shard_id: URL of the shard to open.\nReturns: File-like stream from gopen.\nRaises: KeyError: If shard_id is not in list_shards()." 1804 + "text": "Name\nDescription\n\n\n\n\nlist_shards\nExpand brace pattern and return list of shard URLs.\n\n\nopen_shard\nOpen a single shard by URL.\n\n\n\n\n\nURLSource.list_shards()\nExpand brace pattern and return list of shard URLs.\n\n\n\nURLSource.open_shard(shard_id)\nOpen a single shard by URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nURL of the shard to open.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nFile-like stream from gopen.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards()." 1588 1805 }, 1589 1806 { 1590 1807 "objectID": "api/DatasetPublisher.html", 1591 1808 "href": "api/DatasetPublisher.html", 1592 1809 "title": "DatasetPublisher", 1593 1810 "section": "", 1594 - "text": "atmosphere.DatasetPublisher(client)\nPublishes dataset index records to ATProto.\nThis class creates dataset records that reference a schema and point to external storage (WebDataset URLs) or ATProto blobs.\nExample: >>> dataset = atdata.DatasetMySample >>> >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = DatasetPublisher(client) >>> uri = publisher.publish( … dataset, … name=“My Training Data”, … description=“Training data for my model”, … tags=[“computer-vision”, “training”], … )\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a dataset index record to ATProto.\n\n\npublish_with_blobs\nPublish a dataset with data stored as ATProto blobs.\n\n\npublish_with_urls\nPublish a dataset record with explicit URLs.\n\n\n\n\n\natmosphere.DatasetPublisher.publish(\n dataset,\n *,\n name,\n schema_uri=None,\n description=None,\n tags=None,\n license=None,\n auto_publish_schema=True,\n schema_version='1.0.0',\n rkey=None,\n)\nPublish a dataset index record to ATProto.\nArgs: dataset: The Dataset to publish. name: Human-readable dataset name. schema_uri: AT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’). auto_publish_schema: If True and schema_uri not provided, automatically publish the schema first. schema_version: Version for auto-published schema. rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record.\nRaises: ValueError: If schema_uri is not provided and auto_publish_schema is False.\n\n\n\natmosphere.DatasetPublisher.publish_with_blobs(\n blobs,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n mime_type='application/x-tar',\n rkey=None,\n)\nPublish a dataset with data stored as ATProto blobs.\nThis method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).\nArgs: blobs: List of binary data (e.g., tar shards) to upload as blobs. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. mime_type: MIME type for the blobs (default: application/x-tar). rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record.\nNote: Blobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.\n\n\n\natmosphere.DatasetPublisher.publish_with_urls(\n urls,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n rkey=None,\n)\nPublish a dataset record with explicit URLs.\nThis method allows publishing a dataset record without having a Dataset object, useful for registering existing WebDataset files.\nArgs: urls: List of WebDataset URLs with brace notation. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record." 1811 + "text": "atmosphere.DatasetPublisher(client)\nPublishes dataset index records to ATProto.\nThis class creates dataset records that reference a schema and point to external storage (WebDataset URLs) or ATProto blobs.\n\n\n::\n>>> dataset = atdata.Dataset[MySample](\"s3://bucket/data-{000000..000009}.tar\")\n>>>\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = DatasetPublisher(client)\n>>> uri = publisher.publish(\n... dataset,\n... name=\"My Training Data\",\n... description=\"Training data for my model\",\n... tags=[\"computer-vision\", \"training\"],\n... )\n\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a dataset index record to ATProto.\n\n\npublish_with_blobs\nPublish a dataset with data stored as ATProto blobs.\n\n\npublish_with_urls\nPublish a dataset record with explicit URLs.\n\n\n\n\n\natmosphere.DatasetPublisher.publish(\n dataset,\n *,\n name,\n schema_uri=None,\n description=None,\n tags=None,\n license=None,\n auto_publish_schema=True,\n schema_version='1.0.0',\n rkey=None,\n)\nPublish a dataset index record to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndataset\nDataset[ST]\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\nschema_uri\nOptional[str]\nAT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published.\nNone\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’).\nNone\n\n\nauto_publish_schema\nbool\nIf True and schema_uri not provided, automatically publish the schema first.\nTrue\n\n\nschema_version\nstr\nVersion for auto-published schema.\n'1.0.0'\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema_uri is not provided and auto_publish_schema is False.\n\n\n\n\n\n\n\natmosphere.DatasetPublisher.publish_with_blobs(\n blobs,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n mime_type='application/x-tar',\n rkey=None,\n)\nPublish a dataset with data stored as ATProto blobs.\nThis method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nblobs\nlist[bytes]\nList of binary data (e.g., tar shards) to upload as blobs.\nrequired\n\n\nschema_uri\nstr\nAT URI of the schema record.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nmime_type\nstr\nMIME type for the blobs (default: application/x-tar).\n'application/x-tar'\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record.\n\n\n\n\n\n\nBlobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.\n\n\n\n\natmosphere.DatasetPublisher.publish_with_urls(\n urls,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n rkey=None,\n)\nPublish a dataset record with explicit URLs.\nThis method allows publishing a dataset record without having a Dataset object, useful for registering existing WebDataset files.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of WebDataset URLs with brace notation.\nrequired\n\n\nschema_uri\nstr\nAT URI of the schema record.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record." 1812 + }, 1813 + { 1814 + "objectID": "api/DatasetPublisher.html#example", 1815 + "href": "api/DatasetPublisher.html#example", 1816 + "title": "DatasetPublisher", 1817 + "section": "", 1818 + "text": "::\n>>> dataset = atdata.Dataset[MySample](\"s3://bucket/data-{000000..000009}.tar\")\n>>>\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = DatasetPublisher(client)\n>>> uri = publisher.publish(\n... dataset,\n... name=\"My Training Data\",\n... description=\"Training data for my model\",\n... tags=[\"computer-vision\", \"training\"],\n... )" 1595 1819 }, 1596 1820 { 1597 1821 "objectID": "api/DatasetPublisher.html#methods", 1598 1822 "href": "api/DatasetPublisher.html#methods", 1599 1823 "title": "DatasetPublisher", 1600 1824 "section": "", 1601 - "text": "Name\nDescription\n\n\n\n\npublish\nPublish a dataset index record to ATProto.\n\n\npublish_with_blobs\nPublish a dataset with data stored as ATProto blobs.\n\n\npublish_with_urls\nPublish a dataset record with explicit URLs.\n\n\n\n\n\natmosphere.DatasetPublisher.publish(\n dataset,\n *,\n name,\n schema_uri=None,\n description=None,\n tags=None,\n license=None,\n auto_publish_schema=True,\n schema_version='1.0.0',\n rkey=None,\n)\nPublish a dataset index record to ATProto.\nArgs: dataset: The Dataset to publish. name: Human-readable dataset name. schema_uri: AT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’). auto_publish_schema: If True and schema_uri not provided, automatically publish the schema first. schema_version: Version for auto-published schema. rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record.\nRaises: ValueError: If schema_uri is not provided and auto_publish_schema is False.\n\n\n\natmosphere.DatasetPublisher.publish_with_blobs(\n blobs,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n mime_type='application/x-tar',\n rkey=None,\n)\nPublish a dataset with data stored as ATProto blobs.\nThis method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).\nArgs: blobs: List of binary data (e.g., tar shards) to upload as blobs. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. mime_type: MIME type for the blobs (default: application/x-tar). rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record.\nNote: Blobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.\n\n\n\natmosphere.DatasetPublisher.publish_with_urls(\n urls,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n rkey=None,\n)\nPublish a dataset record with explicit URLs.\nThis method allows publishing a dataset record without having a Dataset object, useful for registering existing WebDataset files.\nArgs: urls: List of WebDataset URLs with brace notation. schema_uri: AT URI of the schema record. name: Human-readable dataset name. description: Human-readable description. tags: Searchable tags for discovery. license: SPDX license identifier. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key.\nReturns: The AT URI of the created dataset record." 1825 + "text": "Name\nDescription\n\n\n\n\npublish\nPublish a dataset index record to ATProto.\n\n\npublish_with_blobs\nPublish a dataset with data stored as ATProto blobs.\n\n\npublish_with_urls\nPublish a dataset record with explicit URLs.\n\n\n\n\n\natmosphere.DatasetPublisher.publish(\n dataset,\n *,\n name,\n schema_uri=None,\n description=None,\n tags=None,\n license=None,\n auto_publish_schema=True,\n schema_version='1.0.0',\n rkey=None,\n)\nPublish a dataset index record to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndataset\nDataset[ST]\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\nschema_uri\nOptional[str]\nAT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published.\nNone\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier (e.g., ‘MIT’, ‘Apache-2.0’).\nNone\n\n\nauto_publish_schema\nbool\nIf True and schema_uri not provided, automatically publish the schema first.\nTrue\n\n\nschema_version\nstr\nVersion for auto-published schema.\n'1.0.0'\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema_uri is not provided and auto_publish_schema is False.\n\n\n\n\n\n\n\natmosphere.DatasetPublisher.publish_with_blobs(\n blobs,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n mime_type='application/x-tar',\n rkey=None,\n)\nPublish a dataset with data stored as ATProto blobs.\nThis method uploads the provided data as blobs to the PDS and creates a dataset record referencing them. Suitable for smaller datasets that fit within blob size limits (typically 50MB per blob, configurable).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nblobs\nlist[bytes]\nList of binary data (e.g., tar shards) to upload as blobs.\nrequired\n\n\nschema_uri\nstr\nAT URI of the schema record.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nmime_type\nstr\nMIME type for the blobs (default: application/x-tar).\n'application/x-tar'\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record.\n\n\n\n\n\n\nBlobs are only retained by the PDS when referenced in a committed record. This method handles that automatically.\n\n\n\n\natmosphere.DatasetPublisher.publish_with_urls(\n urls,\n schema_uri,\n *,\n name,\n description=None,\n tags=None,\n license=None,\n metadata=None,\n rkey=None,\n)\nPublish a dataset record with explicit URLs.\nThis method allows publishing a dataset record without having a Dataset object, useful for registering existing WebDataset files.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of WebDataset URLs with brace notation.\nrequired\n\n\nschema_uri\nstr\nAT URI of the schema record.\nrequired\n\n\nname\nstr\nHuman-readable dataset name.\nrequired\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\ntags\nOptional[list[str]]\nSearchable tags for discovery.\nNone\n\n\nlicense\nOptional[str]\nSPDX license identifier.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created dataset record." 1602 1826 }, 1603 1827 { 1604 1828 "objectID": "api/SchemaPublisher.html", 1605 1829 "href": "api/SchemaPublisher.html", 1606 1830 "title": "SchemaPublisher", 1607 1831 "section": "", 1608 - "text": "atmosphere.SchemaPublisher(client)\nPublishes PackableSample schemas to ATProto.\nThis class introspects a PackableSample class to extract its field definitions and publishes them as an ATProto schema record.\nExample: >>> @atdata.packable … class MySample: … image: NDArray … label: str … >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> publisher = SchemaPublisher(client) >>> uri = publisher.publish(MySample, version=“1.0.0”) >>> print(uri) at://did:plc:…/ac.foundation.dataset.sampleSchema/…\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\natmosphere.SchemaPublisher.publish(\n sample_type,\n *,\n name=None,\n version='1.0.0',\n description=None,\n metadata=None,\n rkey=None,\n)\nPublish a PackableSample schema to ATProto.\nArgs: sample_type: The PackableSample class to publish. name: Human-readable name. Defaults to the class name. version: Semantic version string (e.g., ‘1.0.0’). description: Human-readable description. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key. If not provided, a TID is generated.\nReturns: The AT URI of the created schema record.\nRaises: ValueError: If sample_type is not a dataclass or client is not authenticated. TypeError: If a field type is not supported." 1832 + "text": "atmosphere.SchemaPublisher(client)\nPublishes PackableSample schemas to ATProto.\nThis class introspects a PackableSample class to extract its field definitions and publishes them as an ATProto schema record.\n\n\n::\n>>> @atdata.packable\n... class MySample:\n... image: NDArray\n... label: str\n...\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = SchemaPublisher(client)\n>>> uri = publisher.publish(MySample, version=\"1.0.0\")\n>>> print(uri)\nat://did:plc:.../ac.foundation.dataset.sampleSchema/...\n\n\n\n\n\n\nName\nDescription\n\n\n\n\npublish\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\natmosphere.SchemaPublisher.publish(\n sample_type,\n *,\n name=None,\n version='1.0.0',\n description=None,\n metadata=None,\n rkey=None,\n)\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[ST]\nThe PackableSample class to publish.\nrequired\n\n\nname\nOptional[str]\nHuman-readable name. Defaults to the class name.\nNone\n\n\nversion\nstr\nSemantic version string (e.g., ‘1.0.0’).\n'1.0.0'\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created schema record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass or client is not authenticated.\n\n\n\nTypeError\nIf a field type is not supported." 1833 + }, 1834 + { 1835 + "objectID": "api/SchemaPublisher.html#example", 1836 + "href": "api/SchemaPublisher.html#example", 1837 + "title": "SchemaPublisher", 1838 + "section": "", 1839 + "text": "::\n>>> @atdata.packable\n... class MySample:\n... image: NDArray\n... label: str\n...\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> publisher = SchemaPublisher(client)\n>>> uri = publisher.publish(MySample, version=\"1.0.0\")\n>>> print(uri)\nat://did:plc:.../ac.foundation.dataset.sampleSchema/..." 1609 1840 }, 1610 1841 { 1611 1842 "objectID": "api/SchemaPublisher.html#methods", 1612 1843 "href": "api/SchemaPublisher.html#methods", 1613 1844 "title": "SchemaPublisher", 1614 1845 "section": "", 1615 - "text": "Name\nDescription\n\n\n\n\npublish\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\natmosphere.SchemaPublisher.publish(\n sample_type,\n *,\n name=None,\n version='1.0.0',\n description=None,\n metadata=None,\n rkey=None,\n)\nPublish a PackableSample schema to ATProto.\nArgs: sample_type: The PackableSample class to publish. name: Human-readable name. Defaults to the class name. version: Semantic version string (e.g., ‘1.0.0’). description: Human-readable description. metadata: Arbitrary metadata dictionary. rkey: Optional explicit record key. If not provided, a TID is generated.\nReturns: The AT URI of the created schema record.\nRaises: ValueError: If sample_type is not a dataclass or client is not authenticated. TypeError: If a field type is not supported." 1846 + "text": "Name\nDescription\n\n\n\n\npublish\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\natmosphere.SchemaPublisher.publish(\n sample_type,\n *,\n name=None,\n version='1.0.0',\n description=None,\n metadata=None,\n rkey=None,\n)\nPublish a PackableSample schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[ST]\nThe PackableSample class to publish.\nrequired\n\n\nname\nOptional[str]\nHuman-readable name. Defaults to the class name.\nNone\n\n\nversion\nstr\nSemantic version string (e.g., ‘1.0.0’).\n'1.0.0'\n\n\ndescription\nOptional[str]\nHuman-readable description.\nNone\n\n\nmetadata\nOptional[dict]\nArbitrary metadata dictionary.\nNone\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created schema record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass or client is not authenticated.\n\n\n\nTypeError\nIf a field type is not supported." 1616 1847 }, 1617 1848 { 1618 1849 "objectID": "api/promote_to_atmosphere.html", 1619 1850 "href": "api/promote_to_atmosphere.html", 1620 1851 "title": "promote_to_atmosphere", 1621 1852 "section": "", 1622 - "text": "promote_to_atmosphere\npromote.promote_to_atmosphere(\n local_entry,\n local_index,\n atmosphere_client,\n *,\n data_store=None,\n name=None,\n description=None,\n tags=None,\n license=None,\n)\nPromote a local dataset to the atmosphere network.\nThis function takes a locally-indexed dataset and publishes it to ATProto, making it discoverable on the federated atmosphere network.\nArgs: local_entry: The LocalDatasetEntry to promote. local_index: Local index containing the schema for this entry. atmosphere_client: Authenticated AtmosphereClient. data_store: Optional data store for copying data to new location. If None, the existing data_urls are used as-is. name: Override name for the atmosphere record. Defaults to local name. description: Optional description for the dataset. tags: Optional tags for discovery. license: Optional license identifier.\nReturns: AT URI of the created atmosphere dataset record.\nRaises: KeyError: If schema not found in local index. ValueError: If local entry has no data URLs.\nExample: >>> entry = local_index.get_dataset(“mnist-train”) >>> uri = promote_to_atmosphere(entry, local_index, client) >>> print(uri) at://did:plc:abc123/ac.foundation.dataset.datasetIndex/…" 1853 + "text": "promote.promote_to_atmosphere(\n local_entry,\n local_index,\n atmosphere_client,\n *,\n data_store=None,\n name=None,\n description=None,\n tags=None,\n license=None,\n)\nPromote a local dataset to the atmosphere network.\nThis function takes a locally-indexed dataset and publishes it to ATProto, making it discoverable on the federated atmosphere network.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nlocal_entry\nLocalDatasetEntry\nThe LocalDatasetEntry to promote.\nrequired\n\n\nlocal_index\nLocalIndex\nLocal index containing the schema for this entry.\nrequired\n\n\natmosphere_client\nAtmosphereClient\nAuthenticated AtmosphereClient.\nrequired\n\n\ndata_store\nAbstractDataStore | None\nOptional data store for copying data to new location. If None, the existing data_urls are used as-is.\nNone\n\n\nname\nstr | None\nOverride name for the atmosphere record. Defaults to local name.\nNone\n\n\ndescription\nstr | None\nOptional description for the dataset.\nNone\n\n\ntags\nlist[str] | None\nOptional tags for discovery.\nNone\n\n\nlicense\nstr | None\nOptional license identifier.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the created atmosphere dataset record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found in local index.\n\n\n\nValueError\nIf local entry has no data URLs.\n\n\n\n\n\n\n::\n>>> entry = local_index.get_dataset(\"mnist-train\")\n>>> uri = promote_to_atmosphere(entry, local_index, client)\n>>> print(uri)\nat://did:plc:abc123/ac.foundation.dataset.datasetIndex/..." 1854 + }, 1855 + { 1856 + "objectID": "api/promote_to_atmosphere.html#parameters", 1857 + "href": "api/promote_to_atmosphere.html#parameters", 1858 + "title": "promote_to_atmosphere", 1859 + "section": "", 1860 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\nlocal_entry\nLocalDatasetEntry\nThe LocalDatasetEntry to promote.\nrequired\n\n\nlocal_index\nLocalIndex\nLocal index containing the schema for this entry.\nrequired\n\n\natmosphere_client\nAtmosphereClient\nAuthenticated AtmosphereClient.\nrequired\n\n\ndata_store\nAbstractDataStore | None\nOptional data store for copying data to new location. If None, the existing data_urls are used as-is.\nNone\n\n\nname\nstr | None\nOverride name for the atmosphere record. Defaults to local name.\nNone\n\n\ndescription\nstr | None\nOptional description for the dataset.\nNone\n\n\ntags\nlist[str] | None\nOptional tags for discovery.\nNone\n\n\nlicense\nstr | None\nOptional license identifier.\nNone" 1861 + }, 1862 + { 1863 + "objectID": "api/promote_to_atmosphere.html#returns", 1864 + "href": "api/promote_to_atmosphere.html#returns", 1865 + "title": "promote_to_atmosphere", 1866 + "section": "", 1867 + "text": "Name\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the created atmosphere dataset record." 1868 + }, 1869 + { 1870 + "objectID": "api/promote_to_atmosphere.html#raises", 1871 + "href": "api/promote_to_atmosphere.html#raises", 1872 + "title": "promote_to_atmosphere", 1873 + "section": "", 1874 + "text": "Name\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found in local index.\n\n\n\nValueError\nIf local entry has no data URLs." 1875 + }, 1876 + { 1877 + "objectID": "api/promote_to_atmosphere.html#example", 1878 + "href": "api/promote_to_atmosphere.html#example", 1879 + "title": "promote_to_atmosphere", 1880 + "section": "", 1881 + "text": "::\n>>> entry = local_index.get_dataset(\"mnist-train\")\n>>> uri = promote_to_atmosphere(entry, local_index, client)\n>>> print(uri)\nat://did:plc:abc123/ac.foundation.dataset.datasetIndex/..." 1623 1882 }, 1624 1883 { 1625 1884 "objectID": "api/load_dataset.html", 1626 1885 "href": "api/load_dataset.html", 1627 1886 "title": "load_dataset", 1628 1887 "section": "", 1629 - "text": "load_dataset\nload_dataset(\n path,\n sample_type=None,\n *,\n split=None,\n data_files=None,\n streaming=False,\n index=None,\n)\nLoad a dataset from local files, remote URLs, or an index.\nThis function provides a HuggingFace Datasets-style interface for loading atdata typed datasets. It handles path resolution, split detection, and returns either a single Dataset or a DatasetDict depending on the split parameter.\nWhen no sample_type is provided, returns a Dataset[DictSample] that provides dynamic dict-like access to fields. Use .as_type(MyType) to convert to a typed schema.\nArgs: path: Path to dataset. Can be: - Index lookup: “@handle/dataset-name” or “@local/dataset-name” - WebDataset brace notation: “path/to/{train,test}-{000..099}.tar” - Local directory: “./data/” (scans for .tar files) - Glob pattern: “path/to/.tar” - Remote URL: ”s3://bucket/path/data-.tar” - Single file: “path/to/data.tar”\nsample_type: The PackableSample subclass defining the schema. If None,\n returns ``Dataset[DictSample]`` with dynamic field access. Can also\n be resolved from an index when using @handle/dataset syntax.\n\nsplit: Which split to load. If None, returns a DatasetDict with all\n detected splits. If specified (e.g., \"train\", \"test\"), returns\n a single Dataset for that split.\n\ndata_files: Optional explicit mapping of data files. Can be:\n - str: Single file pattern\n - list[str]: List of file patterns (assigned to \"train\")\n - dict[str, str | list[str]]: Explicit split -> files mapping\n\nstreaming: If True, explicitly marks the dataset for streaming mode.\n Note: atdata Datasets are already lazy/streaming via WebDataset\n pipelines, so this parameter primarily signals intent.\n\nindex: Optional AbstractIndex for dataset lookup. Required when using\n @handle/dataset syntax. When provided with an indexed path, the\n schema can be auto-resolved from the index.\nReturns: If split is None: DatasetDict with all detected splits. If split is specified: Dataset for that split. Type is ST if sample_type provided, otherwise DictSample.\nRaises: ValueError: If the specified split is not found. FileNotFoundError: If no data files are found at the path. KeyError: If dataset not found in index.\nExample: >>> # Load without type - get DictSample for exploration >>> ds = load_dataset(“./data/train.tar”, split=“train”) >>> for sample in ds.ordered(): … print(sample.keys()) # Explore fields … print(sample[“text”]) # Dict-style access … print(sample.label) # Attribute access >>> >>> # Convert to typed schema >>> typed_ds = ds.as_type(TextData) >>> >>> # Or load with explicit type directly >>> train_ds = load_dataset(“./data/train-*.tar”, TextData, split=“train”) >>> >>> # Load from index with auto-type resolution >>> index = LocalIndex() >>> ds = load_dataset(“@local/my-dataset”, index=index, split=“train”)" 1888 + "text": "load_dataset(\n path,\n sample_type=None,\n *,\n split=None,\n data_files=None,\n streaming=False,\n index=None,\n)\nLoad a dataset from local files, remote URLs, or an index.\nThis function provides a HuggingFace Datasets-style interface for loading atdata typed datasets. It handles path resolution, split detection, and returns either a single Dataset or a DatasetDict depending on the split parameter.\nWhen no sample_type is provided, returns a Dataset[DictSample] that provides dynamic dict-like access to fields. Use .as_type(MyType) to convert to a typed schema.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\npath\nstr\nPath to dataset. Can be: - Index lookup: “@handle/dataset-name” or “@local/dataset-name” - WebDataset brace notation: “path/to/{train,test}-{000..099}.tar” - Local directory: “./data/” (scans for .tar files) - Glob pattern: “path/to/.tar” - Remote URL: ”s3://bucket/path/data-.tar” - Single file: “path/to/data.tar”\nrequired\n\n\nsample_type\nType[ST] | None\nThe PackableSample subclass defining the schema. If None, returns Dataset[DictSample] with dynamic field access. Can also be resolved from an index when using @handle/dataset syntax.\nNone\n\n\nsplit\nstr | None\nWhich split to load. If None, returns a DatasetDict with all detected splits. If specified (e.g., “train”, “test”), returns a single Dataset for that split.\nNone\n\n\ndata_files\nstr | list[str] | dict[str, str | list[str]] | None\nOptional explicit mapping of data files. Can be: - str: Single file pattern - list[str]: List of file patterns (assigned to “train”) - dict[str, str | list[str]]: Explicit split -> files mapping\nNone\n\n\nstreaming\nbool\nIf True, explicitly marks the dataset for streaming mode. Note: atdata Datasets are already lazy/streaming via WebDataset pipelines, so this parameter primarily signals intent.\nFalse\n\n\nindex\nOptional['AbstractIndex']\nOptional AbstractIndex for dataset lookup. Required when using @handle/dataset syntax. When provided with an indexed path, the schema can be auto-resolved from the index.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nDataset[ST] | DatasetDict[ST]\nIf split is None: DatasetDict with all detected splits.\n\n\n\nDataset[ST] | DatasetDict[ST]\nIf split is specified: Dataset for that split.\n\n\n\nDataset[ST] | DatasetDict[ST]\nType is ST if sample_type provided, otherwise DictSample.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the specified split is not found.\n\n\n\nFileNotFoundError\nIf no data files are found at the path.\n\n\n\nKeyError\nIf dataset not found in index.\n\n\n\n\n\n\n::\n>>> # Load without type - get DictSample for exploration\n>>> ds = load_dataset(\"./data/train.tar\", split=\"train\")\n>>> for sample in ds.ordered():\n... print(sample.keys()) # Explore fields\n... print(sample[\"text\"]) # Dict-style access\n... print(sample.label) # Attribute access\n>>>\n>>> # Convert to typed schema\n>>> typed_ds = ds.as_type(TextData)\n>>>\n>>> # Or load with explicit type directly\n>>> train_ds = load_dataset(\"./data/train-*.tar\", TextData, split=\"train\")\n>>>\n>>> # Load from index with auto-type resolution\n>>> index = LocalIndex()\n>>> ds = load_dataset(\"@local/my-dataset\", index=index, split=\"train\")" 1889 + }, 1890 + { 1891 + "objectID": "api/load_dataset.html#parameters", 1892 + "href": "api/load_dataset.html#parameters", 1893 + "title": "load_dataset", 1894 + "section": "", 1895 + "text": "Name\nType\nDescription\nDefault\n\n\n\n\npath\nstr\nPath to dataset. Can be: - Index lookup: “@handle/dataset-name” or “@local/dataset-name” - WebDataset brace notation: “path/to/{train,test}-{000..099}.tar” - Local directory: “./data/” (scans for .tar files) - Glob pattern: “path/to/.tar” - Remote URL: ”s3://bucket/path/data-.tar” - Single file: “path/to/data.tar”\nrequired\n\n\nsample_type\nType[ST] | None\nThe PackableSample subclass defining the schema. If None, returns Dataset[DictSample] with dynamic field access. Can also be resolved from an index when using @handle/dataset syntax.\nNone\n\n\nsplit\nstr | None\nWhich split to load. If None, returns a DatasetDict with all detected splits. If specified (e.g., “train”, “test”), returns a single Dataset for that split.\nNone\n\n\ndata_files\nstr | list[str] | dict[str, str | list[str]] | None\nOptional explicit mapping of data files. Can be: - str: Single file pattern - list[str]: List of file patterns (assigned to “train”) - dict[str, str | list[str]]: Explicit split -> files mapping\nNone\n\n\nstreaming\nbool\nIf True, explicitly marks the dataset for streaming mode. Note: atdata Datasets are already lazy/streaming via WebDataset pipelines, so this parameter primarily signals intent.\nFalse\n\n\nindex\nOptional['AbstractIndex']\nOptional AbstractIndex for dataset lookup. Required when using @handle/dataset syntax. When provided with an indexed path, the schema can be auto-resolved from the index.\nNone" 1896 + }, 1897 + { 1898 + "objectID": "api/load_dataset.html#returns", 1899 + "href": "api/load_dataset.html#returns", 1900 + "title": "load_dataset", 1901 + "section": "", 1902 + "text": "Name\nType\nDescription\n\n\n\n\n\nDataset[ST] | DatasetDict[ST]\nIf split is None: DatasetDict with all detected splits.\n\n\n\nDataset[ST] | DatasetDict[ST]\nIf split is specified: Dataset for that split.\n\n\n\nDataset[ST] | DatasetDict[ST]\nType is ST if sample_type provided, otherwise DictSample." 1903 + }, 1904 + { 1905 + "objectID": "api/load_dataset.html#raises", 1906 + "href": "api/load_dataset.html#raises", 1907 + "title": "load_dataset", 1908 + "section": "", 1909 + "text": "Name\nType\nDescription\n\n\n\n\n\nValueError\nIf the specified split is not found.\n\n\n\nFileNotFoundError\nIf no data files are found at the path.\n\n\n\nKeyError\nIf dataset not found in index." 1910 + }, 1911 + { 1912 + "objectID": "api/load_dataset.html#example", 1913 + "href": "api/load_dataset.html#example", 1914 + "title": "load_dataset", 1915 + "section": "", 1916 + "text": "::\n>>> # Load without type - get DictSample for exploration\n>>> ds = load_dataset(\"./data/train.tar\", split=\"train\")\n>>> for sample in ds.ordered():\n... print(sample.keys()) # Explore fields\n... print(sample[\"text\"]) # Dict-style access\n... print(sample.label) # Attribute access\n>>>\n>>> # Convert to typed schema\n>>> typed_ds = ds.as_type(TextData)\n>>>\n>>> # Or load with explicit type directly\n>>> train_ds = load_dataset(\"./data/train-*.tar\", TextData, split=\"train\")\n>>>\n>>> # Load from index with auto-type resolution\n>>> index = LocalIndex()\n>>> ds = load_dataset(\"@local/my-dataset\", index=index, split=\"train\")" 1630 1917 }, 1631 1918 { 1632 1919 "objectID": "api/PackableSample.html", 1633 1920 "href": "api/PackableSample.html", 1634 1921 "title": "PackableSample", 1635 1922 "section": "", 1636 - "text": "PackableSample()\nBase class for samples that can be serialized with msgpack.\nThis abstract base class provides automatic serialization/deserialization for dataclass-based samples. Fields annotated as NDArray or NDArray | None are automatically converted between numpy arrays and bytes during packing/unpacking.\nSubclasses should be defined either by: 1. Direct inheritance with the @dataclass decorator 2. Using the @packable decorator (recommended)\nExample: >>> @packable … class MyData: … name: str … embeddings: NDArray … >>> sample = MyData(name=“test”, embeddings=np.array([1.0, 2.0])) >>> packed = sample.packed # Serialize to bytes >>> restored = MyData.from_bytes(packed) # Deserialize\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\nArgs: bs: Raw bytes from a msgpack-serialized sample.\nReturns: A new instance of this sample class deserialized from the bytes.\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\nArgs: data: Dictionary with keys matching the sample’s field names.\nReturns: New instance with NDArray fields auto-converted from bytes." 1923 + "text": "PackableSample()\nBase class for samples that can be serialized with msgpack.\nThis abstract base class provides automatic serialization/deserialization for dataclass-based samples. Fields annotated as NDArray or NDArray | None are automatically converted between numpy arrays and bytes during packing/unpacking.\nSubclasses should be defined either by: 1. Direct inheritance with the @dataclass decorator 2. Using the @packable decorator (recommended)\n\n\n::\n>>> @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n>>> sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n>>> packed = sample.packed # Serialize to bytes\n>>> restored = MyData.from_bytes(packed) # Deserialize\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 1924 + }, 1925 + { 1926 + "objectID": "api/PackableSample.html#example", 1927 + "href": "api/PackableSample.html#example", 1928 + "title": "PackableSample", 1929 + "section": "", 1930 + "text": "::\n>>> @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n>>> sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n>>> packed = sample.packed # Serialize to bytes\n>>> restored = MyData.from_bytes(packed) # Deserialize" 1637 1931 }, 1638 1932 { 1639 1933 "objectID": "api/PackableSample.html#attributes", ··· 1647 1941 "href": "api/PackableSample.html#methods", 1648 1942 "title": "PackableSample", 1649 1943 "section": "", 1650 - "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\nArgs: bs: Raw bytes from a msgpack-serialized sample.\nReturns: A new instance of this sample class deserialized from the bytes.\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\nArgs: data: Dictionary with keys matching the sample’s field names.\nReturns: New instance with NDArray fields auto-converted from bytes." 1944 + "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 1651 1945 }, 1652 1946 { 1653 1947 "objectID": "api/SchemaLoader.html", 1654 1948 "href": "api/SchemaLoader.html", 1655 1949 "title": "SchemaLoader", 1656 1950 "section": "", 1657 - "text": "atmosphere.SchemaLoader(client)\nLoads PackableSample schemas from ATProto.\nThis class fetches schema records from ATProto and can list available schemas from a repository.\nExample: >>> client = AtmosphereClient() >>> client.login(“handle”, “password”) >>> >>> loader = SchemaLoader(client) >>> schema = loader.get(“at://did:plc:…/ac.foundation.dataset.sampleSchema/…”) >>> print(schema[“name”]) ‘MySample’\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nFetch a schema record by AT URI.\n\n\nlist_all\nList schema records from a repository.\n\n\n\n\n\natmosphere.SchemaLoader.get(uri)\nFetch a schema record by AT URI.\nArgs: uri: The AT URI of the schema record.\nReturns: The schema record as a dictionary.\nRaises: ValueError: If the record is not a schema record. atproto.exceptions.AtProtocolError: If record not found.\n\n\n\natmosphere.SchemaLoader.list_all(repo=None, limit=100)\nList schema records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of schema records." 1951 + "text": "atmosphere.SchemaLoader(client)\nLoads PackableSample schemas from ATProto.\nThis class fetches schema records from ATProto and can list available schemas from a repository.\n\n\n::\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> loader = SchemaLoader(client)\n>>> schema = loader.get(\"at://did:plc:.../ac.foundation.dataset.sampleSchema/...\")\n>>> print(schema[\"name\"])\n'MySample'\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nget\nFetch a schema record by AT URI.\n\n\nlist_all\nList schema records from a repository.\n\n\n\n\n\natmosphere.SchemaLoader.get(uri)\nFetch a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe schema record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a schema record.\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.SchemaLoader.list_all(repo=None, limit=100)\nList schema records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records." 1952 + }, 1953 + { 1954 + "objectID": "api/SchemaLoader.html#example", 1955 + "href": "api/SchemaLoader.html#example", 1956 + "title": "SchemaLoader", 1957 + "section": "", 1958 + "text": "::\n>>> client = AtmosphereClient()\n>>> client.login(\"handle\", \"password\")\n>>>\n>>> loader = SchemaLoader(client)\n>>> schema = loader.get(\"at://did:plc:.../ac.foundation.dataset.sampleSchema/...\")\n>>> print(schema[\"name\"])\n'MySample'" 1658 1959 }, 1659 1960 { 1660 1961 "objectID": "api/SchemaLoader.html#methods", 1661 1962 "href": "api/SchemaLoader.html#methods", 1662 1963 "title": "SchemaLoader", 1663 1964 "section": "", 1664 - "text": "Name\nDescription\n\n\n\n\nget\nFetch a schema record by AT URI.\n\n\nlist_all\nList schema records from a repository.\n\n\n\n\n\natmosphere.SchemaLoader.get(uri)\nFetch a schema record by AT URI.\nArgs: uri: The AT URI of the schema record.\nReturns: The schema record as a dictionary.\nRaises: ValueError: If the record is not a schema record. atproto.exceptions.AtProtocolError: If record not found.\n\n\n\natmosphere.SchemaLoader.list_all(repo=None, limit=100)\nList schema records from a repository.\nArgs: repo: The DID of the repository. Defaults to authenticated user. limit: Maximum number of records to return.\nReturns: List of schema records." 1965 + "text": "Name\nDescription\n\n\n\n\nget\nFetch a schema record by AT URI.\n\n\nlist_all\nList schema records from a repository.\n\n\n\n\n\natmosphere.SchemaLoader.get(uri)\nFetch a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe schema record as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf the record is not a schema record.\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.SchemaLoader.list_all(repo=None, limit=100)\nList schema records from a repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID of the repository. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number of records to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records." 1665 1966 }, 1666 1967 { 1667 1968 "objectID": "tutorials/atmosphere.html",

+30 -30

docs/sitemap.xml

··· 34 34 </url> 35 35 <url> 36 36 <loc>https://github.com/your-org/atdata/api/DatasetDict.html</loc> 37 - <lastmod>2026-01-23T03:31:26.249Z</lastmod> 37 + <lastmod>2026-01-24T19:19:45.336Z</lastmod> 38 38 </url> 39 39 <url> 40 40 <loc>https://github.com/your-org/atdata/api/AtmosphereClient.html</loc> 41 - <lastmod>2026-01-23T03:31:26.313Z</lastmod> 41 + <lastmod>2026-01-23T23:20:15.723Z</lastmod> 42 42 </url> 43 43 <url> 44 44 <loc>https://github.com/your-org/atdata/api/DictSample.html</loc> 45 - <lastmod>2026-01-23T03:31:26.226Z</lastmod> 45 + <lastmod>2026-01-23T23:20:15.573Z</lastmod> 46 46 </url> 47 47 <url> 48 48 <loc>https://github.com/your-org/atdata/api/LensLoader.html</loc> 49 - <lastmod>2026-01-23T03:31:26.348Z</lastmod> 49 + <lastmod>2026-01-23T23:20:15.788Z</lastmod> 50 50 </url> 51 51 <url> 52 52 <loc>https://github.com/your-org/atdata/api/AtmosphereIndex.html</loc> 53 - <lastmod>2026-01-23T03:31:26.320Z</lastmod> 53 + <lastmod>2026-01-23T23:20:15.736Z</lastmod> 54 54 </url> 55 55 <url> 56 56 <loc>https://github.com/your-org/atdata/api/DataSource.html</loc> 57 - <lastmod>2026-01-23T22:28:15.705Z</lastmod> 57 + <lastmod>2026-01-23T23:20:15.642Z</lastmod> 58 58 </url> 59 59 <url> 60 60 <loc>https://github.com/your-org/atdata/api/DatasetLoader.html</loc> 61 - <lastmod>2026-01-23T03:31:26.340Z</lastmod> 61 + <lastmod>2026-01-23T23:20:15.773Z</lastmod> 62 62 </url> 63 63 <url> 64 64 <loc>https://github.com/your-org/atdata/api/Lens.html</loc> 65 - <lastmod>2026-01-23T22:28:15.684Z</lastmod> 65 + <lastmod>2026-01-24T19:29:16.065Z</lastmod> 66 66 </url> 67 67 <url> 68 68 <loc>https://github.com/your-org/atdata/api/local.Index.html</loc> 69 - <lastmod>2026-01-23T03:31:26.290Z</lastmod> 69 + <lastmod>2026-01-23T23:20:15.683Z</lastmod> 70 70 </url> 71 71 <url> 72 72 <loc>https://github.com/your-org/atdata/api/Dataset.html</loc> 73 - <lastmod>2026-01-23T03:31:26.234Z</lastmod> 73 + <lastmod>2026-01-23T23:20:15.588Z</lastmod> 74 74 </url> 75 75 <url> 76 76 <loc>https://github.com/your-org/atdata/api/AbstractDataStore.html</loc> 77 - <lastmod>2026-01-23T22:28:15.703Z</lastmod> 77 + <lastmod>2026-01-23T23:20:15.638Z</lastmod> 78 78 </url> 79 79 <url> 80 80 <loc>https://github.com/your-org/atdata/api/local.S3DataStore.html</loc> 81 - <lastmod>2026-01-23T03:31:26.298Z</lastmod> 81 + <lastmod>2026-01-23T23:03:53.869Z</lastmod> 82 82 </url> 83 83 <url> 84 84 <loc>https://github.com/your-org/atdata/api/AtUri.html</loc> 85 - <lastmod>2026-01-23T03:31:26.350Z</lastmod> 85 + <lastmod>2026-01-23T23:20:15.791Z</lastmod> 86 86 </url> 87 87 <url> 88 88 <loc>https://github.com/your-org/atdata/api/Packable-protocol.html</loc> 89 - <lastmod>2026-01-23T22:28:15.690Z</lastmod> 89 + <lastmod>2026-01-23T23:20:15.617Z</lastmod> 90 90 </url> 91 91 <url> 92 92 <loc>https://github.com/your-org/atdata/api/packable.html</loc> 93 - <lastmod>2026-01-23T22:28:15.652Z</lastmod> 93 + <lastmod>2026-01-23T23:21:24.522Z</lastmod> 94 94 </url> 95 95 <url> 96 96 <loc>https://github.com/your-org/atdata/index.html</loc> ··· 98 98 </url> 99 99 <url> 100 100 <loc>https://github.com/your-org/atdata/api/SampleBatch.html</loc> 101 - <lastmod>2026-01-23T03:31:26.235Z</lastmod> 101 + <lastmod>2026-01-23T23:20:15.589Z</lastmod> 102 102 </url> 103 103 <url> 104 104 <loc>https://github.com/your-org/atdata/api/LensPublisher.html</loc> 105 - <lastmod>2026-01-23T03:31:26.344Z</lastmod> 105 + <lastmod>2026-01-23T23:20:15.781Z</lastmod> 106 106 </url> 107 107 <url> 108 108 <loc>https://github.com/your-org/atdata/api/AtmosphereIndexEntry.html</loc> 109 - <lastmod>2026-01-23T03:31:26.322Z</lastmod> 109 + <lastmod>2026-01-23T23:03:53.910Z</lastmod> 110 110 </url> 111 111 <url> 112 112 <loc>https://github.com/your-org/atdata/api/AbstractIndex.html</loc> 113 - <lastmod>2026-01-23T22:28:15.699Z</lastmod> 113 + <lastmod>2026-01-23T23:20:15.632Z</lastmod> 114 114 </url> 115 115 <url> 116 116 <loc>https://github.com/your-org/atdata/api/local.LocalDatasetEntry.html</loc> 117 - <lastmod>2026-01-23T03:31:26.295Z</lastmod> 117 + <lastmod>2026-01-23T23:03:53.862Z</lastmod> 118 118 </url> 119 119 <url> 120 120 <loc>https://github.com/your-org/atdata/api/S3Source.html</loc> 121 - <lastmod>2026-01-23T03:40:16.524Z</lastmod> 121 + <lastmod>2026-01-24T19:19:45.376Z</lastmod> 122 122 </url> 123 123 <url> 124 124 <loc>https://github.com/your-org/atdata/api/IndexEntry.html</loc> 125 - <lastmod>2026-01-23T22:28:15.692Z</lastmod> 125 + <lastmod>2026-01-23T23:03:53.795Z</lastmod> 126 126 </url> 127 127 <url> 128 128 <loc>https://github.com/your-org/atdata/api/index.html</loc> 129 - <lastmod>2026-01-23T22:28:15.643Z</lastmod> 129 + <lastmod>2026-01-24T19:29:16.007Z</lastmod> 130 130 </url> 131 131 <url> 132 132 <loc>https://github.com/your-org/atdata/api/URLSource.html</loc> 133 - <lastmod>2026-01-23T03:40:16.518Z</lastmod> 133 + <lastmod>2026-01-24T19:19:45.367Z</lastmod> 134 134 </url> 135 135 <url> 136 136 <loc>https://github.com/your-org/atdata/api/DatasetPublisher.html</loc> 137 - <lastmod>2026-01-23T03:31:26.332Z</lastmod> 137 + <lastmod>2026-01-23T23:20:15.757Z</lastmod> 138 138 </url> 139 139 <url> 140 140 <loc>https://github.com/your-org/atdata/api/SchemaPublisher.html</loc> 141 - <lastmod>2026-01-23T03:31:26.324Z</lastmod> 141 + <lastmod>2026-01-23T23:20:15.742Z</lastmod> 142 142 </url> 143 143 <url> 144 144 <loc>https://github.com/your-org/atdata/api/promote_to_atmosphere.html</loc> 145 - <lastmod>2026-01-23T03:31:26.352Z</lastmod> 145 + <lastmod>2026-01-24T19:19:45.514Z</lastmod> 146 146 </url> 147 147 <url> 148 148 <loc>https://github.com/your-org/atdata/api/load_dataset.html</loc> 149 - <lastmod>2026-01-23T03:31:26.247Z</lastmod> 149 + <lastmod>2026-01-24T19:19:45.334Z</lastmod> 150 150 </url> 151 151 <url> 152 152 <loc>https://github.com/your-org/atdata/api/PackableSample.html</loc> 153 - <lastmod>2026-01-23T03:31:26.219Z</lastmod> 153 + <lastmod>2026-01-23T23:20:15.564Z</lastmod> 154 154 </url> 155 155 <url> 156 156 <loc>https://github.com/your-org/atdata/api/SchemaLoader.html</loc> 157 - <lastmod>2026-01-23T03:31:26.327Z</lastmod> 157 + <lastmod>2026-01-23T23:20:15.746Z</lastmod> 158 158 </url> 159 159 <url> 160 160 <loc>https://github.com/your-org/atdata/tutorials/atmosphere.html</loc>

+12 -12

docs/tutorials/atmosphere.html

··· 603 603 </section> 604 604 <section id="setup" class="level2"> 605 605 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 606 - <div id="42091cc7" class="cell"> 606 + <div id="38b7697f" class="cell"> 607 607 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 608 608 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 609 609 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 620 620 </section> 621 621 <section id="define-sample-types" class="level2"> 622 622 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 623 - <div id="ff718eaf" class="cell"> 623 + <div id="96ff9fa1" class="cell"> 624 624 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 625 625 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ImageSample:</span> 626 626 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing image data with metadata."""</span></span> ··· 639 639 <section id="type-introspection" class="level2"> 640 640 <h2 class="anchored" data-anchor-id="type-introspection">Type Introspection</h2> 641 641 <p>See what information is available from a PackableSample type:</p> 642 - <div id="5cf4ce28" class="cell"> 642 + <div id="bcc4ef1b" class="cell"> 643 643 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> fields, is_dataclass</span> 644 644 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 645 645 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Sample type: </span><span class="sc">{</span>ImageSample<span class="sc">.</span><span class="va">__name__</span><span class="sc">}</span><span class="ss">"</span>)</span> ··· 667 667 <section id="at-uri-parsing" class="level2"> 668 668 <h2 class="anchored" data-anchor-id="at-uri-parsing">AT URI Parsing</h2> 669 669 <p>ATProto records are identified by AT URIs:</p> 670 - <div id="3d758573" class="cell"> 670 + <div id="0586bdec" class="cell"> 671 671 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>uris <span class="op">=</span> [</span> 672 672 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz789"</span>,</span> 673 673 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://alice.bsky.social/ac.foundation.dataset.record/my-dataset"</span>,</span> ··· 684 684 <section id="authentication" class="level2"> 685 685 <h2 class="anchored" data-anchor-id="authentication">Authentication</h2> 686 686 <p>Connect to ATProto:</p> 687 - <div id="f298476b" class="cell"> 687 + <div id="e5f8657b" class="cell"> 688 688 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 689 689 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"your.handle.social"</span>, <span class="st">"your-app-password"</span>)</span> 690 690 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 694 694 </section> 695 695 <section id="publish-a-schema" class="level2"> 696 696 <h2 class="anchored" data-anchor-id="publish-a-schema">Publish a Schema</h2> 697 - <div id="6bd6e999" class="cell"> 697 + <div id="e4252ae0" class="cell"> 698 698 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>schema_publisher <span class="op">=</span> SchemaPublisher(client)</span> 699 699 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> schema_publisher.publish(</span> 700 700 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 707 707 </section> 708 708 <section id="list-your-schemas" class="level2"> 709 709 <h2 class="anchored" data-anchor-id="list-your-schemas">List Your Schemas</h2> 710 - <div id="8dd66d61" class="cell"> 710 + <div id="d67ee978" class="cell"> 711 711 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>schema_loader <span class="op">=</span> SchemaLoader(client)</span> 712 712 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>schemas <span class="op">=</span> schema_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 713 713 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(schemas)<span class="sc">}</span><span class="ss"> schema(s)"</span>)</span> ··· 720 720 <h2 class="anchored" data-anchor-id="publish-a-dataset">Publish a Dataset</h2> 721 721 <section id="with-external-urls" class="level3"> 722 722 <h3 class="anchored" data-anchor-id="with-external-urls">With External URLs</h3> 723 - <div id="01e06939" class="cell"> 723 + <div id="10827a3b" class="cell"> 724 724 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset_publisher <span class="op">=</span> DatasetPublisher(client)</span> 725 725 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>dataset_uri <span class="op">=</span> dataset_publisher.publish_with_urls(</span> 726 726 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> urls<span class="op">=</span>[<span class="st">"s3://example-bucket/demo-data-{000000..000009}.tar"</span>],</span> ··· 736 736 <section id="with-blob-storage" class="level3"> 737 737 <h3 class="anchored" data-anchor-id="with-blob-storage">With Blob Storage</h3> 738 738 <p>For smaller datasets, store data directly in ATProto blobs:</p> 739 - <div id="7f52243b" class="cell"> 739 + <div id="02cdc5b0" class="cell"> 740 740 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 741 741 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 742 742 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> ··· 777 777 </section> 778 778 <section id="list-and-load-datasets" class="level2"> 779 779 <h2 class="anchored" data-anchor-id="list-and-load-datasets">List and Load Datasets</h2> 780 - <div id="a1ca081c" class="cell"> 780 + <div id="13e2d797" class="cell"> 781 781 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>dataset_loader <span class="op">=</span> DatasetLoader(client)</span> 782 782 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>datasets <span class="op">=</span> dataset_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 783 783 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(datasets)<span class="sc">}</span><span class="ss"> dataset(s)"</span>)</span> ··· 792 792 </section> 793 793 <section id="load-a-dataset" class="level2"> 794 794 <h2 class="anchored" data-anchor-id="load-a-dataset">Load a Dataset</h2> 795 - <div id="68c36a8c" class="cell"> 795 + <div id="3fd0dcc4" class="cell"> 796 796 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 797 797 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> dataset_loader.get_storage_type(<span class="bu">str</span>(blob_dataset_uri))</span> 798 798 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Storage type: </span><span class="sc">{</span>storage_type<span class="sc">}</span><span class="ss">"</span>)</span> ··· 809 809 </section> 810 810 <section id="complete-publishing-workflow" class="level2"> 811 811 <h2 class="anchored" data-anchor-id="complete-publishing-workflow">Complete Publishing Workflow</h2> 812 - <div id="a7412936" class="cell"> 812 + <div id="86e1598b" class="cell"> 813 813 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 814 814 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 815 815 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span>

+8 -8

docs/tutorials/local-workflow.html

··· 599 599 </section> 600 600 <section id="setup" class="level2"> 601 601 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 602 - <div id="26c13c51" class="cell"> 602 + <div id="bf308a3b" class="cell"> 603 603 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 604 604 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 605 605 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 609 609 </section> 610 610 <section id="define-sample-types" class="level2"> 611 611 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 612 - <div id="ff37eb62" class="cell"> 612 + <div id="b92ec5f1" class="cell"> 613 613 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 614 614 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> TrainingSample:</span> 615 615 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing features and label for training."""</span></span> ··· 626 626 <section id="localdatasetentry" class="level2"> 627 627 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 628 628 <p>Create entries with content-addressable CIDs:</p> 629 - <div id="556be281" class="cell"> 629 + <div id="f75b391d" class="cell"> 630 630 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create an entry manually</span></span> 631 631 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> LocalDatasetEntry(</span> 632 632 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> _name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 658 658 <section id="localindex" class="level2"> 659 659 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 660 660 <p>The index tracks datasets in Redis:</p> 661 - <div id="236e1cae" class="cell"> 661 + <div id="162ca9e1" class="cell"> 662 662 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> redis <span class="im">import</span> Redis</span> 663 663 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 664 664 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to Redis</span></span> ··· 669 669 </div> 670 670 <section id="schema-management" class="level3"> 671 671 <h3 class="anchored" data-anchor-id="schema-management">Schema Management</h3> 672 - <div id="09cfff1f" class="cell"> 672 + <div id="9c358257" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 674 674 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(TrainingSample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 675 675 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published schema: </span><span class="sc">{</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> ··· 691 691 <section id="s3datastore" class="level2"> 692 692 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 693 693 <p>For direct S3 operations:</p> 694 - <div id="335e8653" class="cell"> 694 + <div id="9b5e4a31" class="cell"> 695 695 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>creds <span class="op">=</span> {</span> 696 696 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ENDPOINT"</span>: <span class="st">"http://localhost:9000"</span>,</span> 697 697 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"minioadmin"</span>,</span> ··· 707 707 <section id="complete-index-workflow" class="level2"> 708 708 <h2 class="anchored" data-anchor-id="complete-index-workflow">Complete Index Workflow</h2> 709 709 <p>Use <code>LocalIndex</code> with <code>S3DataStore</code> to store datasets with S3 storage and Redis indexing:</p> 710 - <div id="db2d9cf4" class="cell"> 710 + <div id="37996513" class="cell"> 711 711 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Create sample data</span></span> 712 712 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 713 713 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> TrainingSample(</span> ··· 756 756 <section id="using-load_dataset-with-index" class="level2"> 757 757 <h2 class="anchored" data-anchor-id="using-load_dataset-with-index">Using load_dataset with Index</h2> 758 758 <p>The <code>load_dataset()</code> function supports index lookup:</p> 759 - <div id="c15c4f5e" class="cell"> 759 + <div id="633bb1d8" class="cell"> 760 760 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 761 761 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 762 762 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local index</span></span>

+11 -11

docs/tutorials/promotion.html

··· 593 593 </section> 594 594 <section id="setup" class="level2"> 595 595 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 596 - <div id="c53852d7" class="cell"> 596 + <div id="103f946d" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 598 598 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 599 599 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 606 606 <section id="prepare-a-local-dataset" class="level2"> 607 607 <h2 class="anchored" data-anchor-id="prepare-a-local-dataset">Prepare a Local Dataset</h2> 608 608 <p>First, set up a dataset in local storage:</p> 609 - <div id="4c1ed06c" class="cell"> 609 + <div id="1d38a39c" class="cell"> 610 610 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define sample type</span></span> 611 611 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 612 612 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ExperimentSample:</span> ··· 656 656 <section id="basic-promotion" class="level2"> 657 657 <h2 class="anchored" data-anchor-id="basic-promotion">Basic Promotion</h2> 658 658 <p>Promote the dataset to ATProto:</p> 659 - <div id="0aaa5713" class="cell"> 659 + <div id="e42328f0" class="cell"> 660 660 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to atmosphere</span></span> 661 661 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 662 662 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> ··· 669 669 <section id="promotion-with-metadata" class="level2"> 670 670 <h2 class="anchored" data-anchor-id="promotion-with-metadata">Promotion with Metadata</h2> 671 671 <p>Add description, tags, and license:</p> 672 - <div id="2722dded" class="cell"> 672 + <div id="7d374aac" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 674 674 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> local_entry,</span> 675 675 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 685 685 <section id="schema-deduplication" class="level2"> 686 686 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 687 687 <p>The promotion workflow automatically checks for existing schemas:</p> 688 - <div id="f726c241" class="cell"> 688 + <div id="128c6d02" class="cell"> 689 689 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> _find_existing_schema</span> 690 690 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 691 691 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if schema already exists</span></span> ··· 697 697 <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="st">"No existing schema found, will publish new one"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 698 698 </div> 699 699 <p>When you promote multiple datasets with the same sample type:</p> 700 - <div id="6c840cfa" class="cell"> 700 + <div id="55249656" class="cell"> 701 701 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 702 702 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 703 703 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 712 712 <div class="tab-content"> 713 713 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 714 714 <p>By default, promotion keeps the original data URLs:</p> 715 - <div id="19fb0aeb" class="cell"> 715 + <div id="341b0658" class="cell"> 716 716 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 717 717 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 718 718 </div> ··· 725 725 </div> 726 726 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 727 727 <p>To copy data to a different storage location:</p> 728 - <div id="a0c75fc7" class="cell"> 728 + <div id="7078a3dd" class="cell"> 729 729 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 730 730 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 731 731 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 755 755 <section id="verify-on-atmosphere" class="level2"> 756 756 <h2 class="anchored" data-anchor-id="verify-on-atmosphere">Verify on Atmosphere</h2> 757 757 <p>After promotion, verify the dataset is accessible:</p> 758 - <div id="55e29732" class="cell"> 758 + <div id="891b451d" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereIndex</span> 760 760 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 761 761 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>atm_index <span class="op">=</span> AtmosphereIndex(client)</span> ··· 776 776 </section> 777 777 <section id="error-handling" class="level2"> 778 778 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 779 - <div id="fb917dd1" class="cell"> 779 + <div id="31b5f22a" class="cell"> 780 780 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 781 781 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span> 782 782 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span> ··· 800 800 </section> 801 801 <section id="complete-workflow" class="level2"> 802 802 <h2 class="anchored" data-anchor-id="complete-workflow">Complete Workflow</h2> 803 - <div id="9910ed97" class="cell"> 803 + <div id="8e76bf26" class="cell"> 804 804 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Complete local-to-atmosphere workflow</span></span> 805 805 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 806 806 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span>

+6 -6

docs/tutorials/quickstart.html

··· 582 582 <section id="define-a-sample-type" class="level2"> 583 583 <h2 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h2> 584 584 <p>Use the <code>@packable</code> decorator to create a typed sample:</p> 585 - <div id="63c15979" class="cell"> 585 + <div id="2184c2cd" class="cell"> 586 586 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 587 587 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 588 588 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 603 603 </section> 604 604 <section id="create-sample-instances" class="level2"> 605 605 <h2 class="anchored" data-anchor-id="create-sample-instances">Create Sample Instances</h2> 606 - <div id="bf1cb844" class="cell"> 606 + <div id="edf75cb3" class="cell"> 607 607 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a single sample</span></span> 608 608 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 609 609 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> ··· 624 624 <section id="write-a-dataset" class="level2"> 625 625 <h2 class="anchored" data-anchor-id="write-a-dataset">Write a Dataset</h2> 626 626 <p>Use WebDataset’s <code>TarWriter</code> to create dataset files:</p> 627 - <div id="d2143ab5" class="cell"> 627 + <div id="2cf451ef" class="cell"> 628 628 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 629 629 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 630 630 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create 100 samples</span></span> ··· 648 648 <section id="load-and-iterate" class="level2"> 649 649 <h2 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h2> 650 650 <p>Create a typed <code>Dataset</code> and iterate with batching:</p> 651 - <div id="49ef5046" class="cell"> 651 + <div id="f95261a1" class="cell"> 652 652 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load dataset with type</span></span> 653 653 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"my-dataset-000000.tar"</span>)</span> 654 654 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 669 669 <section id="shuffled-iteration" class="level2"> 670 670 <h2 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h2> 671 671 <p>For training, use shuffled iteration:</p> 672 - <div id="3eaaf99d" class="cell"> 672 + <div id="8b4f6c9a" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 674 674 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled at shard and sample level</span></span> 675 675 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image</span> ··· 683 683 <section id="use-lenses-for-type-transformations" class="level2"> 684 684 <h2 class="anchored" data-anchor-id="use-lenses-for-type-transformations">Use Lenses for Type Transformations</h2> 685 685 <p>View datasets through different schemas:</p> 686 - <div id="e660fe92" class="cell"> 686 + <div id="c2cdb8c9" class="cell"> 687 687 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Define a simplified view type</span></span> 688 688 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 689 689 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span>

+1

docs_src/_quarto.yml

··· 7 7 dir: api 8 8 title: "API Reference" 9 9 style: pkgdown 10 + parser: google 10 11 render_interlinks: true 11 12 12 13 sections:

+33 -15

docs_src/api/AbstractDataStore.qmd

··· 14 14 flexible deployment: local index with S3 storage, atmosphere index with 15 15 S3 storage, or atmosphere index with PDS blobs. 16 16 17 - Example: 17 + ## Example {.doc-section .doc-section-example} 18 + 19 + :: 20 + 18 21 >>> store = S3DataStore(credentials, bucket="my-bucket") 19 22 >>> urls = store.write_shards(dataset, prefix="training/v1") 20 23 >>> print(urls) ··· 40 43 or resolving blob references). This method returns a URL that can be 41 44 used directly with WebDataset. 42 45 43 - Args: 44 - url: Storage URL to resolve. 46 + #### Parameters {.doc-section .doc-section-parameters} 45 47 46 - Returns: 47 - WebDataset-compatible URL for reading. 48 + | Name | Type | Description | Default | 49 + |--------|--------------|-------------------------|------------| 50 + | url | [str](`str`) | Storage URL to resolve. | _required_ | 51 + 52 + #### Returns {.doc-section .doc-section-returns} 53 + 54 + | Name | Type | Description | 55 + |--------|--------------|----------------------------------------| 56 + | | [str](`str`) | WebDataset-compatible URL for reading. | 48 57 49 58 ### supports_streaming { #atdata.AbstractDataStore.supports_streaming } 50 59 ··· 54 63 55 64 Whether this store supports streaming reads. 56 65 57 - Returns: 58 - True if the store supports efficient streaming (like S3), 59 - False if data must be fully downloaded first. 66 + #### Returns {.doc-section .doc-section-returns} 67 + 68 + | Name | Type | Description | 69 + |--------|----------------|-----------------------------------------------------------| 70 + | | [bool](`bool`) | True if the store supports efficient streaming (like S3), | 71 + | | [bool](`bool`) | False if data must be fully downloaded first. | 60 72 61 73 ### write_shards { #atdata.AbstractDataStore.write_shards } 62 74 ··· 66 78 67 79 Write dataset shards to storage. 68 80 69 - Args: 70 - ds: The Dataset to write. 71 - prefix: Path prefix for the shards (e.g., 'datasets/mnist/v1'). 72 - **kwargs: Backend-specific options (e.g., maxcount for shard size). 81 + #### Parameters {.doc-section .doc-section-parameters} 73 82 74 - Returns: 75 - List of URLs for the written shards, suitable for use with 76 - WebDataset or atdata.Dataset(). 83 + | Name | Type | Description | Default | 84 + |----------|-------------------------------------|-----------------------------------------------------------|------------| 85 + | ds | [Dataset](`atdata.dataset.Dataset`) | The Dataset to write. | _required_ | 86 + | prefix | [str](`str`) | Path prefix for the shards (e.g., 'datasets/mnist/v1'). | _required_ | 87 + | **kwargs | | Backend-specific options (e.g., maxcount for shard size). | `{}` | 88 + 89 + #### Returns {.doc-section .doc-section-returns} 90 + 91 + | Name | Type | Description | 92 + |--------|--------------------------------|------------------------------------------------------------| 93 + | | [list](`list`)\[[str](`str`)\] | List of URLs for the written shards, suitable for use with | 94 + | | [list](`list`)\[[str](`str`)\] | WebDataset or atdata.Dataset(). |

+99 -48

docs_src/api/AbstractIndex.qmd

··· 14 14 A single index can hold datasets of many different sample types. The sample 15 15 type is tracked via schema references, not as a generic parameter on the index. 16 16 17 - Optional Extensions: 18 - Some index implementations support additional features: 19 - - ``data_store``: An AbstractDataStore for reading/writing dataset shards. 20 - If present, ``load_dataset`` will use it for S3 credential resolution. 17 + ## Optional Extensions {.doc-section .doc-section-optional-extensions} 21 18 22 - Example: 19 + Some index implementations support additional features: 20 + - ``data_store``: An AbstractDataStore for reading/writing dataset shards. 21 + If present, ``load_dataset`` will use it for S3 credential resolution. 22 + 23 + ## Example {.doc-section .doc-section-example} 24 + 25 + :: 26 + 23 27 >>> def publish_and_list(index: AbstractIndex) -> None: 24 28 ... # Publish schemas for different types 25 29 ... schema1 = index.publish_schema(ImageSample, version="1.0.0") ··· 64 68 ahead of time. The index retrieves the schema record and dynamically 65 69 generates a Packable class matching the schema definition. 66 70 67 - Args: 68 - ref: Schema reference string (local:// or at://). 71 + #### Parameters {.doc-section .doc-section-parameters} 69 72 70 - Returns: 71 - A dynamically generated Packable class with fields matching 72 - the schema definition. The class can be used with 73 - ``Dataset[T]`` to load and iterate over samples. 73 + | Name | Type | Description | Default | 74 + |--------|--------------|----------------------------------------------|------------| 75 + | ref | [str](`str`) | Schema reference string (local:// or at://). | _required_ | 76 + 77 + #### Returns {.doc-section .doc-section-returns} 78 + 79 + | Name | Type | Description | 80 + |--------|-------------------------------------------------------------------|-------------------------------------------------------------| 81 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | A dynamically generated Packable class with fields matching | 82 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | the schema definition. The class can be used with | 83 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | ``Dataset[T]`` to load and iterate over samples. | 74 84 75 - Raises: 76 - KeyError: If schema not found. 77 - ValueError: If schema cannot be decoded (unsupported field types). 85 + #### Raises {.doc-section .doc-section-raises} 78 86 79 - Example: 87 + | Name | Type | Description | 88 + |--------|----------------------------|--------------------------------------------------------| 89 + | | [KeyError](`KeyError`) | If schema not found. | 90 + | | [ValueError](`ValueError`) | If schema cannot be decoded (unsupported field types). | 91 + 92 + #### Example {.doc-section .doc-section-example} 93 + 94 + :: 95 + 80 96 >>> entry = index.get_dataset("my-dataset") 81 97 >>> SampleType = index.decode_schema(entry.schema_ref) 82 98 >>> ds = Dataset[SampleType](entry.data_urls[0]) ··· 91 107 92 108 Get a dataset entry by name or reference. 93 109 94 - Args: 95 - ref: Dataset name, path, or full reference string. 110 + #### Parameters {.doc-section .doc-section-parameters} 111 + 112 + | Name | Type | Description | Default | 113 + |--------|--------------|-----------------------------------------------|------------| 114 + | ref | [str](`str`) | Dataset name, path, or full reference string. | _required_ | 115 + 116 + #### Returns {.doc-section .doc-section-returns} 117 + 118 + | Name | Type | Description | 119 + |--------|----------------------------------------------|-----------------------------| 120 + | | [IndexEntry](`atdata._protocols.IndexEntry`) | IndexEntry for the dataset. | 96 121 97 - Returns: 98 - IndexEntry for the dataset. 122 + #### Raises {.doc-section .doc-section-raises} 99 123 100 - Raises: 101 - KeyError: If dataset not found. 124 + | Name | Type | Description | 125 + |--------|------------------------|-----------------------| 126 + | | [KeyError](`KeyError`) | If dataset not found. | 102 127 103 128 ### get_schema { #atdata.AbstractIndex.get_schema } 104 129 ··· 108 133 109 134 Get a schema record by reference. 110 135 111 - Args: 112 - ref: Schema reference string (local:// or at://). 136 + #### Parameters {.doc-section .doc-section-parameters} 137 + 138 + | Name | Type | Description | Default | 139 + |--------|--------------|----------------------------------------------|------------| 140 + | ref | [str](`str`) | Schema reference string (local:// or at://). | _required_ | 141 + 142 + #### Returns {.doc-section .doc-section-returns} 113 143 114 - Returns: 115 - Schema record as a dictionary with fields like 'name', 'version', 116 - 'fields', etc. 144 + | Name | Type | Description | 145 + |--------|----------------|-------------------------------------------------------------------| 146 + | | [dict](`dict`) | Schema record as a dictionary with fields like 'name', 'version', | 147 + | | [dict](`dict`) | 'fields', etc. | 148 + 149 + #### Raises {.doc-section .doc-section-raises} 117 150 118 - Raises: 119 - KeyError: If schema not found. 151 + | Name | Type | Description | 152 + |--------|------------------------|----------------------| 153 + | | [KeyError](`KeyError`) | If schema not found. | 120 154 121 155 ### insert_dataset { #atdata.AbstractIndex.insert_dataset } 122 156 ··· 129 163 The sample type is inferred from ``ds.sample_type``. If schema_ref is not 130 164 provided, the schema may be auto-published based on the sample type. 131 165 132 - Args: 133 - ds: The Dataset to register in the index (any sample type). 134 - name: Human-readable name for the dataset. 135 - schema_ref: Optional explicit schema reference. If not provided, 136 - the schema may be auto-published or inferred from ds.sample_type. 137 - **kwargs: Additional backend-specific options. 166 + #### Parameters {.doc-section .doc-section-parameters} 167 + 168 + | Name | Type | Description | Default | 169 + |------------|-----------------------------------------------|------------------------------------------------------------------------------------------------------------------------|------------| 170 + | ds | [Dataset](`atdata.dataset.Dataset`) | The Dataset to register in the index (any sample type). | _required_ | 171 + | name | [str](`str`) | Human-readable name for the dataset. | _required_ | 172 + | schema_ref | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type. | `None` | 173 + | **kwargs | | Additional backend-specific options. | `{}` | 138 174 139 - Returns: 140 - IndexEntry for the inserted dataset. 175 + #### Returns {.doc-section .doc-section-returns} 176 + 177 + | Name | Type | Description | 178 + |--------|----------------------------------------------|--------------------------------------| 179 + | | [IndexEntry](`atdata._protocols.IndexEntry`) | IndexEntry for the inserted dataset. | 141 180 142 181 ### list_datasets { #atdata.AbstractIndex.list_datasets } 143 182 ··· 147 186 148 187 Get all dataset entries as a materialized list. 149 188 150 - Returns: 151 - List of IndexEntry for each dataset. 189 + #### Returns {.doc-section .doc-section-returns} 190 + 191 + | Name | Type | Description | 192 + |--------|----------------------------------------------------------------|--------------------------------------| 193 + | | [list](`list`)\[[IndexEntry](`atdata._protocols.IndexEntry`)\] | List of IndexEntry for each dataset. | 152 194 153 195 ### list_schemas { #atdata.AbstractIndex.list_schemas } 154 196 ··· 158 200 159 201 Get all schema records as a materialized list. 160 202 161 - Returns: 162 - List of schema records as dictionaries. 203 + #### Returns {.doc-section .doc-section-returns} 204 + 205 + | Name | Type | Description | 206 + |--------|----------------------------------|-----------------------------------------| 207 + | | [list](`list`)\[[dict](`dict`)\] | List of schema records as dictionaries. | 163 208 164 209 ### publish_schema { #atdata.AbstractIndex.publish_schema } 165 210 ··· 169 214 170 215 Publish a schema for a sample type. 171 216 172 - Args: 173 - sample_type: A Packable type (PackableSample subclass or @packable-decorated). 174 - version: Semantic version string for the schema. 175 - **kwargs: Additional backend-specific options. 217 + #### Parameters {.doc-section .doc-section-parameters} 176 218 177 - Returns: 178 - Schema reference string: 179 - - Local: 'local://schemas/{module.Class}@{version}' 180 - - Atmosphere: 'at://did:plc:.../ac.foundation.dataset.sampleSchema/...' 219 + | Name | Type | Description | Default | 220 + |-------------|-------------------------------------------------------------------|-------------------------------------------------------------------|------------| 221 + | sample_type | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | A Packable type (PackableSample subclass or @packable-decorated). | _required_ | 222 + | version | [str](`str`) | Semantic version string for the schema. | `'1.0.0'` | 223 + | **kwargs | | Additional backend-specific options. | `{}` | 224 + 225 + #### Returns {.doc-section .doc-section-returns} 226 + 227 + | Name | Type | Description | 228 + |--------|--------------|-------------------------------------------------------------------------| 229 + | | [str](`str`) | Schema reference string: | 230 + | | [str](`str`) | - Local: 'local://schemas/{module.Class}@{version}' | 231 + | | [str](`str`) | - Atmosphere: 'at://did:plc:.../ac.foundation.dataset.sampleSchema/...' |

+19 -7

docs_src/api/AtUri.qmd

··· 8 8 9 9 AT URIs follow the format: at://<authority>/<collection>/<rkey> 10 10 11 - Example: 11 + ## Example {.doc-section .doc-section-example} 12 + 13 + :: 14 + 12 15 >>> uri = AtUri.parse("at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz") 13 16 >>> uri.authority 14 17 'did:plc:abc123' ··· 39 42 40 43 Parse an AT URI string into components. 41 44 42 - Args: 43 - uri: AT URI string in format ``at://<authority>/<collection>/<rkey>`` 45 + #### Parameters {.doc-section .doc-section-parameters} 46 + 47 + | Name | Type | Description | Default | 48 + |--------|--------------|------------------------------------------------------------------|------------| 49 + | uri | [str](`str`) | AT URI string in format ``at://<authority>/<collection>/<rkey>`` | _required_ | 50 + 51 + #### Returns {.doc-section .doc-section-returns} 52 + 53 + | Name | Type | Description | 54 + |--------|-------------------------------------------|------------------------| 55 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | Parsed AtUri instance. | 44 56 45 - Returns: 46 - Parsed AtUri instance. 57 + #### Raises {.doc-section .doc-section-raises} 47 58 48 - Raises: 49 - ValueError: If the URI format is invalid. 59 + | Name | Type | Description | 60 + |--------|----------------------------|-------------------------------| 61 + | | [ValueError](`ValueError`) | If the URI format is invalid. |

+203 -100

docs_src/api/AtmosphereClient.qmd

··· 9 9 This class wraps the atproto SDK client and provides higher-level methods 10 10 for working with atdata records (schemas, datasets, lenses). 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> client = AtmosphereClient() 14 17 >>> client.login("alice.bsky.social", "app-password") 15 18 >>> print(client.did) 16 19 'did:plc:...' 17 20 18 - Note: 19 - The password should be an app-specific password, not your main account 20 - password. Create app passwords in your Bluesky account settings. 21 + ## Note {.doc-section .doc-section-note} 22 + 23 + The password should be an app-specific password, not your main account 24 + password. Create app passwords in your Bluesky account settings. 21 25 22 26 ## Attributes 23 27 ··· 60 64 61 65 Create a record in the user's repository. 62 66 63 - Args: 64 - collection: The NSID of the record collection 65 - (e.g., 'ac.foundation.dataset.sampleSchema'). 66 - record: The record data. Must include a '$type' field. 67 - rkey: Optional explicit record key. If not provided, a TID is generated. 68 - validate: Whether to validate against the Lexicon schema. Set to False 69 - for custom lexicons that the PDS doesn't know about. 67 + #### Parameters {.doc-section .doc-section-parameters} 70 68 71 - Returns: 72 - The AT URI of the created record. 69 + | Name | Type | Description | Default | 70 + |------------|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------------|------------| 71 + | collection | [str](`str`) | The NSID of the record collection (e.g., 'ac.foundation.dataset.sampleSchema'). | _required_ | 72 + | record | [dict](`dict`) | The record data. Must include a '$type' field. | _required_ | 73 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. If not provided, a TID is generated. | `None` | 74 + | validate | [bool](`bool`) | Whether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn't know about. | `False` | 73 75 74 - Raises: 75 - ValueError: If not authenticated. 76 - atproto.exceptions.AtProtocolError: If record creation fails. 76 + #### Returns {.doc-section .doc-section-returns} 77 + 78 + | Name | Type | Description | 79 + |--------|-------------------------------------------|-----------------------------------| 80 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created record. | 81 + 82 + #### Raises {.doc-section .doc-section-raises} 83 + 84 + | Name | Type | Description | 85 + |--------|-----------------------------------------------------------------------------------------------------------------|---------------------------| 86 + | | [ValueError](`ValueError`) | If not authenticated. | 87 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If record creation fails. | 77 88 78 89 ### delete_record { #atdata.atmosphere.AtmosphereClient.delete_record } 79 90 ··· 83 94 84 95 Delete a record. 85 96 86 - Args: 87 - uri: The AT URI of the record to delete. 88 - swap_commit: Optional CID for compare-and-swap delete. 97 + #### Parameters {.doc-section .doc-section-parameters} 98 + 99 + | Name | Type | Description | Default | 100 + |-------------|-----------------------------------------------------------|-------------------------------------------|------------| 101 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the record to delete. | _required_ | 102 + | swap_commit | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional CID for compare-and-swap delete. | `None` | 89 103 90 - Raises: 91 - ValueError: If not authenticated. 92 - atproto.exceptions.AtProtocolError: If deletion fails. 104 + #### Raises {.doc-section .doc-section-raises} 105 + 106 + | Name | Type | Description | 107 + |--------|-----------------------------------------------------------------------------------------------------------------|-----------------------| 108 + | | [ValueError](`ValueError`) | If not authenticated. | 109 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If deletion fails. | 93 110 94 111 ### export_session { #atdata.atmosphere.AtmosphereClient.export_session } 95 112 ··· 99 116 100 117 Export the current session for later reuse. 101 118 102 - Returns: 103 - Session string that can be passed to ``login_with_session()``. 119 + #### Returns {.doc-section .doc-section-returns} 120 + 121 + | Name | Type | Description | 122 + |--------|--------------|----------------------------------------------------------------| 123 + | | [str](`str`) | Session string that can be passed to ``login_with_session()``. | 124 + 125 + #### Raises {.doc-section .doc-section-raises} 104 126 105 - Raises: 106 - ValueError: If not authenticated. 127 + | Name | Type | Description | 128 + |--------|----------------------------|-----------------------| 129 + | | [ValueError](`ValueError`) | If not authenticated. | 107 130 108 131 ### get_blob { #atdata.atmosphere.AtmosphereClient.get_blob } 109 132 ··· 116 139 This resolves the PDS endpoint from the DID document and fetches 117 140 the blob directly from the PDS. 118 141 119 - Args: 120 - did: The DID of the repository containing the blob. 121 - cid: The CID of the blob. 142 + #### Parameters {.doc-section .doc-section-parameters} 143 + 144 + | Name | Type | Description | Default | 145 + |--------|--------------|------------------------------------------------|------------| 146 + | did | [str](`str`) | The DID of the repository containing the blob. | _required_ | 147 + | cid | [str](`str`) | The CID of the blob. | _required_ | 148 + 149 + #### Returns {.doc-section .doc-section-returns} 122 150 123 - Returns: 124 - The blob data as bytes. 151 + | Name | Type | Description | 152 + |--------|------------------|-------------------------| 153 + | | [bytes](`bytes`) | The blob data as bytes. | 125 154 126 - Raises: 127 - ValueError: If PDS endpoint cannot be resolved. 128 - requests.HTTPError: If blob fetch fails. 155 + #### Raises {.doc-section .doc-section-raises} 156 + 157 + | Name | Type | Description | 158 + |--------|----------------------------------------------------------|-------------------------------------| 159 + | | [ValueError](`ValueError`) | If PDS endpoint cannot be resolved. | 160 + | | [requests](`requests`).[HTTPError](`requests.HTTPError`) | If blob fetch fails. | 129 161 130 162 ### get_blob_url { #atdata.atmosphere.AtmosphereClient.get_blob_url } 131 163 ··· 137 169 138 170 This is useful for passing to WebDataset or other HTTP clients. 139 171 140 - Args: 141 - did: The DID of the repository containing the blob. 142 - cid: The CID of the blob. 172 + #### Parameters {.doc-section .doc-section-parameters} 143 173 144 - Returns: 145 - The full URL for fetching the blob. 174 + | Name | Type | Description | Default | 175 + |--------|--------------|------------------------------------------------|------------| 176 + | did | [str](`str`) | The DID of the repository containing the blob. | _required_ | 177 + | cid | [str](`str`) | The CID of the blob. | _required_ | 146 178 147 - Raises: 148 - ValueError: If PDS endpoint cannot be resolved. 179 + #### Returns {.doc-section .doc-section-returns} 180 + 181 + | Name | Type | Description | 182 + |--------|--------------|-------------------------------------| 183 + | | [str](`str`) | The full URL for fetching the blob. | 184 + 185 + #### Raises {.doc-section .doc-section-raises} 186 + 187 + | Name | Type | Description | 188 + |--------|----------------------------|-------------------------------------| 189 + | | [ValueError](`ValueError`) | If PDS endpoint cannot be resolved. | 149 190 150 191 ### get_record { #atdata.atmosphere.AtmosphereClient.get_record } 151 192 ··· 155 196 156 197 Fetch a record by AT URI. 157 198 158 - Args: 159 - uri: The AT URI of the record. 199 + #### Parameters {.doc-section .doc-section-parameters} 160 200 161 - Returns: 162 - The record data as a dictionary. 201 + | Name | Type | Description | Default | 202 + |--------|-----------------------------------------------------------|---------------------------|------------| 203 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the record. | _required_ | 163 204 164 - Raises: 165 - atproto.exceptions.AtProtocolError: If record not found. 205 + #### Returns {.doc-section .doc-section-returns} 206 + 207 + | Name | Type | Description | 208 + |--------|----------------|----------------------------------| 209 + | | [dict](`dict`) | The record data as a dictionary. | 210 + 211 + #### Raises {.doc-section .doc-section-raises} 212 + 213 + | Name | Type | Description | 214 + |--------|-----------------------------------------------------------------------------------------------------------------|----------------------| 215 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If record not found. | 166 216 167 217 ### list_datasets { #atdata.atmosphere.AtmosphereClient.list_datasets } 168 218 ··· 172 222 173 223 List dataset records. 174 224 175 - Args: 176 - repo: The DID to query. Defaults to authenticated user. 177 - limit: Maximum number to return. 225 + #### Parameters {.doc-section .doc-section-parameters} 226 + 227 + | Name | Type | Description | Default | 228 + |--------|-----------------------------------------------|---------------------------------------------------|-----------| 229 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID to query. Defaults to authenticated user. | `None` | 230 + | limit | [int](`int`) | Maximum number to return. | `100` | 231 + 232 + #### Returns {.doc-section .doc-section-returns} 178 233 179 - Returns: 180 - List of dataset records. 234 + | Name | Type | Description | 235 + |--------|----------------------------------|--------------------------| 236 + | | [list](`list`)\[[dict](`dict`)\] | List of dataset records. | 181 237 182 238 ### list_lenses { #atdata.atmosphere.AtmosphereClient.list_lenses } 183 239 ··· 187 243 188 244 List lens records. 189 245 190 - Args: 191 - repo: The DID to query. Defaults to authenticated user. 192 - limit: Maximum number to return. 246 + #### Parameters {.doc-section .doc-section-parameters} 193 247 194 - Returns: 195 - List of lens records. 248 + | Name | Type | Description | Default | 249 + |--------|-----------------------------------------------|---------------------------------------------------|-----------| 250 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID to query. Defaults to authenticated user. | `None` | 251 + | limit | [int](`int`) | Maximum number to return. | `100` | 252 + 253 + #### Returns {.doc-section .doc-section-returns} 254 + 255 + | Name | Type | Description | 256 + |--------|----------------------------------|-----------------------| 257 + | | [list](`list`)\[[dict](`dict`)\] | List of lens records. | 196 258 197 259 ### list_records { #atdata.atmosphere.AtmosphereClient.list_records } 198 260 ··· 208 270 209 271 List records in a collection. 210 272 211 - Args: 212 - collection: The NSID of the record collection. 213 - repo: The DID of the repository to query. Defaults to the 214 - authenticated user's repository. 215 - limit: Maximum number of records to return (default 100). 216 - cursor: Pagination cursor from a previous call. 273 + #### Parameters {.doc-section .doc-section-parameters} 217 274 218 - Returns: 219 - A tuple of (records, next_cursor). The cursor is None if there 220 - are no more records. 275 + | Name | Type | Description | Default | 276 + |------------|-----------------------------------------------|--------------------------------------------------------------------------------------|------------| 277 + | collection | [str](`str`) | The NSID of the record collection. | _required_ | 278 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID of the repository to query. Defaults to the authenticated user's repository. | `None` | 279 + | limit | [int](`int`) | Maximum number of records to return (default 100). | `100` | 280 + | cursor | [Optional](`typing.Optional`)\[[str](`str`)\] | Pagination cursor from a previous call. | `None` | 221 281 222 - Raises: 223 - ValueError: If repo is None and not authenticated. 282 + #### Returns {.doc-section .doc-section-returns} 283 + 284 + | Name | Type | Description | 285 + |--------|-----------------------------------------------|----------------------------------------------------------------| 286 + | | [list](`list`)\[[dict](`dict`)\] | A tuple of (records, next_cursor). The cursor is None if there | 287 + | | [Optional](`typing.Optional`)\[[str](`str`)\] | are no more records. | 288 + 289 + #### Raises {.doc-section .doc-section-raises} 290 + 291 + | Name | Type | Description | 292 + |--------|----------------------------|----------------------------------------| 293 + | | [ValueError](`ValueError`) | If repo is None and not authenticated. | 224 294 225 295 ### list_schemas { #atdata.atmosphere.AtmosphereClient.list_schemas } 226 296 ··· 230 300 231 301 List schema records. 232 302 233 - Args: 234 - repo: The DID to query. Defaults to authenticated user. 235 - limit: Maximum number to return. 303 + #### Parameters {.doc-section .doc-section-parameters} 236 304 237 - Returns: 238 - List of schema records. 305 + | Name | Type | Description | Default | 306 + |--------|-----------------------------------------------|---------------------------------------------------|-----------| 307 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID to query. Defaults to authenticated user. | `None` | 308 + | limit | [int](`int`) | Maximum number to return. | `100` | 309 + 310 + #### Returns {.doc-section .doc-section-returns} 311 + 312 + | Name | Type | Description | 313 + |--------|----------------------------------|-------------------------| 314 + | | [list](`list`)\[[dict](`dict`)\] | List of schema records. | 239 315 240 316 ### login { #atdata.atmosphere.AtmosphereClient.login } 241 317 ··· 245 321 246 322 Authenticate with the ATProto PDS. 247 323 248 - Args: 249 - handle: Your Bluesky handle (e.g., 'alice.bsky.social'). 250 - password: App-specific password (not your main password). 324 + #### Parameters {.doc-section .doc-section-parameters} 325 + 326 + | Name | Type | Description | Default | 327 + |----------|--------------|--------------------------------------------------|------------| 328 + | handle | [str](`str`) | Your Bluesky handle (e.g., 'alice.bsky.social'). | _required_ | 329 + | password | [str](`str`) | App-specific password (not your main password). | _required_ | 251 330 252 - Raises: 253 - atproto.exceptions.AtProtocolError: If authentication fails. 331 + #### Raises {.doc-section .doc-section-raises} 332 + 333 + | Name | Type | Description | 334 + |--------|-----------------------------------------------------------------------------------------------------------------|--------------------------| 335 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If authentication fails. | 254 336 255 337 ### login_with_session { #atdata.atmosphere.AtmosphereClient.login_with_session } 256 338 ··· 263 345 This allows reusing a session without re-authenticating, which helps 264 346 avoid rate limits on session creation. 265 347 266 - Args: 267 - session_string: Session string from ``export_session()``. 348 + #### Parameters {.doc-section .doc-section-parameters} 349 + 350 + | Name | Type | Description | Default | 351 + |----------------|--------------|-------------------------------------------|------------| 352 + | session_string | [str](`str`) | Session string from ``export_session()``. | _required_ | 268 353 269 354 ### put_record { #atdata.atmosphere.AtmosphereClient.put_record } 270 355 ··· 281 366 282 367 Create or update a record at a specific key. 283 368 284 - Args: 285 - collection: The NSID of the record collection. 286 - rkey: The record key. 287 - record: The record data. Must include a '$type' field. 288 - validate: Whether to validate against the Lexicon schema. 289 - swap_commit: Optional CID for compare-and-swap update. 369 + #### Parameters {.doc-section .doc-section-parameters} 370 + 371 + | Name | Type | Description | Default | 372 + |-------------|-----------------------------------------------|-------------------------------------------------|------------| 373 + | collection | [str](`str`) | The NSID of the record collection. | _required_ | 374 + | rkey | [str](`str`) | The record key. | _required_ | 375 + | record | [dict](`dict`) | The record data. Must include a '$type' field. | _required_ | 376 + | validate | [bool](`bool`) | Whether to validate against the Lexicon schema. | `False` | 377 + | swap_commit | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional CID for compare-and-swap update. | `None` | 378 + 379 + #### Returns {.doc-section .doc-section-returns} 380 + 381 + | Name | Type | Description | 382 + |--------|-------------------------------------------|---------------------------| 383 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the record. | 290 384 291 - Returns: 292 - The AT URI of the record. 385 + #### Raises {.doc-section .doc-section-raises} 293 386 294 - Raises: 295 - ValueError: If not authenticated. 296 - atproto.exceptions.AtProtocolError: If operation fails. 387 + | Name | Type | Description | 388 + |--------|-----------------------------------------------------------------------------------------------------------------|-----------------------| 389 + | | [ValueError](`ValueError`) | If not authenticated. | 390 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If operation fails. | 297 391 298 392 ### upload_blob { #atdata.atmosphere.AtmosphereClient.upload_blob } 299 393 ··· 306 400 307 401 Upload binary data as a blob to the PDS. 308 402 309 - Args: 310 - data: Binary data to upload. 311 - mime_type: MIME type of the data (for reference, not enforced by PDS). 403 + #### Parameters {.doc-section .doc-section-parameters} 404 + 405 + | Name | Type | Description | Default | 406 + |-----------|------------------|-------------------------------------------------------------|------------------------------| 407 + | data | [bytes](`bytes`) | Binary data to upload. | _required_ | 408 + | mime_type | [str](`str`) | MIME type of the data (for reference, not enforced by PDS). | `'application/octet-stream'` | 409 + 410 + #### Returns {.doc-section .doc-section-returns} 411 + 412 + | Name | Type | Description | 413 + |--------|----------------|----------------------------------------------------------------------| 414 + | | [dict](`dict`) | A blob reference dict with keys: '$type', 'ref', 'mimeType', 'size'. | 415 + | | [dict](`dict`) | This can be embedded directly in record fields. | 312 416 313 - Returns: 314 - A blob reference dict with keys: '$type', 'ref', 'mimeType', 'size'. 315 - This can be embedded directly in record fields. 417 + #### Raises {.doc-section .doc-section-raises} 316 418 317 - Raises: 318 - ValueError: If not authenticated. 319 - atproto.exceptions.AtProtocolError: If upload fails. 419 + | Name | Type | Description | 420 + |--------|-----------------------------------------------------------------------------------------------------------------|-----------------------| 421 + | | [ValueError](`ValueError`) | If not authenticated. | 422 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If upload fails. |

+94 -40

docs_src/api/AtmosphereIndex.qmd

··· 9 9 Wraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide 10 10 a unified interface compatible with LocalIndex. 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> client = AtmosphereClient() 14 17 >>> client.login("handle.bsky.social", "app-password") 15 18 >>> ··· 44 47 45 48 Reconstruct a Python type from a schema record. 46 49 47 - Args: 48 - ref: AT URI of the schema record. 50 + #### Parameters {.doc-section .doc-section-parameters} 49 51 50 - Returns: 51 - Dynamically generated Packable type. 52 + | Name | Type | Description | Default | 53 + |--------|--------------|------------------------------|------------| 54 + | ref | [str](`str`) | AT URI of the schema record. | _required_ | 52 55 53 - Raises: 54 - ValueError: If schema cannot be decoded. 56 + #### Returns {.doc-section .doc-section-returns} 57 + 58 + | Name | Type | Description | 59 + |--------|-------------------------------------------------------------------|--------------------------------------| 60 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | Dynamically generated Packable type. | 61 + 62 + #### Raises {.doc-section .doc-section-raises} 63 + 64 + | Name | Type | Description | 65 + |--------|----------------------------|------------------------------| 66 + | | [ValueError](`ValueError`) | If schema cannot be decoded. | 55 67 56 68 ### get_dataset { #atdata.atmosphere.AtmosphereIndex.get_dataset } 57 69 ··· 61 73 62 74 Get a dataset by AT URI. 63 75 64 - Args: 65 - ref: AT URI of the dataset record. 76 + #### Parameters {.doc-section .doc-section-parameters} 77 + 78 + | Name | Type | Description | Default | 79 + |--------|--------------|-------------------------------|------------| 80 + | ref | [str](`str`) | AT URI of the dataset record. | _required_ | 81 + 82 + #### Returns {.doc-section .doc-section-returns} 83 + 84 + | Name | Type | Description | 85 + |--------|------------------------------------------------------------------|---------------------------------------| 86 + | | [AtmosphereIndexEntry](`atdata.atmosphere.AtmosphereIndexEntry`) | AtmosphereIndexEntry for the dataset. | 66 87 67 - Returns: 68 - AtmosphereIndexEntry for the dataset. 88 + #### Raises {.doc-section .doc-section-raises} 69 89 70 - Raises: 71 - ValueError: If record is not a dataset. 90 + | Name | Type | Description | 91 + |--------|----------------------------|-----------------------------| 92 + | | [ValueError](`ValueError`) | If record is not a dataset. | 72 93 73 94 ### get_schema { #atdata.atmosphere.AtmosphereIndex.get_schema } 74 95 ··· 78 99 79 100 Get a schema record by AT URI. 80 101 81 - Args: 82 - ref: AT URI of the schema record. 102 + #### Parameters {.doc-section .doc-section-parameters} 103 + 104 + | Name | Type | Description | Default | 105 + |--------|--------------|------------------------------|------------| 106 + | ref | [str](`str`) | AT URI of the schema record. | _required_ | 83 107 84 - Returns: 85 - Schema record dictionary. 108 + #### Returns {.doc-section .doc-section-returns} 86 109 87 - Raises: 88 - ValueError: If record is not a schema. 110 + | Name | Type | Description | 111 + |--------|----------------|---------------------------| 112 + | | [dict](`dict`) | Schema record dictionary. | 113 + 114 + #### Raises {.doc-section .doc-section-raises} 115 + 116 + | Name | Type | Description | 117 + |--------|----------------------------|----------------------------| 118 + | | [ValueError](`ValueError`) | If record is not a schema. | 89 119 90 120 ### insert_dataset { #atdata.atmosphere.AtmosphereIndex.insert_dataset } 91 121 ··· 101 131 102 132 Insert a dataset into ATProto. 103 133 104 - Args: 105 - ds: The Dataset to publish. 106 - name: Human-readable name. 107 - schema_ref: Optional schema AT URI. If None, auto-publishes schema. 108 - **kwargs: Additional options (description, tags, license). 134 + #### Parameters {.doc-section .doc-section-parameters} 109 135 110 - Returns: 111 - AtmosphereIndexEntry for the inserted dataset. 136 + | Name | Type | Description | Default | 137 + |------------|-----------------------------------------------|---------------------------------------------------------|------------| 138 + | ds | [Dataset](`atdata.dataset.Dataset`) | The Dataset to publish. | _required_ | 139 + | name | [str](`str`) | Human-readable name. | _required_ | 140 + | schema_ref | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional schema AT URI. If None, auto-publishes schema. | `None` | 141 + | **kwargs | | Additional options (description, tags, license). | `{}` | 142 + 143 + #### Returns {.doc-section .doc-section-returns} 144 + 145 + | Name | Type | Description | 146 + |--------|------------------------------------------------------------------|------------------------------------------------| 147 + | | [AtmosphereIndexEntry](`atdata.atmosphere.AtmosphereIndexEntry`) | AtmosphereIndexEntry for the inserted dataset. | 112 148 113 149 ### list_datasets { #atdata.atmosphere.AtmosphereIndex.list_datasets } 114 150 ··· 118 154 119 155 Get all dataset entries as a materialized list (AbstractIndex protocol). 120 156 121 - Args: 122 - repo: DID of repository. Defaults to authenticated user. 157 + #### Parameters {.doc-section .doc-section-parameters} 158 + 159 + | Name | Type | Description | Default | 160 + |--------|-----------------------------------------------|----------------------------------------------------|-----------| 161 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | DID of repository. Defaults to authenticated user. | `None` | 123 162 124 - Returns: 125 - List of AtmosphereIndexEntry for each dataset. 163 + #### Returns {.doc-section .doc-section-returns} 164 + 165 + | Name | Type | Description | 166 + |--------|------------------------------------------------------------------------------------|------------------------------------------------| 167 + | | [list](`list`)\[[AtmosphereIndexEntry](`atdata.atmosphere.AtmosphereIndexEntry`)\] | List of AtmosphereIndexEntry for each dataset. | 126 168 127 169 ### list_schemas { #atdata.atmosphere.AtmosphereIndex.list_schemas } 128 170 ··· 132 174 133 175 Get all schema records as a materialized list (AbstractIndex protocol). 134 176 135 - Args: 136 - repo: DID of repository. Defaults to authenticated user. 177 + #### Parameters {.doc-section .doc-section-parameters} 178 + 179 + | Name | Type | Description | Default | 180 + |--------|-----------------------------------------------|----------------------------------------------------|-----------| 181 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | DID of repository. Defaults to authenticated user. | `None` | 182 + 183 + #### Returns {.doc-section .doc-section-returns} 137 184 138 - Returns: 139 - List of schema records as dictionaries. 185 + | Name | Type | Description | 186 + |--------|----------------------------------|-----------------------------------------| 187 + | | [list](`list`)\[[dict](`dict`)\] | List of schema records as dictionaries. | 140 188 141 189 ### publish_schema { #atdata.atmosphere.AtmosphereIndex.publish_schema } 142 190 ··· 151 199 152 200 Publish a schema to ATProto. 153 201 154 - Args: 155 - sample_type: A Packable type (PackableSample subclass or @packable-decorated). 156 - version: Semantic version string. 157 - **kwargs: Additional options (description, metadata). 202 + #### Parameters {.doc-section .doc-section-parameters} 203 + 204 + | Name | Type | Description | Default | 205 + |-------------|-------------------------------------------------------------------|-------------------------------------------------------------------|------------| 206 + | sample_type | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | A Packable type (PackableSample subclass or @packable-decorated). | _required_ | 207 + | version | [str](`str`) | Semantic version string. | `'1.0.0'` | 208 + | **kwargs | | Additional options (description, metadata). | `{}` | 158 209 159 - Returns: 160 - AT URI of the schema record. 210 + #### Returns {.doc-section .doc-section-returns} 211 + 212 + | Name | Type | Description | 213 + |--------|--------------|------------------------------| 214 + | | [str](`str`) | AT URI of the schema record. |

+5 -12

docs_src/api/AtmosphereIndexEntry.qmd

··· 6 6 7 7 Entry wrapper for ATProto dataset records implementing IndexEntry protocol. 8 8 9 - Attributes: 10 - _uri: AT URI of the record. 11 - _record: Raw record dictionary. 9 + ## Attributes {.doc-section .doc-section-attributes} 12 10 13 - ## Attributes 14 - 15 - | Name | Description | 16 - | --- | --- | 17 - | [data_urls](#atdata.atmosphere.AtmosphereIndexEntry.data_urls) | WebDataset URLs from external storage. | 18 - | [metadata](#atdata.atmosphere.AtmosphereIndexEntry.metadata) | Metadata from the record, if any. | 19 - | [name](#atdata.atmosphere.AtmosphereIndexEntry.name) | Human-readable dataset name. | 20 - | [schema_ref](#atdata.atmosphere.AtmosphereIndexEntry.schema_ref) | AT URI of the schema record. | 21 - | [uri](#atdata.atmosphere.AtmosphereIndexEntry.uri) | AT URI of this record. | 11 + | Name | Type | Description | 12 + |---------|--------|------------------------| 13 + | _uri | | AT URI of the record. | 14 + | _record | | Raw record dictionary. |

+24 -9

docs_src/api/DataSource.qmd

··· 19 19 - ATProto blob streaming 20 20 - Any other source that can provide file-like objects 21 21 22 - Example: 22 + ## Example {.doc-section .doc-section-example} 23 + 24 + :: 25 + 23 26 >>> source = S3Source( 24 27 ... bucket="my-bucket", 25 28 ... keys=["data-000.tar", "data-001.tar"], ··· 55 58 streaming data. Implementations should return identifiers that 56 59 match what shards would yield. 57 60 58 - Returns: 59 - List of shard identifier strings. 61 + #### Returns {.doc-section .doc-section-returns} 62 + 63 + | Name | Type | Description | 64 + |--------|--------------------------------|-----------------------------------| 65 + | | [list](`list`)\[[str](`str`)\] | List of shard identifier strings. | 60 66 61 67 ### open_shard { #atdata.DataSource.open_shard } 62 68 ··· 70 76 required for PyTorch DataLoader worker splitting. Each worker opens 71 77 only its assigned shards rather than iterating all shards. 72 78 73 - Args: 74 - shard_id: Shard identifier from shard_list. 79 + #### Parameters {.doc-section .doc-section-parameters} 80 + 81 + | Name | Type | Description | Default | 82 + |----------|--------------|-----------------------------------|------------| 83 + | shard_id | [str](`str`) | Shard identifier from shard_list. | _required_ | 75 84 76 - Returns: 77 - File-like stream for reading the shard. 85 + #### Returns {.doc-section .doc-section-returns} 78 86 79 - Raises: 80 - KeyError: If shard_id is not in shard_list. 87 + | Name | Type | Description | 88 + |--------|---------------------------------------|-----------------------------------------| 89 + | | [IO](`typing.IO`)\[[bytes](`bytes`)\] | File-like stream for reading the shard. | 90 + 91 + #### Raises {.doc-section .doc-section-raises} 92 + 93 + | Name | Type | Description | 94 + |--------|------------------------|-----------------------------------| 95 + | | [KeyError](`KeyError`) | If shard_id is not in shard_list. |

+116 -89

docs_src/api/Dataset.qmd

··· 16 16 - Type transformations via the lens system (``as_type()``) 17 17 - Export to parquet format 18 18 19 - Type Parameters: 20 - ST: The sample type for this dataset, must derive from ``PackableSample``. 19 + ## Parameters {.doc-section .doc-section-parameters} 20 + 21 + | Name | Type | Description | Default | 22 + |--------|--------|------------------------------------------------------------------------|------------| 23 + | ST | | The sample type for this dataset, must derive from ``PackableSample``. | _required_ | 21 24 22 - Attributes: 23 - url: WebDataset brace-notation URL for the tar file(s). 25 + ## Attributes {.doc-section .doc-section-attributes} 24 26 25 - Example: 27 + | Name | Type | Description | 28 + |--------|--------|----------------------------------------------------| 29 + | url | | WebDataset brace-notation URL for the tar file(s). | 30 + 31 + ## Example {.doc-section .doc-section-example} 32 + 33 + :: 34 + 26 35 >>> ds = Dataset[MyData]("path/to/data-{000000..000009}.tar") 27 36 >>> for sample in ds.ordered(batch_size=32): 28 37 ... # sample is SampleBatch[MyData] with batch_size samples ··· 31 40 >>> # Transform to a different view 32 41 >>> ds_view = ds.as_type(MyDataView) 33 42 34 - Note: 35 - This class uses Python's ``__orig_class__`` mechanism to extract the 36 - type parameter at runtime. Instances must be created using the 37 - subscripted syntax ``Dataset[MyType](url)`` rather than calling the 38 - constructor directly with an unsubscripted class. 39 - 40 - ## Attributes 43 + ## Note {.doc-section .doc-section-note} 41 44 42 - | Name | Description | 43 - | --- | --- | 44 - | [batch_type](#atdata.Dataset.batch_type) | The type of batches produced by this dataset. | 45 - | [metadata](#atdata.Dataset.metadata) | Fetch and cache metadata from metadata_url. | 46 - | [metadata_url](#atdata.Dataset.metadata_url) | Optional URL to msgpack-encoded metadata for this dataset. | 47 - | [sample_type](#atdata.Dataset.sample_type) | The type of each returned sample from this dataset's iterator. | 48 - | [shard_list](#atdata.Dataset.shard_list) | List of individual dataset shards (deprecated, use list_shards()). | 49 - | [source](#atdata.Dataset.source) | The underlying data source for this dataset. | 45 + This class uses Python's ``__orig_class__`` mechanism to extract the 46 + type parameter at runtime. Instances must be created using the 47 + subscripted syntax ``Dataset[MyType](url)`` rather than calling the 48 + constructor directly with an unsubscripted class. 50 49 51 50 ## Methods 52 51 ··· 68 67 69 68 View this dataset through a different sample type using a registered lens. 70 69 71 - Args: 72 - other: The target sample type to transform into. Must be a type 73 - derived from ``PackableSample``. 70 + #### Parameters {.doc-section .doc-section-parameters} 71 + 72 + | Name | Type | Description | Default | 73 + |--------|----------------------------------------------------|-------------------------------------------------------------------------------------------|------------| 74 + | other | [Type](`typing.Type`)\[[RT](`atdata.dataset.RT`)\] | The target sample type to transform into. Must be a type derived from ``PackableSample``. | _required_ | 75 + 76 + #### Returns {.doc-section .doc-section-returns} 77 + 78 + | Name | Type | Description | 79 + |--------|------------------------------------------------------------------|------------------------------------------------------------------| 80 + | | [Dataset](`atdata.dataset.Dataset`)\[[RT](`atdata.dataset.RT`)\] | A new ``Dataset`` instance that yields samples of type ``other`` | 81 + | | [Dataset](`atdata.dataset.Dataset`)\[[RT](`atdata.dataset.RT`)\] | by applying the appropriate lens transformation from the global | 82 + | | [Dataset](`atdata.dataset.Dataset`)\[[RT](`atdata.dataset.RT`)\] | ``LensNetwork`` registry. | 74 83 75 - Returns: 76 - A new ``Dataset`` instance that yields samples of type ``other`` 77 - by applying the appropriate lens transformation from the global 78 - ``LensNetwork`` registry. 84 + #### Raises {.doc-section .doc-section-raises} 79 85 80 - Raises: 81 - ValueError: If no registered lens exists between the current 82 - sample type and the target type. 86 + | Name | Type | Description | 87 + |--------|----------------------------|-----------------------------------------------------------------------------------| 88 + | | [ValueError](`ValueError`) | If no registered lens exists between the current sample type and the target type. | 83 89 84 90 ### list_shards { #atdata.Dataset.list_shards } 85 91 ··· 89 95 90 96 Get list of individual dataset shards. 91 97 92 - Returns: 93 - A full (non-lazy) list of the individual ``tar`` files within the 94 - source WebDataset. 98 + #### Returns {.doc-section .doc-section-returns} 99 + 100 + | Name | Type | Description | 101 + |--------|--------------------------------|-------------------------------------------------------------------| 102 + | | [list](`list`)\[[str](`str`)\] | A full (non-lazy) list of the individual ``tar`` files within the | 103 + | | [list](`list`)\[[str](`str`)\] | source WebDataset. | 95 104 96 105 ### ordered { #atdata.Dataset.ordered } 97 106 ··· 101 110 102 111 Iterate over the dataset in order 103 112 104 - Args: 105 - batch_size (:obj:`int`, optional): The size of iterated batches. 106 - Default: None (unbatched). If ``None``, iterates over one 107 - sample at a time with no batch dimension. 113 + #### Parameters {.doc-section .doc-section-parameters} 114 + 115 + | Name | Type | Description | Default | 116 + |--------------|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 117 + | batch_size ( | | obj:`int`, optional): The size of iterated batches. Default: None (unbatched). If ``None``, iterates over one sample at a time with no batch dimension. | _required_ | 118 + 119 + #### Returns {.doc-section .doc-section-returns} 108 120 109 - Returns: 110 - :obj:`webdataset.DataPipeline` A data pipeline that iterates over 111 - the dataset in its original sample order 121 + | Name | Type | Description | 122 + |--------|------------------------------------------------------------|------------------------------------------------------------------| 123 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | obj:`webdataset.DataPipeline` A data pipeline that iterates over | 124 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | the dataset in its original sample order | 112 125 113 126 ### shuffled { #atdata.Dataset.shuffled } 114 127 ··· 118 131 119 132 Iterate over the dataset in random order. 120 133 121 - Args: 122 - buffer_shards: Number of shards to buffer for shuffling at the 123 - shard level. Larger values increase randomness but use more 124 - memory. Default: 100. 125 - buffer_samples: Number of samples to buffer for shuffling within 126 - shards. Larger values increase randomness but use more memory. 127 - Default: 10,000. 128 - batch_size: The size of iterated batches. Default: None (unbatched). 129 - If ``None``, iterates over one sample at a time with no batch 130 - dimension. 134 + #### Parameters {.doc-section .doc-section-parameters} 135 + 136 + | Name | Type | Description | Default | 137 + |----------------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------|-----------| 138 + | buffer_shards | [int](`int`) | Number of shards to buffer for shuffling at the shard level. Larger values increase randomness but use more memory. Default: 100. | `100` | 139 + | buffer_samples | [int](`int`) | Number of samples to buffer for shuffling within shards. Larger values increase randomness but use more memory. Default: 10,000. | `10000` | 140 + | batch_size | [int](`int`) \| None | The size of iterated batches. Default: None (unbatched). If ``None``, iterates over one sample at a time with no batch dimension. | `None` | 141 + 142 + #### Returns {.doc-section .doc-section-returns} 131 143 132 - Returns: 133 - A WebDataset data pipeline that iterates over the dataset in 134 - randomized order. If ``batch_size`` is not ``None``, yields 135 - ``SampleBatch[ST]`` instances; otherwise yields individual ``ST`` 136 - samples. 144 + | Name | Type | Description | 145 + |--------|------------------------------------------------------------|-------------------------------------------------------------------| 146 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | A WebDataset data pipeline that iterates over the dataset in | 147 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | randomized order. If ``batch_size`` is not ``None``, yields | 148 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | ``SampleBatch[ST]`` instances; otherwise yields individual ``ST`` | 149 + | | [Iterable](`typing.Iterable`)\[[ST](`atdata.dataset.ST`)\] | samples. | 137 150 138 151 ### to_parquet { #atdata.Dataset.to_parquet } 139 152 ··· 146 159 Converts all samples to a pandas DataFrame and saves to parquet file(s). 147 160 Useful for interoperability with data analysis tools. 148 161 149 - Args: 150 - path: Output path for the parquet file. If ``maxcount`` is specified, 151 - files are named ``{stem}-{segment:06d}.parquet``. 152 - sample_map: Optional function to convert samples to dictionaries. 153 - Defaults to ``dataclasses.asdict``. 154 - maxcount: If specified, split output into multiple files with at most 155 - this many samples each. Recommended for large datasets. 156 - **kwargs: Additional arguments passed to ``pandas.DataFrame.to_parquet()``. 157 - Common options include ``compression``, ``index``, ``engine``. 162 + #### Parameters {.doc-section .doc-section-parameters} 158 163 159 - Warning: 160 - **Memory Usage**: When ``maxcount=None`` (default), this method loads 161 - the **entire dataset into memory** as a pandas DataFrame before writing. 162 - For large datasets, this can cause memory exhaustion. 164 + | Name | Type | Description | Default | 165 + |------------|--------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------|------------| 166 + | path | [Pathlike](`atdata.dataset.Pathlike`) | Output path for the parquet file. If ``maxcount`` is specified, files are named ``{stem}-{segment:06d}.parquet``. | _required_ | 167 + | sample_map | [Optional](`typing.Optional`)\[[SampleExportMap](`atdata.dataset.SampleExportMap`)\] | Optional function to convert samples to dictionaries. Defaults to ``dataclasses.asdict``. | `None` | 168 + | maxcount | [Optional](`typing.Optional`)\[[int](`int`)\] | If specified, split output into multiple files with at most this many samples each. Recommended for large datasets. | `None` | 169 + | **kwargs | | Additional arguments passed to ``pandas.DataFrame.to_parquet()``. Common options include ``compression``, ``index``, ``engine``. | `{}` | 163 170 164 - For datasets larger than available RAM, always specify ``maxcount``:: 171 + #### Warning {.doc-section .doc-section-warning} 165 172 166 - # Safe for large datasets - processes in chunks 167 - ds.to_parquet("output.parquet", maxcount=10000) 173 + **Memory Usage**: When ``maxcount=None`` (default), this method loads 174 + the **entire dataset into memory** as a pandas DataFrame before writing. 175 + For large datasets, this can cause memory exhaustion. 168 176 169 - This creates multiple parquet files: ``output-000000.parquet``, 170 - ``output-000001.parquet``, etc. 177 + For datasets larger than available RAM, always specify ``maxcount``:: 171 178 172 - Example: 179 + # Safe for large datasets - processes in chunks 180 + ds.to_parquet("output.parquet", maxcount=10000) 181 + 182 + This creates multiple parquet files: ``output-000000.parquet``, 183 + ``output-000001.parquet``, etc. 184 + 185 + #### Example {.doc-section .doc-section-example} 186 + 187 + :: 188 + 173 189 >>> ds = Dataset[MySample]("data.tar") 174 190 >>> # Small dataset - load all at once 175 191 >>> ds.to_parquet("output.parquet") ··· 185 201 186 202 Wrap a raw msgpack sample into the appropriate dataset-specific type. 187 203 188 - Args: 189 - sample: A dictionary containing at minimum a ``'msgpack'`` key with 190 - serialized sample bytes. 204 + #### Parameters {.doc-section .doc-section-parameters} 205 + 206 + | Name | Type | Description | Default | 207 + |--------|-----------------------------------------------|--------------------------------------------------------------------------------------|------------| 208 + | sample | [WDSRawSample](`atdata.dataset.WDSRawSample`) | A dictionary containing at minimum a ``'msgpack'`` key with serialized sample bytes. | _required_ | 209 + 210 + #### Returns {.doc-section .doc-section-returns} 191 211 192 - Returns: 193 - A deserialized sample of type ``ST``, optionally transformed through 194 - a lens if ``as_type()`` was called. 212 + | Name | Type | Description | 213 + |--------|---------------------------|----------------------------------------------------------------------| 214 + | | [ST](`atdata.dataset.ST`) | A deserialized sample of type ``ST``, optionally transformed through | 215 + | | [ST](`atdata.dataset.ST`) | a lens if ``as_type()`` was called. | 195 216 196 217 ### wrap_batch { #atdata.Dataset.wrap_batch } 197 218 ··· 201 222 202 223 Wrap a batch of raw msgpack samples into a typed SampleBatch. 203 224 204 - Args: 205 - batch: A dictionary containing a ``'msgpack'`` key with a list of 206 - serialized sample bytes. 225 + #### Parameters {.doc-section .doc-section-parameters} 226 + 227 + | Name | Type | Description | Default | 228 + |--------|---------------------------------------------|-------------------------------------------------------------------------------------|------------| 229 + | batch | [WDSRawBatch](`atdata.dataset.WDSRawBatch`) | A dictionary containing a ``'msgpack'`` key with a list of serialized sample bytes. | _required_ | 230 + 231 + #### Returns {.doc-section .doc-section-returns} 232 + 233 + | Name | Type | Description | 234 + |--------|--------------------------------------------------------------------------|-------------------------------------------------------------------| 235 + | | [SampleBatch](`atdata.dataset.SampleBatch`)\[[ST](`atdata.dataset.ST`)\] | A ``SampleBatch[ST]`` containing deserialized samples, optionally | 236 + | | [SampleBatch](`atdata.dataset.SampleBatch`)\[[ST](`atdata.dataset.ST`)\] | transformed through a lens if ``as_type()`` was called. | 207 237 208 - Returns: 209 - A ``SampleBatch[ST]`` containing deserialized samples, optionally 210 - transformed through a lens if ``as_type()`` was called. 238 + #### Note {.doc-section .doc-section-note} 211 239 212 - Note: 213 - This implementation deserializes samples one at a time, then 214 - aggregates them into a batch. 240 + This implementation deserializes samples one at a time, then 241 + aggregates them into a batch.

+9 -3

docs_src/api/DatasetDict.qmd

··· 10 10 multiple dataset splits (train, test, validation, etc.) with convenience 11 11 methods that operate across all splits. 12 12 13 - Type Parameters: 14 - ST: The sample type for all datasets in this dict. 13 + ## Parameters {.doc-section .doc-section-parameters} 15 14 16 - Example: 15 + | Name | Type | Description | Default | 16 + |--------|--------|------------------------------------------------|------------| 17 + | ST | | The sample type for all datasets in this dict. | _required_ | 18 + 19 + ## Example {.doc-section .doc-section-example} 20 + 21 + :: 22 + 17 23 >>> ds_dict = load_dataset("path/to/data", MyData) 18 24 >>> train = ds_dict["train"] 19 25 >>> test = ds_dict["test"]

+120 -48

docs_src/api/DatasetLoader.qmd

··· 10 10 from them. Note that loading a dataset requires having the corresponding 11 11 Python class for the sample type. 12 12 13 - Example: 13 + ## Example {.doc-section .doc-section-example} 14 + 15 + :: 16 + 14 17 >>> client = AtmosphereClient() 15 18 >>> loader = DatasetLoader(client) 16 19 >>> ··· 43 46 44 47 Fetch a dataset record by AT URI. 45 48 46 - Args: 47 - uri: The AT URI of the dataset record. 49 + #### Parameters {.doc-section .doc-section-parameters} 48 50 49 - Returns: 50 - The dataset record as a dictionary. 51 + | Name | Type | Description | Default | 52 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 53 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 51 54 52 - Raises: 53 - ValueError: If the record is not a dataset record. 55 + #### Returns {.doc-section .doc-section-returns} 56 + 57 + | Name | Type | Description | 58 + |--------|----------------|-------------------------------------| 59 + | | [dict](`dict`) | The dataset record as a dictionary. | 60 + 61 + #### Raises {.doc-section .doc-section-raises} 62 + 63 + | Name | Type | Description | 64 + |--------|----------------------------|----------------------------------------| 65 + | | [ValueError](`ValueError`) | If the record is not a dataset record. | 54 66 55 67 ### get_blob_urls { #atdata.atmosphere.DatasetLoader.get_blob_urls } 56 68 ··· 63 75 This resolves the PDS endpoint and constructs URLs that can be 64 76 used to fetch the blob data directly. 65 77 66 - Args: 67 - uri: The AT URI of the dataset record. 78 + #### Parameters {.doc-section .doc-section-parameters} 79 + 80 + | Name | Type | Description | Default | 81 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 82 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 68 83 69 - Returns: 70 - List of URLs for fetching the blob data. 84 + #### Returns {.doc-section .doc-section-returns} 71 85 72 - Raises: 73 - ValueError: If storage type is not blobs or PDS cannot be resolved. 86 + | Name | Type | Description | 87 + |--------|--------------------------------|------------------------------------------| 88 + | | [list](`list`)\[[str](`str`)\] | List of URLs for fetching the blob data. | 89 + 90 + #### Raises {.doc-section .doc-section-raises} 91 + 92 + | Name | Type | Description | 93 + |--------|----------------------------|---------------------------------------------------------| 94 + | | [ValueError](`ValueError`) | If storage type is not blobs or PDS cannot be resolved. | 74 95 75 96 ### get_blobs { #atdata.atmosphere.DatasetLoader.get_blobs } 76 97 ··· 80 101 81 102 Get the blob references from a dataset record. 82 103 83 - Args: 84 - uri: The AT URI of the dataset record. 104 + #### Parameters {.doc-section .doc-section-parameters} 105 + 106 + | Name | Type | Description | Default | 107 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 108 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 85 109 86 - Returns: 87 - List of blob reference dicts with keys: $type, ref, mimeType, size. 110 + #### Returns {.doc-section .doc-section-returns} 88 111 89 - Raises: 90 - ValueError: If the storage type is not blobs. 112 + | Name | Type | Description | 113 + |--------|----------------------------------|---------------------------------------------------------------------| 114 + | | [list](`list`)\[[dict](`dict`)\] | List of blob reference dicts with keys: $type, ref, mimeType, size. | 115 + 116 + #### Raises {.doc-section .doc-section-raises} 117 + 118 + | Name | Type | Description | 119 + |--------|----------------------------|-----------------------------------| 120 + | | [ValueError](`ValueError`) | If the storage type is not blobs. | 91 121 92 122 ### get_metadata { #atdata.atmosphere.DatasetLoader.get_metadata } 93 123 ··· 97 127 98 128 Get the metadata from a dataset record. 99 129 100 - Args: 101 - uri: The AT URI of the dataset record. 130 + #### Parameters {.doc-section .doc-section-parameters} 131 + 132 + | Name | Type | Description | Default | 133 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 134 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 135 + 136 + #### Returns {.doc-section .doc-section-returns} 102 137 103 - Returns: 104 - The metadata dictionary, or None if no metadata. 138 + | Name | Type | Description | 139 + |--------|-------------------------------------------------|--------------------------------------------------| 140 + | | [Optional](`typing.Optional`)\[[dict](`dict`)\] | The metadata dictionary, or None if no metadata. | 105 141 106 142 ### get_storage_type { #atdata.atmosphere.DatasetLoader.get_storage_type } 107 143 ··· 111 147 112 148 Get the storage type of a dataset record. 113 149 114 - Args: 115 - uri: The AT URI of the dataset record. 150 + #### Parameters {.doc-section .doc-section-parameters} 151 + 152 + | Name | Type | Description | Default | 153 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 154 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 155 + 156 + #### Returns {.doc-section .doc-section-returns} 157 + 158 + | Name | Type | Description | 159 + |--------|--------------|-------------------------------| 160 + | | [str](`str`) | Either "external" or "blobs". | 116 161 117 - Returns: 118 - Either "external" or "blobs". 162 + #### Raises {.doc-section .doc-section-raises} 119 163 120 - Raises: 121 - ValueError: If storage type is unknown. 164 + | Name | Type | Description | 165 + |--------|----------------------------|-----------------------------| 166 + | | [ValueError](`ValueError`) | If storage type is unknown. | 122 167 123 168 ### get_urls { #atdata.atmosphere.DatasetLoader.get_urls } 124 169 ··· 128 173 129 174 Get the WebDataset URLs from a dataset record. 130 175 131 - Args: 132 - uri: The AT URI of the dataset record. 176 + #### Parameters {.doc-section .doc-section-parameters} 177 + 178 + | Name | Type | Description | Default | 179 + |--------|-----------------------------------------------------------|-----------------------------------|------------| 180 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 181 + 182 + #### Returns {.doc-section .doc-section-returns} 183 + 184 + | Name | Type | Description | 185 + |--------|--------------------------------|--------------------------| 186 + | | [list](`list`)\[[str](`str`)\] | List of WebDataset URLs. | 133 187 134 - Returns: 135 - List of WebDataset URLs. 188 + #### Raises {.doc-section .doc-section-raises} 136 189 137 - Raises: 138 - ValueError: If the storage type is not external URLs. 190 + | Name | Type | Description | 191 + |--------|----------------------------|-------------------------------------------| 192 + | | [ValueError](`ValueError`) | If the storage type is not external URLs. | 139 193 140 194 ### list_all { #atdata.atmosphere.DatasetLoader.list_all } 141 195 ··· 145 199 146 200 List dataset records from a repository. 147 201 148 - Args: 149 - repo: The DID of the repository. Defaults to authenticated user. 150 - limit: Maximum number of records to return. 202 + #### Parameters {.doc-section .doc-section-parameters} 151 203 152 - Returns: 153 - List of dataset records. 204 + | Name | Type | Description | Default | 205 + |--------|-----------------------------------------------|------------------------------------------------------------|-----------| 206 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID of the repository. Defaults to authenticated user. | `None` | 207 + | limit | [int](`int`) | Maximum number of records to return. | `100` | 208 + 209 + #### Returns {.doc-section .doc-section-returns} 210 + 211 + | Name | Type | Description | 212 + |--------|----------------------------------|--------------------------| 213 + | | [list](`list`)\[[dict](`dict`)\] | List of dataset records. | 154 214 155 215 ### to_dataset { #atdata.atmosphere.DatasetLoader.to_dataset } 156 216 ··· 166 226 167 227 Supports both external URL storage and ATProto blob storage. 168 228 169 - Args: 170 - uri: The AT URI of the dataset record. 171 - sample_type: The Python class for the sample type. 229 + #### Parameters {.doc-section .doc-section-parameters} 172 230 173 - Returns: 174 - A Dataset instance configured from the record. 231 + | Name | Type | Description | Default | 232 + |-------------|---------------------------------------------------------------|---------------------------------------|------------| 233 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the dataset record. | _required_ | 234 + | sample_type | [Type](`typing.Type`)\[[ST](`atdata.atmosphere.records.ST`)\] | The Python class for the sample type. | _required_ | 235 + 236 + #### Returns {.doc-section .doc-section-returns} 237 + 238 + | Name | Type | Description | 239 + |--------|-----------------------------------------------------------------------------|------------------------------------------------| 240 + | | [Dataset](`atdata.dataset.Dataset`)\[[ST](`atdata.atmosphere.records.ST`)\] | A Dataset instance configured from the record. | 241 + 242 + #### Raises {.doc-section .doc-section-raises} 175 243 176 - Raises: 177 - ValueError: If no storage URLs can be resolved. 244 + | Name | Type | Description | 245 + |--------|----------------------------|-------------------------------------| 246 + | | [ValueError](`ValueError`) | If no storage URLs can be resolved. | 178 247 179 - Example: 248 + #### Example {.doc-section .doc-section-example} 249 + 250 + :: 251 + 180 252 >>> loader = DatasetLoader(client) 181 253 >>> dataset = loader.to_dataset(uri, MySampleType) 182 254 >>> for batch in dataset.shuffled(batch_size=32):

+66 -43

docs_src/api/DatasetPublisher.qmd

··· 9 9 This class creates dataset records that reference a schema and point to 10 10 external storage (WebDataset URLs) or ATProto blobs. 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> dataset = atdata.Dataset[MySample]("s3://bucket/data-{000000..000009}.tar") 14 17 >>> 15 18 >>> client = AtmosphereClient() ··· 50 53 51 54 Publish a dataset index record to ATProto. 52 55 53 - Args: 54 - dataset: The Dataset to publish. 55 - name: Human-readable dataset name. 56 - schema_uri: AT URI of the schema record. If not provided and 57 - auto_publish_schema is True, the schema will be published. 58 - description: Human-readable description. 59 - tags: Searchable tags for discovery. 60 - license: SPDX license identifier (e.g., 'MIT', 'Apache-2.0'). 61 - auto_publish_schema: If True and schema_uri not provided, 62 - automatically publish the schema first. 63 - schema_version: Version for auto-published schema. 64 - rkey: Optional explicit record key. 56 + #### Parameters {.doc-section .doc-section-parameters} 57 + 58 + | Name | Type | Description | Default | 59 + |---------------------|-----------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------| 60 + | dataset | [Dataset](`atdata.dataset.Dataset`)\[[ST](`atdata.atmosphere.records.ST`)\] | The Dataset to publish. | _required_ | 61 + | name | [str](`str`) | Human-readable dataset name. | _required_ | 62 + | schema_uri | [Optional](`typing.Optional`)\[[str](`str`)\] | AT URI of the schema record. If not provided and auto_publish_schema is True, the schema will be published. | `None` | 63 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | Human-readable description. | `None` | 64 + | tags | [Optional](`typing.Optional`)\[[list](`list`)\[[str](`str`)\]\] | Searchable tags for discovery. | `None` | 65 + | license | [Optional](`typing.Optional`)\[[str](`str`)\] | SPDX license identifier (e.g., 'MIT', 'Apache-2.0'). | `None` | 66 + | auto_publish_schema | [bool](`bool`) | If True and schema_uri not provided, automatically publish the schema first. | `True` | 67 + | schema_version | [str](`str`) | Version for auto-published schema. | `'1.0.0'` | 68 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. | `None` | 69 + 70 + #### Returns {.doc-section .doc-section-returns} 71 + 72 + | Name | Type | Description | 73 + |--------|-------------------------------------------|-------------------------------------------| 74 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created dataset record. | 65 75 66 - Returns: 67 - The AT URI of the created dataset record. 76 + #### Raises {.doc-section .doc-section-raises} 68 77 69 - Raises: 70 - ValueError: If schema_uri is not provided and auto_publish_schema is False. 78 + | Name | Type | Description | 79 + |--------|----------------------------|-----------------------------------------------------------------| 80 + | | [ValueError](`ValueError`) | If schema_uri is not provided and auto_publish_schema is False. | 71 81 72 82 ### publish_with_blobs { #atdata.atmosphere.DatasetPublisher.publish_with_blobs } 73 83 ··· 92 102 a dataset record referencing them. Suitable for smaller datasets that 93 103 fit within blob size limits (typically 50MB per blob, configurable). 94 104 95 - Args: 96 - blobs: List of binary data (e.g., tar shards) to upload as blobs. 97 - schema_uri: AT URI of the schema record. 98 - name: Human-readable dataset name. 99 - description: Human-readable description. 100 - tags: Searchable tags for discovery. 101 - license: SPDX license identifier. 102 - metadata: Arbitrary metadata dictionary. 103 - mime_type: MIME type for the blobs (default: application/x-tar). 104 - rkey: Optional explicit record key. 105 + #### Parameters {.doc-section .doc-section-parameters} 106 + 107 + | Name | Type | Description | Default | 108 + |-------------|-----------------------------------------------------------------|------------------------------------------------------------|-----------------------| 109 + | blobs | [list](`list`)\[[bytes](`bytes`)\] | List of binary data (e.g., tar shards) to upload as blobs. | _required_ | 110 + | schema_uri | [str](`str`) | AT URI of the schema record. | _required_ | 111 + | name | [str](`str`) | Human-readable dataset name. | _required_ | 112 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | Human-readable description. | `None` | 113 + | tags | [Optional](`typing.Optional`)\[[list](`list`)\[[str](`str`)\]\] | Searchable tags for discovery. | `None` | 114 + | license | [Optional](`typing.Optional`)\[[str](`str`)\] | SPDX license identifier. | `None` | 115 + | metadata | [Optional](`typing.Optional`)\[[dict](`dict`)\] | Arbitrary metadata dictionary. | `None` | 116 + | mime_type | [str](`str`) | MIME type for the blobs (default: application/x-tar). | `'application/x-tar'` | 117 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. | `None` | 105 118 106 - Returns: 107 - The AT URI of the created dataset record. 119 + #### Returns {.doc-section .doc-section-returns} 120 + 121 + | Name | Type | Description | 122 + |--------|-------------------------------------------|-------------------------------------------| 123 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created dataset record. | 124 + 125 + #### Note {.doc-section .doc-section-note} 108 126 109 - Note: 110 - Blobs are only retained by the PDS when referenced in a committed 111 - record. This method handles that automatically. 127 + Blobs are only retained by the PDS when referenced in a committed 128 + record. This method handles that automatically. 112 129 113 130 ### publish_with_urls { #atdata.atmosphere.DatasetPublisher.publish_with_urls } 114 131 ··· 131 148 This method allows publishing a dataset record without having a 132 149 Dataset object, useful for registering existing WebDataset files. 133 150 134 - Args: 135 - urls: List of WebDataset URLs with brace notation. 136 - schema_uri: AT URI of the schema record. 137 - name: Human-readable dataset name. 138 - description: Human-readable description. 139 - tags: Searchable tags for discovery. 140 - license: SPDX license identifier. 141 - metadata: Arbitrary metadata dictionary. 142 - rkey: Optional explicit record key. 151 + #### Parameters {.doc-section .doc-section-parameters} 152 + 153 + | Name | Type | Description | Default | 154 + |-------------|-----------------------------------------------------------------|----------------------------------------------|------------| 155 + | urls | [list](`list`)\[[str](`str`)\] | List of WebDataset URLs with brace notation. | _required_ | 156 + | schema_uri | [str](`str`) | AT URI of the schema record. | _required_ | 157 + | name | [str](`str`) | Human-readable dataset name. | _required_ | 158 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | Human-readable description. | `None` | 159 + | tags | [Optional](`typing.Optional`)\[[list](`list`)\[[str](`str`)\]\] | Searchable tags for discovery. | `None` | 160 + | license | [Optional](`typing.Optional`)\[[str](`str`)\] | SPDX license identifier. | `None` | 161 + | metadata | [Optional](`typing.Optional`)\[[dict](`dict`)\] | Arbitrary metadata dictionary. | `None` | 162 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. | `None` | 163 + 164 + #### Returns {.doc-section .doc-section-returns} 143 165 144 - Returns: 145 - The AT URI of the created dataset record. 166 + | Name | Type | Description | 167 + |--------|-------------------------------------------|-------------------------------------------| 168 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created dataset record. |

+39 -17

docs_src/api/DictSample.qmd

··· 20 20 ``@packable``-decorated class. Every ``@packable`` class automatically 21 21 registers a lens from ``DictSample``, making this conversion seamless. 22 22 23 - Example: 23 + ## Example {.doc-section .doc-section-example} 24 + 25 + :: 26 + 24 27 >>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample] 25 28 >>> for sample in ds.ordered(): 26 29 ... print(sample.some_field) # Attribute access ··· 30 33 >>> # Convert to typed schema 31 34 >>> typed_ds = ds.as_type(MyTypedSample) 32 35 33 - Note: 34 - NDArray fields are stored as raw bytes in DictSample. They are only 35 - converted to numpy arrays when accessed through a typed sample class. 36 + ## Note {.doc-section .doc-section-note} 37 + 38 + NDArray fields are stored as raw bytes in DictSample. They are only 39 + converted to numpy arrays when accessed through a typed sample class. 36 40 37 41 ## Attributes 38 42 ··· 61 65 62 66 Create a DictSample from raw msgpack bytes. 63 67 64 - Args: 65 - bs: Raw bytes from a msgpack-serialized sample. 68 + #### Parameters {.doc-section .doc-section-parameters} 66 69 67 - Returns: 68 - New DictSample instance with the unpacked data. 70 + | Name | Type | Description | Default | 71 + |--------|------------------|---------------------------------------------|------------| 72 + | bs | [bytes](`bytes`) | Raw bytes from a msgpack-serialized sample. | _required_ | 73 + 74 + #### Returns {.doc-section .doc-section-returns} 75 + 76 + | Name | Type | Description | 77 + |--------|-------------------------------------------|-------------------------------------------------| 78 + | | [DictSample](`atdata.dataset.DictSample`) | New DictSample instance with the unpacked data. | 69 79 70 80 ### from_data { #atdata.DictSample.from_data } 71 81 ··· 75 85 76 86 Create a DictSample from unpacked msgpack data. 77 87 78 - Args: 79 - data: Dictionary with field names as keys. 88 + #### Parameters {.doc-section .doc-section-parameters} 80 89 81 - Returns: 82 - New DictSample instance wrapping the data. 90 + | Name | Type | Description | Default | 91 + |--------|-----------------------------------------------------|--------------------------------------|------------| 92 + | data | [dict](`dict`)\[[str](`str`), [Any](`typing.Any`)\] | Dictionary with field names as keys. | _required_ | 93 + 94 + #### Returns {.doc-section .doc-section-returns} 95 + 96 + | Name | Type | Description | 97 + |--------|-------------------------------------------|--------------------------------------------| 98 + | | [DictSample](`atdata.dataset.DictSample`) | New DictSample instance wrapping the data. | 83 99 84 100 ### get { #atdata.DictSample.get } 85 101 ··· 89 105 90 106 Get a field value with optional default. 91 107 92 - Args: 93 - key: Field name to access. 94 - default: Value to return if field doesn't exist. 108 + #### Parameters {.doc-section .doc-section-parameters} 95 109 96 - Returns: 97 - The field value or default. 110 + | Name | Type | Description | Default | 111 + |---------|---------------------|-----------------------------------------|------------| 112 + | key | [str](`str`) | Field name to access. | _required_ | 113 + | default | [Any](`typing.Any`) | Value to return if field doesn't exist. | `None` | 114 + 115 + #### Returns {.doc-section .doc-section-returns} 116 + 117 + | Name | Type | Description | 118 + |--------|---------------------|-----------------------------| 119 + | | [Any](`typing.Any`) | The field value or default. | 98 120 99 121 ### items { #atdata.DictSample.items } 100 122

+6 -5

docs_src/api/IndexEntry.qmd

··· 9 9 Both LocalDatasetEntry and atmosphere DatasetRecord-based entries 10 10 should satisfy this protocol, enabling code that works with either. 11 11 12 - Properties: 13 - name: Human-readable dataset name 14 - schema_ref: Reference to schema (local:// path or AT URI) 15 - data_urls: WebDataset URLs for the data 16 - metadata: Arbitrary metadata dict, or None 12 + ## Properties {.doc-section .doc-section-properties} 13 + 14 + name: Human-readable dataset name 15 + schema_ref: Reference to schema (local:// path or AT URI) 16 + data_urls: WebDataset URLs for the data 17 + metadata: Arbitrary metadata dict, or None 17 18 18 19 ## Attributes 19 20

+99 -47

docs_src/api/Lens.qmd

··· 18 18 Lenses support the functional programming concept of composable, well-behaved 19 19 transformations that satisfy lens laws (GetPut and PutGet). 20 20 21 - Example: 21 + ## Example {.doc-section .doc-section-example} 22 + 23 + :: 24 + 22 25 >>> @packable 23 26 ... class FullData: 24 27 ... name: str ··· 61 64 and an optional putter that transforms ``(V, S) -> S``, enabling updates to 62 65 the view to be reflected back in the source. 63 66 64 - Type Parameters: 65 - S: The source type, must derive from ``PackableSample``. 66 - V: The view type, must derive from ``PackableSample``. 67 + #### Parameters {.doc-section .doc-section-parameters} 67 68 68 - Example: 69 + | Name | Type | Description | Default | 70 + |--------|--------|-------------------------------------------------------|------------| 71 + | S | | The source type, must derive from ``PackableSample``. | _required_ | 72 + | V | | The view type, must derive from ``PackableSample``. | _required_ | 73 + 74 + #### Example {.doc-section .doc-section-example} 75 + 76 + :: 77 + 69 78 >>> @lens 70 79 ... def name_lens(full: FullData) -> NameOnly: 71 80 ... return NameOnly(name=full.name) ··· 90 99 91 100 Transform the source into the view type. 92 101 93 - Args: 94 - s: The source sample of type ``S``. 102 + ###### Parameters {.doc-section .doc-section-parameters} 95 103 96 - Returns: 97 - A view of the source as type ``V``. 104 + | Name | Type | Description | Default | 105 + |--------|----------------------|----------------------------------|------------| 106 + | s | [S](`atdata.lens.S`) | The source sample of type ``S``. | _required_ | 107 + 108 + ###### Returns {.doc-section .doc-section-returns} 109 + 110 + | Name | Type | Description | 111 + |--------|----------------------|-------------------------------------| 112 + | | [V](`atdata.lens.V`) | A view of the source as type ``V``. | 98 113 99 114 ##### put { #atdata.lens.Lens.put } 100 115 ··· 104 119 105 120 Update the source based on a modified view. 106 121 107 - Args: 108 - v: The modified view of type ``V``. 109 - s: The original source of type ``S``. 122 + ###### Parameters {.doc-section .doc-section-parameters} 123 + 124 + | Name | Type | Description | Default | 125 + |--------|----------------------|------------------------------------|------------| 126 + | v | [V](`atdata.lens.V`) | The modified view of type ``V``. | _required_ | 127 + | s | [S](`atdata.lens.S`) | The original source of type ``S``. | _required_ | 110 128 111 - Returns: 112 - An updated source of type ``S`` that reflects changes from the view. 129 + ###### Returns {.doc-section .doc-section-returns} 130 + 131 + | Name | Type | Description | 132 + |--------|----------------------|----------------------------------------------------------------------| 133 + | | [S](`atdata.lens.S`) | An updated source of type ``S`` that reflects changes from the view. | 113 134 114 135 ##### putter { #atdata.lens.Lens.putter } 115 136 ··· 119 140 120 141 Decorator to register a putter function for this lens. 121 142 122 - Args: 123 - put: A function that takes a view of type ``V`` and source of type 124 - ``S``, and returns an updated source of type ``S``. 143 + ###### Parameters {.doc-section .doc-section-parameters} 125 144 126 - Returns: 127 - The putter function, allowing this to be used as a decorator. 145 + | Name | Type | Description | Default | 146 + |--------|--------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------|------------| 147 + | put | [LensPutter](`atdata.lens.LensPutter`)\[[S](`atdata.lens.S`), [V](`atdata.lens.V`)\] | A function that takes a view of type ``V`` and source of type ``S``, and returns an updated source of type ``S``. | _required_ | 128 148 129 - Example: 149 + ###### Returns {.doc-section .doc-section-returns} 150 + 151 + | Name | Type | Description | 152 + |--------|--------------------------------------------------------------------------------------|---------------------------------------------------------------| 153 + | | [LensPutter](`atdata.lens.LensPutter`)\[[S](`atdata.lens.S`), [V](`atdata.lens.V`)\] | The putter function, allowing this to be used as a decorator. | 154 + 155 + ###### Example {.doc-section .doc-section-example} 156 + 157 + :: 158 + 130 159 >>> @my_lens.putter 131 160 ... def my_lens_put(view: ViewType, source: SourceType) -> SourceType: 132 161 ... return SourceType(...) ··· 143 172 all lenses decorated with ``@lens``. It enables looking up transformations 144 173 between different ``PackableSample`` types. 145 174 146 - Attributes: 147 - _instance: The singleton instance of this class. 148 - _registry: Dictionary mapping ``(source_type, view_type)`` tuples to 149 - their corresponding ``Lens`` objects. 175 + #### Attributes {.doc-section .doc-section-attributes} 176 + 177 + | Name | Type | Description | 178 + |-----------|---------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------| 179 + | _instance | | The singleton instance of this class. | 180 + | _registry | [Dict](`typing.Dict`)\[[LensSignature](`atdata.lens.LensSignature`), [Lens](`atdata.lens.Lens`)\] | Dictionary mapping ``(source_type, view_type)`` tuples to their corresponding ``Lens`` objects. | 150 181 151 182 #### Methods 152 183 ··· 163 194 164 195 Register a lens as the canonical transformation between two types. 165 196 166 - Args: 167 - _lens: The lens to register. Will be stored in the registry under 168 - the key ``(_lens.source_type, _lens.view_type)``. 197 + ###### Parameters {.doc-section .doc-section-parameters} 198 + 199 + | Name | Type | Description | Default | 200 + |--------|----------------------------|--------------------------------------------------------------------------------------------------------------|------------| 201 + | _lens | [Lens](`atdata.lens.Lens`) | The lens to register. Will be stored in the registry under the key ``(_lens.source_type, _lens.view_type)``. | _required_ | 202 + 203 + ###### Note {.doc-section .doc-section-note} 169 204 170 - Note: 171 - If a lens already exists for the same type pair, it will be 172 - overwritten. 205 + If a lens already exists for the same type pair, it will be 206 + overwritten. 173 207 174 208 ##### transform { #atdata.lens.LensNetwork.transform } 175 209 ··· 179 213 180 214 Look up the lens transformation between two sample types. 181 215 182 - Args: 183 - source: The source sample type (must derive from ``PackableSample``). 184 - view: The target view type (must derive from ``PackableSample``). 216 + ###### Parameters {.doc-section .doc-section-parameters} 217 + 218 + | Name | Type | Description | Default | 219 + |--------|------------------------------------------|---------------------------------------------------------------|------------| 220 + | source | [DatasetType](`atdata.lens.DatasetType`) | The source sample type (must derive from ``PackableSample``). | _required_ | 221 + | view | [DatasetType](`atdata.lens.DatasetType`) | The target view type (must derive from ``PackableSample``). | _required_ | 222 + 223 + ###### Returns {.doc-section .doc-section-returns} 224 + 225 + | Name | Type | Description | 226 + |--------|----------------------------|----------------------------------------------------------------------| 227 + | | [Lens](`atdata.lens.Lens`) | The registered ``Lens`` that transforms from ``source`` to ``view``. | 228 + 229 + ###### Raises {.doc-section .doc-section-raises} 185 230 186 - Returns: 187 - The registered ``Lens`` that transforms from ``source`` to ``view``. 231 + | Name | Type | Description | 232 + |--------|----------------------------|---------------------------------------------------------| 233 + | | [ValueError](`ValueError`) | If no lens has been registered for the given type pair. | 188 234 189 - Raises: 190 - ValueError: If no lens has been registered for the given type pair. 235 + ###### Note {.doc-section .doc-section-note} 191 236 192 - Note: 193 - Currently only supports direct transformations. Compositional 194 - transformations (chaining multiple lenses) are not yet implemented. 237 + Currently only supports direct transformations. Compositional 238 + transformations (chaining multiple lenses) are not yet implemented. 195 239 196 240 ## Functions 197 241 ··· 210 254 This decorator converts a getter function into a ``Lens`` object and 211 255 automatically registers it in the global ``LensNetwork`` registry. 212 256 213 - Args: 214 - f: A getter function that transforms from source type ``S`` to view 215 - type ``V``. Must have exactly one parameter with a type annotation. 257 + #### Parameters {.doc-section .doc-section-parameters} 216 258 217 - Returns: 218 - A ``Lens[S, V]`` object that can be called to apply the transformation 219 - or decorated with ``@lens_name.putter`` to add a putter function. 259 + | Name | Type | Description | Default | 260 + |--------|--------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|------------| 261 + | f | [LensGetter](`atdata.lens.LensGetter`)\[[S](`atdata.lens.S`), [V](`atdata.lens.V`)\] | A getter function that transforms from source type ``S`` to view type ``V``. Must have exactly one parameter with a type annotation. | _required_ | 262 + 263 + #### Returns {.doc-section .doc-section-returns} 220 264 221 - Example: 265 + | Name | Type | Description | 266 + |--------|--------------------------------------------------------------------------|------------------------------------------------------------------------| 267 + | | [Lens](`atdata.lens.Lens`)\[[S](`atdata.lens.S`), [V](`atdata.lens.V`)\] | A ``Lens[S, V]`` object that can be called to apply the transformation | 268 + | | [Lens](`atdata.lens.Lens`)\[[S](`atdata.lens.S`), [V](`atdata.lens.V`)\] | or decorated with ``@lens_name.putter`` to add a putter function. | 269 + 270 + #### Example {.doc-section .doc-section-example} 271 + 272 + :: 273 + 222 274 >>> @lens 223 275 ... def extract_name(full: FullData) -> NameOnly: 224 276 ... return NameOnly(name=full.name)

+42 -19

docs_src/api/LensLoader.qmd

··· 10 10 using a lens requires installing the referenced code and importing 11 11 it manually. 12 12 13 - Example: 13 + ## Example {.doc-section .doc-section-example} 14 + 15 + :: 16 + 14 17 >>> client = AtmosphereClient() 15 18 >>> loader = LensLoader(client) 16 19 >>> ··· 39 42 40 43 Find lenses that transform between specific schemas. 41 44 42 - Args: 43 - source_schema_uri: AT URI of the source schema. 44 - target_schema_uri: Optional AT URI of the target schema. 45 - If not provided, returns all lenses from the source. 46 - repo: The DID of the repository to search. 45 + #### Parameters {.doc-section .doc-section-parameters} 46 + 47 + | Name | Type | Description | Default | 48 + |-------------------|-----------------------------------------------|--------------------------------------------------------------------------------------------|------------| 49 + | source_schema_uri | [str](`str`) | AT URI of the source schema. | _required_ | 50 + | target_schema_uri | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional AT URI of the target schema. If not provided, returns all lenses from the source. | `None` | 51 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID of the repository to search. | `None` | 52 + 53 + #### Returns {.doc-section .doc-section-returns} 47 54 48 - Returns: 49 - List of matching lens records. 55 + | Name | Type | Description | 56 + |--------|----------------------------------|--------------------------------| 57 + | | [list](`list`)\[[dict](`dict`)\] | List of matching lens records. | 50 58 51 59 ### get { #atdata.atmosphere.LensLoader.get } 52 60 ··· 56 64 57 65 Fetch a lens record by AT URI. 58 66 59 - Args: 60 - uri: The AT URI of the lens record. 67 + #### Parameters {.doc-section .doc-section-parameters} 68 + 69 + | Name | Type | Description | Default | 70 + |--------|-----------------------------------------------------------|--------------------------------|------------| 71 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the lens record. | _required_ | 72 + 73 + #### Returns {.doc-section .doc-section-returns} 61 74 62 - Returns: 63 - The lens record as a dictionary. 75 + | Name | Type | Description | 76 + |--------|----------------|----------------------------------| 77 + | | [dict](`dict`) | The lens record as a dictionary. | 78 + 79 + #### Raises {.doc-section .doc-section-raises} 64 80 65 - Raises: 66 - ValueError: If the record is not a lens record. 81 + | Name | Type | Description | 82 + |--------|----------------------------|-------------------------------------| 83 + | | [ValueError](`ValueError`) | If the record is not a lens record. | 67 84 68 85 ### list_all { #atdata.atmosphere.LensLoader.list_all } 69 86 ··· 73 90 74 91 List lens records from a repository. 75 92 76 - Args: 77 - repo: The DID of the repository. Defaults to authenticated user. 78 - limit: Maximum number of records to return. 93 + #### Parameters {.doc-section .doc-section-parameters} 94 + 95 + | Name | Type | Description | Default | 96 + |--------|-----------------------------------------------|------------------------------------------------------------|-----------| 97 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID of the repository. Defaults to authenticated user. | `None` | 98 + | limit | [int](`int`) | Maximum number of records to return. | `100` | 99 + 100 + #### Returns {.doc-section .doc-section-returns} 79 101 80 - Returns: 81 - List of lens records. 102 + | Name | Type | Description | 103 + |--------|----------------------------------|-----------------------| 104 + | | [list](`list`)\[[dict](`dict`)\] | List of lens records. |

+49 -32

docs_src/api/LensPublisher.qmd

··· 9 9 This class creates lens records that reference source and target schemas 10 10 and point to the transformation code in a git repository. 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> @atdata.lens 14 17 ... def my_lens(source: SourceType) -> TargetType: 15 18 ... return TargetType(field=source.other_field) ··· 28 31 ... putter_path="mymodule.lenses:my_lens_putter", 29 32 ... ) 30 33 31 - Security Note: 32 - Lens code is stored as references to git repositories rather than 33 - inline code. This prevents arbitrary code execution from ATProto 34 - records. Users must manually install and trust lens implementations. 34 + ## Security Note {.doc-section .doc-section-security-note} 35 + 36 + Lens code is stored as references to git repositories rather than 37 + inline code. This prevents arbitrary code execution from ATProto 38 + records. Users must manually install and trust lens implementations. 35 39 36 40 ## Methods 37 41 ··· 58 62 59 63 Publish a lens transformation record to ATProto. 60 64 61 - Args: 62 - name: Human-readable lens name. 63 - source_schema_uri: AT URI of the source schema. 64 - target_schema_uri: AT URI of the target schema. 65 - description: What this transformation does. 66 - code_repository: Git repository URL containing the lens code. 67 - code_commit: Git commit hash for reproducibility. 68 - getter_path: Module path to the getter function 69 - (e.g., 'mymodule.lenses:my_getter'). 70 - putter_path: Module path to the putter function 71 - (e.g., 'mymodule.lenses:my_putter'). 72 - rkey: Optional explicit record key. 65 + #### Parameters {.doc-section .doc-section-parameters} 66 + 67 + | Name | Type | Description | Default | 68 + |-------------------|-----------------------------------------------|-------------------------------------------------------------------------|------------| 69 + | name | [str](`str`) | Human-readable lens name. | _required_ | 70 + | source_schema_uri | [str](`str`) | AT URI of the source schema. | _required_ | 71 + | target_schema_uri | [str](`str`) | AT URI of the target schema. | _required_ | 72 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | What this transformation does. | `None` | 73 + | code_repository | [Optional](`typing.Optional`)\[[str](`str`)\] | Git repository URL containing the lens code. | `None` | 74 + | code_commit | [Optional](`typing.Optional`)\[[str](`str`)\] | Git commit hash for reproducibility. | `None` | 75 + | getter_path | [Optional](`typing.Optional`)\[[str](`str`)\] | Module path to the getter function (e.g., 'mymodule.lenses:my_getter'). | `None` | 76 + | putter_path | [Optional](`typing.Optional`)\[[str](`str`)\] | Module path to the putter function (e.g., 'mymodule.lenses:my_putter'). | `None` | 77 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. | `None` | 78 + 79 + #### Returns {.doc-section .doc-section-returns} 80 + 81 + | Name | Type | Description | 82 + |--------|-------------------------------------------|----------------------------------------| 83 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created lens record. | 73 84 74 - Returns: 75 - The AT URI of the created lens record. 85 + #### Raises {.doc-section .doc-section-raises} 76 86 77 - Raises: 78 - ValueError: If code references are incomplete. 87 + | Name | Type | Description | 88 + |--------|----------------------------|------------------------------------| 89 + | | [ValueError](`ValueError`) | If code references are incomplete. | 79 90 80 91 ### publish_from_lens { #atdata.atmosphere.LensPublisher.publish_from_lens } 81 92 ··· 98 109 This method extracts the getter and putter function names from 99 110 the Lens object and publishes a record referencing them. 100 111 101 - Args: 102 - lens_obj: The Lens object to publish. 103 - name: Human-readable lens name. 104 - source_schema_uri: AT URI of the source schema. 105 - target_schema_uri: AT URI of the target schema. 106 - code_repository: Git repository URL. 107 - code_commit: Git commit hash. 108 - description: What this transformation does. 109 - rkey: Optional explicit record key. 112 + #### Parameters {.doc-section .doc-section-parameters} 110 113 111 - Returns: 112 - The AT URI of the created lens record. 114 + | Name | Type | Description | Default | 115 + |-------------------|-----------------------------------------------|--------------------------------|------------| 116 + | lens_obj | [Lens](`atdata.lens.Lens`) | The Lens object to publish. | _required_ | 117 + | name | [str](`str`) | Human-readable lens name. | _required_ | 118 + | source_schema_uri | [str](`str`) | AT URI of the source schema. | _required_ | 119 + | target_schema_uri | [str](`str`) | AT URI of the target schema. | _required_ | 120 + | code_repository | [str](`str`) | Git repository URL. | _required_ | 121 + | code_commit | [str](`str`) | Git commit hash. | _required_ | 122 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | What this transformation does. | `None` | 123 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. | `None` | 124 + 125 + #### Returns {.doc-section .doc-section-returns} 126 + 127 + | Name | Type | Description | 128 + |--------|-------------------------------------------|----------------------------------------| 129 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created lens record. |

+4 -1

docs_src/api/Packable-protocol.qmd

··· 18 18 - Schema publishing (class introspection via dataclass fields) 19 19 - Serialization/deserialization (packed, from_bytes) 20 20 21 - Example: 21 + ## Example {.doc-section .doc-section-example} 22 + 23 + :: 24 + 22 25 >>> @packable 23 26 ... class MySample: 24 27 ... name: str

+24 -9

docs_src/api/PackableSample.qmd

··· 15 15 1. Direct inheritance with the ``@dataclass`` decorator 16 16 2. Using the ``@packable`` decorator (recommended) 17 17 18 - Example: 18 + ## Example {.doc-section .doc-section-example} 19 + 20 + :: 21 + 19 22 >>> @packable 20 23 ... class MyData: 21 24 ... name: str ··· 47 50 48 51 Create a sample instance from raw msgpack bytes. 49 52 50 - Args: 51 - bs: Raw bytes from a msgpack-serialized sample. 53 + #### Parameters {.doc-section .doc-section-parameters} 52 54 53 - Returns: 54 - A new instance of this sample class deserialized from the bytes. 55 + | Name | Type | Description | Default | 56 + |--------|------------------|---------------------------------------------|------------| 57 + | bs | [bytes](`bytes`) | Raw bytes from a msgpack-serialized sample. | _required_ | 58 + 59 + #### Returns {.doc-section .doc-section-returns} 60 + 61 + | Name | Type | Description | 62 + |--------|-----------------------|------------------------------------------------------------------| 63 + | | [Self](`typing.Self`) | A new instance of this sample class deserialized from the bytes. | 55 64 56 65 ### from_data { #atdata.PackableSample.from_data } 57 66 ··· 61 70 62 71 Create a sample instance from unpacked msgpack data. 63 72 64 - Args: 65 - data: Dictionary with keys matching the sample's field names. 73 + #### Parameters {.doc-section .doc-section-parameters} 66 74 67 - Returns: 68 - New instance with NDArray fields auto-converted from bytes. 75 + | Name | Type | Description | Default | 76 + |--------|-----------------------------------------------|---------------------------------------------------------|------------| 77 + | data | [WDSRawSample](`atdata.dataset.WDSRawSample`) | Dictionary with keys matching the sample's field names. | _required_ | 78 + 79 + #### Returns {.doc-section .doc-section-returns} 80 + 81 + | Name | Type | Description | 82 + |--------|-----------------------|-------------------------------------------------------------| 83 + | | [Self](`typing.Self`) | New instance with NDArray fields auto-converted from bytes. |

+68 -40

docs_src/api/S3Source.qmd

··· 24 24 or global gopen_schemes registration. Credentials are scoped to the 25 25 source instance. 26 26 27 - Attributes: 28 - bucket: S3 bucket name. 29 - keys: List of object keys (paths within bucket). 30 - endpoint: Optional custom endpoint URL for S3-compatible services. 31 - access_key: Optional AWS access key ID. 32 - secret_key: Optional AWS secret access key. 33 - region: Optional AWS region (defaults to us-east-1). 27 + ## Attributes {.doc-section .doc-section-attributes} 28 + 29 + | Name | Type | Description | 30 + |------------|--------------------------------|----------------------------------------------------------| 31 + | bucket | [str](`str`) | S3 bucket name. | 32 + | keys | [list](`list`)\[[str](`str`)\] | List of object keys (paths within bucket). | 33 + | endpoint | [str](`str`) \| None | Optional custom endpoint URL for S3-compatible services. | 34 + | access_key | [str](`str`) \| None | Optional AWS access key ID. | 35 + | secret_key | [str](`str`) \| None | Optional AWS secret access key. | 36 + | region | [str](`str`) \| None | Optional AWS region (defaults to us-east-1). | 37 + 38 + ## Example {.doc-section .doc-section-example} 39 + 40 + :: 34 41 35 - Example: 36 42 >>> source = S3Source( 37 43 ... bucket="my-datasets", 38 44 ... keys=["train/shard-000.tar", "train/shard-001.tar"], ··· 43 49 >>> for shard_id, stream in source.shards: 44 50 ... process(stream) 45 51 46 - ## Attributes 47 - 48 - | Name | Description | 49 - | --- | --- | 50 - | [shard_list](#atdata.S3Source.shard_list) | Return list of S3 URIs for the shards (deprecated, use list_shards()). | 51 - | [shards](#atdata.S3Source.shards) | Lazily yield (s3_uri, stream) pairs for each shard. | 52 - 53 52 ## Methods 54 53 55 54 | Name | Description | ··· 69 68 70 69 Accepts the same credential format used by S3DataStore. 71 70 72 - Args: 73 - credentials: Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, 74 - and optionally AWS_ENDPOINT. 75 - bucket: S3 bucket name. 76 - keys: List of object keys. 71 + #### Parameters {.doc-section .doc-section-parameters} 72 + 73 + | Name | Type | Description | Default | 74 + |-------------|----------------------------------------------|----------------------------------------------------------------------------------|------------| 75 + | credentials | [dict](`dict`)\[[str](`str`), [str](`str`)\] | Dict with AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and optionally AWS_ENDPOINT. | _required_ | 76 + | bucket | [str](`str`) | S3 bucket name. | _required_ | 77 + | keys | [list](`list`)\[[str](`str`)\] | List of object keys. | _required_ | 78 + 79 + #### Returns {.doc-section .doc-section-returns} 80 + 81 + | Name | Type | Description | 82 + |--------|--------------|----------------------| 83 + | | \'S3Source\' | Configured S3Source. | 77 84 78 - Returns: 79 - Configured S3Source. 85 + #### Example {.doc-section .doc-section-example} 80 86 81 - Example: 87 + :: 88 + 82 89 >>> creds = { 83 90 ... "AWS_ACCESS_KEY_ID": "...", 84 91 ... "AWS_SECRET_ACCESS_KEY": "...", ··· 104 111 Parses s3://bucket/key URLs and extracts bucket and keys. 105 112 All URLs must be in the same bucket. 106 113 107 - Args: 108 - urls: List of s3:// URLs. 109 - endpoint: Optional custom endpoint. 110 - access_key: Optional access key. 111 - secret_key: Optional secret key. 112 - region: Optional region. 114 + #### Parameters {.doc-section .doc-section-parameters} 115 + 116 + | Name | Type | Description | Default | 117 + |------------|--------------------------------|---------------------------|------------| 118 + | urls | [list](`list`)\[[str](`str`)\] | List of s3:// URLs. | _required_ | 119 + | endpoint | [str](`str`) \| None | Optional custom endpoint. | `None` | 120 + | access_key | [str](`str`) \| None | Optional access key. | `None` | 121 + | secret_key | [str](`str`) \| None | Optional secret key. | `None` | 122 + | region | [str](`str`) \| None | Optional region. | `None` | 123 + 124 + #### Returns {.doc-section .doc-section-returns} 113 125 114 - Returns: 115 - S3Source configured for the given URLs. 126 + | Name | Type | Description | 127 + |--------|--------------|-----------------------------------------| 128 + | | \'S3Source\' | S3Source configured for the given URLs. | 116 129 117 - Raises: 118 - ValueError: If URLs are not valid s3:// URLs or span multiple buckets. 130 + #### Raises {.doc-section .doc-section-raises} 119 131 120 - Example: 132 + | Name | Type | Description | 133 + |--------|----------------------------|------------------------------------------------------------| 134 + | | [ValueError](`ValueError`) | If URLs are not valid s3:// URLs or span multiple buckets. | 135 + 136 + #### Example {.doc-section .doc-section-example} 137 + 138 + :: 139 + 121 140 >>> source = S3Source.from_urls( 122 141 ... ["s3://my-bucket/train-000.tar", "s3://my-bucket/train-001.tar"], 123 142 ... endpoint="https://r2.example.com", ··· 139 158 140 159 Open a single shard by S3 URI. 141 160 142 - Args: 143 - shard_id: S3 URI of the shard (s3://bucket/key). 161 + #### Parameters {.doc-section .doc-section-parameters} 162 + 163 + | Name | Type | Description | Default | 164 + |----------|--------------|----------------------------------------|------------| 165 + | shard_id | [str](`str`) | S3 URI of the shard (s3://bucket/key). | _required_ | 166 + 167 + #### Returns {.doc-section .doc-section-returns} 168 + 169 + | Name | Type | Description | 170 + |--------|---------------------------------------|---------------------------------------| 171 + | | [IO](`typing.IO`)\[[bytes](`bytes`)\] | StreamingBody for reading the object. | 144 172 145 - Returns: 146 - StreamingBody for reading the object. 173 + #### Raises {.doc-section .doc-section-raises} 147 174 148 - Raises: 149 - KeyError: If shard_id is not in list_shards(). 175 + | Name | Type | Description | 176 + |--------|------------------------|--------------------------------------| 177 + | | [KeyError](`KeyError`) | If shard_id is not in list_shards(). |

+19 -15

docs_src/api/SampleBatch.qmd

··· 14 14 NDArray fields are stacked into a numpy array with a batch dimension. 15 15 Other fields are aggregated into a list. 16 16 17 - Type Parameters: 18 - DT: The sample type, must derive from ``PackableSample``. 17 + ## Parameters {.doc-section .doc-section-parameters} 18 + 19 + | Name | Type | Description | Default | 20 + |--------|--------|-------------------------------------------------------|------------| 21 + | DT | | The sample type, must derive from ``PackableSample``. | _required_ | 22 + 23 + ## Attributes {.doc-section .doc-section-attributes} 24 + 25 + | Name | Type | Description | 26 + |---------|--------|---------------------------------------------| 27 + | samples | | The list of sample instances in this batch. | 19 28 20 - Attributes: 21 - samples: The list of sample instances in this batch. 29 + ## Example {.doc-section .doc-section-example} 22 30 23 - Example: 31 + :: 32 + 24 33 >>> batch = SampleBatch[MyData]([sample1, sample2, sample3]) 25 34 >>> batch.embeddings # Returns stacked numpy array of shape (3, ...) 26 35 >>> batch.names # Returns list of names 27 36 28 - Note: 29 - This class uses Python's ``__orig_class__`` mechanism to extract the 30 - type parameter at runtime. Instances must be created using the 31 - subscripted syntax ``SampleBatch[MyType](samples)`` rather than 32 - calling the constructor directly with an unsubscripted class. 37 + ## Note {.doc-section .doc-section-note} 33 38 34 - ## Attributes 35 - 36 - | Name | Description | 37 - | --- | --- | 38 - | [sample_type](#atdata.SampleBatch.sample_type) | The type of each sample in this batch. | 39 + This class uses Python's ``__orig_class__`` mechanism to extract the 40 + type parameter at runtime. Instances must be created using the 41 + subscripted syntax ``SampleBatch[MyType](samples)`` rather than 42 + calling the constructor directly with an unsubscripted class.

+31 -13

docs_src/api/SchemaLoader.qmd

··· 9 9 This class fetches schema records from ATProto and can list available 10 10 schemas from a repository. 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> client = AtmosphereClient() 14 17 >>> client.login("handle", "password") 15 18 >>> ··· 33 36 34 37 Fetch a schema record by AT URI. 35 38 36 - Args: 37 - uri: The AT URI of the schema record. 39 + #### Parameters {.doc-section .doc-section-parameters} 40 + 41 + | Name | Type | Description | Default | 42 + |--------|-----------------------------------------------------------|----------------------------------|------------| 43 + | uri | [str](`str`) \| [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the schema record. | _required_ | 38 44 39 - Returns: 40 - The schema record as a dictionary. 45 + #### Returns {.doc-section .doc-section-returns} 41 46 42 - Raises: 43 - ValueError: If the record is not a schema record. 44 - atproto.exceptions.AtProtocolError: If record not found. 47 + | Name | Type | Description | 48 + |--------|----------------|------------------------------------| 49 + | | [dict](`dict`) | The schema record as a dictionary. | 50 + 51 + #### Raises {.doc-section .doc-section-raises} 52 + 53 + | Name | Type | Description | 54 + |--------|-----------------------------------------------------------------------------------------------------------------|---------------------------------------| 55 + | | [ValueError](`ValueError`) | If the record is not a schema record. | 56 + | | [atproto](`atproto`).[exceptions](`atproto.exceptions`).[AtProtocolError](`atproto.exceptions.AtProtocolError`) | If record not found. | 45 57 46 58 ### list_all { #atdata.atmosphere.SchemaLoader.list_all } 47 59 ··· 51 63 52 64 List schema records from a repository. 53 65 54 - Args: 55 - repo: The DID of the repository. Defaults to authenticated user. 56 - limit: Maximum number of records to return. 66 + #### Parameters {.doc-section .doc-section-parameters} 57 67 58 - Returns: 59 - List of schema records. 68 + | Name | Type | Description | Default | 69 + |--------|-----------------------------------------------|------------------------------------------------------------|-----------| 70 + | repo | [Optional](`typing.Optional`)\[[str](`str`)\] | The DID of the repository. Defaults to authenticated user. | `None` | 71 + | limit | [int](`int`) | Maximum number of records to return. | `100` | 72 + 73 + #### Returns {.doc-section .doc-section-returns} 74 + 75 + | Name | Type | Description | 76 + |--------|----------------------------------|-------------------------| 77 + | | [list](`list`)\[[dict](`dict`)\] | List of schema records. |

+25 -13

docs_src/api/SchemaPublisher.qmd

··· 9 9 This class introspects a PackableSample class to extract its field 10 10 definitions and publishes them as an ATProto schema record. 11 11 12 - Example: 12 + ## Example {.doc-section .doc-section-example} 13 + 14 + :: 15 + 13 16 >>> @atdata.packable 14 17 ... class MySample: 15 18 ... image: NDArray ··· 45 48 46 49 Publish a PackableSample schema to ATProto. 47 50 48 - Args: 49 - sample_type: The PackableSample class to publish. 50 - name: Human-readable name. Defaults to the class name. 51 - version: Semantic version string (e.g., '1.0.0'). 52 - description: Human-readable description. 53 - metadata: Arbitrary metadata dictionary. 54 - rkey: Optional explicit record key. If not provided, a TID is generated. 51 + #### Parameters {.doc-section .doc-section-parameters} 52 + 53 + | Name | Type | Description | Default | 54 + |-------------|--------------------------------------------------------------|--------------------------------------------------------------------|------------| 55 + | sample_type | [Type](`typing.Type`)\[[ST](`atdata.atmosphere.schema.ST`)\] | The PackableSample class to publish. | _required_ | 56 + | name | [Optional](`typing.Optional`)\[[str](`str`)\] | Human-readable name. Defaults to the class name. | `None` | 57 + | version | [str](`str`) | Semantic version string (e.g., '1.0.0'). | `'1.0.0'` | 58 + | description | [Optional](`typing.Optional`)\[[str](`str`)\] | Human-readable description. | `None` | 59 + | metadata | [Optional](`typing.Optional`)\[[dict](`dict`)\] | Arbitrary metadata dictionary. | `None` | 60 + | rkey | [Optional](`typing.Optional`)\[[str](`str`)\] | Optional explicit record key. If not provided, a TID is generated. | `None` | 61 + 62 + #### Returns {.doc-section .doc-section-returns} 63 + 64 + | Name | Type | Description | 65 + |--------|-------------------------------------------|------------------------------------------| 66 + | | [AtUri](`atdata.atmosphere._types.AtUri`) | The AT URI of the created schema record. | 55 67 56 - Returns: 57 - The AT URI of the created schema record. 68 + #### Raises {.doc-section .doc-section-raises} 58 69 59 - Raises: 60 - ValueError: If sample_type is not a dataclass or client is not authenticated. 61 - TypeError: If a field type is not supported. 70 + | Name | Type | Description | 71 + |--------|----------------------------|-------------------------------------------------------------------| 72 + | | [ValueError](`ValueError`) | If sample_type is not a dataclass or client is not authenticated. | 73 + | | [TypeError](`TypeError`) | If a field type is not supported. |

+24 -16

docs_src/api/URLSource.qmd

··· 12 12 13 13 This is the default source type when a string URL is passed to Dataset. 14 14 15 - Attributes: 16 - url: URL or brace pattern for the shards. 15 + ## Attributes {.doc-section .doc-section-attributes} 16 + 17 + | Name | Type | Description | 18 + |--------|--------------|--------------------------------------| 19 + | url | [str](`str`) | URL or brace pattern for the shards. | 20 + 21 + ## Example {.doc-section .doc-section-example} 22 + 23 + :: 17 24 18 - Example: 19 25 >>> source = URLSource("https://example.com/train-{000..009}.tar") 20 26 >>> for shard_id, stream in source.shards: 21 27 ... print(f"Streaming {shard_id}") 22 - 23 - ## Attributes 24 - 25 - | Name | Description | 26 - | --- | --- | 27 - | [shard_list](#atdata.URLSource.shard_list) | Expand brace pattern and return list of shard URLs (deprecated, use list_shards()). | 28 - | [shards](#atdata.URLSource.shards) | Lazily yield (url, stream) pairs for each shard. | 29 28 30 29 ## Methods 31 30 ··· 50 49 51 50 Open a single shard by URL. 52 51 53 - Args: 54 - shard_id: URL of the shard to open. 52 + #### Parameters {.doc-section .doc-section-parameters} 55 53 56 - Returns: 57 - File-like stream from gopen. 54 + | Name | Type | Description | Default | 55 + |----------|--------------|---------------------------|------------| 56 + | shard_id | [str](`str`) | URL of the shard to open. | _required_ | 58 57 59 - Raises: 60 - KeyError: If shard_id is not in list_shards(). 58 + #### Returns {.doc-section .doc-section-returns} 59 + 60 + | Name | Type | Description | 61 + |--------|---------------------------------------|------------------------------| 62 + | | [IO](`typing.IO`)\[[bytes](`bytes`)\] | File-like stream from gopen. | 63 + 64 + #### Raises {.doc-section .doc-section-raises} 65 + 66 + | Name | Type | Description | 67 + |--------|------------------------|--------------------------------------| 68 + | | [KeyError](`KeyError`) | If shard_id is not in list_shards(). |

+23 -33

docs_src/api/load_dataset.qmd

··· 23 23 provides dynamic dict-like access to fields. Use ``.as_type(MyType)`` to 24 24 convert to a typed schema. 25 25 26 - Args: 27 - path: Path to dataset. Can be: 28 - - Index lookup: "@handle/dataset-name" or "@local/dataset-name" 29 - - WebDataset brace notation: "path/to/{train,test}-{000..099}.tar" 30 - - Local directory: "./data/" (scans for .tar files) 31 - - Glob pattern: "path/to/*.tar" 32 - - Remote URL: "s3://bucket/path/data-*.tar" 33 - - Single file: "path/to/data.tar" 26 + ## Parameters {.doc-section .doc-section-parameters} 34 27 35 - sample_type: The PackableSample subclass defining the schema. If None, 36 - returns ``Dataset[DictSample]`` with dynamic field access. Can also 37 - be resolved from an index when using @handle/dataset syntax. 28 + | Name | Type | Description | Default | 29 + |-------------|------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 30 + | path | [str](`str`) | Path to dataset. Can be: - Index lookup: "@handle/dataset-name" or "@local/dataset-name" - WebDataset brace notation: "path/to/{train,test}-{000..099}.tar" - Local directory: "./data/" (scans for .tar files) - Glob pattern: "path/to/*.tar" - Remote URL: "s3://bucket/path/data-*.tar" - Single file: "path/to/data.tar" | _required_ | 31 + | sample_type | [Type](`typing.Type`)\[[ST](`atdata._hf_api.ST`)\] \| None | The PackableSample subclass defining the schema. If None, returns ``Dataset[DictSample]`` with dynamic field access. Can also be resolved from an index when using @handle/dataset syntax. | `None` | 32 + | split | [str](`str`) \| None | Which split to load. If None, returns a DatasetDict with all detected splits. If specified (e.g., "train", "test"), returns a single Dataset for that split. | `None` | 33 + | data_files | [str](`str`) \| [list](`list`)\[[str](`str`)\] \| [dict](`dict`)\[[str](`str`), [str](`str`) \| [list](`list`)\[[str](`str`)\]\] \| None | Optional explicit mapping of data files. Can be: - str: Single file pattern - list[str]: List of file patterns (assigned to "train") - dict[str, str \| list[str]]: Explicit split -> files mapping | `None` | 34 + | streaming | [bool](`bool`) | If True, explicitly marks the dataset for streaming mode. Note: atdata Datasets are already lazy/streaming via WebDataset pipelines, so this parameter primarily signals intent. | `False` | 35 + | index | [Optional](`typing.Optional`)\[\'AbstractIndex\'\] | Optional AbstractIndex for dataset lookup. Required when using @handle/dataset syntax. When provided with an indexed path, the schema can be auto-resolved from the index. | `None` | 38 36 39 - split: Which split to load. If None, returns a DatasetDict with all 40 - detected splits. If specified (e.g., "train", "test"), returns 41 - a single Dataset for that split. 37 + ## Returns {.doc-section .doc-section-returns} 42 38 43 - data_files: Optional explicit mapping of data files. Can be: 44 - - str: Single file pattern 45 - - list[str]: List of file patterns (assigned to "train") 46 - - dict[str, str | list[str]]: Explicit split -> files mapping 39 + | Name | Type | Description | 40 + |--------|----------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------| 41 + | | [Dataset](`atdata.dataset.Dataset`)\[[ST](`atdata._hf_api.ST`)\] \| [DatasetDict](`atdata._hf_api.DatasetDict`)\[[ST](`atdata._hf_api.ST`)\] | If split is None: DatasetDict with all detected splits. | 42 + | | [Dataset](`atdata.dataset.Dataset`)\[[ST](`atdata._hf_api.ST`)\] \| [DatasetDict](`atdata._hf_api.DatasetDict`)\[[ST](`atdata._hf_api.ST`)\] | If split is specified: Dataset for that split. | 43 + | | [Dataset](`atdata.dataset.Dataset`)\[[ST](`atdata._hf_api.ST`)\] \| [DatasetDict](`atdata._hf_api.DatasetDict`)\[[ST](`atdata._hf_api.ST`)\] | Type is ``ST`` if sample_type provided, otherwise ``DictSample``. | 47 44 48 - streaming: If True, explicitly marks the dataset for streaming mode. 49 - Note: atdata Datasets are already lazy/streaming via WebDataset 50 - pipelines, so this parameter primarily signals intent. 45 + ## Raises {.doc-section .doc-section-raises} 51 46 52 - index: Optional AbstractIndex for dataset lookup. Required when using 53 - @handle/dataset syntax. When provided with an indexed path, the 54 - schema can be auto-resolved from the index. 47 + | Name | Type | Description | 48 + |--------|------------------------------------------|-----------------------------------------| 49 + | | [ValueError](`ValueError`) | If the specified split is not found. | 50 + | | [FileNotFoundError](`FileNotFoundError`) | If no data files are found at the path. | 51 + | | [KeyError](`KeyError`) | If dataset not found in index. | 55 52 56 - Returns: 57 - If split is None: DatasetDict with all detected splits. 58 - If split is specified: Dataset for that split. 59 - Type is ``ST`` if sample_type provided, otherwise ``DictSample``. 53 + ## Example {.doc-section .doc-section-example} 60 54 61 - Raises: 62 - ValueError: If the specified split is not found. 63 - FileNotFoundError: If no data files are found at the path. 64 - KeyError: If dataset not found in index. 55 + :: 65 56 66 - Example: 67 57 >>> # Load without type - get DictSample for exploration 68 58 >>> ds = load_dataset("./data/train.tar", split="train") 69 59 >>> for sample in ds.ordered():

+220 -122

docs_src/api/local.Index.qmd

··· 20 20 shards to storage before indexing. Without a data_store, insert_dataset() 21 21 only indexes existing URLs. 22 22 23 - Attributes: 24 - _redis: Redis connection for index storage. 25 - _data_store: Optional AbstractDataStore for writing dataset shards. 26 - 27 - ## Attributes 23 + ## Attributes {.doc-section .doc-section-attributes} 28 24 29 - | Name | Description | 30 - | --- | --- | 31 - | [all_entries](#atdata.local.Index.all_entries) | Get all index entries as a list (deprecated, use list_entries()). | 32 - | [data_store](#atdata.local.Index.data_store) | The data store for writing shards, or None if index-only. | 33 - | [datasets](#atdata.local.Index.datasets) | Lazily iterate over all dataset entries (AbstractIndex protocol). | 34 - | [entries](#atdata.local.Index.entries) | Iterate over all index entries. | 35 - | [schemas](#atdata.local.Index.schemas) | Iterate over all schema records in this index. | 36 - | [stub_dir](#atdata.local.Index.stub_dir) | Directory where stub files are written, or None if auto-stubs disabled. | 37 - | [types](#atdata.local.Index.types) | Namespace for accessing loaded schema types. | 25 + | Name | Type | Description | 26 + |-------------|--------|--------------------------------------------------------| 27 + | _redis | | Redis connection for index storage. | 28 + | _data_store | | Optional AbstractDataStore for writing dataset shards. | 38 29 39 30 ## Methods 40 31 ··· 67 58 68 59 Creates a LocalDatasetEntry for the dataset and persists it to Redis. 69 60 70 - Args: 71 - ds: The dataset to add to the index. 72 - name: Human-readable name for the dataset. 73 - schema_ref: Optional schema reference. If None, generates from sample type. 74 - metadata: Optional metadata dictionary. If None, uses ds._metadata if available. 61 + #### Parameters {.doc-section .doc-section-parameters} 62 + 63 + | Name | Type | Description | Default | 64 + |------------|-----------------------------|------------------------------------------------------------------------|------------| 65 + | ds | [Dataset](`atdata.Dataset`) | The dataset to add to the index. | _required_ | 66 + | name | [str](`str`) | Human-readable name for the dataset. | _required_ | 67 + | schema_ref | [str](`str`) \| None | Optional schema reference. If None, generates from sample type. | `None` | 68 + | metadata | [dict](`dict`) \| None | Optional metadata dictionary. If None, uses ds._metadata if available. | `None` | 69 + 70 + #### Returns {.doc-section .doc-section-returns} 75 71 76 - Returns: 77 - The created LocalDatasetEntry object. 72 + | Name | Type | Description | 73 + |--------|-------------------------------------------------------|---------------------------------------| 74 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | The created LocalDatasetEntry object. | 78 75 79 76 ### clear_stubs { #atdata.local.Index.clear_stubs } 80 77 ··· 86 83 87 84 Only works if auto_stubs was enabled when creating the Index. 88 85 89 - Returns: 90 - Number of stub files removed, or 0 if auto_stubs is disabled. 86 + #### Returns {.doc-section .doc-section-returns} 87 + 88 + | Name | Type | Description | 89 + |--------|--------------|---------------------------------------------------------------| 90 + | | [int](`int`) | Number of stub files removed, or 0 if auto_stubs is disabled. | 91 91 92 92 ### decode_schema { #atdata.local.Index.decode_schema } 93 93 ··· 105 105 class will be imported from it, providing full IDE autocomplete support. 106 106 The returned class has proper type information that IDEs can understand. 107 107 108 - Args: 109 - ref: Schema reference string (atdata://local/sampleSchema/... or 110 - legacy local://schemas/...). 108 + #### Parameters {.doc-section .doc-section-parameters} 111 109 112 - Returns: 113 - A PackableSample subclass - either imported from a generated module 114 - (if auto_stubs is enabled) or dynamically created. 110 + | Name | Type | Description | Default | 111 + |--------|--------------|------------------------------------------------------------------------------------------|------------| 112 + | ref | [str](`str`) | Schema reference string (atdata://local/sampleSchema/... or legacy local://schemas/...). | _required_ | 113 + 114 + #### Returns {.doc-section .doc-section-returns} 115 + 116 + | Name | Type | Description | 117 + |--------|-------------------------------------------------------------------|---------------------------------------------------------------------| 118 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | A PackableSample subclass - either imported from a generated module | 119 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | (if auto_stubs is enabled) or dynamically created. | 120 + 121 + #### Raises {.doc-section .doc-section-raises} 115 122 116 - Raises: 117 - KeyError: If schema not found. 118 - ValueError: If schema cannot be decoded. 123 + | Name | Type | Description | 124 + |--------|----------------------------|------------------------------| 125 + | | [KeyError](`KeyError`) | If schema not found. | 126 + | | [ValueError](`ValueError`) | If schema cannot be decoded. | 119 127 120 128 ### decode_schema_as { #atdata.local.Index.decode_schema_as } 121 129 ··· 129 137 type information for IDE autocomplete. Use this when you have a 130 138 stub file for the schema and want full IDE support. 131 139 132 - Args: 133 - ref: Schema reference string. 134 - type_hint: The stub type to use for type hints. Import this from 135 - the generated stub file. 140 + #### Parameters {.doc-section .doc-section-parameters} 141 + 142 + | Name | Type | Description | Default | 143 + |-----------|-----------------------------------------|--------------------------------------------------------------------------------|------------| 144 + | ref | [str](`str`) | Schema reference string. | _required_ | 145 + | type_hint | [type](`type`)\[[T](`atdata.local.T`)\] | The stub type to use for type hints. Import this from the generated stub file. | _required_ | 146 + 147 + #### Returns {.doc-section .doc-section-returns} 148 + 149 + | Name | Type | Description | 150 + |--------|-----------------------------------------|----------------------------------------------------------------| 151 + | | [type](`type`)\[[T](`atdata.local.T`)\] | The decoded type, cast to match the type_hint for IDE support. | 152 + 153 + #### Example {.doc-section .doc-section-example} 136 154 137 - Returns: 138 - The decoded type, cast to match the type_hint for IDE support. 155 + :: 139 156 140 - Example: 141 157 >>> # After enabling auto_stubs and configuring IDE extraPaths: 142 158 >>> from local.MySample_1_0_0 import MySample 143 159 >>> ··· 145 161 >>> DecodedType = index.decode_schema_as(ref, MySample) 146 162 >>> sample = DecodedType(text="hello", value=42) # IDE knows signature! 147 163 148 - Note: 149 - The type_hint is only used for static type checking - at runtime, 150 - the actual decoded type from the schema is returned. Ensure the 151 - stub matches the schema to avoid runtime surprises. 164 + #### Note {.doc-section .doc-section-note} 165 + 166 + The type_hint is only used for static type checking - at runtime, 167 + the actual decoded type from the schema is returned. Ensure the 168 + stub matches the schema to avoid runtime surprises. 152 169 153 170 ### get_dataset { #atdata.local.Index.get_dataset } 154 171 ··· 158 175 159 176 Get a dataset entry by name (AbstractIndex protocol). 160 177 161 - Args: 162 - ref: Dataset name. 178 + #### Parameters {.doc-section .doc-section-parameters} 179 + 180 + | Name | Type | Description | Default | 181 + |--------|--------------|---------------|------------| 182 + | ref | [str](`str`) | Dataset name. | _required_ | 183 + 184 + #### Returns {.doc-section .doc-section-returns} 185 + 186 + | Name | Type | Description | 187 + |--------|-------------------------------------------------------|-----------------------------| 188 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | IndexEntry for the dataset. | 163 189 164 - Returns: 165 - IndexEntry for the dataset. 190 + #### Raises {.doc-section .doc-section-raises} 166 191 167 - Raises: 168 - KeyError: If dataset not found. 192 + | Name | Type | Description | 193 + |--------|------------------------|-----------------------| 194 + | | [KeyError](`KeyError`) | If dataset not found. | 169 195 170 196 ### get_entry { #atdata.local.Index.get_entry } 171 197 ··· 175 201 176 202 Get an entry by its CID. 177 203 178 - Args: 179 - cid: Content identifier of the entry. 204 + #### Parameters {.doc-section .doc-section-parameters} 205 + 206 + | Name | Type | Description | Default | 207 + |--------|--------------|----------------------------------|------------| 208 + | cid | [str](`str`) | Content identifier of the entry. | _required_ | 209 + 210 + #### Returns {.doc-section .doc-section-returns} 211 + 212 + | Name | Type | Description | 213 + |--------|-------------------------------------------------------|--------------------------------------| 214 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | LocalDatasetEntry for the given CID. | 180 215 181 - Returns: 182 - LocalDatasetEntry for the given CID. 216 + #### Raises {.doc-section .doc-section-raises} 183 217 184 - Raises: 185 - KeyError: If entry not found. 218 + | Name | Type | Description | 219 + |--------|------------------------|---------------------| 220 + | | [KeyError](`KeyError`) | If entry not found. | 186 221 187 222 ### get_entry_by_name { #atdata.local.Index.get_entry_by_name } 188 223 ··· 192 227 193 228 Get an entry by its human-readable name. 194 229 195 - Args: 196 - name: Human-readable name of the entry. 230 + #### Parameters {.doc-section .doc-section-parameters} 197 231 198 - Returns: 199 - LocalDatasetEntry with the given name. 232 + | Name | Type | Description | Default | 233 + |--------|--------------|-----------------------------------|------------| 234 + | name | [str](`str`) | Human-readable name of the entry. | _required_ | 235 + 236 + #### Returns {.doc-section .doc-section-returns} 237 + 238 + | Name | Type | Description | 239 + |--------|-------------------------------------------------------|----------------------------------------| 240 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | LocalDatasetEntry with the given name. | 241 + 242 + #### Raises {.doc-section .doc-section-raises} 200 243 201 - Raises: 202 - KeyError: If no entry with that name exists. 244 + | Name | Type | Description | 245 + |--------|------------------------|------------------------------------| 246 + | | [KeyError](`KeyError`) | If no entry with that name exists. | 203 247 204 248 ### get_import_path { #atdata.local.Index.get_import_path } 205 249 ··· 212 256 When auto_stubs is enabled, this returns the import path that can 213 257 be used to import the schema type with full IDE support. 214 258 215 - Args: 216 - ref: Schema reference string. 259 + #### Parameters {.doc-section .doc-section-parameters} 217 260 218 - Returns: 219 - Import path like "local.MySample_1_0_0", or None if auto_stubs 220 - is disabled. 261 + | Name | Type | Description | Default | 262 + |--------|--------------|--------------------------|------------| 263 + | ref | [str](`str`) | Schema reference string. | _required_ | 264 + 265 + #### Returns {.doc-section .doc-section-returns} 221 266 222 - Example: 267 + | Name | Type | Description | 268 + |--------|----------------------|----------------------------------------------------------------| 269 + | | [str](`str`) \| None | Import path like "local.MySample_1_0_0", or None if auto_stubs | 270 + | | [str](`str`) \| None | is disabled. | 271 + 272 + #### Example {.doc-section .doc-section-example} 273 + 274 + :: 275 + 223 276 >>> index = LocalIndex(auto_stubs=True) 224 277 >>> ref = index.publish_schema(MySample, version="1.0.0") 225 278 >>> index.load_schema(ref) ··· 236 289 237 290 Get a schema record by reference (AbstractIndex protocol). 238 291 239 - Args: 240 - ref: Schema reference string. Supports both new format 241 - (atdata://local/sampleSchema/{name}@{version}) and legacy 242 - format (local://schemas/{module.Class}@{version}). 292 + #### Parameters {.doc-section .doc-section-parameters} 293 + 294 + | Name | Type | Description | Default | 295 + |--------|--------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 296 + | ref | [str](`str`) | Schema reference string. Supports both new format (atdata://local/sampleSchema/{name}@{version}) and legacy format (local://schemas/{module.Class}@{version}). | _required_ | 297 + 298 + #### Returns {.doc-section .doc-section-returns} 299 + 300 + | Name | Type | Description | 301 + |--------|----------------|------------------------------------------------------------| 302 + | | [dict](`dict`) | Schema record as a dictionary with keys 'name', 'version', | 303 + | | [dict](`dict`) | 'fields', '$ref', etc. | 243 304 244 - Returns: 245 - Schema record as a dictionary with keys 'name', 'version', 246 - 'fields', '$ref', etc. 305 + #### Raises {.doc-section .doc-section-raises} 247 306 248 - Raises: 249 - KeyError: If schema not found. 250 - ValueError: If reference format is invalid. 307 + | Name | Type | Description | 308 + |--------|----------------------------|---------------------------------| 309 + | | [KeyError](`KeyError`) | If schema not found. | 310 + | | [ValueError](`ValueError`) | If reference format is invalid. | 251 311 252 312 ### get_schema_record { #atdata.local.Index.get_schema_record } 253 313 ··· 260 320 Use this when you need the full LocalSchemaRecord with typed properties. 261 321 For Protocol-compliant dict access, use get_schema() instead. 262 322 263 - Args: 264 - ref: Schema reference string. 323 + #### Parameters {.doc-section .doc-section-parameters} 265 324 266 - Returns: 267 - LocalSchemaRecord with schema details. 325 + | Name | Type | Description | Default | 326 + |--------|--------------|--------------------------|------------| 327 + | ref | [str](`str`) | Schema reference string. | _required_ | 268 328 269 - Raises: 270 - KeyError: If schema not found. 271 - ValueError: If reference format is invalid. 329 + #### Returns {.doc-section .doc-section-returns} 330 + 331 + | Name | Type | Description | 332 + |--------|-------------------------------------------------------|----------------------------------------| 333 + | | [LocalSchemaRecord](`atdata.local.LocalSchemaRecord`) | LocalSchemaRecord with schema details. | 334 + 335 + #### Raises {.doc-section .doc-section-raises} 336 + 337 + | Name | Type | Description | 338 + |--------|----------------------------|---------------------------------| 339 + | | [KeyError](`KeyError`) | If schema not found. | 340 + | | [ValueError](`ValueError`) | If reference format is invalid. | 272 341 273 342 ### insert_dataset { #atdata.local.Index.insert_dataset } 274 343 ··· 282 351 to storage first, then indexes the new URLs. Otherwise, indexes the 283 352 dataset's existing URL. 284 353 285 - Args: 286 - ds: The Dataset to register. 287 - name: Human-readable name for the dataset. 288 - schema_ref: Optional schema reference. 289 - **kwargs: Additional options: 290 - - metadata: Optional metadata dict 291 - - prefix: Storage prefix (default: dataset name) 292 - - cache_local: If True, cache writes locally first 354 + #### Parameters {.doc-section .doc-section-parameters} 293 355 294 - Returns: 295 - IndexEntry for the inserted dataset. 356 + | Name | Type | Description | Default | 357 + |------------|-----------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 358 + | ds | [Dataset](`atdata.Dataset`) | The Dataset to register. | _required_ | 359 + | name | [str](`str`) | Human-readable name for the dataset. | _required_ | 360 + | schema_ref | [str](`str`) \| None | Optional schema reference. | `None` | 361 + | **kwargs | | Additional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first | `{}` | 362 + 363 + #### Returns {.doc-section .doc-section-returns} 364 + 365 + | Name | Type | Description | 366 + |--------|-------------------------------------------------------|--------------------------------------| 367 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | IndexEntry for the inserted dataset. | 296 368 297 369 ### list_datasets { #atdata.local.Index.list_datasets } 298 370 ··· 302 374 303 375 Get all dataset entries as a materialized list (AbstractIndex protocol). 304 376 305 - Returns: 306 - List of IndexEntry for each dataset. 377 + #### Returns {.doc-section .doc-section-returns} 378 + 379 + | Name | Type | Description | 380 + |--------|-------------------------------------------------------------------------|--------------------------------------| 381 + | | [list](`list`)\[[LocalDatasetEntry](`atdata.local.LocalDatasetEntry`)\] | List of IndexEntry for each dataset. | 307 382 308 383 ### list_entries { #atdata.local.Index.list_entries } 309 384 ··· 313 388 314 389 Get all index entries as a materialized list. 315 390 316 - Returns: 317 - List of all LocalDatasetEntry objects in the index. 391 + #### Returns {.doc-section .doc-section-returns} 392 + 393 + | Name | Type | Description | 394 + |--------|-------------------------------------------------------------------------|-----------------------------------------------------| 395 + | | [list](`list`)\[[LocalDatasetEntry](`atdata.local.LocalDatasetEntry`)\] | List of all LocalDatasetEntry objects in the index. | 318 396 319 397 ### list_schemas { #atdata.local.Index.list_schemas } 320 398 ··· 324 402 325 403 Get all schema records as a materialized list (AbstractIndex protocol). 326 404 327 - Returns: 328 - List of schema records as dictionaries. 405 + #### Returns {.doc-section .doc-section-returns} 406 + 407 + | Name | Type | Description | 408 + |--------|----------------------------------|-----------------------------------------| 409 + | | [list](`list`)\[[dict](`dict`)\] | List of schema records as dictionaries. | 329 410 330 411 ### load_schema { #atdata.local.Index.load_schema } 331 412 ··· 339 420 for IDE support (if auto_stubs is enabled), and registers the type 340 421 in the :attr:`types` namespace for easy access. 341 422 342 - Args: 343 - ref: Schema reference string (atdata://local/sampleSchema/... or 344 - legacy local://schemas/...). 423 + #### Parameters {.doc-section .doc-section-parameters} 424 + 425 + | Name | Type | Description | Default | 426 + |--------|--------------|------------------------------------------------------------------------------------------|------------| 427 + | ref | [str](`str`) | Schema reference string (atdata://local/sampleSchema/... or legacy local://schemas/...). | _required_ | 428 + 429 + #### Returns {.doc-section .doc-section-returns} 430 + 431 + | Name | Type | Description | 432 + |--------|-------------------------------------------------------------------|---------------------------------------------------------| 433 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | The decoded PackableSample subclass. Also available via | 434 + | | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | ``index.types.<ClassName>`` after this call. | 435 + 436 + #### Raises {.doc-section .doc-section-raises} 437 + 438 + | Name | Type | Description | 439 + |--------|----------------------------|------------------------------| 440 + | | [KeyError](`KeyError`) | If schema not found. | 441 + | | [ValueError](`ValueError`) | If schema cannot be decoded. | 345 442 346 - Returns: 347 - The decoded PackableSample subclass. Also available via 348 - ``index.types.<ClassName>`` after this call. 443 + #### Example {.doc-section .doc-section-example} 349 444 350 - Raises: 351 - KeyError: If schema not found. 352 - ValueError: If schema cannot be decoded. 445 + :: 353 446 354 - Example: 355 447 >>> # Load and use immediately 356 448 >>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 357 449 >>> sample = MyType(name="hello", value=42) ··· 368 460 369 461 Publish a schema for a sample type to Redis. 370 462 371 - Args: 372 - sample_type: The PackableSample subclass to publish. 373 - version: Semantic version string (e.g., '1.0.0'). If None, 374 - auto-increments from the latest published version (patch bump), 375 - or starts at '1.0.0' if no previous version exists. 376 - description: Optional human-readable description. If None, uses 377 - the class docstring. 463 + #### Parameters {.doc-section .doc-section-parameters} 378 464 379 - Returns: 380 - Schema reference string: 'atdata://local/sampleSchema/{name}@{version}'. 465 + | Name | Type | Description | Default | 466 + |-------------|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 467 + | sample_type | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | The PackableSample subclass to publish. | _required_ | 468 + | version | [str](`str`) \| None | Semantic version string (e.g., '1.0.0'). If None, auto-increments from the latest published version (patch bump), or starts at '1.0.0' if no previous version exists. | `None` | 469 + | description | [str](`str`) \| None | Optional human-readable description. If None, uses the class docstring. | `None` | 470 + 471 + #### Returns {.doc-section .doc-section-returns} 472 + 473 + | Name | Type | Description | 474 + |--------|--------------|--------------------------------------------------------------------------| 475 + | | [str](`str`) | Schema reference string: 'atdata://local/sampleSchema/{name}@{version}'. | 381 476 382 - Raises: 383 - ValueError: If sample_type is not a dataclass. 384 - TypeError: If a field type is not supported. 477 + #### Raises {.doc-section .doc-section-raises} 478 + 479 + | Name | Type | Description | 480 + |--------|----------------------------|------------------------------------| 481 + | | [ValueError](`ValueError`) | If sample_type is not a dataclass. | 482 + | | [TypeError](`TypeError`) | If a field type is not supported. |

+28 -25

docs_src/api/local.LocalDatasetEntry.qmd

··· 21 21 ensuring the same data produces the same CID whether stored locally or 22 22 in the atmosphere. This enables seamless promotion from local to ATProto. 23 23 24 - Attributes: 25 - name: Human-readable name for this dataset. 26 - schema_ref: Reference to the schema for this dataset. 27 - data_urls: WebDataset URLs for the data. 28 - metadata: Arbitrary metadata dictionary, or None if not set. 24 + ## Attributes {.doc-section .doc-section-attributes} 29 25 30 - ## Attributes 31 - 32 - | Name | Description | 33 - | --- | --- | 34 - | [cid](#atdata.local.LocalDatasetEntry.cid) | Content identifier (ATProto-compatible CID). | 35 - | [data_urls](#atdata.local.LocalDatasetEntry.data_urls) | WebDataset URLs for the data. | 36 - | [metadata](#atdata.local.LocalDatasetEntry.metadata) | Arbitrary metadata dictionary, or None if not set. | 37 - | [name](#atdata.local.LocalDatasetEntry.name) | Human-readable name for this dataset. | 38 - | [sample_kind](#atdata.local.LocalDatasetEntry.sample_kind) | Legacy property: returns schema_ref for backwards compatibility. | 39 - | [schema_ref](#atdata.local.LocalDatasetEntry.schema_ref) | Reference to the schema for this dataset. | 40 - | [wds_url](#atdata.local.LocalDatasetEntry.wds_url) | Legacy property: returns first data URL for backwards compatibility. | 26 + | Name | Type | Description | 27 + |------------|--------------------------------|----------------------------------------------------| 28 + | name | [str](`str`) | Human-readable name for this dataset. | 29 + | schema_ref | [str](`str`) | Reference to the schema for this dataset. | 30 + | data_urls | [list](`list`)\[[str](`str`)\] | WebDataset URLs for the data. | 31 + | metadata | [dict](`dict`) \| None | Arbitrary metadata dictionary, or None if not set. | 41 32 42 33 ## Methods 43 34 ··· 54 45 55 46 Load an entry from Redis by CID. 56 47 57 - Args: 58 - redis: Redis connection to read from. 59 - cid: Content identifier of the entry to load. 48 + #### Parameters {.doc-section .doc-section-parameters} 49 + 50 + | Name | Type | Description | Default | 51 + |--------|------------------------|------------------------------------------|------------| 52 + | redis | [Redis](`redis.Redis`) | Redis connection to read from. | _required_ | 53 + | cid | [str](`str`) | Content identifier of the entry to load. | _required_ | 60 54 61 - Returns: 62 - LocalDatasetEntry loaded from Redis. 55 + #### Returns {.doc-section .doc-section-returns} 63 56 64 - Raises: 65 - KeyError: If entry not found. 57 + | Name | Type | Description | 58 + |--------|-------------------------------------------------------|--------------------------------------| 59 + | | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | LocalDatasetEntry loaded from Redis. | 60 + 61 + #### Raises {.doc-section .doc-section-raises} 62 + 63 + | Name | Type | Description | 64 + |--------|------------------------|---------------------| 65 + | | [KeyError](`KeyError`) | If entry not found. | 66 66 67 67 ### write_to { #atdata.local.LocalDatasetEntry.write_to } 68 68 ··· 74 74 75 75 Stores the entry as a Redis hash with key '{REDIS_KEY_DATASET_ENTRY}:{cid}'. 76 76 77 - Args: 78 - redis: Redis connection to write to. 77 + #### Parameters {.doc-section .doc-section-parameters} 78 + 79 + | Name | Type | Description | Default | 80 + |--------|------------------------|-------------------------------|------------| 81 + | redis | [Redis](`redis.Redis`) | Redis connection to write to. | _required_ |

+41 -20

docs_src/api/local.S3DataStore.qmd

··· 9 9 Handles writing dataset shards to S3-compatible object storage and 10 10 resolving URLs for reading. 11 11 12 - Attributes: 13 - credentials: S3 credentials dictionary. 14 - bucket: Target bucket name. 15 - _fs: S3FileSystem instance. 12 + ## Attributes {.doc-section .doc-section-attributes} 13 + 14 + | Name | Type | Description | 15 + |-------------|--------|----------------------------| 16 + | credentials | | S3 credentials dictionary. | 17 + | bucket | | Target bucket name. | 18 + | _fs | | S3FileSystem instance. | 16 19 17 20 ## Methods 18 21 ··· 37 40 For standard AWS S3 (no custom endpoint), URLs are returned unchanged 38 41 since WebDataset's built-in s3fs integration handles them. 39 42 40 - Args: 41 - url: S3 URL to resolve (e.g., 's3://bucket/path/file.tar'). 43 + #### Parameters {.doc-section .doc-section-parameters} 44 + 45 + | Name | Type | Description | Default | 46 + |--------|--------------|--------------------------------------------------------|------------| 47 + | url | [str](`str`) | S3 URL to resolve (e.g., 's3://bucket/path/file.tar'). | _required_ | 48 + 49 + #### Returns {.doc-section .doc-section-returns} 42 50 43 - Returns: 44 - HTTPS URL if custom endpoint is configured, otherwise unchanged. 45 - Example: 's3://bucket/path' -> 'https://endpoint.com/bucket/path' 51 + | Name | Type | Description | 52 + |---------|--------------|------------------------------------------------------------------| 53 + | | [str](`str`) | HTTPS URL if custom endpoint is configured, otherwise unchanged. | 54 + | Example | [str](`str`) | 's3://bucket/path' -> 'https://endpoint.com/bucket/path' | 46 55 47 56 ### supports_streaming { #atdata.local.S3DataStore.supports_streaming } 48 57 ··· 52 61 53 62 S3 supports streaming reads. 54 63 55 - Returns: 56 - True. 64 + #### Returns {.doc-section .doc-section-returns} 65 + 66 + | Name | Type | Description | 67 + |--------|----------------|---------------| 68 + | | [bool](`bool`) | True. | 57 69 58 70 ### write_shards { #atdata.local.S3DataStore.write_shards } 59 71 ··· 63 75 64 76 Write dataset shards to S3. 65 77 66 - Args: 67 - ds: The Dataset to write. 68 - prefix: Path prefix within bucket (e.g., 'datasets/mnist/v1'). 69 - cache_local: If True, write locally first then copy to S3. 70 - **kwargs: Additional args passed to wds.ShardWriter (e.g., maxcount). 78 + #### Parameters {.doc-section .doc-section-parameters} 71 79 72 - Returns: 73 - List of S3 URLs for the written shards. 80 + | Name | Type | Description | Default | 81 + |-------------|-----------------------------|-------------------------------------------------------------|------------| 82 + | ds | [Dataset](`atdata.Dataset`) | The Dataset to write. | _required_ | 83 + | prefix | [str](`str`) | Path prefix within bucket (e.g., 'datasets/mnist/v1'). | _required_ | 84 + | cache_local | [bool](`bool`) | If True, write locally first then copy to S3. | `False` | 85 + | **kwargs | | Additional args passed to wds.ShardWriter (e.g., maxcount). | `{}` | 86 + 87 + #### Returns {.doc-section .doc-section-returns} 88 + 89 + | Name | Type | Description | 90 + |--------|--------------------------------|-----------------------------------------| 91 + | | [list](`list`)\[[str](`str`)\] | List of S3 URLs for the written shards. | 74 92 75 - Raises: 76 - RuntimeError: If no shards were written. 93 + #### Raises {.doc-section .doc-section-raises} 94 + 95 + | Name | Type | Description | 96 + |--------|--------------------------------|----------------------------| 97 + | | [RuntimeError](`RuntimeError`) | If no shards were written. |

+27 -18

docs_src/api/packable.qmd

··· 14 14 with all atdata APIs that accept packable types (e.g., ``publish_schema``, 15 15 lens transformations, etc.). 16 16 17 - Args: 18 - cls: The class to convert. Should have type annotations for its fields. 17 + ## Parameters {.doc-section .doc-section-parameters} 18 + 19 + | Name | Type | Description | Default | 20 + |--------|---------------------------------------------|--------------------------------------------------------------------|------------| 21 + | cls | [type](`type`)\[[_T](`atdata.dataset._T`)\] | The class to convert. Should have type annotations for its fields. | _required_ | 22 + 23 + ## Returns {.doc-section .doc-section-returns} 24 + 25 + | Name | Type | Description | 26 + |--------|---------------------------------------------|---------------------------------------------------------------------------| 27 + | | [type](`type`)\[[_T](`atdata.dataset._T`)\] | A new dataclass that inherits from ``PackableSample`` with the same | 28 + | | [type](`type`)\[[_T](`atdata.dataset._T`)\] | name and annotations as the original class. The class satisfies the | 29 + | | [type](`type`)\[[_T](`atdata.dataset._T`)\] | ``Packable`` protocol and can be used with ``Type[Packable]`` signatures. | 19 30 20 - Returns: 21 - A new dataclass that inherits from ``PackableSample`` with the same 22 - name and annotations as the original class. The class satisfies the 23 - ``Packable`` protocol and can be used with ``Type[Packable]`` signatures. 31 + ## Examples {.doc-section .doc-section-examples} 32 + 33 + This is a test of the functionality:: 24 34 25 - Example: 26 - >>> @packable 27 - ... class MyData: 28 - ... name: str 29 - ... values: NDArray 30 - ... 31 - >>> sample = MyData(name="test", values=np.array([1, 2, 3])) 32 - >>> bytes_data = sample.packed 33 - >>> restored = MyData.from_bytes(bytes_data) 34 - >>> 35 - >>> # Works with Packable-typed APIs 36 - >>> index.publish_schema(MyData, version="1.0.0") # Type-safe 35 + @packable 36 + class MyData: 37 + name: str 38 + values: NDArray 39 + 40 + sample = MyData(name="test", values=np.array([1, 2, 3])) 41 + bytes_data = sample.packed 42 + restored = MyData.from_bytes(bytes_data) 43 + 44 + # Works with Packable-typed APIs 45 + index.publish_schema(MyData, version="1.0.0") # Type-safe

+27 -16

docs_src/api/promote_to_atmosphere.qmd

··· 19 19 This function takes a locally-indexed dataset and publishes it to ATProto, 20 20 making it discoverable on the federated atmosphere network. 21 21 22 - Args: 23 - local_entry: The LocalDatasetEntry to promote. 24 - local_index: Local index containing the schema for this entry. 25 - atmosphere_client: Authenticated AtmosphereClient. 26 - data_store: Optional data store for copying data to new location. 27 - If None, the existing data_urls are used as-is. 28 - name: Override name for the atmosphere record. Defaults to local name. 29 - description: Optional description for the dataset. 30 - tags: Optional tags for discovery. 31 - license: Optional license identifier. 22 + ## Parameters {.doc-section .doc-section-parameters} 23 + 24 + | Name | Type | Description | Default | 25 + |-------------------|--------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------|------------| 26 + | local_entry | [LocalDatasetEntry](`atdata.local.LocalDatasetEntry`) | The LocalDatasetEntry to promote. | _required_ | 27 + | local_index | [LocalIndex](`atdata.local.Index`) | Local index containing the schema for this entry. | _required_ | 28 + | atmosphere_client | [AtmosphereClient](`atdata.atmosphere.AtmosphereClient`) | Authenticated AtmosphereClient. | _required_ | 29 + | data_store | [AbstractDataStore](`atdata._protocols.AbstractDataStore`) \| None | Optional data store for copying data to new location. If None, the existing data_urls are used as-is. | `None` | 30 + | name | [str](`str`) \| None | Override name for the atmosphere record. Defaults to local name. | `None` | 31 + | description | [str](`str`) \| None | Optional description for the dataset. | `None` | 32 + | tags | [list](`list`)\[[str](`str`)\] \| None | Optional tags for discovery. | `None` | 33 + | license | [str](`str`) \| None | Optional license identifier. | `None` | 34 + 35 + ## Returns {.doc-section .doc-section-returns} 36 + 37 + | Name | Type | Description | 38 + |--------|--------------|--------------------------------------------------| 39 + | | [str](`str`) | AT URI of the created atmosphere dataset record. | 40 + 41 + ## Raises {.doc-section .doc-section-raises} 42 + 43 + | Name | Type | Description | 44 + |--------|----------------------------|-------------------------------------| 45 + | | [KeyError](`KeyError`) | If schema not found in local index. | 46 + | | [ValueError](`ValueError`) | If local entry has no data URLs. | 32 47 33 - Returns: 34 - AT URI of the created atmosphere dataset record. 48 + ## Example {.doc-section .doc-section-example} 35 49 36 - Raises: 37 - KeyError: If schema not found in local index. 38 - ValueError: If local entry has no data URLs. 50 + :: 39 51 40 - Example: 41 52 >>> entry = local_index.get_dataset("mnist-train") 42 53 >>> uri = promote_to_atmosphere(entry, local_index, client) 43 54 >>> print(uri)

+2

justfile

··· 1 + docs: 2 + (cd docs_src && uv run quartodoc build && quarto render)

+1 -1

pyproject.toml

··· 51 51 "moto[s3]>=5.0.29", 52 52 "pytest>=8.4.2", 53 53 "pytest-cov>=7.0.0", 54 - "quartodoc>=0.9.0", 54 + "quartodoc>=0.11.1", 55 55 "ruff>=0.14.13", 56 56 ]

+28 -18

src/atdata/_cid.py

··· 13 13 seamless promotion from local storage to atmosphere (ATProto network). 14 14 15 15 Example: 16 - >>> schema = {"name": "ImageSample", "version": "1.0.0", "fields": [...]} 17 - >>> cid = generate_cid(schema) 18 - >>> print(cid) 19 - bafyreihffx5a2e7k6r5zqgp5iwpjqr2gfyheqhzqtlxagvqjqyxzqpzqaa 16 + :: 17 + 18 + >>> schema = {"name": "ImageSample", "version": "1.0.0", "fields": [...]} 19 + >>> cid = generate_cid(schema) 20 + >>> print(cid) 21 + bafyreihffx5a2e7k6r5zqgp5iwpjqr2gfyheqhzqtlxagvqjqyxzqpzqaa 20 22 """ 21 23 22 24 import hashlib ··· 49 51 ValueError: If the data cannot be encoded as DAG-CBOR. 50 52 51 53 Example: 52 - >>> generate_cid({"name": "test", "value": 42}) 53 - 'bafyrei...' 54 + :: 55 + 56 + >>> generate_cid({"name": "test", "value": 42}) 57 + 'bafyrei...' 54 58 """ 55 59 # Encode data as DAG-CBOR 56 60 try: ··· 83 87 CIDv1 string in base32 multibase format. 84 88 85 89 Example: 86 - >>> cbor_bytes = libipld.encode_dag_cbor({"key": "value"}) 87 - >>> cid = generate_cid_from_bytes(cbor_bytes) 90 + :: 91 + 92 + >>> cbor_bytes = libipld.encode_dag_cbor({"key": "value"}) 93 + >>> cid = generate_cid_from_bytes(cbor_bytes) 88 94 """ 89 95 sha256_hash = hashlib.sha256(data_bytes).digest() 90 96 raw_cid_bytes = bytes([CID_VERSION_1, CODEC_DAG_CBOR, HASH_SHA256, SHA256_SIZE]) + sha256_hash ··· 102 108 True if the CID matches the data, False otherwise. 103 109 104 110 Example: 105 - >>> cid = generate_cid({"name": "test"}) 106 - >>> verify_cid(cid, {"name": "test"}) 107 - True 108 - >>> verify_cid(cid, {"name": "different"}) 109 - False 111 + :: 112 + 113 + >>> cid = generate_cid({"name": "test"}) 114 + >>> verify_cid(cid, {"name": "test"}) 115 + True 116 + >>> verify_cid(cid, {"name": "different"}) 117 + False 110 118 """ 111 119 expected_cid = generate_cid(data) 112 120 return cid == expected_cid ··· 123 131 The 'hash' value is itself a dict with 'code', 'size', and 'digest'. 124 132 125 133 Example: 126 - >>> info = parse_cid('bafyrei...') 127 - >>> info['version'] 128 - 1 129 - >>> info['codec'] 130 - 113 # 0x71 = dag-cbor 134 + :: 135 + 136 + >>> info = parse_cid('bafyrei...') 137 + >>> info['version'] 138 + 1 139 + >>> info['codec'] 140 + 113 # 0x71 = dag-cbor 131 141 """ 132 142 return libipld.decode_cid(cid) 133 143

+45 -38

src/atdata/_hf_api.py

··· 10 10 - No Arrow caching layer (WebDataset handles remote/local transparently) 11 11 12 12 Example: 13 - >>> import atdata 14 - >>> from atdata import load_dataset 15 - >>> 16 - >>> @atdata.packable 17 - ... class MyData: 18 - ... text: str 19 - ... label: int 20 - >>> 21 - >>> # Load a single split 22 - >>> ds = load_dataset("path/to/train-{000000..000099}.tar", MyData, split="train") 23 - >>> 24 - >>> # Load all splits (returns DatasetDict) 25 - >>> ds_dict = load_dataset("path/to/{train,test}-*.tar", MyData) 26 - >>> train_ds = ds_dict["train"] 13 + :: 14 + 15 + >>> import atdata 16 + >>> from atdata import load_dataset 17 + >>> 18 + >>> @atdata.packable 19 + ... class MyData: 20 + ... text: str 21 + ... label: int 22 + >>> 23 + >>> # Load a single split 24 + >>> ds = load_dataset("path/to/train-{000000..000099}.tar", MyData, split="train") 25 + >>> 26 + >>> # Load all splits (returns DatasetDict) 27 + >>> ds_dict = load_dataset("path/to/{train,test}-*.tar", MyData) 28 + >>> train_ds = ds_dict["train"] 27 29 """ 28 30 29 31 from __future__ import annotations ··· 62 64 multiple dataset splits (train, test, validation, etc.) with convenience 63 65 methods that operate across all splits. 64 66 65 - Type Parameters: 67 + Parameters: 66 68 ST: The sample type for all datasets in this dict. 67 69 68 70 Example: 69 - >>> ds_dict = load_dataset("path/to/data", MyData) 70 - >>> train = ds_dict["train"] 71 - >>> test = ds_dict["test"] 72 - >>> 73 - >>> # Iterate over all splits 74 - >>> for split_name, dataset in ds_dict.items(): 75 - ... print(f"{split_name}: {len(dataset.shard_list)} shards") 71 + :: 72 + 73 + >>> ds_dict = load_dataset("path/to/data", MyData) 74 + >>> train = ds_dict["train"] 75 + >>> test = ds_dict["test"] 76 + >>> 77 + >>> # Iterate over all splits 78 + >>> for split_name, dataset in ds_dict.items(): 79 + ... print(f"{split_name}: {len(dataset.shard_list)} shards") 76 80 """ 81 + # TODO The above has a line for "Parameters:" that should be "Type Parameters:"; this is a temporary fix for `quartodoc` auto-generation bugs. 77 82 78 83 def __init__( 79 84 self, ··· 585 590 KeyError: If dataset not found in index. 586 591 587 592 Example: 588 - >>> # Load without type - get DictSample for exploration 589 - >>> ds = load_dataset("./data/train.tar", split="train") 590 - >>> for sample in ds.ordered(): 591 - ... print(sample.keys()) # Explore fields 592 - ... print(sample["text"]) # Dict-style access 593 - ... print(sample.label) # Attribute access 594 - >>> 595 - >>> # Convert to typed schema 596 - >>> typed_ds = ds.as_type(TextData) 597 - >>> 598 - >>> # Or load with explicit type directly 599 - >>> train_ds = load_dataset("./data/train-*.tar", TextData, split="train") 600 - >>> 601 - >>> # Load from index with auto-type resolution 602 - >>> index = LocalIndex() 603 - >>> ds = load_dataset("@local/my-dataset", index=index, split="train") 593 + :: 594 + 595 + >>> # Load without type - get DictSample for exploration 596 + >>> ds = load_dataset("./data/train.tar", split="train") 597 + >>> for sample in ds.ordered(): 598 + ... print(sample.keys()) # Explore fields 599 + ... print(sample["text"]) # Dict-style access 600 + ... print(sample.label) # Attribute access 601 + >>> 602 + >>> # Convert to typed schema 603 + >>> typed_ds = ds.as_type(TextData) 604 + >>> 605 + >>> # Or load with explicit type directly 606 + >>> train_ds = load_dataset("./data/train-*.tar", TextData, split="train") 607 + >>> 608 + >>> # Load from index with auto-type resolution 609 + >>> index = LocalIndex() 610 + >>> ds = load_dataset("@local/my-dataset", index=index, split="train") 604 611 """ 605 612 # Handle @handle/dataset indexed path resolution 606 613 if _is_indexed_path(path):

+63 -49

src/atdata/_protocols.py

··· 20 20 AbstractDataStore: Protocol for data storage operations 21 21 22 22 Example: 23 - >>> def process_datasets(index: AbstractIndex) -> None: 24 - ... for entry in index.list_datasets(): 25 - ... print(f"{entry.name}: {entry.data_urls}") 26 - ... 27 - >>> # Works with either LocalIndex or AtmosphereIndex 28 - >>> process_datasets(local_index) 29 - >>> process_datasets(atmosphere_index) 23 + :: 24 + 25 + >>> def process_datasets(index: AbstractIndex) -> None: 26 + ... for entry in index.list_datasets(): 27 + ... print(f"{entry.name}: {entry.data_urls}") 28 + ... 29 + >>> # Works with either LocalIndex or AtmosphereIndex 30 + >>> process_datasets(local_index) 31 + >>> process_datasets(atmosphere_index) 30 32 """ 31 33 32 34 from typing import ( ··· 66 68 - Serialization/deserialization (packed, from_bytes) 67 69 68 70 Example: 69 - >>> @packable 70 - ... class MySample: 71 - ... name: str 72 - ... value: int 73 - ... 74 - >>> def process(sample_type: Type[Packable]) -> None: 75 - ... # Type checker knows sample_type has from_bytes, packed, etc. 76 - ... instance = sample_type.from_bytes(data) 77 - ... print(instance.packed) 71 + :: 72 + 73 + >>> @packable 74 + ... class MySample: 75 + ... name: str 76 + ... value: int 77 + ... 78 + >>> def process(sample_type: Type[Packable]) -> None: 79 + ... # Type checker knows sample_type has from_bytes, packed, etc. 80 + ... instance = sample_type.from_bytes(data) 81 + ... print(instance.packed) 78 82 """ 79 83 80 84 @classmethod ··· 166 170 If present, ``load_dataset`` will use it for S3 credential resolution. 167 171 168 172 Example: 169 - >>> def publish_and_list(index: AbstractIndex) -> None: 170 - ... # Publish schemas for different types 171 - ... schema1 = index.publish_schema(ImageSample, version="1.0.0") 172 - ... schema2 = index.publish_schema(TextSample, version="1.0.0") 173 - ... 174 - ... # Insert datasets of different types 175 - ... index.insert_dataset(image_ds, name="images") 176 - ... index.insert_dataset(text_ds, name="texts") 177 - ... 178 - ... # List all datasets (mixed types) 179 - ... for entry in index.list_datasets(): 180 - ... print(f"{entry.name} -> {entry.schema_ref}") 173 + :: 174 + 175 + >>> def publish_and_list(index: AbstractIndex) -> None: 176 + ... # Publish schemas for different types 177 + ... schema1 = index.publish_schema(ImageSample, version="1.0.0") 178 + ... schema2 = index.publish_schema(TextSample, version="1.0.0") 179 + ... 180 + ... # Insert datasets of different types 181 + ... index.insert_dataset(image_ds, name="images") 182 + ... index.insert_dataset(text_ds, name="texts") 183 + ... 184 + ... # List all datasets (mixed types) 185 + ... for entry in index.list_datasets(): 186 + ... print(f"{entry.name} -> {entry.schema_ref}") 181 187 """ 182 188 183 189 # Optional data store (not required by protocol, but supported by some implementations) ··· 317 323 ValueError: If schema cannot be decoded (unsupported field types). 318 324 319 325 Example: 320 - >>> entry = index.get_dataset("my-dataset") 321 - >>> SampleType = index.decode_schema(entry.schema_ref) 322 - >>> ds = Dataset[SampleType](entry.data_urls[0]) 323 - >>> for sample in ds.ordered(): 324 - ... print(sample) # sample is instance of SampleType 326 + :: 327 + 328 + >>> entry = index.get_dataset("my-dataset") 329 + >>> SampleType = index.decode_schema(entry.schema_ref) 330 + >>> ds = Dataset[SampleType](entry.data_urls[0]) 331 + >>> for sample in ds.ordered(): 332 + ... print(sample) # sample is instance of SampleType 325 333 """ 326 334 ... 327 335 ··· 342 350 S3 storage, or atmosphere index with PDS blobs. 343 351 344 352 Example: 345 - >>> store = S3DataStore(credentials, bucket="my-bucket") 346 - >>> urls = store.write_shards(dataset, prefix="training/v1") 347 - >>> print(urls) 348 - ['s3://my-bucket/training/v1/shard-000000.tar', ...] 353 + :: 354 + 355 + >>> store = S3DataStore(credentials, bucket="my-bucket") 356 + >>> urls = store.write_shards(dataset, prefix="training/v1") 357 + >>> print(urls) 358 + ['s3://my-bucket/training/v1/shard-000000.tar', ...] 349 359 """ 350 360 351 361 def write_shards( ··· 415 425 - Any other source that can provide file-like objects 416 426 417 427 Example: 418 - >>> source = S3Source( 419 - ... bucket="my-bucket", 420 - ... keys=["data-000.tar", "data-001.tar"], 421 - ... endpoint="https://r2.example.com", 422 - ... credentials=creds, 423 - ... ) 424 - >>> ds = Dataset[MySample](source) 425 - >>> for sample in ds.ordered(): 426 - ... print(sample) 428 + :: 429 + 430 + >>> source = S3Source( 431 + ... bucket="my-bucket", 432 + ... keys=["data-000.tar", "data-001.tar"], 433 + ... endpoint="https://r2.example.com", 434 + ... credentials=creds, 435 + ... ) 436 + >>> ds = Dataset[MySample](source) 437 + >>> for sample in ds.ordered(): 438 + ... print(sample) 427 439 """ 428 440 429 441 @property ··· 437 449 Tuple of (shard_identifier, file_like_stream). 438 450 439 451 Example: 440 - >>> for shard_id, stream in source.shards: 441 - ... print(f"Processing {shard_id}") 442 - ... data = stream.read() 452 + :: 453 + 454 + >>> for shard_id, stream in source.shards: 455 + ... print(f"Processing {shard_id}") 456 + ... data = stream.read() 443 457 """ 444 458 ... 445 459

+31 -23

src/atdata/_schema_codec.py

··· 10 10 arrays, and schema references. 11 11 12 12 Example: 13 - >>> schema = { 14 - ... "name": "ImageSample", 15 - ... "version": "1.0.0", 16 - ... "fields": [ 17 - ... {"name": "image", "fieldType": {"$type": "...#ndarray", "dtype": "float32"}, "optional": False}, 18 - ... {"name": "label", "fieldType": {"$type": "...#primitive", "primitive": "str"}, "optional": False}, 19 - ... ] 20 - ... } 21 - >>> ImageSample = schema_to_type(schema) 22 - >>> sample = ImageSample(image=np.zeros((64, 64)), label="cat") 13 + :: 14 + 15 + >>> schema = { 16 + ... "name": "ImageSample", 17 + ... "version": "1.0.0", 18 + ... "fields": [ 19 + ... {"name": "image", "fieldType": {"$type": "...#ndarray", "dtype": "float32"}, "optional": False}, 20 + ... {"name": "label", "fieldType": {"$type": "...#primitive", "primitive": "str"}, "optional": False}, 21 + ... ] 22 + ... } 23 + >>> ImageSample = schema_to_type(schema) 24 + >>> sample = ImageSample(image=np.zeros((64, 64)), label="cat") 23 25 """ 24 26 25 27 from dataclasses import field, make_dataclass ··· 150 152 ValueError: If schema is malformed or contains unsupported types. 151 153 152 154 Example: 153 - >>> schema = index.get_schema("local://schemas/MySample@1.0.0") 154 - >>> MySample = schema_to_type(schema) 155 - >>> ds = Dataset[MySample]("data.tar") 156 - >>> for sample in ds.ordered(): 157 - ... print(sample) 155 + :: 156 + 157 + >>> schema = index.get_schema("local://schemas/MySample@1.0.0") 158 + >>> MySample = schema_to_type(schema) 159 + >>> ds = Dataset[MySample]("data.tar") 160 + >>> for sample in ds.ordered(): 161 + ... print(sample) 158 162 """ 159 163 # Check cache first 160 164 if use_cache: ··· 279 283 String content for a .pyi stub file. 280 284 281 285 Example: 282 - >>> schema = index.get_schema("atdata://local/sampleSchema/MySample@1.0.0") 283 - >>> stub_content = generate_stub(schema.to_dict()) 284 - >>> # Save to a stubs directory configured in your IDE 285 - >>> with open("stubs/my_sample.pyi", "w") as f: 286 - ... f.write(stub_content) 286 + :: 287 + 288 + >>> schema = index.get_schema("atdata://local/sampleSchema/MySample@1.0.0") 289 + >>> stub_content = generate_stub(schema.to_dict()) 290 + >>> # Save to a stubs directory configured in your IDE 291 + >>> with open("stubs/my_sample.pyi", "w") as f: 292 + ... f.write(stub_content) 287 293 """ 288 294 name = schema.get("name", "UnknownSample") 289 295 version = schema.get("version", "1.0.0") ··· 355 361 String content for a .py module file. 356 362 357 363 Example: 358 - >>> schema = index.get_schema("atdata://local/sampleSchema/MySample@1.0.0") 359 - >>> module_content = generate_module(schema.to_dict()) 360 - >>> # The module can be imported after being saved 364 + :: 365 + 366 + >>> schema = index.get_schema("atdata://local/sampleSchema/MySample@1.0.0") 367 + >>> module_content = generate_module(schema.to_dict()) 368 + >>> # The module can be imported after being saved 361 369 """ 362 370 name = schema.get("name", "UnknownSample") 363 371 version = schema.get("version", "1.0.0")

+45 -35

src/atdata/_sources.py

··· 14 14 endpoints, and future backends like ATProto blobs. 15 15 16 16 Example: 17 - >>> # Standard URL (uses WebDataset's gopen) 18 - >>> source = URLSource("https://example.com/data-{000..009}.tar") 19 - >>> ds = Dataset[MySample](source) 20 - >>> 21 - >>> # Private S3 with credentials 22 - >>> source = S3Source( 23 - ... bucket="my-bucket", 24 - ... keys=["train/shard-000.tar", "train/shard-001.tar"], 25 - ... endpoint="https://my-r2.cloudflarestorage.com", 26 - ... access_key="...", 27 - ... secret_key="...", 28 - ... ) 29 - >>> ds = Dataset[MySample](source) 17 + :: 18 + 19 + >>> # Standard URL (uses WebDataset's gopen) 20 + >>> source = URLSource("https://example.com/data-{000..009}.tar") 21 + >>> ds = Dataset[MySample](source) 22 + >>> 23 + >>> # Private S3 with credentials 24 + >>> source = S3Source( 25 + ... bucket="my-bucket", 26 + ... keys=["train/shard-000.tar", "train/shard-001.tar"], 27 + ... endpoint="https://my-r2.cloudflarestorage.com", 28 + ... access_key="...", 29 + ... secret_key="...", 30 + ... ) 31 + >>> ds = Dataset[MySample](source) 30 32 """ 31 33 32 34 from __future__ import annotations ··· 53 55 url: URL or brace pattern for the shards. 54 56 55 57 Example: 56 - >>> source = URLSource("https://example.com/train-{000..009}.tar") 57 - >>> for shard_id, stream in source.shards: 58 - ... print(f"Streaming {shard_id}") 58 + :: 59 + 60 + >>> source = URLSource("https://example.com/train-{000..009}.tar") 61 + >>> for shard_id, stream in source.shards: 62 + ... print(f"Streaming {shard_id}") 59 63 """ 60 64 61 65 url: str ··· 128 132 region: Optional AWS region (defaults to us-east-1). 129 133 130 134 Example: 131 - >>> source = S3Source( 132 - ... bucket="my-datasets", 133 - ... keys=["train/shard-000.tar", "train/shard-001.tar"], 134 - ... endpoint="https://abc123.r2.cloudflarestorage.com", 135 - ... access_key="AKIAIOSFODNN7EXAMPLE", 136 - ... secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", 137 - ... ) 138 - >>> for shard_id, stream in source.shards: 139 - ... process(stream) 135 + :: 136 + 137 + >>> source = S3Source( 138 + ... bucket="my-datasets", 139 + ... keys=["train/shard-000.tar", "train/shard-001.tar"], 140 + ... endpoint="https://abc123.r2.cloudflarestorage.com", 141 + ... access_key="AKIAIOSFODNN7EXAMPLE", 142 + ... secret_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY", 143 + ... ) 144 + >>> for shard_id, stream in source.shards: 145 + ... process(stream) 140 146 """ 141 147 142 148 bucket: str ··· 253 259 ValueError: If URLs are not valid s3:// URLs or span multiple buckets. 254 260 255 261 Example: 256 - >>> source = S3Source.from_urls( 257 - ... ["s3://my-bucket/train-000.tar", "s3://my-bucket/train-001.tar"], 258 - ... endpoint="https://r2.example.com", 259 - ... ) 262 + :: 263 + 264 + >>> source = S3Source.from_urls( 265 + ... ["s3://my-bucket/train-000.tar", "s3://my-bucket/train-001.tar"], 266 + ... endpoint="https://r2.example.com", 267 + ... ) 260 268 """ 261 269 if not urls: 262 270 raise ValueError("urls cannot be empty") ··· 310 318 Configured S3Source. 311 319 312 320 Example: 313 - >>> creds = { 314 - ... "AWS_ACCESS_KEY_ID": "...", 315 - ... "AWS_SECRET_ACCESS_KEY": "...", 316 - ... "AWS_ENDPOINT": "https://r2.example.com", 317 - ... } 318 - >>> source = S3Source.from_credentials(creds, "my-bucket", ["data.tar"]) 321 + :: 322 + 323 + >>> creds = { 324 + ... "AWS_ACCESS_KEY_ID": "...", 325 + ... "AWS_SECRET_ACCESS_KEY": "...", 326 + ... "AWS_ENDPOINT": "https://r2.example.com", 327 + ... } 328 + >>> source = S3Source.from_credentials(creds, "my-bucket", ["data.tar"]) 319 329 """ 320 330 return cls( 321 331 bucket=bucket,

+20 -16

src/atdata/_stub_manager.py

··· 9 9 typed classes that work with both static type checkers and runtime. 10 10 11 11 Example: 12 - >>> from atdata.local import Index 13 - >>> 14 - >>> # Enable auto-stub generation 15 - >>> index = Index(auto_stubs=True) 16 - >>> 17 - >>> # Modules are generated automatically on decode_schema 18 - >>> MyType = index.decode_schema("atdata://local/sampleSchema/MySample@1.0.0") 19 - >>> # MyType is now properly typed for IDE autocomplete! 20 - >>> 21 - >>> # Get the stub directory path for IDE configuration 22 - >>> print(f"Add to IDE: {index.stub_dir}") 12 + :: 13 + 14 + >>> from atdata.local import Index 15 + >>> 16 + >>> # Enable auto-stub generation 17 + >>> index = Index(auto_stubs=True) 18 + >>> 19 + >>> # Modules are generated automatically on decode_schema 20 + >>> MyType = index.decode_schema("atdata://local/sampleSchema/MySample@1.0.0") 21 + >>> # MyType is now properly typed for IDE autocomplete! 22 + >>> 23 + >>> # Get the stub directory path for IDE configuration 24 + >>> print(f"Add to IDE: {index.stub_dir}") 23 25 """ 24 26 25 27 from pathlib import Path ··· 100 102 stub_dir: Directory to write module files. Defaults to ``~/.atdata/stubs/``. 101 103 102 104 Example: 103 - >>> manager = StubManager() 104 - >>> schema_dict = {"name": "MySample", "version": "1.0.0", "fields": [...]} 105 - >>> SampleClass = manager.ensure_module(schema_dict) 106 - >>> print(manager.stub_dir) 107 - /Users/you/.atdata/stubs 105 + :: 106 + 107 + >>> manager = StubManager() 108 + >>> schema_dict = {"name": "MySample", "version": "1.0.0", "fields": [...]} 109 + >>> SampleClass = manager.ensure_module(schema_dict) 110 + >>> print(manager.stub_dir) 111 + /Users/you/.atdata/stubs 108 112 """ 109 113 110 114 def __init__(self, stub_dir: Optional[Union[str, Path]] = None):

+17 -13

src/atdata/atmosphere/__init__.py

··· 16 16 or discover datasets on the ATProto network. 17 17 18 18 Example: 19 - >>> from atdata.atmosphere import AtmosphereClient, SchemaPublisher 20 - >>> 21 - >>> client = AtmosphereClient() 22 - >>> client.login("handle.bsky.social", "app-password") 23 - >>> 24 - >>> publisher = SchemaPublisher(client) 25 - >>> schema_uri = publisher.publish(MySampleType, version="1.0.0") 19 + :: 20 + 21 + >>> from atdata.atmosphere import AtmosphereClient, SchemaPublisher 22 + >>> 23 + >>> client = AtmosphereClient() 24 + >>> client.login("handle.bsky.social", "app-password") 25 + >>> 26 + >>> publisher = SchemaPublisher(client) 27 + >>> schema_uri = publisher.publish(MySampleType, version="1.0.0") 26 28 27 29 Note: 28 30 This module requires the ``atproto`` package to be installed:: ··· 101 103 a unified interface compatible with LocalIndex. 102 104 103 105 Example: 104 - >>> client = AtmosphereClient() 105 - >>> client.login("handle.bsky.social", "app-password") 106 - >>> 107 - >>> index = AtmosphereIndex(client) 108 - >>> schema_ref = index.publish_schema(MySample, version="1.0.0") 109 - >>> entry = index.insert_dataset(dataset, name="my-data") 106 + :: 107 + 108 + >>> client = AtmosphereClient() 109 + >>> client.login("handle.bsky.social", "app-password") 110 + >>> 111 + >>> index = AtmosphereIndex(client) 112 + >>> schema_ref = index.publish_schema(MySample, version="1.0.0") 113 + >>> entry = index.insert_dataset(dataset, name="my-data") 110 114 """ 111 115 112 116 def __init__(self, client: AtmosphereClient):

+9 -7

src/atdata/atmosphere/_types.py

··· 20 20 AT URIs follow the format: at://<authority>/<collection>/<rkey> 21 21 22 22 Example: 23 - >>> uri = AtUri.parse("at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz") 24 - >>> uri.authority 25 - 'did:plc:abc123' 26 - >>> uri.collection 27 - 'ac.foundation.dataset.sampleSchema' 28 - >>> uri.rkey 29 - 'xyz' 23 + :: 24 + 25 + >>> uri = AtUri.parse("at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz") 26 + >>> uri.authority 27 + 'did:plc:abc123' 28 + >>> uri.collection 29 + 'ac.foundation.dataset.sampleSchema' 30 + >>> uri.rkey 31 + 'xyz' 30 32 """ 31 33 32 34 authority: str

+6 -4

src/atdata/atmosphere/client.py

··· 34 34 for working with atdata records (schemas, datasets, lenses). 35 35 36 36 Example: 37 - >>> client = AtmosphereClient() 38 - >>> client.login("alice.bsky.social", "app-password") 39 - >>> print(client.did) 40 - 'did:plc:...' 37 + :: 38 + 39 + >>> client = AtmosphereClient() 40 + >>> client.login("alice.bsky.social", "app-password") 41 + >>> print(client.did) 42 + 'did:plc:...' 41 43 42 44 Note: 43 45 The password should be an app-specific password, not your main account

+28 -24

src/atdata/atmosphere/lens.py

··· 32 32 and point to the transformation code in a git repository. 33 33 34 34 Example: 35 - >>> @atdata.lens 36 - ... def my_lens(source: SourceType) -> TargetType: 37 - ... return TargetType(field=source.other_field) 38 - >>> 39 - >>> client = AtmosphereClient() 40 - >>> client.login("handle", "password") 41 - >>> 42 - >>> publisher = LensPublisher(client) 43 - >>> uri = publisher.publish( 44 - ... name="my_lens", 45 - ... source_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/source", 46 - ... target_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/target", 47 - ... code_repository="https://github.com/user/repo", 48 - ... code_commit="abc123def456", 49 - ... getter_path="mymodule.lenses:my_lens", 50 - ... putter_path="mymodule.lenses:my_lens_putter", 51 - ... ) 35 + :: 36 + 37 + >>> @atdata.lens 38 + ... def my_lens(source: SourceType) -> TargetType: 39 + ... return TargetType(field=source.other_field) 40 + >>> 41 + >>> client = AtmosphereClient() 42 + >>> client.login("handle", "password") 43 + >>> 44 + >>> publisher = LensPublisher(client) 45 + >>> uri = publisher.publish( 46 + ... name="my_lens", 47 + ... source_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/source", 48 + ... target_schema_uri="at://did:plc:abc/ac.foundation.dataset.sampleSchema/target", 49 + ... code_repository="https://github.com/user/repo", 50 + ... code_commit="abc123def456", 51 + ... getter_path="mymodule.lenses:my_lens", 52 + ... putter_path="mymodule.lenses:my_lens_putter", 53 + ... ) 52 54 53 55 Security Note: 54 56 Lens code is stored as references to git repositories rather than ··· 194 196 it manually. 195 197 196 198 Example: 197 - >>> client = AtmosphereClient() 198 - >>> loader = LensLoader(client) 199 - >>> 200 - >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.lens/xyz") 201 - >>> print(record["name"]) 202 - >>> print(record["sourceSchema"]) 203 - >>> print(record.get("getterCode", {}).get("repository")) 199 + :: 200 + 201 + >>> client = AtmosphereClient() 202 + >>> loader = LensLoader(client) 203 + >>> 204 + >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.lens/xyz") 205 + >>> print(record["name"]) 206 + >>> print(record["sourceSchema"]) 207 + >>> print(record.get("getterCode", {}).get("repository")) 204 208 """ 205 209 206 210 def __init__(self, client: AtmosphereClient):

+32 -26

src/atdata/atmosphere/records.py

··· 32 32 external storage (WebDataset URLs) or ATProto blobs. 33 33 34 34 Example: 35 - >>> dataset = atdata.Dataset[MySample]("s3://bucket/data-{000000..000009}.tar") 36 - >>> 37 - >>> client = AtmosphereClient() 38 - >>> client.login("handle", "password") 39 - >>> 40 - >>> publisher = DatasetPublisher(client) 41 - >>> uri = publisher.publish( 42 - ... dataset, 43 - ... name="My Training Data", 44 - ... description="Training data for my model", 45 - ... tags=["computer-vision", "training"], 46 - ... ) 35 + :: 36 + 37 + >>> dataset = atdata.Dataset[MySample]("s3://bucket/data-{000000..000009}.tar") 38 + >>> 39 + >>> client = AtmosphereClient() 40 + >>> client.login("handle", "password") 41 + >>> 42 + >>> publisher = DatasetPublisher(client) 43 + >>> uri = publisher.publish( 44 + ... dataset, 45 + ... name="My Training Data", 46 + ... description="Training data for my model", 47 + ... tags=["computer-vision", "training"], 48 + ... ) 47 49 """ 48 50 49 51 def __init__(self, client: AtmosphereClient): ··· 266 268 Python class for the sample type. 267 269 268 270 Example: 269 - >>> client = AtmosphereClient() 270 - >>> loader = DatasetLoader(client) 271 - >>> 272 - >>> # List available datasets 273 - >>> datasets = loader.list() 274 - >>> for ds in datasets: 275 - ... print(ds["name"], ds["schemaRef"]) 276 - >>> 277 - >>> # Get a specific dataset record 278 - >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.record/xyz") 271 + :: 272 + 273 + >>> client = AtmosphereClient() 274 + >>> loader = DatasetLoader(client) 275 + >>> 276 + >>> # List available datasets 277 + >>> datasets = loader.list() 278 + >>> for ds in datasets: 279 + ... print(ds["name"], ds["schemaRef"]) 280 + >>> 281 + >>> # Get a specific dataset record 282 + >>> record = loader.get("at://did:plc:abc/ac.foundation.dataset.record/xyz") 279 283 """ 280 284 281 285 def __init__(self, client: AtmosphereClient): ··· 475 479 ValueError: If no storage URLs can be resolved. 476 480 477 481 Example: 478 - >>> loader = DatasetLoader(client) 479 - >>> dataset = loader.to_dataset(uri, MySampleType) 480 - >>> for batch in dataset.shuffled(batch_size=32): 481 - ... process(batch) 482 + :: 483 + 484 + >>> loader = DatasetLoader(client) 485 + >>> dataset = loader.to_dataset(uri, MySampleType) 486 + >>> for batch in dataset.shuffled(batch_size=32): 487 + ... process(batch) 482 488 """ 483 489 # Import here to avoid circular import 484 490 from ..dataset import Dataset

+23 -19

src/atdata/atmosphere/schema.py

··· 38 38 definitions and publishes them as an ATProto schema record. 39 39 40 40 Example: 41 - >>> @atdata.packable 42 - ... class MySample: 43 - ... image: NDArray 44 - ... label: str 45 - ... 46 - >>> client = AtmosphereClient() 47 - >>> client.login("handle", "password") 48 - >>> 49 - >>> publisher = SchemaPublisher(client) 50 - >>> uri = publisher.publish(MySample, version="1.0.0") 51 - >>> print(uri) 52 - at://did:plc:.../ac.foundation.dataset.sampleSchema/... 41 + :: 42 + 43 + >>> @atdata.packable 44 + ... class MySample: 45 + ... image: NDArray 46 + ... label: str 47 + ... 48 + >>> client = AtmosphereClient() 49 + >>> client.login("handle", "password") 50 + >>> 51 + >>> publisher = SchemaPublisher(client) 52 + >>> uri = publisher.publish(MySample, version="1.0.0") 53 + >>> print(uri) 54 + at://did:plc:.../ac.foundation.dataset.sampleSchema/... 53 55 """ 54 56 55 57 def __init__(self, client: AtmosphereClient): ··· 177 179 schemas from a repository. 178 180 179 181 Example: 180 - >>> client = AtmosphereClient() 181 - >>> client.login("handle", "password") 182 - >>> 183 - >>> loader = SchemaLoader(client) 184 - >>> schema = loader.get("at://did:plc:.../ac.foundation.dataset.sampleSchema/...") 185 - >>> print(schema["name"]) 186 - 'MySample' 182 + :: 183 + 184 + >>> client = AtmosphereClient() 185 + >>> client.login("handle", "password") 186 + >>> 187 + >>> loader = SchemaLoader(client) 188 + >>> schema = loader.get("at://did:plc:.../ac.foundation.dataset.sampleSchema/...") 189 + >>> print(schema["name"]) 190 + 'MySample' 187 191 """ 188 192 189 193 def __init__(self, client: AtmosphereClient):

+71 -55

src/atdata/dataset.py

··· 14 14 archives. 15 15 16 16 Example: 17 - >>> @packable 18 - ... class ImageSample: 19 - ... image: NDArray 20 - ... label: str 21 - ... 22 - >>> ds = Dataset[ImageSample]("data-{000000..000009}.tar") 23 - >>> for batch in ds.shuffled(batch_size=32): 24 - ... images = batch.image # Stacked numpy array (32, H, W, C) 25 - ... labels = batch.label # List of 32 strings 17 + :: 18 + 19 + >>> @packable 20 + ... class ImageSample: 21 + ... image: NDArray 22 + ... label: str 23 + ... 24 + >>> ds = Dataset[ImageSample]("data-{000000..000009}.tar") 25 + >>> for batch in ds.shuffled(batch_size=32): 26 + ... images = batch.image # Stacked numpy array (32, H, W, C) 27 + ... labels = batch.label # List of 32 strings 26 28 """ 27 29 28 30 ## ··· 124 126 registers a lens from ``DictSample``, making this conversion seamless. 125 127 126 128 Example: 127 - >>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample] 128 - >>> for sample in ds.ordered(): 129 - ... print(sample.some_field) # Attribute access 130 - ... print(sample["other_field"]) # Dict access 131 - ... print(sample.keys()) # Inspect available fields 132 - ... 133 - >>> # Convert to typed schema 134 - >>> typed_ds = ds.as_type(MyTypedSample) 129 + :: 130 + 131 + >>> ds = load_dataset("path/to/data.tar") # Returns Dataset[DictSample] 132 + >>> for sample in ds.ordered(): 133 + ... print(sample.some_field) # Attribute access 134 + ... print(sample["other_field"]) # Dict access 135 + ... print(sample.keys()) # Inspect available fields 136 + ... 137 + >>> # Convert to typed schema 138 + >>> typed_ds = ds.as_type(MyTypedSample) 135 139 136 140 Note: 137 141 NDArray fields are stored as raw bytes in DictSample. They are only ··· 285 289 2. Using the ``@packable`` decorator (recommended) 286 290 287 291 Example: 288 - >>> @packable 289 - ... class MyData: 290 - ... name: str 291 - ... embeddings: NDArray 292 - ... 293 - >>> sample = MyData(name="test", embeddings=np.array([1.0, 2.0])) 294 - >>> packed = sample.packed # Serialize to bytes 295 - >>> restored = MyData.from_bytes(packed) # Deserialize 292 + :: 293 + 294 + >>> @packable 295 + ... class MyData: 296 + ... name: str 297 + ... embeddings: NDArray 298 + ... 299 + >>> sample = MyData(name="test", embeddings=np.array([1.0, 2.0])) 300 + >>> packed = sample.packed # Serialize to bytes 301 + >>> restored = MyData.from_bytes(packed) # Deserialize 296 302 """ 297 303 298 304 def _ensure_good( self ): ··· 417 423 NDArray fields are stacked into a numpy array with a batch dimension. 418 424 Other fields are aggregated into a list. 419 425 420 - Type Parameters: 426 + Parameters: 421 427 DT: The sample type, must derive from ``PackableSample``. 422 428 423 429 Attributes: 424 430 samples: The list of sample instances in this batch. 425 431 426 432 Example: 427 - >>> batch = SampleBatch[MyData]([sample1, sample2, sample3]) 428 - >>> batch.embeddings # Returns stacked numpy array of shape (3, ...) 429 - >>> batch.names # Returns list of names 433 + :: 434 + 435 + >>> batch = SampleBatch[MyData]([sample1, sample2, sample3]) 436 + >>> batch.embeddings # Returns stacked numpy array of shape (3, ...) 437 + >>> batch.names # Returns list of names 430 438 431 439 Note: 432 440 This class uses Python's ``__orig_class__`` mechanism to extract the ··· 434 442 subscripted syntax ``SampleBatch[MyType](samples)`` rather than 435 443 calling the constructor directly with an unsubscripted class. 436 444 """ 445 + # TODO The above has a line for "Parameters:" that should be "Type Parameters:"; this is a temporary fix for `quartodoc` auto-generation bugs. 437 446 438 447 def __init__( self, samples: Sequence[DT] ): 439 448 """Create a batch from a sequence of samples. ··· 539 548 - Type transformations via the lens system (``as_type()``) 540 549 - Export to parquet format 541 550 542 - Type Parameters: 551 + Parameters: 543 552 ST: The sample type for this dataset, must derive from ``PackableSample``. 544 553 545 554 Attributes: 546 555 url: WebDataset brace-notation URL for the tar file(s). 547 556 548 557 Example: 549 - >>> ds = Dataset[MyData]("path/to/data-{000000..000009}.tar") 550 - >>> for sample in ds.ordered(batch_size=32): 551 - ... # sample is SampleBatch[MyData] with batch_size samples 552 - ... embeddings = sample.embeddings # shape: (32, ...) 553 - ... 554 - >>> # Transform to a different view 555 - >>> ds_view = ds.as_type(MyDataView) 558 + :: 559 + 560 + >>> ds = Dataset[MyData]("path/to/data-{000000..000009}.tar") 561 + >>> for sample in ds.ordered(batch_size=32): 562 + ... # sample is SampleBatch[MyData] with batch_size samples 563 + ... embeddings = sample.embeddings # shape: (32, ...) 564 + ... 565 + >>> # Transform to a different view 566 + >>> ds_view = ds.as_type(MyDataView) 556 567 557 568 Note: 558 569 This class uses Python's ``__orig_class__`` mechanism to extract the ··· 560 571 subscripted syntax ``Dataset[MyType](url)`` rather than calling the 561 572 constructor directly with an unsubscripted class. 562 573 """ 574 + # TODO The above has a line for "Parameters:" that should be "Type Parameters:"; this is a temporary fix for `quartodoc` auto-generation bugs. 563 575 564 576 @property 565 577 def sample_type( self ) -> Type: ··· 809 821 ``output-000001.parquet``, etc. 810 822 811 823 Example: 812 - >>> ds = Dataset[MySample]("data.tar") 813 - >>> # Small dataset - load all at once 814 - >>> ds.to_parquet("output.parquet") 815 - >>> 816 - >>> # Large dataset - process in chunks 817 - >>> ds.to_parquet("output.parquet", maxcount=50000) 824 + :: 825 + 826 + >>> ds = Dataset[MySample]("data.tar") 827 + >>> # Small dataset - load all at once 828 + >>> ds.to_parquet("output.parquet") 829 + >>> 830 + >>> # Large dataset - process in chunks 831 + >>> ds.to_parquet("output.parquet", maxcount=50000) 818 832 """ 819 833 ## 820 834 ··· 938 952 name and annotations as the original class. The class satisfies the 939 953 ``Packable`` protocol and can be used with ``Type[Packable]`` signatures. 940 954 941 - Example: 942 - >>> @packable 943 - ... class MyData: 944 - ... name: str 945 - ... values: NDArray 946 - ... 947 - >>> sample = MyData(name="test", values=np.array([1, 2, 3])) 948 - >>> bytes_data = sample.packed 949 - >>> restored = MyData.from_bytes(bytes_data) 950 - >>> 951 - >>> # Works with Packable-typed APIs 952 - >>> index.publish_schema(MyData, version="1.0.0") # Type-safe 955 + Examples: 956 + This is a test of the functionality:: 957 + 958 + @packable 959 + class MyData: 960 + name: str 961 + values: NDArray 962 + 963 + sample = MyData(name="test", values=np.array([1, 2, 3])) 964 + bytes_data = sample.packed 965 + restored = MyData.from_bytes(bytes_data) 966 + 967 + # Works with Packable-typed APIs 968 + index.publish_schema(MyData, version="1.0.0") # Type-safe 953 969 """ 954 970 955 971 ##

+48 -39

src/atdata/lens.py

··· 15 15 transformations that satisfy lens laws (GetPut and PutGet). 16 16 17 17 Example: 18 - >>> @packable 19 - ... class FullData: 20 - ... name: str 21 - ... age: int 22 - ... embedding: NDArray 23 - ... 24 - >>> @packable 25 - ... class NameOnly: 26 - ... name: str 27 - ... 28 - >>> @lens 29 - ... def name_view(full: FullData) -> NameOnly: 30 - ... return NameOnly(name=full.name) 31 - ... 32 - >>> @name_view.putter 33 - ... def name_view_put(view: NameOnly, source: FullData) -> FullData: 34 - ... return FullData(name=view.name, age=source.age, 35 - ... embedding=source.embedding) 36 - ... 37 - >>> ds = Dataset[FullData]("data.tar") 38 - >>> ds_names = ds.as_type(NameOnly) # Uses registered lens 18 + :: 19 + 20 + >>> @packable 21 + ... class FullData: 22 + ... name: str 23 + ... age: int 24 + ... embedding: NDArray 25 + ... 26 + >>> @packable 27 + ... class NameOnly: 28 + ... name: str 29 + ... 30 + >>> @lens 31 + ... def name_view(full: FullData) -> NameOnly: 32 + ... return NameOnly(name=full.name) 33 + ... 34 + >>> @name_view.putter 35 + ... def name_view_put(view: NameOnly, source: FullData) -> FullData: 36 + ... return FullData(name=view.name, age=source.age, 37 + ... embedding=source.embedding) 38 + ... 39 + >>> ds = Dataset[FullData]("data.tar") 40 + >>> ds_names = ds.as_type(NameOnly) # Uses registered lens 39 41 """ 40 42 41 43 ## ··· 86 88 and an optional putter that transforms ``(V, S) -> S``, enabling updates to 87 89 the view to be reflected back in the source. 88 90 89 - Type Parameters: 91 + Parameters: 90 92 S: The source type, must derive from ``PackableSample``. 91 93 V: The view type, must derive from ``PackableSample``. 92 94 93 95 Example: 94 - >>> @lens 95 - ... def name_lens(full: FullData) -> NameOnly: 96 - ... return NameOnly(name=full.name) 97 - ... 98 - >>> @name_lens.putter 99 - ... def name_lens_put(view: NameOnly, source: FullData) -> FullData: 100 - ... return FullData(name=view.name, age=source.age) 96 + :: 97 + 98 + >>> @lens 99 + ... def name_lens(full: FullData) -> NameOnly: 100 + ... return NameOnly(name=full.name) 101 + ... 102 + >>> @name_lens.putter 103 + ... def name_lens_put(view: NameOnly, source: FullData) -> FullData: 104 + ... return FullData(name=view.name, age=source.age) 101 105 """ 106 + # TODO The above has a line for "Parameters:" that should be "Type Parameters:"; this is a temporary fix for `quartodoc` auto-generation bugs. 102 107 103 108 def __init__( self, get: LensGetter[S, V], 104 109 put: Optional[LensPutter[S, V]] = None ··· 159 164 The putter function, allowing this to be used as a decorator. 160 165 161 166 Example: 162 - >>> @my_lens.putter 163 - ... def my_lens_put(view: ViewType, source: SourceType) -> SourceType: 164 - ... return SourceType(...) 167 + :: 168 + 169 + >>> @my_lens.putter 170 + ... def my_lens_put(view: ViewType, source: SourceType) -> SourceType: 171 + ... return SourceType(...) 165 172 """ 166 173 ## 167 174 self._putter = put ··· 212 219 or decorated with ``@lens_name.putter`` to add a putter function. 213 220 214 221 Example: 215 - >>> @lens 216 - ... def extract_name(full: FullData) -> NameOnly: 217 - ... return NameOnly(name=full.name) 218 - ... 219 - >>> @extract_name.putter 220 - ... def extract_name_put(view: NameOnly, source: FullData) -> FullData: 221 - ... return FullData(name=view.name, age=source.age) 222 + :: 223 + 224 + >>> @lens 225 + ... def extract_name(full: FullData) -> NameOnly: 226 + ... return NameOnly(name=full.name) 227 + ... 228 + >>> @extract_name.putter 229 + ... def extract_name_put(view: NameOnly, source: FullData) -> FullData: 230 + ... return FullData(name=view.name, age=source.age) 222 231 """ 223 232 ret = Lens[S, V]( f ) 224 233 _network.register( ret )

+36 -26

src/atdata/local.py

··· 85 85 schema's class becomes available as an attribute on this namespace. 86 86 87 87 Example: 88 - >>> index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 89 - >>> MyType = index.types.MySample 90 - >>> sample = MyType(field1="hello", field2=42) 88 + :: 89 + 90 + >>> index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 91 + >>> MyType = index.types.MySample 92 + >>> sample = MyType(field1="hello", field2=42) 91 93 92 94 The namespace supports: 93 95 - Attribute access: ``index.types.MySample`` ··· 1026 1028 as attributes on this namespace. 1027 1029 1028 1030 Example: 1029 - >>> index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 1030 - >>> MyType = index.types.MySample 1031 - >>> sample = MyType(name="hello", value=42) 1031 + :: 1032 + 1033 + >>> index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 1034 + >>> MyType = index.types.MySample 1035 + >>> sample = MyType(name="hello", value=42) 1032 1036 1033 1037 Returns: 1034 1038 SchemaNamespace containing all loaded schema types. ··· 1055 1059 ValueError: If schema cannot be decoded. 1056 1060 1057 1061 Example: 1058 - >>> # Load and use immediately 1059 - >>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 1060 - >>> sample = MyType(name="hello", value=42) 1061 - >>> 1062 - >>> # Or access later via namespace 1063 - >>> index.load_schema("atdata://local/sampleSchema/OtherType@1.0.0") 1064 - >>> other = index.types.OtherType(data="test") 1062 + :: 1063 + 1064 + >>> # Load and use immediately 1065 + >>> MyType = index.load_schema("atdata://local/sampleSchema/MySample@1.0.0") 1066 + >>> sample = MyType(name="hello", value=42) 1067 + >>> 1068 + >>> # Or access later via namespace 1069 + >>> index.load_schema("atdata://local/sampleSchema/OtherType@1.0.0") 1070 + >>> other = index.types.OtherType(data="test") 1065 1071 """ 1066 1072 # Decode the schema (uses generated module if auto_stubs enabled) 1067 1073 cls = self.decode_schema(ref) ··· 1085 1091 is disabled. 1086 1092 1087 1093 Example: 1088 - >>> index = LocalIndex(auto_stubs=True) 1089 - >>> ref = index.publish_schema(MySample, version="1.0.0") 1090 - >>> index.load_schema(ref) 1091 - >>> print(index.get_import_path(ref)) 1092 - local.MySample_1_0_0 1093 - >>> # Then in your code: 1094 - >>> # from local.MySample_1_0_0 import MySample 1094 + :: 1095 + 1096 + >>> index = LocalIndex(auto_stubs=True) 1097 + >>> ref = index.publish_schema(MySample, version="1.0.0") 1098 + >>> index.load_schema(ref) 1099 + >>> print(index.get_import_path(ref)) 1100 + local.MySample_1_0_0 1101 + >>> # Then in your code: 1102 + >>> # from local.MySample_1_0_0 import MySample 1095 1103 """ 1096 1104 if self._stub_manager is None: 1097 1105 return None ··· 1526 1534 The decoded type, cast to match the type_hint for IDE support. 1527 1535 1528 1536 Example: 1529 - >>> # After enabling auto_stubs and configuring IDE extraPaths: 1530 - >>> from local.MySample_1_0_0 import MySample 1531 - >>> 1532 - >>> # This gives full IDE autocomplete: 1533 - >>> DecodedType = index.decode_schema_as(ref, MySample) 1534 - >>> sample = DecodedType(text="hello", value=42) # IDE knows signature! 1537 + :: 1538 + 1539 + >>> # After enabling auto_stubs and configuring IDE extraPaths: 1540 + >>> from local.MySample_1_0_0 import MySample 1541 + >>> 1542 + >>> # This gives full IDE autocomplete: 1543 + >>> DecodedType = index.decode_schema_as(ref, MySample) 1544 + >>> sample = DecodedType(text="hello", value=42) # IDE knows signature! 1535 1545 1536 1546 Note: 1537 1547 The type_hint is only used for static type checking - at runtime,

+20 -16

src/atdata/promote.py

··· 5 5 federation while maintaining schema consistency. 6 6 7 7 Example: 8 - >>> from atdata.local import LocalIndex, Repo 9 - >>> from atdata.atmosphere import AtmosphereClient, AtmosphereIndex 10 - >>> from atdata.promote import promote_to_atmosphere 11 - >>> 12 - >>> # Setup 13 - >>> local_index = LocalIndex() 14 - >>> client = AtmosphereClient() 15 - >>> client.login("handle.bsky.social", "app-password") 16 - >>> 17 - >>> # Promote a dataset 18 - >>> entry = local_index.get_dataset("my-dataset") 19 - >>> at_uri = promote_to_atmosphere(entry, local_index, client) 8 + :: 9 + 10 + >>> from atdata.local import LocalIndex, Repo 11 + >>> from atdata.atmosphere import AtmosphereClient, AtmosphereIndex 12 + >>> from atdata.promote import promote_to_atmosphere 13 + >>> 14 + >>> # Setup 15 + >>> local_index = LocalIndex() 16 + >>> client = AtmosphereClient() 17 + >>> client.login("handle.bsky.social", "app-password") 18 + >>> 19 + >>> # Promote a dataset 20 + >>> entry = local_index.get_dataset("my-dataset") 21 + >>> at_uri = promote_to_atmosphere(entry, local_index, client) 20 22 """ 21 23 22 24 from typing import TYPE_CHECKING, Type ··· 127 129 ValueError: If local entry has no data URLs. 128 130 129 131 Example: 130 - >>> entry = local_index.get_dataset("mnist-train") 131 - >>> uri = promote_to_atmosphere(entry, local_index, client) 132 - >>> print(uri) 133 - at://did:plc:abc123/ac.foundation.dataset.datasetIndex/... 132 + :: 133 + 134 + >>> entry = local_index.get_dataset("mnist-train") 135 + >>> uri = promote_to_atmosphere(entry, local_index, client) 136 + >>> print(uri) 137 + at://did:plc:abc123/ac.foundation.dataset.datasetIndex/... 134 138 """ 135 139 from .atmosphere import DatasetPublisher 136 140 from ._schema_codec import schema_to_type

Configure Feed

Configure Feed