A loose federation of distributed, typed datasets
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: add deployment and troubleshooting guides with loader API documentation

- Add new reference pages: deployment.html and troubleshooting.html
- Document lower-level loaders (SchemaLoader, DatasetLoader, LensLoader) in atmosphere reference
- Update navigation across all doc pages to include new references
- Fix failing tests in test_integration_error_handling.py (#337)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+3486 -334
.chainlink/issues.db

This is a binary file and will not be displayed.

+1
CHANGELOG.md
··· 25 25 - **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases 26 26 27 27 ### Changed 28 + - Fix failing tests in test_integration_error_handling.py (#337) 28 29 - v0.2.2 beta release improvements (#326) 29 30 - Document to_parquet() memory usage (#336) 30 31 - Evaluate splitting local.py into modules (#335)
+20
docs/api/index.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 498 506 <div class="sidebar-item-container"> 499 507 <a href="../reference/uri-spec.html" class="sidebar-item-text sidebar-link"> 500 508 <span class="menu-text">URI Specification</span></a> 509 + </div> 510 + </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 501 521 </div> 502 522 </li> 503 523 </ul>
+26 -6
docs/index.html
··· 358 358 <a class="dropdown-item" href="./reference/uri-spec.html"> 359 359 <span class="dropdown-text">URI Specification</span></a> 360 360 </li> 361 + <li> 362 + <a class="dropdown-item" href="./reference/troubleshooting.html"> 363 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 364 + </li> 365 + <li> 366 + <a class="dropdown-item" href="./reference/deployment.html"> 367 + <span class="dropdown-text">Deployment Guide</span></a> 368 + </li> 361 369 </ul> 362 370 </li> 363 371 <li class="nav-item"> ··· 499 507 <span class="menu-text">URI Specification</span></a> 500 508 </div> 501 509 </li> 510 + <li class="sidebar-item"> 511 + <div class="sidebar-item-container"> 512 + <a href="./reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 513 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 514 + </div> 515 + </li> 516 + <li class="sidebar-item"> 517 + <div class="sidebar-item-container"> 518 + <a href="./reference/deployment.html" class="sidebar-item-text sidebar-link"> 519 + <span class="menu-text">Deployment Guide</span></a> 520 + </div> 521 + </li> 502 522 </ul> 503 523 </li> 504 524 <li class="sidebar-item sidebar-item-section"> ··· 616 636 <h2 class="anchored" data-anchor-id="quick-example">Quick Example</h2> 617 637 <section id="define-a-sample-type" class="level3"> 618 638 <h3 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h3> 619 - <div id="916a9cce" class="cell"> 639 + <div id="51496c92" class="cell"> 620 640 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 621 641 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 622 642 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 630 650 </section> 631 651 <section id="create-and-write-samples" class="level3"> 632 652 <h3 class="anchored" data-anchor-id="create-and-write-samples">Create and Write Samples</h3> 633 - <div id="3d97fc6b" class="cell"> 653 + <div id="7283938c" class="cell"> 634 654 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 635 655 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 636 656 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> ··· 649 669 </section> 650 670 <section id="load-and-iterate" class="level3"> 651 671 <h3 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h3> 652 - <div id="0563cacd" class="cell"> 672 + <div id="95680d15" class="cell"> 653 673 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-000000.tar"</span>)</span> 654 674 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 655 675 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate with batching</span></span> ··· 662 682 </section> 663 683 <section id="huggingface-style-loading" class="level2"> 664 684 <h2 class="anchored" data-anchor-id="huggingface-style-loading">HuggingFace-Style Loading</h2> 665 - <div id="41d862a6" class="cell"> 685 + <div id="04eeccba" class="cell"> 666 686 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local path</span></span> 667 687 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.load_dataset(<span class="st">"path/to/data-{000000..000009}.tar"</span>, split<span class="op">=</span><span class="st">"train"</span>)</span> 668 688 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 674 694 </section> 675 695 <section id="local-storage-with-redis-s3" class="level2"> 676 696 <h2 class="anchored" data-anchor-id="local-storage-with-redis-s3">Local Storage with Redis + S3</h2> 677 - <div id="da8cd0f5" class="cell"> 697 + <div id="1b0c3cfe" class="cell"> 678 698 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 679 699 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 680 700 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 698 718 </section> 699 719 <section id="publish-to-atproto-federation" class="level2"> 700 720 <h2 class="anchored" data-anchor-id="publish-to-atproto-federation">Publish to ATProto Federation</h2> 701 - <div id="72b91bec" class="cell"> 721 + <div id="ba25ed5b" class="cell"> 702 722 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 703 723 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> 704 724 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
+185 -88
docs/reference/atmosphere.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 548 568 <li><a href="#datasetpublisher" id="toc-datasetpublisher" class="nav-link" data-scroll-target="#datasetpublisher">DatasetPublisher</a></li> 549 569 <li><a href="#lenspublisher" id="toc-lenspublisher" class="nav-link" data-scroll-target="#lenspublisher">LensPublisher</a></li> 550 570 </ul></li> 571 + <li><a href="#lower-level-loaders" id="toc-lower-level-loaders" class="nav-link" data-scroll-target="#lower-level-loaders">Lower-Level Loaders</a> 572 + <ul class="collapse"> 573 + <li><a href="#schemaloader" id="toc-schemaloader" class="nav-link" data-scroll-target="#schemaloader">SchemaLoader</a></li> 574 + <li><a href="#datasetloader" id="toc-datasetloader" class="nav-link" data-scroll-target="#datasetloader">DatasetLoader</a></li> 575 + <li><a href="#lensloader" id="toc-lensloader" class="nav-link" data-scroll-target="#lensloader">LensLoader</a></li> 576 + </ul></li> 551 577 <li><a href="#at-uris" id="toc-at-uris" class="nav-link" data-scroll-target="#at-uris">AT URIs</a></li> 552 578 <li><a href="#supported-field-types" id="toc-supported-field-types" class="nav-link" data-scroll-target="#supported-field-types">Supported Field Types</a></li> 553 579 <li><a href="#complete-example" id="toc-complete-example" class="nav-link" data-scroll-target="#complete-example">Complete Example</a></li> ··· 602 628 <section id="atmosphereclient" class="level2"> 603 629 <h2 class="anchored" data-anchor-id="atmosphereclient">AtmosphereClient</h2> 604 630 <p>The client handles authentication and record operations:</p> 605 - <div id="5fa059eb" class="cell"> 631 + <div id="54a7d635" class="cell"> 606 632 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 607 633 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 608 634 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> ··· 629 655 <section id="session-management" class="level3"> 630 656 <h3 class="anchored" data-anchor-id="session-management">Session Management</h3> 631 657 <p>Save and restore sessions to avoid re-authentication:</p> 632 - <div id="8959ae2e" class="cell"> 658 + <div id="cb24eb75" class="cell"> 633 659 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export session for later</span></span> 634 660 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>session_string <span class="op">=</span> client.export_session()</span> 635 661 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 641 667 <section id="custom-pds" class="level3"> 642 668 <h3 class="anchored" data-anchor-id="custom-pds">Custom PDS</h3> 643 669 <p>Connect to a custom PDS instead of bsky.social:</p> 644 - <div id="ad2ea8c0" class="cell"> 670 + <div id="2ea717d1" class="cell"> 645 671 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient(base_url<span class="op">=</span><span class="st">"https://pds.example.com"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 646 672 </div> 647 673 </section> ··· 649 675 <section id="atmosphereindex" class="level2"> 650 676 <h2 class="anchored" data-anchor-id="atmosphereindex">AtmosphereIndex</h2> 651 677 <p>The unified interface for ATProto operations, implementing the AbstractIndex protocol:</p> 652 - <div id="64bb9377" class="cell"> 678 + <div id="9c653596" class="cell"> 653 679 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 654 680 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 655 681 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> ··· 659 685 </div> 660 686 <section id="publishing-schemas" class="level3"> 661 687 <h3 class="anchored" data-anchor-id="publishing-schemas">Publishing Schemas</h3> 662 - <div id="ce0c598d" class="cell"> 688 + <div id="e49cb8cd" class="cell"> 663 689 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 664 690 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 665 691 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 680 706 </section> 681 707 <section id="publishing-datasets" class="level3"> 682 708 <h3 class="anchored" data-anchor-id="publishing-datasets">Publishing Datasets</h3> 683 - <div id="20718b28" class="cell"> 709 + <div id="d2dca930" class="cell"> 684 710 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 685 711 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 686 712 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> ··· 698 724 </section> 699 725 <section id="listing-and-retrieving" class="level3"> 700 726 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 701 - <div id="993e8525" class="cell"> 727 + <div id="c49b3743" class="cell"> 702 728 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># List your datasets</span></span> 703 729 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets():</span> 704 730 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> ··· 724 750 <p>For more control, use the individual publisher classes:</p> 725 751 <section id="schemapublisher" class="level3"> 726 752 <h3 class="anchored" data-anchor-id="schemapublisher">SchemaPublisher</h3> 727 - <div id="0a1832cc" class="cell"> 753 + <div id="e627e8b5" class="cell"> 728 754 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaPublisher</span> 729 755 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 730 756 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> SchemaPublisher(client)</span> ··· 740 766 </section> 741 767 <section id="datasetpublisher" class="level3"> 742 768 <h3 class="anchored" data-anchor-id="datasetpublisher">DatasetPublisher</h3> 743 - <div id="330466cd" class="cell"> 769 + <div id="c3ac2153" class="cell"> 744 770 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetPublisher</span> 745 771 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 746 772 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> DatasetPublisher(client)</span> ··· 758 784 <section id="blob-storage" class="level4"> 759 785 <h4 class="anchored" data-anchor-id="blob-storage">Blob Storage</h4> 760 786 <p>For smaller datasets (up to ~50MB per shard), you can store data directly in ATProto blobs instead of external URLs:</p> 761 - <div id="4a14d06b" class="cell"> 787 + <div id="a6ef8b57" class="cell"> 762 788 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 763 789 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 764 790 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> ··· 778 804 <span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 779 805 </div> 780 806 <p>To load datasets with blob storage:</p> 781 - <div id="20df2904" class="cell"> 807 + <div id="6b91066e" class="cell"> 782 808 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 783 809 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span> 784 810 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> ··· 799 825 </section> 800 826 <section id="lenspublisher" class="level3"> 801 827 <h3 class="anchored" data-anchor-id="lenspublisher">LensPublisher</h3> 802 - <div id="4903cae5" class="cell"> 828 + <div id="a506b50a" class="cell"> 803 829 <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensPublisher</span> 804 830 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span> 805 831 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> LensPublisher(client)</span> ··· 837 863 </div> 838 864 </section> 839 865 </section> 866 + <section id="lower-level-loaders" class="level2"> 867 + <h2 class="anchored" data-anchor-id="lower-level-loaders">Lower-Level Loaders</h2> 868 + <p>For direct access to records, use the loader classes:</p> 869 + <section id="schemaloader" class="level3"> 870 + <h3 class="anchored" data-anchor-id="schemaloader">SchemaLoader</h3> 871 + <div id="222d40c3" class="cell"> 872 + <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaLoader</span> 873 + <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span> 874 + <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> SchemaLoader(client)</span> 875 + <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a></span> 876 + <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific schema</span></span> 877 + <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>schema <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 878 + <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(schema[<span class="st">"name"</span>], schema[<span class="st">"version"</span>])</span> 879 + <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span> 880 + <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="co"># List all schemas from a repository</span></span> 881 + <span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> schema <span class="kw">in</span> loader.list_all(repo<span class="op">=</span><span class="st">"did:plc:other-user"</span>):</span> 882 + <span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(schema[<span class="st">"name"</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 883 + </div> 884 + </section> 885 + <section id="datasetloader" class="level3"> 886 + <h3 class="anchored" data-anchor-id="datasetloader">DatasetLoader</h3> 887 + <div id="f22ac274" class="cell"> 888 + <div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 889 + <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span> 890 + <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> 891 + <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a></span> 892 + <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific dataset record</span></span> 893 + <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>record <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.record/xyz"</span>)</span> 894 + <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a></span> 895 + <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 896 + <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> loader.get_storage_type(uri) <span class="co"># "external" or "blobs"</span></span> 897 + <span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a></span> 898 + <span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Get URLs based on storage type</span></span> 899 + <span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"external"</span>:</span> 900 + <span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_urls(uri)</span> 901 + <span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a><span class="cf">else</span>:</span> 902 + <span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_blob_urls(uri)</span> 903 + <span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a></span> 904 + <span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Get metadata</span></span> 905 + <span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a>metadata <span class="op">=</span> loader.get_metadata(uri)</span> 906 + <span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a></span> 907 + <span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a Dataset object directly</span></span> 908 + <span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> loader.to_dataset(uri, MySampleType)</span> 909 + <span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 910 + <span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 911 + </div> 912 + </section> 913 + <section id="lensloader" class="level3"> 914 + <h3 class="anchored" data-anchor-id="lensloader">LensLoader</h3> 915 + <div id="1f7d10f0" class="cell"> 916 + <div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensLoader</span> 917 + <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span> 918 + <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> LensLoader(client)</span> 919 + <span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a></span> 920 + <span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific lens record</span></span> 921 + <span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>lens <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.lens/xyz"</span>)</span> 922 + <span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 923 + <span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"sourceSchema"</span>], <span class="st">"-&gt;"</span>, lens[<span class="st">"targetSchema"</span>])</span> 924 + <span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a></span> 925 + <span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a><span class="co"># List all lenses from a repository</span></span> 926 + <span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> lens <span class="kw">in</span> loader.list_all():</span> 927 + <span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 928 + <span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a></span> 929 + <span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Find lenses by schema</span></span> 930 + <span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>lenses <span class="op">=</span> loader.find_by_schemas(</span> 931 + <span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a> source_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source"</span>,</span> 932 + <span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a> target_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target"</span>,</span> 933 + <span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 934 + </div> 935 + </section> 936 + </section> 840 937 <section id="at-uris" class="level2"> 841 938 <h2 class="anchored" data-anchor-id="at-uris">AT URIs</h2> 842 939 <p>ATProto records are identified by AT URIs:</p> 843 - <div id="fd00e706" class="cell"> 844 - <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtUri</span> 845 - <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span> 846 - <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Parse an AT URI</span></span> 847 - <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> AtUri.parse(<span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 848 - <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a></span> 849 - <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.authority) <span class="co"># 'did:plc:abc123'</span></span> 850 - <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.collection) <span class="co"># 'ac.foundation.dataset.sampleSchema'</span></span> 851 - <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.rkey) <span class="co"># 'xyz'</span></span> 852 - <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a></span> 853 - <span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Format back to string</span></span> 854 - <span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="bu">str</span>(uri)) <span class="co"># 'at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 940 + <div id="3a1f912a" class="cell"> 941 + <div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtUri</span> 942 + <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span> 943 + <span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Parse an AT URI</span></span> 944 + <span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> AtUri.parse(<span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 945 + <span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a></span> 946 + <span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.authority) <span class="co"># 'did:plc:abc123'</span></span> 947 + <span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.collection) <span class="co"># 'ac.foundation.dataset.sampleSchema'</span></span> 948 + <span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.rkey) <span class="co"># 'xyz'</span></span> 949 + <span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a></span> 950 + <span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Format back to string</span></span> 951 + <span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="bu">str</span>(uri)) <span class="co"># 'at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 855 952 </div> 856 953 </section> 857 954 <section id="supported-field-types" class="level2"> ··· 906 1003 </section> 907 1004 <section id="complete-example" class="level2"> 908 1005 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 909 - <div id="ceb4ed17" class="cell"> 910 - <div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 911 - <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 912 - <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 913 - <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 914 - <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 915 - <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a></span> 916 - <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 917 - <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 918 - <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 919 - <span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 920 - <span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 921 - <span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 922 - <span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a></span> 923 - <span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 924 - <span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 925 - <span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 926 - <span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 927 - <span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 928 - <span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a> )</span> 929 - <span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 930 - <span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a>]</span> 931 - <span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a></span> 932 - <span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 933 - <span id="cb15-24"><a href="#cb15-24" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 934 - <span id="cb15-25"><a href="#cb15-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 935 - <span id="cb15-26"><a href="#cb15-26" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 936 - <span id="cb15-27"><a href="#cb15-27" aria-hidden="true" tabindex="-1"></a></span> 937 - <span id="cb15-28"><a href="#cb15-28" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate</span></span> 938 - <span id="cb15-29"><a href="#cb15-29" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 939 - <span id="cb15-30"><a href="#cb15-30" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 940 - <span id="cb15-31"><a href="#cb15-31" aria-hidden="true" tabindex="-1"></a></span> 941 - <span id="cb15-32"><a href="#cb15-32" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 942 - <span id="cb15-33"><a href="#cb15-33" aria-hidden="true" tabindex="-1"></a></span> 943 - <span id="cb15-34"><a href="#cb15-34" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 944 - <span id="cb15-35"><a href="#cb15-35" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 945 - <span id="cb15-36"><a href="#cb15-36" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 946 - <span id="cb15-37"><a href="#cb15-37" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 947 - <span id="cb15-38"><a href="#cb15-38" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 948 - <span id="cb15-39"><a href="#cb15-39" aria-hidden="true" tabindex="-1"></a>)</span> 949 - <span id="cb15-40"><a href="#cb15-40" aria-hidden="true" tabindex="-1"></a></span> 950 - <span id="cb15-41"><a href="#cb15-41" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset</span></span> 951 - <span id="cb15-42"><a href="#cb15-42" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 952 - <span id="cb15-43"><a href="#cb15-43" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 953 - <span id="cb15-44"><a href="#cb15-44" aria-hidden="true" tabindex="-1"></a> dataset,</span> 954 - <span id="cb15-45"><a href="#cb15-45" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 955 - <span id="cb15-46"><a href="#cb15-46" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 956 - <span id="cb15-47"><a href="#cb15-47" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 957 - <span id="cb15-48"><a href="#cb15-48" aria-hidden="true" tabindex="-1"></a>)</span> 958 - <span id="cb15-49"><a href="#cb15-49" aria-hidden="true" tabindex="-1"></a></span> 959 - <span id="cb15-50"><a href="#cb15-50" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 960 - <span id="cb15-51"><a href="#cb15-51" aria-hidden="true" tabindex="-1"></a></span> 961 - <span id="cb15-52"><a href="#cb15-52" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. Later: discover and load</span></span> 962 - <span id="cb15-53"><a href="#cb15-53" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> dataset_entry <span class="kw">in</span> index.list_datasets():</span> 963 - <span id="cb15-54"><a href="#cb15-54" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Found: </span><span class="sc">{</span>dataset_entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">"</span>)</span> 964 - <span id="cb15-55"><a href="#cb15-55" aria-hidden="true" tabindex="-1"></a></span> 965 - <span id="cb15-56"><a href="#cb15-56" aria-hidden="true" tabindex="-1"></a> <span class="co"># Reconstruct type from schema</span></span> 966 - <span id="cb15-57"><a href="#cb15-57" aria-hidden="true" tabindex="-1"></a> SampleType <span class="op">=</span> index.decode_schema(dataset_entry.schema_ref)</span> 967 - <span id="cb15-58"><a href="#cb15-58" aria-hidden="true" tabindex="-1"></a></span> 968 - <span id="cb15-59"><a href="#cb15-59" aria-hidden="true" tabindex="-1"></a> <span class="co"># Load dataset</span></span> 969 - <span id="cb15-60"><a href="#cb15-60" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> atdata.Dataset[SampleType](dataset_entry.data_urls[<span class="dv">0</span>])</span> 970 - <span id="cb15-61"><a href="#cb15-61" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 971 - <span id="cb15-62"><a href="#cb15-62" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(batch.features.shape)</span> 972 - <span id="cb15-63"><a href="#cb15-63" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1006 + <div id="2ada07a7" class="cell"> 1007 + <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 1008 + <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 1009 + <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 1010 + <span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 1011 + <span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 1012 + <span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a></span> 1013 + <span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 1014 + <span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 1015 + <span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 1016 + <span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 1017 + <span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 1018 + <span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 1019 + <span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a></span> 1020 + <span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 1021 + <span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 1022 + <span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 1023 + <span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 1024 + <span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 1025 + <span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a> )</span> 1026 + <span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 1027 + <span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a>]</span> 1028 + <span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a></span> 1029 + <span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 1030 + <span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 1031 + <span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 1032 + <span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 1033 + <span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"></a></span> 1034 + <span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate</span></span> 1035 + <span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 1036 + <span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 1037 + <span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"></a></span> 1038 + <span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 1039 + <span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"></a></span> 1040 + <span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 1041 + <span id="cb18-35"><a href="#cb18-35" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 1042 + <span id="cb18-36"><a href="#cb18-36" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 1043 + <span id="cb18-37"><a href="#cb18-37" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 1044 + <span id="cb18-38"><a href="#cb18-38" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 1045 + <span id="cb18-39"><a href="#cb18-39" aria-hidden="true" tabindex="-1"></a>)</span> 1046 + <span id="cb18-40"><a href="#cb18-40" aria-hidden="true" tabindex="-1"></a></span> 1047 + <span id="cb18-41"><a href="#cb18-41" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset</span></span> 1048 + <span id="cb18-42"><a href="#cb18-42" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 1049 + <span id="cb18-43"><a href="#cb18-43" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 1050 + <span id="cb18-44"><a href="#cb18-44" aria-hidden="true" tabindex="-1"></a> dataset,</span> 1051 + <span id="cb18-45"><a href="#cb18-45" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 1052 + <span id="cb18-46"><a href="#cb18-46" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 1053 + <span id="cb18-47"><a href="#cb18-47" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 1054 + <span id="cb18-48"><a href="#cb18-48" aria-hidden="true" tabindex="-1"></a>)</span> 1055 + <span id="cb18-49"><a href="#cb18-49" aria-hidden="true" tabindex="-1"></a></span> 1056 + <span id="cb18-50"><a href="#cb18-50" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 1057 + <span id="cb18-51"><a href="#cb18-51" aria-hidden="true" tabindex="-1"></a></span> 1058 + <span id="cb18-52"><a href="#cb18-52" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. Later: discover and load</span></span> 1059 + <span id="cb18-53"><a href="#cb18-53" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> dataset_entry <span class="kw">in</span> index.list_datasets():</span> 1060 + <span id="cb18-54"><a href="#cb18-54" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Found: </span><span class="sc">{</span>dataset_entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">"</span>)</span> 1061 + <span id="cb18-55"><a href="#cb18-55" aria-hidden="true" tabindex="-1"></a></span> 1062 + <span id="cb18-56"><a href="#cb18-56" aria-hidden="true" tabindex="-1"></a> <span class="co"># Reconstruct type from schema</span></span> 1063 + <span id="cb18-57"><a href="#cb18-57" aria-hidden="true" tabindex="-1"></a> SampleType <span class="op">=</span> index.decode_schema(dataset_entry.schema_ref)</span> 1064 + <span id="cb18-58"><a href="#cb18-58" aria-hidden="true" tabindex="-1"></a></span> 1065 + <span id="cb18-59"><a href="#cb18-59" aria-hidden="true" tabindex="-1"></a> <span class="co"># Load dataset</span></span> 1066 + <span id="cb18-60"><a href="#cb18-60" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> atdata.Dataset[SampleType](dataset_entry.data_urls[<span class="dv">0</span>])</span> 1067 + <span id="cb18-61"><a href="#cb18-61" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 1068 + <span id="cb18-62"><a href="#cb18-62" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(batch.features.shape)</span> 1069 + <span id="cb18-63"><a href="#cb18-63" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 973 1070 </div> 974 1071 </section> 975 1072 <section id="related" class="level2">
+33 -13
docs/reference/datasets.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 590 610 <p>The <code>Dataset</code> class provides typed iteration over WebDataset tar files with automatic batching and lens transformations.</p> 591 611 <section id="creating-a-dataset" class="level2"> 592 612 <h2 class="anchored" data-anchor-id="creating-a-dataset">Creating a Dataset</h2> 593 - <div id="2e18486f" class="cell"> 613 + <div id="67dadae6" class="cell"> 594 614 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 595 615 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 596 616 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 613 633 <section id="url-source-default" class="level3"> 614 634 <h3 class="anchored" data-anchor-id="url-source-default">URL Source (default)</h3> 615 635 <p>When you pass a string to <code>Dataset</code>, it automatically wraps it in a <code>URLSource</code>:</p> 616 - <div id="da2c83cf" class="cell"> 636 + <div id="0d273ed2" class="cell"> 617 637 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># These are equivalent:</span></span> 618 638 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 619 639 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](atdata.URLSource(<span class="st">"data-{000000..000009}.tar"</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 622 642 <section id="s3-source" class="level3"> 623 643 <h3 class="anchored" data-anchor-id="s3-source">S3 Source</h3> 624 644 <p>For private S3 buckets or S3-compatible storage (Cloudflare R2, MinIO), use <code>S3Source</code>:</p> 625 - <div id="708170e4" class="cell"> 645 + <div id="ba4a3943" class="cell"> 626 646 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># From explicit credentials</span></span> 627 647 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> atdata.S3Source(</span> 628 648 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-bucket"</span>,</span> ··· 660 680 <section id="ordered-iteration" class="level3"> 661 681 <h3 class="anchored" data-anchor-id="ordered-iteration">Ordered Iteration</h3> 662 682 <p>Iterate through samples in their original order:</p> 663 - <div id="b01d3cf1" class="cell"> 683 + <div id="2beecbaa" class="cell"> 664 684 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># With batching (default batch_size=1)</span></span> 665 685 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 666 686 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image <span class="co"># numpy array (32, H, W, C)</span></span> ··· 674 694 <section id="shuffled-iteration" class="level3"> 675 695 <h3 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h3> 676 696 <p>Iterate with randomized order at both shard and sample levels:</p> 677 - <div id="6e6f41af" class="cell"> 697 + <div id="7575f1aa" class="cell"> 678 698 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 679 699 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled</span></span> 680 700 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> process(batch)</span> ··· 705 725 <section id="samplebatch" class="level2"> 706 726 <h2 class="anchored" data-anchor-id="samplebatch">SampleBatch</h2> 707 727 <p>When iterating with a <code>batch_size</code>, each iteration yields a <code>SampleBatch</code> with automatic attribute aggregation.</p> 708 - <div id="f8dbbfd8" class="cell"> 728 + <div id="b062fd54" class="cell"> 709 729 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 710 730 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> Sample:</span> 711 731 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># shape (256,)</span></span> ··· 725 745 <section id="type-transformations-with-lenses" class="level2"> 726 746 <h2 class="anchored" data-anchor-id="type-transformations-with-lenses">Type Transformations with Lenses</h2> 727 747 <p>View a dataset through a different sample type using registered lenses:</p> 728 - <div id="4af800d8" class="cell"> 748 + <div id="177825bf" class="cell"> 729 749 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 730 750 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span> 731 751 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">str</span></span> ··· 747 767 <section id="shard-list" class="level3"> 748 768 <h3 class="anchored" data-anchor-id="shard-list">Shard List</h3> 749 769 <p>Get the list of individual tar files:</p> 750 - <div id="4b585c57" class="cell"> 770 + <div id="89615cde" class="cell"> 751 771 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 752 772 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shards <span class="op">=</span> dataset.shard_list</span> 753 773 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="co"># ['data-000000.tar', 'data-000001.tar', ..., 'data-000009.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 756 776 <section id="metadata" class="level3"> 757 777 <h3 class="anchored" data-anchor-id="metadata">Metadata</h3> 758 778 <p>Datasets can have associated metadata from a URL:</p> 759 - <div id="edb2350d" class="cell"> 779 + <div id="2e7028fe" class="cell"> 760 780 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](</span> 761 781 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"data-{000000..000009}.tar"</span>,</span> 762 782 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> metadata_url<span class="op">=</span><span class="st">"https://example.com/metadata.msgpack"</span></span> ··· 770 790 <section id="writing-datasets" class="level2"> 771 791 <h2 class="anchored" data-anchor-id="writing-datasets">Writing Datasets</h2> 772 792 <p>Use WebDataset’s <code>TarWriter</code> or <code>ShardWriter</code> to create datasets:</p> 773 - <div id="21bf922f" class="cell"> 793 + <div id="97f2dcd8" class="cell"> 774 794 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 775 795 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 776 796 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 793 813 <section id="parquet-export" class="level2"> 794 814 <h2 class="anchored" data-anchor-id="parquet-export">Parquet Export</h2> 795 815 <p>Export dataset contents to parquet format:</p> 796 - <div id="01aee72e" class="cell"> 816 + <div id="fe7a74ac" class="cell"> 797 817 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export entire dataset</span></span> 798 818 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>dataset.to_parquet(<span class="st">"output.parquet"</span>)</span> 799 819 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> ··· 844 864 <section id="source" class="level3"> 845 865 <h3 class="anchored" data-anchor-id="source">Source</h3> 846 866 <p>Access the underlying <code>DataSource</code>:</p> 847 - <div id="003a36bf" class="cell"> 867 + <div id="b7542849" class="cell"> 848 868 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data.tar"</span>)</span> 849 869 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> dataset.source <span class="co"># URLSource instance</span></span> 850 870 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(source.shard_list) <span class="co"># ['data.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 853 873 <section id="sample-type" class="level3"> 854 874 <h3 class="anchored" data-anchor-id="sample-type">Sample Type</h3> 855 875 <p>Get the type parameter used to create the dataset:</p> 856 - <div id="4630dffb" class="cell"> 876 + <div id="b3062e6b" class="cell"> 857 877 <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data.tar"</span>)</span> 858 878 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.sample_type) <span class="co"># &lt;class 'ImageSample'&gt;</span></span> 859 879 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.batch_type) <span class="co"># SampleBatch[ImageSample]</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+1360
docs/reference/deployment.html
··· 1 + <!DOCTYPE html> 2 + <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head> 3 + 4 + <meta charset="utf-8"> 5 + <meta name="generator" content="quarto-1.7.34"> 6 + 7 + <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> 8 + 9 + <meta name="description" content="Production deployment for local storage and ATProto integration"> 10 + 11 + <title>Deployment Guide – atdata</title> 12 + <style> 13 + code{white-space: pre-wrap;} 14 + span.smallcaps{font-variant: small-caps;} 15 + div.columns{display: flex; gap: min(4vw, 1.5em);} 16 + div.column{flex: auto; overflow-x: auto;} 17 + div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} 18 + ul.task-list{list-style: none;} 19 + ul.task-list li input[type="checkbox"] { 20 + width: 0.8em; 21 + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 22 + vertical-align: middle; 23 + } 24 + /* CSS for syntax highlighting */ 25 + html { -webkit-text-size-adjust: 100%; } 26 + pre > code.sourceCode { white-space: pre; position: relative; } 27 + pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } 28 + pre > code.sourceCode > span:empty { height: 1.2em; } 29 + .sourceCode { overflow: visible; } 30 + code.sourceCode > span { color: inherit; text-decoration: inherit; } 31 + div.sourceCode { margin: 1em 0; } 32 + pre.sourceCode { margin: 0; } 33 + @media screen { 34 + div.sourceCode { overflow: auto; } 35 + } 36 + @media print { 37 + pre > code.sourceCode { white-space: pre-wrap; } 38 + pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } 39 + } 40 + pre.numberSource code 41 + { counter-reset: source-line 0; } 42 + pre.numberSource code > span 43 + { position: relative; left: -4em; counter-increment: source-line; } 44 + pre.numberSource code > span > a:first-child::before 45 + { content: counter(source-line); 46 + position: relative; left: -1em; text-align: right; vertical-align: baseline; 47 + border: none; display: inline-block; 48 + -webkit-touch-callout: none; -webkit-user-select: none; 49 + -khtml-user-select: none; -moz-user-select: none; 50 + -ms-user-select: none; user-select: none; 51 + padding: 0 4px; width: 4em; 52 + } 53 + pre.numberSource { margin-left: 3em; padding-left: 4px; } 54 + div.sourceCode 55 + { } 56 + @media screen { 57 + pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } 58 + } 59 + </style> 60 + 61 + 62 + <script src="../site_libs/quarto-nav/quarto-nav.js"></script> 63 + <script src="../site_libs/quarto-nav/headroom.min.js"></script> 64 + <script src="../site_libs/clipboard/clipboard.min.js"></script> 65 + <script src="../site_libs/quarto-search/autocomplete.umd.js"></script> 66 + <script src="../site_libs/quarto-search/fuse.min.js"></script> 67 + <script src="../site_libs/quarto-search/quarto-search.js"></script> 68 + <meta name="quarto:offset" content="../"> 69 + <script src="../site_libs/quarto-html/quarto.js" type="module"></script> 70 + <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script> 71 + <script src="../site_libs/quarto-html/popper.min.js"></script> 72 + <script src="../site_libs/quarto-html/tippy.umd.min.js"></script> 73 + <script src="../site_libs/quarto-html/anchor.min.js"></script> 74 + <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet"> 75 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme" id="quarto-text-highlighting-styles"> 76 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-8dcd8563ea6803ab7cbb3d71ca5772e1.css" rel="stylesheet" class="quarto-color-scheme quarto-color-alternate" id="quarto-text-highlighting-styles"> 77 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme-extra" id="quarto-text-highlighting-styles"> 78 + <script src="../site_libs/bootstrap/bootstrap.min.js"></script> 79 + <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet"> 80 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme" id="quarto-bootstrap" data-mode="light"> 81 + <link href="../site_libs/bootstrap/bootstrap-dark-7964ffd8887b0991fe8d71c6c8bc75d6.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme quarto-color-alternate" id="quarto-bootstrap" data-mode="dark"> 82 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme-extra" id="quarto-bootstrap" data-mode="light"> 83 + <script id="quarto-search-options" type="application/json">{ 84 + "location": "navbar", 85 + "copy-button": false, 86 + "collapse-after": 3, 87 + "panel-placement": "end", 88 + "type": "overlay", 89 + "limit": 50, 90 + "keyboard-shortcut": [ 91 + "f", 92 + "/", 93 + "s" 94 + ], 95 + "show-item-context": false, 96 + "language": { 97 + "search-no-results-text": "No results", 98 + "search-matching-documents-text": "matching documents", 99 + "search-copy-link-title": "Copy link to search", 100 + "search-hide-matches-text": "Hide additional matches", 101 + "search-more-match-text": "more match in this document", 102 + "search-more-matches-text": "more matches in this document", 103 + "search-clear-button-title": "Clear", 104 + "search-text-placeholder": "", 105 + "search-detached-cancel-button-title": "Cancel", 106 + "search-submit-button-title": "Submit", 107 + "search-label": "Search" 108 + } 109 + }</script> 110 + 111 + 112 + <link rel="stylesheet" href="../assets/styles.css"> 113 + </head> 114 + 115 + <body class="nav-sidebar docked nav-fixed quarto-light"><script id="quarto-html-before-body" type="application/javascript"> 116 + const toggleBodyColorMode = (bsSheetEl) => { 117 + const mode = bsSheetEl.getAttribute("data-mode"); 118 + const bodyEl = window.document.querySelector("body"); 119 + if (mode === "dark") { 120 + bodyEl.classList.add("quarto-dark"); 121 + bodyEl.classList.remove("quarto-light"); 122 + } else { 123 + bodyEl.classList.add("quarto-light"); 124 + bodyEl.classList.remove("quarto-dark"); 125 + } 126 + } 127 + const toggleBodyColorPrimary = () => { 128 + const bsSheetEl = window.document.querySelector("link#quarto-bootstrap:not([rel=disabled-stylesheet])"); 129 + if (bsSheetEl) { 130 + toggleBodyColorMode(bsSheetEl); 131 + } 132 + } 133 + const setColorSchemeToggle = (alternate) => { 134 + const toggles = window.document.querySelectorAll('.quarto-color-scheme-toggle'); 135 + for (let i=0; i < toggles.length; i++) { 136 + const toggle = toggles[i]; 137 + if (toggle) { 138 + if (alternate) { 139 + toggle.classList.add("alternate"); 140 + } else { 141 + toggle.classList.remove("alternate"); 142 + } 143 + } 144 + } 145 + }; 146 + const toggleColorMode = (alternate) => { 147 + // Switch the stylesheets 148 + const primaryStylesheets = window.document.querySelectorAll('link.quarto-color-scheme:not(.quarto-color-alternate)'); 149 + const alternateStylesheets = window.document.querySelectorAll('link.quarto-color-scheme.quarto-color-alternate'); 150 + manageTransitions('#quarto-margin-sidebar .nav-link', false); 151 + if (alternate) { 152 + // note: dark is layered on light, we don't disable primary! 153 + enableStylesheet(alternateStylesheets); 154 + for (const sheetNode of alternateStylesheets) { 155 + if (sheetNode.id === "quarto-bootstrap") { 156 + toggleBodyColorMode(sheetNode); 157 + } 158 + } 159 + } else { 160 + disableStylesheet(alternateStylesheets); 161 + enableStylesheet(primaryStylesheets) 162 + toggleBodyColorPrimary(); 163 + } 164 + manageTransitions('#quarto-margin-sidebar .nav-link', true); 165 + // Switch the toggles 166 + setColorSchemeToggle(alternate) 167 + // Hack to workaround the fact that safari doesn't 168 + // properly recolor the scrollbar when toggling (#1455) 169 + if (navigator.userAgent.indexOf('Safari') > 0 && navigator.userAgent.indexOf('Chrome') == -1) { 170 + manageTransitions("body", false); 171 + window.scrollTo(0, 1); 172 + setTimeout(() => { 173 + window.scrollTo(0, 0); 174 + manageTransitions("body", true); 175 + }, 40); 176 + } 177 + } 178 + const disableStylesheet = (stylesheets) => { 179 + for (let i=0; i < stylesheets.length; i++) { 180 + const stylesheet = stylesheets[i]; 181 + stylesheet.rel = 'disabled-stylesheet'; 182 + } 183 + } 184 + const enableStylesheet = (stylesheets) => { 185 + for (let i=0; i < stylesheets.length; i++) { 186 + const stylesheet = stylesheets[i]; 187 + if(stylesheet.rel !== 'stylesheet') { // for Chrome, which will still FOUC without this check 188 + stylesheet.rel = 'stylesheet'; 189 + } 190 + } 191 + } 192 + const manageTransitions = (selector, allowTransitions) => { 193 + const els = window.document.querySelectorAll(selector); 194 + for (let i=0; i < els.length; i++) { 195 + const el = els[i]; 196 + if (allowTransitions) { 197 + el.classList.remove('notransition'); 198 + } else { 199 + el.classList.add('notransition'); 200 + } 201 + } 202 + } 203 + const isFileUrl = () => { 204 + return window.location.protocol === 'file:'; 205 + } 206 + const hasAlternateSentinel = () => { 207 + let styleSentinel = getColorSchemeSentinel(); 208 + if (styleSentinel !== null) { 209 + return styleSentinel === "alternate"; 210 + } else { 211 + return false; 212 + } 213 + } 214 + const setStyleSentinel = (alternate) => { 215 + const value = alternate ? "alternate" : "default"; 216 + if (!isFileUrl()) { 217 + window.localStorage.setItem("quarto-color-scheme", value); 218 + } else { 219 + localAlternateSentinel = value; 220 + } 221 + } 222 + const getColorSchemeSentinel = () => { 223 + if (!isFileUrl()) { 224 + const storageValue = window.localStorage.getItem("quarto-color-scheme"); 225 + return storageValue != null ? storageValue : localAlternateSentinel; 226 + } else { 227 + return localAlternateSentinel; 228 + } 229 + } 230 + const toggleGiscusIfUsed = (isAlternate, darkModeDefault) => { 231 + const baseTheme = document.querySelector('#giscus-base-theme')?.value ?? 'light'; 232 + const alternateTheme = document.querySelector('#giscus-alt-theme')?.value ?? 'dark'; 233 + let newTheme = ''; 234 + if(authorPrefersDark) { 235 + newTheme = isAlternate ? baseTheme : alternateTheme; 236 + } else { 237 + newTheme = isAlternate ? alternateTheme : baseTheme; 238 + } 239 + const changeGiscusTheme = () => { 240 + // From: https://github.com/giscus/giscus/issues/336 241 + const sendMessage = (message) => { 242 + const iframe = document.querySelector('iframe.giscus-frame'); 243 + if (!iframe) return; 244 + iframe.contentWindow.postMessage({ giscus: message }, 'https://giscus.app'); 245 + } 246 + sendMessage({ 247 + setConfig: { 248 + theme: newTheme 249 + } 250 + }); 251 + } 252 + const isGiscussLoaded = window.document.querySelector('iframe.giscus-frame') !== null; 253 + if (isGiscussLoaded) { 254 + changeGiscusTheme(); 255 + } 256 + }; 257 + const authorPrefersDark = false; 258 + const darkModeDefault = authorPrefersDark; 259 + document.querySelector('link#quarto-text-highlighting-styles.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 260 + document.querySelector('link#quarto-bootstrap.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 261 + let localAlternateSentinel = darkModeDefault ? 'alternate' : 'default'; 262 + // Dark / light mode switch 263 + window.quartoToggleColorScheme = () => { 264 + // Read the current dark / light value 265 + let toAlternate = !hasAlternateSentinel(); 266 + toggleColorMode(toAlternate); 267 + setStyleSentinel(toAlternate); 268 + toggleGiscusIfUsed(toAlternate, darkModeDefault); 269 + window.dispatchEvent(new Event('resize')); 270 + }; 271 + // Switch to dark mode if need be 272 + if (hasAlternateSentinel()) { 273 + toggleColorMode(true); 274 + } else { 275 + toggleColorMode(false); 276 + } 277 + </script> 278 + 279 + <div id="quarto-search-results"></div> 280 + <header id="quarto-header" class="headroom fixed-top"> 281 + <nav class="navbar navbar-expand-lg " data-bs-theme="dark"> 282 + <div class="navbar-container container-fluid"> 283 + <div class="navbar-brand-container mx-auto"> 284 + <a class="navbar-brand" href="../index.html"> 285 + <span class="navbar-title">atdata</span> 286 + </a> 287 + </div> 288 + <div id="quarto-search" class="" title="Search"></div> 289 + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 290 + <span class="navbar-toggler-icon"></span> 291 + </button> 292 + <div class="collapse navbar-collapse" id="navbarCollapse"> 293 + <ul class="navbar-nav navbar-nav-scroll me-auto"> 294 + <li class="nav-item"> 295 + <a class="nav-link active" href="../index.html" aria-current="page"> 296 + <span class="menu-text">Guide</span></a> 297 + </li> 298 + <li class="nav-item dropdown "> 299 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-tutorials" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 300 + <span class="menu-text">Tutorials</span> 301 + </a> 302 + <ul class="dropdown-menu" aria-labelledby="nav-menu-tutorials"> 303 + <li> 304 + <a class="dropdown-item" href="../tutorials/quickstart.html"> 305 + <span class="dropdown-text">Quick Start</span></a> 306 + </li> 307 + <li> 308 + <a class="dropdown-item" href="../tutorials/local-workflow.html"> 309 + <span class="dropdown-text">Local Workflow</span></a> 310 + </li> 311 + <li> 312 + <a class="dropdown-item" href="../tutorials/atmosphere.html"> 313 + <span class="dropdown-text">Atmosphere Publishing</span></a> 314 + </li> 315 + <li> 316 + <a class="dropdown-item" href="../tutorials/promotion.html"> 317 + <span class="dropdown-text">Promotion Workflow</span></a> 318 + </li> 319 + </ul> 320 + </li> 321 + <li class="nav-item dropdown "> 322 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-reference" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 323 + <span class="menu-text">Reference</span> 324 + </a> 325 + <ul class="dropdown-menu" aria-labelledby="nav-menu-reference"> 326 + <li> 327 + <a class="dropdown-item" href="../reference/packable-samples.html"> 328 + <span class="dropdown-text">Packable Samples</span></a> 329 + </li> 330 + <li> 331 + <a class="dropdown-item" href="../reference/datasets.html"> 332 + <span class="dropdown-text">Datasets</span></a> 333 + </li> 334 + <li> 335 + <a class="dropdown-item" href="../reference/lenses.html"> 336 + <span class="dropdown-text">Lenses</span></a> 337 + </li> 338 + <li> 339 + <a class="dropdown-item" href="../reference/local-storage.html"> 340 + <span class="dropdown-text">Local Storage</span></a> 341 + </li> 342 + <li> 343 + <a class="dropdown-item" href="../reference/atmosphere.html"> 344 + <span class="dropdown-text">Atmosphere</span></a> 345 + </li> 346 + <li> 347 + <a class="dropdown-item" href="../reference/promotion.html"> 348 + <span class="dropdown-text">Promotion</span></a> 349 + </li> 350 + <li> 351 + <a class="dropdown-item" href="../reference/load-dataset.html"> 352 + <span class="dropdown-text">load_dataset API</span></a> 353 + </li> 354 + <li> 355 + <a class="dropdown-item" href="../reference/protocols.html"> 356 + <span class="dropdown-text">Protocols</span></a> 357 + </li> 358 + <li> 359 + <a class="dropdown-item" href="../reference/uri-spec.html"> 360 + <span class="dropdown-text">URI Specification</span></a> 361 + </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 370 + </ul> 371 + </li> 372 + <li class="nav-item"> 373 + <a class="nav-link" href="../api/index.html"> 374 + <span class="menu-text">API</span></a> 375 + </li> 376 + </ul> 377 + <ul class="navbar-nav navbar-nav-scroll ms-auto"> 378 + <li class="nav-item compact"> 379 + <a class="nav-link" href="https://github.com/your-org/atdata"> <i class="bi bi-github" role="img"> 380 + </i> 381 + <span class="menu-text"></span></a> 382 + </li> 383 + </ul> 384 + </div> <!-- /navcollapse --> 385 + <div class="quarto-navbar-tools"> 386 + <a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a> 387 + </div> 388 + </div> <!-- /container-fluid --> 389 + </nav> 390 + <nav class="quarto-secondary-nav"> 391 + <div class="container-fluid d-flex"> 392 + <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 393 + <i class="bi bi-layout-text-sidebar-reverse"></i> 394 + </button> 395 + <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../reference/packable-samples.html">Reference</a></li><li class="breadcrumb-item"><a href="../reference/deployment.html">Deployment Guide</a></li></ol></nav> 396 + <a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 397 + </a> 398 + </div> 399 + </nav> 400 + </header> 401 + <!-- content --> 402 + <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar"> 403 + <!-- sidebar --> 404 + <nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation docked overflow-auto"> 405 + <div class="sidebar-menu-container"> 406 + <ul class="list-unstyled mt-1"> 407 + <li class="sidebar-item"> 408 + <div class="sidebar-item-container"> 409 + <a href="../index.html" class="sidebar-item-text sidebar-link"> 410 + <span class="menu-text">atdata</span></a> 411 + </div> 412 + </li> 413 + <li class="sidebar-item sidebar-item-section"> 414 + <div class="sidebar-item-container"> 415 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true"> 416 + <span class="menu-text">Getting Started</span></a> 417 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true" aria-label="Toggle section"> 418 + <i class="bi bi-chevron-right ms-2"></i> 419 + </a> 420 + </div> 421 + <ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show"> 422 + <li class="sidebar-item"> 423 + <div class="sidebar-item-container"> 424 + <a href="../tutorials/quickstart.html" class="sidebar-item-text sidebar-link"> 425 + <span class="menu-text">Quick Start</span></a> 426 + </div> 427 + </li> 428 + <li class="sidebar-item"> 429 + <div class="sidebar-item-container"> 430 + <a href="../tutorials/local-workflow.html" class="sidebar-item-text sidebar-link"> 431 + <span class="menu-text">Local Workflow</span></a> 432 + </div> 433 + </li> 434 + <li class="sidebar-item"> 435 + <div class="sidebar-item-container"> 436 + <a href="../tutorials/atmosphere.html" class="sidebar-item-text sidebar-link"> 437 + <span class="menu-text">Atmosphere Publishing</span></a> 438 + </div> 439 + </li> 440 + <li class="sidebar-item"> 441 + <div class="sidebar-item-container"> 442 + <a href="../tutorials/promotion.html" class="sidebar-item-text sidebar-link"> 443 + <span class="menu-text">Promotion Workflow</span></a> 444 + </div> 445 + </li> 446 + </ul> 447 + </li> 448 + <li class="sidebar-item sidebar-item-section"> 449 + <div class="sidebar-item-container"> 450 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true"> 451 + <span class="menu-text">Reference</span></a> 452 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true" aria-label="Toggle section"> 453 + <i class="bi bi-chevron-right ms-2"></i> 454 + </a> 455 + </div> 456 + <ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 show"> 457 + <li class="sidebar-item"> 458 + <div class="sidebar-item-container"> 459 + <a href="../reference/packable-samples.html" class="sidebar-item-text sidebar-link"> 460 + <span class="menu-text">Packable Samples</span></a> 461 + </div> 462 + </li> 463 + <li class="sidebar-item"> 464 + <div class="sidebar-item-container"> 465 + <a href="../reference/datasets.html" class="sidebar-item-text sidebar-link"> 466 + <span class="menu-text">Datasets</span></a> 467 + </div> 468 + </li> 469 + <li class="sidebar-item"> 470 + <div class="sidebar-item-container"> 471 + <a href="../reference/lenses.html" class="sidebar-item-text sidebar-link"> 472 + <span class="menu-text">Lenses</span></a> 473 + </div> 474 + </li> 475 + <li class="sidebar-item"> 476 + <div class="sidebar-item-container"> 477 + <a href="../reference/local-storage.html" class="sidebar-item-text sidebar-link"> 478 + <span class="menu-text">Local Storage</span></a> 479 + </div> 480 + </li> 481 + <li class="sidebar-item"> 482 + <div class="sidebar-item-container"> 483 + <a href="../reference/atmosphere.html" class="sidebar-item-text sidebar-link"> 484 + <span class="menu-text">Atmosphere (ATProto Integration)</span></a> 485 + </div> 486 + </li> 487 + <li class="sidebar-item"> 488 + <div class="sidebar-item-container"> 489 + <a href="../reference/promotion.html" class="sidebar-item-text sidebar-link"> 490 + <span class="menu-text">Promotion Workflow</span></a> 491 + </div> 492 + </li> 493 + <li class="sidebar-item"> 494 + <div class="sidebar-item-container"> 495 + <a href="../reference/load-dataset.html" class="sidebar-item-text sidebar-link"> 496 + <span class="menu-text">load_dataset API</span></a> 497 + </div> 498 + </li> 499 + <li class="sidebar-item"> 500 + <div class="sidebar-item-container"> 501 + <a href="../reference/protocols.html" class="sidebar-item-text sidebar-link"> 502 + <span class="menu-text">Protocols</span></a> 503 + </div> 504 + </li> 505 + <li class="sidebar-item"> 506 + <div class="sidebar-item-container"> 507 + <a href="../reference/uri-spec.html" class="sidebar-item-text sidebar-link"> 508 + <span class="menu-text">URI Specification</span></a> 509 + </div> 510 + </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link active"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 523 + </ul> 524 + </li> 525 + <li class="sidebar-item sidebar-item-section"> 526 + <div class="sidebar-item-container"> 527 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true"> 528 + <span class="menu-text">API Reference</span></a> 529 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true" aria-label="Toggle section"> 530 + <i class="bi bi-chevron-right ms-2"></i> 531 + </a> 532 + </div> 533 + <ul id="quarto-sidebar-section-3" class="collapse list-unstyled sidebar-section depth1 show"> 534 + <li class="sidebar-item"> 535 + <div class="sidebar-item-container"> 536 + <a href="../api/index.html" class="sidebar-item-text sidebar-link"> 537 + <span class="menu-text">API Reference</span></a> 538 + </div> 539 + </li> 540 + </ul> 541 + </li> 542 + </ul> 543 + </div> 544 + </nav> 545 + <div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div> 546 + <!-- margin-sidebar --> 547 + <div id="quarto-margin-sidebar" class="sidebar margin-sidebar"> 548 + <nav id="TOC" role="doc-toc" class="toc-active"> 549 + <h2 id="toc-title">On this page</h2> 550 + 551 + <ul> 552 + <li><a href="#local-storage-deployment" id="toc-local-storage-deployment" class="nav-link active" data-scroll-target="#local-storage-deployment">Local Storage Deployment</a> 553 + <ul class="collapse"> 554 + <li><a href="#redis-setup" id="toc-redis-setup" class="nav-link" data-scroll-target="#redis-setup">Redis Setup</a></li> 555 + <li><a href="#s3-storage-setup" id="toc-s3-storage-setup" class="nav-link" data-scroll-target="#s3-storage-setup">S3 Storage Setup</a></li> 556 + <li><a href="#production-checklist" id="toc-production-checklist" class="nav-link" data-scroll-target="#production-checklist">Production Checklist</a></li> 557 + </ul></li> 558 + <li><a href="#atproto-deployment" id="toc-atproto-deployment" class="nav-link" data-scroll-target="#atproto-deployment">ATProto Deployment</a> 559 + <ul class="collapse"> 560 + <li><a href="#account-setup" id="toc-account-setup" class="nav-link" data-scroll-target="#account-setup">Account Setup</a></li> 561 + <li><a href="#authentication-patterns" id="toc-authentication-patterns" class="nav-link" data-scroll-target="#authentication-patterns">Authentication Patterns</a></li> 562 + <li><a href="#custom-pds-deployment" id="toc-custom-pds-deployment" class="nav-link" data-scroll-target="#custom-pds-deployment">Custom PDS Deployment</a></li> 563 + <li><a href="#rate-limiting-considerations" id="toc-rate-limiting-considerations" class="nav-link" data-scroll-target="#rate-limiting-considerations">Rate Limiting Considerations</a></li> 564 + </ul></li> 565 + <li><a href="#docker-compose-example" id="toc-docker-compose-example" class="nav-link" data-scroll-target="#docker-compose-example">Docker Compose Example</a></li> 566 + <li><a href="#monitoring" id="toc-monitoring" class="nav-link" data-scroll-target="#monitoring">Monitoring</a> 567 + <ul class="collapse"> 568 + <li><a href="#redis-metrics" id="toc-redis-metrics" class="nav-link" data-scroll-target="#redis-metrics">Redis Metrics</a></li> 569 + <li><a href="#s3-metrics" id="toc-s3-metrics" class="nav-link" data-scroll-target="#s3-metrics">S3 Metrics</a></li> 570 + </ul></li> 571 + <li><a href="#security-best-practices" id="toc-security-best-practices" class="nav-link" data-scroll-target="#security-best-practices">Security Best Practices</a> 572 + <ul class="collapse"> 573 + <li><a href="#s3-iam-policy-example" id="toc-s3-iam-policy-example" class="nav-link" data-scroll-target="#s3-iam-policy-example">S3 IAM Policy Example</a></li> 574 + </ul></li> 575 + </ul> 576 + <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/reference/deployment.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 577 + </div> 578 + <!-- main --> 579 + <main class="content" id="quarto-document-content"> 580 + 581 + 582 + <header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../reference/packable-samples.html">Reference</a></li><li class="breadcrumb-item"><a href="../reference/deployment.html">Deployment Guide</a></li></ol></nav> 583 + <div class="quarto-title"> 584 + <h1 class="title">Deployment Guide</h1> 585 + </div> 586 + 587 + <div> 588 + <div class="description"> 589 + Production deployment for local storage and ATProto integration 590 + </div> 591 + </div> 592 + 593 + 594 + <div class="quarto-title-meta"> 595 + 596 + 597 + 598 + 599 + </div> 600 + 601 + 602 + 603 + </header> 604 + 605 + 606 + <p>This guide covers deploying atdata in production environments, including Redis setup for LocalIndex, S3 storage configuration, and ATProto publishing considerations.</p> 607 + <section id="local-storage-deployment" class="level2"> 608 + <h2 class="anchored" data-anchor-id="local-storage-deployment">Local Storage Deployment</h2> 609 + <p>The local storage backend uses Redis for metadata indexing and S3-compatible storage for dataset files.</p> 610 + <section id="redis-setup" class="level3"> 611 + <h3 class="anchored" data-anchor-id="redis-setup">Redis Setup</h3> 612 + <section id="requirements" class="level4"> 613 + <h4 class="anchored" data-anchor-id="requirements">Requirements</h4> 614 + <ul> 615 + <li>Redis 6.0+ (for Redis-OM compatibility)</li> 616 + <li>Sufficient memory for index metadata (typically &lt; 100MB for most deployments)</li> 617 + </ul> 618 + </section> 619 + <section id="docker-deployment" class="level4"> 620 + <h4 class="anchored" data-anchor-id="docker-deployment">Docker Deployment</h4> 621 + <div class="sourceCode" id="cb1"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Basic Redis</span></span> 622 + <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="ex">docker</span> run <span class="at">-d</span> <span class="dt">\</span></span> 623 + <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a> <span class="at">--name</span> atdata-redis <span class="dt">\</span></span> 624 + <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a> <span class="at">-p</span> 6379:6379 <span class="dt">\</span></span> 625 + <span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> <span class="at">-v</span> redis-data:/data <span class="dt">\</span></span> 626 + <span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> redis:7-alpine <span class="dt">\</span></span> 627 + <span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> redis-server <span class="at">--appendonly</span> yes</span> 628 + <span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a></span> 629 + <span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a><span class="co"># With password</span></span> 630 + <span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a><span class="ex">docker</span> run <span class="at">-d</span> <span class="dt">\</span></span> 631 + <span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> <span class="at">--name</span> atdata-redis <span class="dt">\</span></span> 632 + <span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> <span class="at">-p</span> 6379:6379 <span class="dt">\</span></span> 633 + <span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a> <span class="at">-v</span> redis-data:/data <span class="dt">\</span></span> 634 + <span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a> redis:7-alpine <span class="dt">\</span></span> 635 + <span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a> redis-server <span class="at">--appendonly</span> yes <span class="at">--requirepass</span> yourpassword</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 636 + </section> 637 + <section id="configuration" class="level4"> 638 + <h4 class="anchored" data-anchor-id="configuration">Configuration</h4> 639 + <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> redis <span class="im">import</span> Redis</span> 640 + <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 641 + <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> 642 + <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Basic connection</span></span> 643 + <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a>redis <span class="op">=</span> Redis(host<span class="op">=</span><span class="st">"localhost"</span>, port<span class="op">=</span><span class="dv">6379</span>)</span> 644 + <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex(redis<span class="op">=</span>redis)</span> 645 + <span id="cb2-7"><a href="#cb2-7" aria-hidden="true" tabindex="-1"></a></span> 646 + <span id="cb2-8"><a href="#cb2-8" aria-hidden="true" tabindex="-1"></a><span class="co"># With authentication</span></span> 647 + <span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>redis <span class="op">=</span> Redis(</span> 648 + <span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a> host<span class="op">=</span><span class="st">"redis.example.com"</span>,</span> 649 + <span id="cb2-11"><a href="#cb2-11" aria-hidden="true" tabindex="-1"></a> port<span class="op">=</span><span class="dv">6379</span>,</span> 650 + <span id="cb2-12"><a href="#cb2-12" aria-hidden="true" tabindex="-1"></a> password<span class="op">=</span><span class="st">"yourpassword"</span>,</span> 651 + <span id="cb2-13"><a href="#cb2-13" aria-hidden="true" tabindex="-1"></a> ssl<span class="op">=</span><span class="va">True</span>, <span class="co"># For production</span></span> 652 + <span id="cb2-14"><a href="#cb2-14" aria-hidden="true" tabindex="-1"></a>)</span> 653 + <span id="cb2-15"><a href="#cb2-15" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex(redis<span class="op">=</span>redis)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 654 + </section> 655 + <section id="redis-clustering" class="level4"> 656 + <h4 class="anchored" data-anchor-id="redis-clustering">Redis Clustering</h4> 657 + <p>For high-availability deployments:</p> 658 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> redis.cluster <span class="im">import</span> RedisCluster</span> 659 + <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 660 + <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Redis Cluster connection</span></span> 661 + <span id="cb3-4"><a href="#cb3-4" aria-hidden="true" tabindex="-1"></a>redis <span class="op">=</span> RedisCluster(</span> 662 + <span id="cb3-5"><a href="#cb3-5" aria-hidden="true" tabindex="-1"></a> host<span class="op">=</span><span class="st">"redis-cluster.example.com"</span>,</span> 663 + <span id="cb3-6"><a href="#cb3-6" aria-hidden="true" tabindex="-1"></a> port<span class="op">=</span><span class="dv">6379</span>,</span> 664 + <span id="cb3-7"><a href="#cb3-7" aria-hidden="true" tabindex="-1"></a> password<span class="op">=</span><span class="st">"yourpassword"</span>,</span> 665 + <span id="cb3-8"><a href="#cb3-8" aria-hidden="true" tabindex="-1"></a>)</span> 666 + <span id="cb3-9"><a href="#cb3-9" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex(redis<span class="op">=</span>redis)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 667 + <div class="callout callout-style-default callout-note callout-titled"> 668 + <div class="callout-header d-flex align-content-center"> 669 + <div class="callout-icon-container"> 670 + <i class="callout-icon"></i> 671 + </div> 672 + <div class="callout-title-container flex-fill"> 673 + Note 674 + </div> 675 + </div> 676 + <div class="callout-body-container callout-body"> 677 + <p>Redis-OM (used internally) supports Redis Cluster mode. Ensure all nodes have the same configuration.</p> 678 + </div> 679 + </div> 680 + </section> 681 + </section> 682 + <section id="s3-storage-setup" class="level3"> 683 + <h3 class="anchored" data-anchor-id="s3-storage-setup">S3 Storage Setup</h3> 684 + <section id="aws-s3" class="level4"> 685 + <h4 class="anchored" data-anchor-id="aws-s3">AWS S3</h4> 686 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 687 + <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 688 + <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Using environment credentials (recommended for AWS)</span></span> 689 + <span id="cb4-4"><a href="#cb4-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY</span></span> 690 + <span id="cb4-5"><a href="#cb4-5" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> 691 + <span id="cb4-6"><a href="#cb4-6" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-atdata-bucket"</span>,</span> 692 + <span id="cb4-7"><a href="#cb4-7" aria-hidden="true" tabindex="-1"></a> prefix<span class="op">=</span><span class="st">"datasets/"</span>,</span> 693 + <span id="cb4-8"><a href="#cb4-8" aria-hidden="true" tabindex="-1"></a>)</span> 694 + <span id="cb4-9"><a href="#cb4-9" aria-hidden="true" tabindex="-1"></a></span> 695 + <span id="cb4-10"><a href="#cb4-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Explicit credentials</span></span> 696 + <span id="cb4-11"><a href="#cb4-11" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> 697 + <span id="cb4-12"><a href="#cb4-12" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-atdata-bucket"</span>,</span> 698 + <span id="cb4-13"><a href="#cb4-13" aria-hidden="true" tabindex="-1"></a> prefix<span class="op">=</span><span class="st">"datasets/"</span>,</span> 699 + <span id="cb4-14"><a href="#cb4-14" aria-hidden="true" tabindex="-1"></a> credentials<span class="op">=</span>{</span> 700 + <span id="cb4-15"><a href="#cb4-15" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"..."</span>,</span> 701 + <span id="cb4-16"><a href="#cb4-16" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_SECRET_ACCESS_KEY"</span>: <span class="st">"..."</span>,</span> 702 + <span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_DEFAULT_REGION"</span>: <span class="st">"us-west-2"</span>,</span> 703 + <span id="cb4-18"><a href="#cb4-18" aria-hidden="true" tabindex="-1"></a> },</span> 704 + <span id="cb4-19"><a href="#cb4-19" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 705 + </section> 706 + <section id="s3-compatible-storage-minio-cloudflare-r2-etc." class="level4"> 707 + <h4 class="anchored" data-anchor-id="s3-compatible-storage-minio-cloudflare-r2-etc.">S3-Compatible Storage (MinIO, Cloudflare R2, etc.)</h4> 708 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> 709 + <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-bucket"</span>,</span> 710 + <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> prefix<span class="op">=</span><span class="st">"datasets/"</span>,</span> 711 + <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a> endpoint_url<span class="op">=</span><span class="st">"https://s3.example.com"</span>,</span> 712 + <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a> credentials<span class="op">=</span>{</span> 713 + <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"..."</span>,</span> 714 + <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_SECRET_ACCESS_KEY"</span>: <span class="st">"..."</span>,</span> 715 + <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a> },</span> 716 + <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 717 + </section> 718 + <section id="minio-deployment" class="level4"> 719 + <h4 class="anchored" data-anchor-id="minio-deployment">MinIO Deployment</h4> 720 + <div class="sourceCode" id="cb6"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Docker deployment</span></span> 721 + <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="ex">docker</span> run <span class="at">-d</span> <span class="dt">\</span></span> 722 + <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="at">--name</span> minio <span class="dt">\</span></span> 723 + <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> <span class="at">-p</span> 9000:9000 <span class="dt">\</span></span> 724 + <span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> <span class="at">-p</span> 9001:9001 <span class="dt">\</span></span> 725 + <span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> <span class="at">-v</span> minio-data:/data <span class="dt">\</span></span> 726 + <span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="at">-e</span> MINIO_ROOT_USER=minioadmin <span class="dt">\</span></span> 727 + <span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> <span class="at">-e</span> MINIO_ROOT_PASSWORD=minioadmin <span class="dt">\</span></span> 728 + <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> minio/minio server /data <span class="at">--console-address</span> <span class="st">":9001"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 729 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> 730 + <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"atdata"</span>,</span> 731 + <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> endpoint_url<span class="op">=</span><span class="st">"http://localhost:9000"</span>,</span> 732 + <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> credentials<span class="op">=</span>{</span> 733 + <span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"minioadmin"</span>,</span> 734 + <span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_SECRET_ACCESS_KEY"</span>: <span class="st">"minioadmin"</span>,</span> 735 + <span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> },</span> 736 + <span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 737 + </section> 738 + </section> 739 + <section id="production-checklist" class="level3"> 740 + <h3 class="anchored" data-anchor-id="production-checklist">Production Checklist</h3> 741 + <ul class="task-list"> 742 + <li><label><input type="checkbox">Redis persistence enabled (<code>appendonly yes</code>)</label></li> 743 + <li><label><input type="checkbox">Redis password authentication configured</label></li> 744 + <li><label><input type="checkbox">Redis TLS enabled for remote connections</label></li> 745 + <li><label><input type="checkbox">S3 bucket access policies configured (least privilege)</label></li> 746 + <li><label><input type="checkbox">S3 bucket versioning enabled (for data recovery)</label></li> 747 + <li><label><input type="checkbox">Monitoring for Redis memory usage</label></li> 748 + <li><label><input type="checkbox">Backup strategy for Redis data</label></li> 749 + </ul> 750 + </section> 751 + </section> 752 + <section id="atproto-deployment" class="level2"> 753 + <h2 class="anchored" data-anchor-id="atproto-deployment">ATProto Deployment</h2> 754 + <section id="account-setup" class="level3"> 755 + <h3 class="anchored" data-anchor-id="account-setup">Account Setup</h3> 756 + <ol type="1"> 757 + <li>Create a Bluesky account or use your existing account</li> 758 + <li>Generate an app-specific password at <a href="https://bsky.app/settings/app-passwords">bsky.app/settings/app-passwords</a></li> 759 + <li>Never use your main account password in code</li> 760 + </ol> 761 + <div class="callout callout-style-default callout-warning callout-titled"> 762 + <div class="callout-header d-flex align-content-center"> 763 + <div class="callout-icon-container"> 764 + <i class="callout-icon"></i> 765 + </div> 766 + <div class="callout-title-container flex-fill"> 767 + Warning 768 + </div> 769 + </div> 770 + <div class="callout-body-container callout-body"> 771 + <p><strong>Security</strong>: Always use app passwords, never your main password. App passwords can be revoked without affecting your account.</p> 772 + </div> 773 + </div> 774 + </section> 775 + <section id="authentication-patterns" class="level3"> 776 + <h3 class="anchored" data-anchor-id="authentication-patterns">Authentication Patterns</h3> 777 + <section id="environment-variables-recommended" class="level4"> 778 + <h4 class="anchored" data-anchor-id="environment-variables-recommended">Environment Variables (Recommended)</h4> 779 + <div class="sourceCode" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span> 780 + <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 781 + <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span> 782 + <span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 783 + <span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a>client.login(</span> 784 + <span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a> os.environ[<span class="st">"ATPROTO_HANDLE"</span>],</span> 785 + <span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> os.environ[<span class="st">"ATPROTO_APP_PASSWORD"</span>],</span> 786 + <span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 787 + </section> 788 + <section id="session-persistence" class="level4"> 789 + <h4 class="anchored" data-anchor-id="session-persistence">Session Persistence</h4> 790 + <p>For long-running services, persist and reuse sessions:</p> 791 + <div class="sourceCode" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> os</span> 792 + <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> pathlib <span class="im">import</span> Path</span> 793 + <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> 794 + <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a>SESSION_FILE <span class="op">=</span> Path(<span class="st">"~/.atdata/session"</span>).expanduser()</span> 795 + <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a></span> 796 + <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 797 + <span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a></span> 798 + <span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> SESSION_FILE.exists():</span> 799 + <span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> <span class="co"># Restore existing session</span></span> 800 + <span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> session_string <span class="op">=</span> SESSION_FILE.read_text()</span> 801 + <span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> <span class="cf">try</span>:</span> 802 + <span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> client.login_with_session(session_string)</span> 803 + <span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> <span class="cf">except</span> <span class="pp">Exception</span>:</span> 804 + <span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> <span class="co"># Session expired, re-authenticate</span></span> 805 + <span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a> client.login(handle, app_password)</span> 806 + <span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> SESSION_FILE.parent.mkdir(parents<span class="op">=</span><span class="va">True</span>, exist_ok<span class="op">=</span><span class="va">True</span>)</span> 807 + <span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> SESSION_FILE.write_text(client.export_session())</span> 808 + <span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a><span class="cf">else</span>:</span> 809 + <span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> <span class="co"># Initial login</span></span> 810 + <span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a> client.login(handle, app_password)</span> 811 + <span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a> SESSION_FILE.parent.mkdir(parents<span class="op">=</span><span class="va">True</span>, exist_ok<span class="op">=</span><span class="va">True</span>)</span> 812 + <span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a> SESSION_FILE.write_text(client.export_session())</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 813 + </section> 814 + </section> 815 + <section id="custom-pds-deployment" class="level3"> 816 + <h3 class="anchored" data-anchor-id="custom-pds-deployment">Custom PDS Deployment</h3> 817 + <p>For self-hosted ATProto infrastructure:</p> 818 + <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient(base_url<span class="op">=</span><span class="st">"https://pds.example.com"</span>)</span> 819 + <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"handle.example.com"</span>, <span class="st">"app-password"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 820 + <p>See <a href="https://github.com/bluesky-social/pds">ATProto PDS documentation</a> for self-hosting setup.</p> 821 + </section> 822 + <section id="rate-limiting-considerations" class="level3"> 823 + <h3 class="anchored" data-anchor-id="rate-limiting-considerations">Rate Limiting Considerations</h3> 824 + <p>ATProto has rate limits. For bulk operations:</p> 825 + <ul> 826 + <li>Space out record creation (1-2 per second for bulk uploads)</li> 827 + <li>Use batch operations where available</li> 828 + <li>Implement exponential backoff for retries</li> 829 + <li>Consider blob storage limits (~50MB per blob)</li> 830 + </ul> 831 + <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> time</span> 832 + <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 833 + <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i, dataset <span class="kw">in</span> <span class="bu">enumerate</span>(datasets_to_publish):</span> 834 + <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> index.insert_dataset(dataset, name<span class="op">=</span><span class="ss">f"dataset-</span><span class="sc">{</span>i<span class="sc">}</span><span class="ss">"</span>, ...)</span> 835 + <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> time.sleep(<span class="dv">1</span>) <span class="co"># Rate limiting</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 836 + </section> 837 + </section> 838 + <section id="docker-compose-example" class="level2"> 839 + <h2 class="anchored" data-anchor-id="docker-compose-example">Docker Compose Example</h2> 840 + <p>Complete local deployment with Redis and MinIO:</p> 841 + <div class="sourceCode" id="cb12"><pre class="sourceCode yaml code-with-copy"><code class="sourceCode yaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># docker-compose.yml</span></span> 842 + <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="fu">version</span><span class="kw">:</span><span class="at"> </span><span class="st">'3.8'</span></span> 843 + <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a></span> 844 + <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a><span class="fu">services</span><span class="kw">:</span></span> 845 + <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">redis</span><span class="kw">:</span></span> 846 + <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">image</span><span class="kw">:</span><span class="at"> redis:7-alpine</span></span> 847 + <span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">command</span><span class="kw">:</span><span class="at"> redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}</span></span> 848 + <span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ports</span><span class="kw">:</span></span> 849 + <span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="st">"6379:6379"</span></span> 850 + <span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">volumes</span><span class="kw">:</span></span> 851 + <span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> redis-data:/data</span></span> 852 + <span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a></span> 853 + <span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">minio</span><span class="kw">:</span></span> 854 + <span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">image</span><span class="kw">:</span><span class="at"> minio/minio</span></span> 855 + <span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">command</span><span class="kw">:</span><span class="at"> server /data --console-address ":9001"</span></span> 856 + <span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">ports</span><span class="kw">:</span></span> 857 + <span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="st">"9000:9000"</span></span> 858 + <span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> </span><span class="st">"9001:9001"</span></span> 859 + <span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">environment</span><span class="kw">:</span></span> 860 + <span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">MINIO_ROOT_USER</span><span class="kw">:</span><span class="at"> ${MINIO_USER}</span></span> 861 + <span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">MINIO_ROOT_PASSWORD</span><span class="kw">:</span><span class="at"> ${MINIO_PASSWORD}</span></span> 862 + <span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">volumes</span><span class="kw">:</span></span> 863 + <span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="kw">-</span><span class="at"> minio-data:/data</span></span> 864 + <span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a></span> 865 + <span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a><span class="fu">volumes</span><span class="kw">:</span></span> 866 + <span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">redis-data</span><span class="kw">:</span></span> 867 + <span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a><span class="at"> </span><span class="fu">minio-data</span><span class="kw">:</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 868 + <div class="sourceCode" id="cb13"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="co"># .env</span></span> 869 + <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="va">REDIS_PASSWORD</span><span class="op">=</span>your-redis-password</span> 870 + <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="va">MINIO_USER</span><span class="op">=</span>minioadmin</span> 871 + <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a><span class="va">MINIO_PASSWORD</span><span class="op">=</span>your-minio-password</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 872 + </section> 873 + <section id="monitoring" class="level2"> 874 + <h2 class="anchored" data-anchor-id="monitoring">Monitoring</h2> 875 + <section id="redis-metrics" class="level3"> 876 + <h3 class="anchored" data-anchor-id="redis-metrics">Redis Metrics</h3> 877 + <p>Key metrics to monitor:</p> 878 + <ul> 879 + <li><code>used_memory</code>: Memory usage</li> 880 + <li><code>connected_clients</code>: Active connections</li> 881 + <li><code>keyspace_hits/misses</code>: Cache efficiency</li> 882 + <li><code>aof_last_write_status</code>: Persistence health</li> 883 + </ul> 884 + <div class="sourceCode" id="cb14"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="ex">redis-cli</span> INFO <span class="kw">|</span> <span class="fu">grep</span> <span class="at">-E</span> <span class="st">"used_memory|connected_clients|keyspace"</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 885 + </section> 886 + <section id="s3-metrics" class="level3"> 887 + <h3 class="anchored" data-anchor-id="s3-metrics">S3 Metrics</h3> 888 + <ul> 889 + <li>Request counts and latency</li> 890 + <li>Error rates (4xx, 5xx)</li> 891 + <li>Storage usage by prefix</li> 892 + <li>Data transfer costs</li> 893 + </ul> 894 + </section> 895 + </section> 896 + <section id="security-best-practices" class="level2"> 897 + <h2 class="anchored" data-anchor-id="security-best-practices">Security Best Practices</h2> 898 + <ol type="1"> 899 + <li><strong>Network Isolation</strong>: Run Redis and S3 in private networks</li> 900 + <li><strong>TLS Everywhere</strong>: Encrypt connections to Redis and S3</li> 901 + <li><strong>Credential Rotation</strong>: Rotate API keys and passwords regularly</li> 902 + <li><strong>Access Logging</strong>: Enable S3 access logging for audit trails</li> 903 + <li><strong>Least Privilege</strong>: Use minimal IAM permissions for S3 access</li> 904 + </ol> 905 + <section id="s3-iam-policy-example" class="level3"> 906 + <h3 class="anchored" data-anchor-id="s3-iam-policy-example">S3 IAM Policy Example</h3> 907 + <div class="sourceCode" id="cb15"><pre class="sourceCode json code-with-copy"><code class="sourceCode json"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span></span> 908 + <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a> <span class="dt">"Version"</span><span class="fu">:</span> <span class="st">"2012-10-17"</span><span class="fu">,</span></span> 909 + <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a> <span class="dt">"Statement"</span><span class="fu">:</span> <span class="ot">[</span></span> 910 + <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a> <span class="fu">{</span></span> 911 + <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a> <span class="dt">"Effect"</span><span class="fu">:</span> <span class="st">"Allow"</span><span class="fu">,</span></span> 912 + <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a> <span class="dt">"Action"</span><span class="fu">:</span> <span class="ot">[</span></span> 913 + <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="st">"s3:GetObject"</span><span class="ot">,</span></span> 914 + <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> <span class="st">"s3:PutObject"</span><span class="ot">,</span></span> 915 + <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a> <span class="st">"s3:ListBucket"</span></span> 916 + <span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span><span class="fu">,</span></span> 917 + <span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a> <span class="dt">"Resource"</span><span class="fu">:</span> <span class="ot">[</span></span> 918 + <span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"arn:aws:s3:::my-atdata-bucket"</span><span class="ot">,</span></span> 919 + <span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> <span class="st">"arn:aws:s3:::my-atdata-bucket/*"</span></span> 920 + <span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span> 921 + <span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a> <span class="fu">}</span></span> 922 + <span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a> <span class="ot">]</span></span> 923 + <span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a><span class="fu">}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 924 + 925 + 926 + </section> 927 + </section> 928 + 929 + </main> <!-- /main --> 930 + <script id="quarto-html-after-body" type="application/javascript"> 931 + window.document.addEventListener("DOMContentLoaded", function (event) { 932 + // Ensure there is a toggle, if there isn't float one in the top right 933 + if (window.document.querySelector('.quarto-color-scheme-toggle') === null) { 934 + const a = window.document.createElement('a'); 935 + a.classList.add('top-right'); 936 + a.classList.add('quarto-color-scheme-toggle'); 937 + a.href = ""; 938 + a.onclick = function() { try { window.quartoToggleColorScheme(); } catch {} return false; }; 939 + const i = window.document.createElement("i"); 940 + i.classList.add('bi'); 941 + a.appendChild(i); 942 + window.document.body.appendChild(a); 943 + } 944 + setColorSchemeToggle(hasAlternateSentinel()) 945 + const icon = ""; 946 + const anchorJS = new window.AnchorJS(); 947 + anchorJS.options = { 948 + placement: 'right', 949 + icon: icon 950 + }; 951 + anchorJS.add('.anchored'); 952 + const isCodeAnnotation = (el) => { 953 + for (const clz of el.classList) { 954 + if (clz.startsWith('code-annotation-')) { 955 + return true; 956 + } 957 + } 958 + return false; 959 + } 960 + const onCopySuccess = function(e) { 961 + // button target 962 + const button = e.trigger; 963 + // don't keep focus 964 + button.blur(); 965 + // flash "checked" 966 + button.classList.add('code-copy-button-checked'); 967 + var currentTitle = button.getAttribute("title"); 968 + button.setAttribute("title", "Copied!"); 969 + let tooltip; 970 + if (window.bootstrap) { 971 + button.setAttribute("data-bs-toggle", "tooltip"); 972 + button.setAttribute("data-bs-placement", "left"); 973 + button.setAttribute("data-bs-title", "Copied!"); 974 + tooltip = new bootstrap.Tooltip(button, 975 + { trigger: "manual", 976 + customClass: "code-copy-button-tooltip", 977 + offset: [0, -8]}); 978 + tooltip.show(); 979 + } 980 + setTimeout(function() { 981 + if (tooltip) { 982 + tooltip.hide(); 983 + button.removeAttribute("data-bs-title"); 984 + button.removeAttribute("data-bs-toggle"); 985 + button.removeAttribute("data-bs-placement"); 986 + } 987 + button.setAttribute("title", currentTitle); 988 + button.classList.remove('code-copy-button-checked'); 989 + }, 1000); 990 + // clear code selection 991 + e.clearSelection(); 992 + } 993 + const getTextToCopy = function(trigger) { 994 + const codeEl = trigger.previousElementSibling.cloneNode(true); 995 + for (const childEl of codeEl.children) { 996 + if (isCodeAnnotation(childEl)) { 997 + childEl.remove(); 998 + } 999 + } 1000 + return codeEl.innerText; 1001 + } 1002 + const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', { 1003 + text: getTextToCopy 1004 + }); 1005 + clipboard.on('success', onCopySuccess); 1006 + if (window.document.getElementById('quarto-embedded-source-code-modal')) { 1007 + const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', { 1008 + text: getTextToCopy, 1009 + container: window.document.getElementById('quarto-embedded-source-code-modal') 1010 + }); 1011 + clipboardModal.on('success', onCopySuccess); 1012 + } 1013 + var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); 1014 + var mailtoRegex = new RegExp(/^mailto:/); 1015 + var filterRegex = new RegExp("https:\/\/github\.com\/your-org\/atdata"); 1016 + var isInternal = (href) => { 1017 + return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href); 1018 + } 1019 + // Inspect non-navigation links and adorn them if external 1020 + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); 1021 + for (var i=0; i<links.length; i++) { 1022 + const link = links[i]; 1023 + if (!isInternal(link.href)) { 1024 + // undo the damage that might have been done by quarto-nav.js in the case of 1025 + // links that we want to consider external 1026 + if (link.dataset.originalHref !== undefined) { 1027 + link.href = link.dataset.originalHref; 1028 + } 1029 + } 1030 + } 1031 + function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) { 1032 + const config = { 1033 + allowHTML: true, 1034 + maxWidth: 500, 1035 + delay: 100, 1036 + arrow: false, 1037 + appendTo: function(el) { 1038 + return el.parentElement; 1039 + }, 1040 + interactive: true, 1041 + interactiveBorder: 10, 1042 + theme: 'quarto', 1043 + placement: 'bottom-start', 1044 + }; 1045 + if (contentFn) { 1046 + config.content = contentFn; 1047 + } 1048 + if (onTriggerFn) { 1049 + config.onTrigger = onTriggerFn; 1050 + } 1051 + if (onUntriggerFn) { 1052 + config.onUntrigger = onUntriggerFn; 1053 + } 1054 + window.tippy(el, config); 1055 + } 1056 + const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]'); 1057 + for (var i=0; i<noterefs.length; i++) { 1058 + const ref = noterefs[i]; 1059 + tippyHover(ref, function() { 1060 + // use id or data attribute instead here 1061 + let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href'); 1062 + try { href = new URL(href).hash; } catch {} 1063 + const id = href.replace(/^#\/?/, ""); 1064 + const note = window.document.getElementById(id); 1065 + if (note) { 1066 + return note.innerHTML; 1067 + } else { 1068 + return ""; 1069 + } 1070 + }); 1071 + } 1072 + const xrefs = window.document.querySelectorAll('a.quarto-xref'); 1073 + const processXRef = (id, note) => { 1074 + // Strip column container classes 1075 + const stripColumnClz = (el) => { 1076 + el.classList.remove("page-full", "page-columns"); 1077 + if (el.children) { 1078 + for (const child of el.children) { 1079 + stripColumnClz(child); 1080 + } 1081 + } 1082 + } 1083 + stripColumnClz(note) 1084 + if (id === null || id.startsWith('sec-')) { 1085 + // Special case sections, only their first couple elements 1086 + const container = document.createElement("div"); 1087 + if (note.children && note.children.length > 2) { 1088 + container.appendChild(note.children[0].cloneNode(true)); 1089 + for (let i = 1; i < note.children.length; i++) { 1090 + const child = note.children[i]; 1091 + if (child.tagName === "P" && child.innerText === "") { 1092 + continue; 1093 + } else { 1094 + container.appendChild(child.cloneNode(true)); 1095 + break; 1096 + } 1097 + } 1098 + if (window.Quarto?.typesetMath) { 1099 + window.Quarto.typesetMath(container); 1100 + } 1101 + return container.innerHTML 1102 + } else { 1103 + if (window.Quarto?.typesetMath) { 1104 + window.Quarto.typesetMath(note); 1105 + } 1106 + return note.innerHTML; 1107 + } 1108 + } else { 1109 + // Remove any anchor links if they are present 1110 + const anchorLink = note.querySelector('a.anchorjs-link'); 1111 + if (anchorLink) { 1112 + anchorLink.remove(); 1113 + } 1114 + if (window.Quarto?.typesetMath) { 1115 + window.Quarto.typesetMath(note); 1116 + } 1117 + if (note.classList.contains("callout")) { 1118 + return note.outerHTML; 1119 + } else { 1120 + return note.innerHTML; 1121 + } 1122 + } 1123 + } 1124 + for (var i=0; i<xrefs.length; i++) { 1125 + const xref = xrefs[i]; 1126 + tippyHover(xref, undefined, function(instance) { 1127 + instance.disable(); 1128 + let url = xref.getAttribute('href'); 1129 + let hash = undefined; 1130 + if (url.startsWith('#')) { 1131 + hash = url; 1132 + } else { 1133 + try { hash = new URL(url).hash; } catch {} 1134 + } 1135 + if (hash) { 1136 + const id = hash.replace(/^#\/?/, ""); 1137 + const note = window.document.getElementById(id); 1138 + if (note !== null) { 1139 + try { 1140 + const html = processXRef(id, note.cloneNode(true)); 1141 + instance.setContent(html); 1142 + } finally { 1143 + instance.enable(); 1144 + instance.show(); 1145 + } 1146 + } else { 1147 + // See if we can fetch this 1148 + fetch(url.split('#')[0]) 1149 + .then(res => res.text()) 1150 + .then(html => { 1151 + const parser = new DOMParser(); 1152 + const htmlDoc = parser.parseFromString(html, "text/html"); 1153 + const note = htmlDoc.getElementById(id); 1154 + if (note !== null) { 1155 + const html = processXRef(id, note); 1156 + instance.setContent(html); 1157 + } 1158 + }).finally(() => { 1159 + instance.enable(); 1160 + instance.show(); 1161 + }); 1162 + } 1163 + } else { 1164 + // See if we can fetch a full url (with no hash to target) 1165 + // This is a special case and we should probably do some content thinning / targeting 1166 + fetch(url) 1167 + .then(res => res.text()) 1168 + .then(html => { 1169 + const parser = new DOMParser(); 1170 + const htmlDoc = parser.parseFromString(html, "text/html"); 1171 + const note = htmlDoc.querySelector('main.content'); 1172 + if (note !== null) { 1173 + // This should only happen for chapter cross references 1174 + // (since there is no id in the URL) 1175 + // remove the first header 1176 + if (note.children.length > 0 && note.children[0].tagName === "HEADER") { 1177 + note.children[0].remove(); 1178 + } 1179 + const html = processXRef(null, note); 1180 + instance.setContent(html); 1181 + } 1182 + }).finally(() => { 1183 + instance.enable(); 1184 + instance.show(); 1185 + }); 1186 + } 1187 + }, function(instance) { 1188 + }); 1189 + } 1190 + let selectedAnnoteEl; 1191 + const selectorForAnnotation = ( cell, annotation) => { 1192 + let cellAttr = 'data-code-cell="' + cell + '"'; 1193 + let lineAttr = 'data-code-annotation="' + annotation + '"'; 1194 + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; 1195 + return selector; 1196 + } 1197 + const selectCodeLines = (annoteEl) => { 1198 + const doc = window.document; 1199 + const targetCell = annoteEl.getAttribute("data-target-cell"); 1200 + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); 1201 + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); 1202 + const lines = annoteSpan.getAttribute("data-code-lines").split(","); 1203 + const lineIds = lines.map((line) => { 1204 + return targetCell + "-" + line; 1205 + }) 1206 + let top = null; 1207 + let height = null; 1208 + let parent = null; 1209 + if (lineIds.length > 0) { 1210 + //compute the position of the single el (top and bottom and make a div) 1211 + const el = window.document.getElementById(lineIds[0]); 1212 + top = el.offsetTop; 1213 + height = el.offsetHeight; 1214 + parent = el.parentElement.parentElement; 1215 + if (lineIds.length > 1) { 1216 + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); 1217 + const bottom = lastEl.offsetTop + lastEl.offsetHeight; 1218 + height = bottom - top; 1219 + } 1220 + if (top !== null && height !== null && parent !== null) { 1221 + // cook up a div (if necessary) and position it 1222 + let div = window.document.getElementById("code-annotation-line-highlight"); 1223 + if (div === null) { 1224 + div = window.document.createElement("div"); 1225 + div.setAttribute("id", "code-annotation-line-highlight"); 1226 + div.style.position = 'absolute'; 1227 + parent.appendChild(div); 1228 + } 1229 + div.style.top = top - 2 + "px"; 1230 + div.style.height = height + 4 + "px"; 1231 + div.style.left = 0; 1232 + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); 1233 + if (gutterDiv === null) { 1234 + gutterDiv = window.document.createElement("div"); 1235 + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); 1236 + gutterDiv.style.position = 'absolute'; 1237 + const codeCell = window.document.getElementById(targetCell); 1238 + const gutter = codeCell.querySelector('.code-annotation-gutter'); 1239 + gutter.appendChild(gutterDiv); 1240 + } 1241 + gutterDiv.style.top = top - 2 + "px"; 1242 + gutterDiv.style.height = height + 4 + "px"; 1243 + } 1244 + selectedAnnoteEl = annoteEl; 1245 + } 1246 + }; 1247 + const unselectCodeLines = () => { 1248 + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; 1249 + elementsIds.forEach((elId) => { 1250 + const div = window.document.getElementById(elId); 1251 + if (div) { 1252 + div.remove(); 1253 + } 1254 + }); 1255 + selectedAnnoteEl = undefined; 1256 + }; 1257 + // Handle positioning of the toggle 1258 + window.addEventListener( 1259 + "resize", 1260 + throttle(() => { 1261 + elRect = undefined; 1262 + if (selectedAnnoteEl) { 1263 + selectCodeLines(selectedAnnoteEl); 1264 + } 1265 + }, 10) 1266 + ); 1267 + function throttle(fn, ms) { 1268 + let throttle = false; 1269 + let timer; 1270 + return (...args) => { 1271 + if(!throttle) { // first call gets through 1272 + fn.apply(this, args); 1273 + throttle = true; 1274 + } else { // all the others get throttled 1275 + if(timer) clearTimeout(timer); // cancel #2 1276 + timer = setTimeout(() => { 1277 + fn.apply(this, args); 1278 + timer = throttle = false; 1279 + }, ms); 1280 + } 1281 + }; 1282 + } 1283 + // Attach click handler to the DT 1284 + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); 1285 + for (const annoteDlNode of annoteDls) { 1286 + annoteDlNode.addEventListener('click', (event) => { 1287 + const clickedEl = event.target; 1288 + if (clickedEl !== selectedAnnoteEl) { 1289 + unselectCodeLines(); 1290 + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); 1291 + if (activeEl) { 1292 + activeEl.classList.remove('code-annotation-active'); 1293 + } 1294 + selectCodeLines(clickedEl); 1295 + clickedEl.classList.add('code-annotation-active'); 1296 + } else { 1297 + // Unselect the line 1298 + unselectCodeLines(); 1299 + clickedEl.classList.remove('code-annotation-active'); 1300 + } 1301 + }); 1302 + } 1303 + const findCites = (el) => { 1304 + const parentEl = el.parentElement; 1305 + if (parentEl) { 1306 + const cites = parentEl.dataset.cites; 1307 + if (cites) { 1308 + return { 1309 + el, 1310 + cites: cites.split(' ') 1311 + }; 1312 + } else { 1313 + return findCites(el.parentElement) 1314 + } 1315 + } else { 1316 + return undefined; 1317 + } 1318 + }; 1319 + var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); 1320 + for (var i=0; i<bibliorefs.length; i++) { 1321 + const ref = bibliorefs[i]; 1322 + const citeInfo = findCites(ref); 1323 + if (citeInfo) { 1324 + tippyHover(citeInfo.el, function() { 1325 + var popup = window.document.createElement('div'); 1326 + citeInfo.cites.forEach(function(cite) { 1327 + var citeDiv = window.document.createElement('div'); 1328 + citeDiv.classList.add('hanging-indent'); 1329 + citeDiv.classList.add('csl-entry'); 1330 + var biblioDiv = window.document.getElementById('ref-' + cite); 1331 + if (biblioDiv) { 1332 + citeDiv.innerHTML = biblioDiv.innerHTML; 1333 + } 1334 + popup.appendChild(citeDiv); 1335 + }); 1336 + return popup.innerHTML; 1337 + }); 1338 + } 1339 + } 1340 + }); 1341 + </script> 1342 + </div> <!-- /content --> 1343 + <footer class="footer"> 1344 + <div class="nav-footer"> 1345 + <div class="nav-footer-left"> 1346 + <p>Built with <a href="https://quarto.org/">Quarto</a></p> 1347 + </div> 1348 + <div class="nav-footer-center"> 1349 + &nbsp; 1350 + <div class="toc-actions d-sm-block d-md-none"><ul><li><a href="https://github.com/your-org/atdata/edit/main/reference/deployment.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></div> 1351 + <div class="nav-footer-right"> 1352 + <p>MIT License</p> 1353 + </div> 1354 + </div> 1355 + </footer> 1356 + 1357 + 1358 + 1359 + 1360 + </body></html>
+30 -10
docs/reference/lenses.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 498 506 <div class="sidebar-item-container"> 499 507 <a href="../reference/uri-spec.html" class="sidebar-item-text sidebar-link"> 500 508 <span class="menu-text">URI Specification</span></a> 509 + </div> 510 + </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 501 521 </div> 502 522 </li> 503 523 </ul> ··· 582 602 <section id="creating-a-lens" class="level2"> 583 603 <h2 class="anchored" data-anchor-id="creating-a-lens">Creating a Lens</h2> 584 604 <p>Use the <code>@lens</code> decorator to define a getter:</p> 585 - <div id="bb40c899" class="cell"> 605 + <div id="45f5ac05" class="cell"> 586 606 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 587 607 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 588 608 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 612 632 <section id="adding-a-putter" class="level2"> 613 633 <h2 class="anchored" data-anchor-id="adding-a-putter">Adding a Putter</h2> 614 634 <p>To enable bidirectional updates, add a putter:</p> 615 - <div id="320457e1" class="cell"> 635 + <div id="8f4129ae" class="cell"> 616 636 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@simplify.putter</span></span> 617 637 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> simplify_put(view: SimpleSample, source: FullSample) <span class="op">-&gt;</span> FullSample:</span> 618 638 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> FullSample(</span> ··· 632 652 <section id="using-lenses-with-datasets" class="level2"> 633 653 <h2 class="anchored" data-anchor-id="using-lenses-with-datasets">Using Lenses with Datasets</h2> 634 654 <p>Lenses integrate with <code>Dataset.as_type()</code>:</p> 635 - <div id="08a2bea4" class="cell"> 655 + <div id="80f0f8ed" class="cell"> 636 656 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FullSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 637 657 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 638 658 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># View through a different type</span></span> ··· 647 667 <section id="direct-lens-usage" class="level2"> 648 668 <h2 class="anchored" data-anchor-id="direct-lens-usage">Direct Lens Usage</h2> 649 669 <p>Lenses can also be called directly:</p> 650 - <div id="1feebfa5" class="cell"> 670 + <div id="1f6c843a" class="cell"> 651 671 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 652 672 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 653 673 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>full <span class="op">=</span> FullSample(</span> ··· 676 696 <div class="tab-content"> 677 697 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 678 698 <p>If you get a view and immediately put it back, the source is unchanged:</p> 679 - <div id="9fd0ceae" class="cell"> 699 + <div id="4f5e3828" class="cell"> 680 700 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>view <span class="op">=</span> lens.get(source)</span> 681 701 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.put(view, source) <span class="op">==</span> source</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 682 702 </div> 683 703 </div> 684 704 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 685 705 <p>If you put a view, getting it back yields that view:</p> 686 - <div id="4180490e" class="cell"> 706 + <div id="fa5c30b6" class="cell"> 687 707 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>updated <span class="op">=</span> lens.put(view, source)</span> 688 708 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.get(updated) <span class="op">==</span> view</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 689 709 </div> 690 710 </div> 691 711 <div id="tabset-1-3" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-3-tab"> 692 712 <p>Putting twice is equivalent to putting once with the final value:</p> 693 - <div id="b91446f8" class="cell"> 713 + <div id="909783e4" class="cell"> 694 714 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>result1 <span class="op">=</span> lens.put(v2, lens.put(v1, source))</span> 695 715 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>result2 <span class="op">=</span> lens.put(v2, source)</span> 696 716 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> result1 <span class="op">==</span> result2</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 702 722 <section id="trivial-putter" class="level2"> 703 723 <h2 class="anchored" data-anchor-id="trivial-putter">Trivial Putter</h2> 704 724 <p>If no putter is defined, a trivial putter is used that ignores view updates:</p> 705 - <div id="04ca3e2d" class="cell"> 725 + <div id="275968e3" class="cell"> 706 726 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.lens</span></span> 707 727 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> extract_label(src: FullSample) <span class="op">-&gt;</span> SimpleSample:</span> 708 728 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> SimpleSample(label<span class="op">=</span>src.label, confidence<span class="op">=</span>src.confidence)</span> ··· 716 736 <section id="lensnetwork-registry" class="level2"> 717 737 <h2 class="anchored" data-anchor-id="lensnetwork-registry">LensNetwork Registry</h2> 718 738 <p>The <code>LensNetwork</code> is a singleton that stores all registered lenses:</p> 719 - <div id="ed7f3c46" class="cell"> 739 + <div id="5bac1458" class="cell"> 720 740 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.lens <span class="im">import</span> LensNetwork</span> 721 741 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 722 742 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>network <span class="op">=</span> LensNetwork()</span> ··· 733 753 </section> 734 754 <section id="example-feature-extraction" class="level2"> 735 755 <h2 class="anchored" data-anchor-id="example-feature-extraction">Example: Feature Extraction</h2> 736 - <div id="37b2ed4a" class="cell"> 756 + <div id="a29d12f5" class="cell"> 737 757 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 738 758 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> RawSample:</span> 739 759 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> audio: NDArray</span>
+32 -12
docs/reference/load-dataset.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 591 611 </section> 592 612 <section id="basic-usage" class="level2"> 593 613 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 594 - <div id="8431ca90" class="cell"> 614 + <div id="46f7c26d" class="cell"> 595 615 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 596 616 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 597 617 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> ··· 614 634 <h2 class="anchored" data-anchor-id="path-formats">Path Formats</h2> 615 635 <section id="webdataset-brace-notation" class="level3"> 616 636 <h3 class="anchored" data-anchor-id="webdataset-brace-notation">WebDataset Brace Notation</h3> 617 - <div id="c4d94a2a" class="cell"> 637 + <div id="ed8498ef" class="cell"> 618 638 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Range notation</span></span> 619 639 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"data-{000000..000099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 620 640 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 624 644 </section> 625 645 <section id="glob-patterns" class="level3"> 626 646 <h3 class="anchored" data-anchor-id="glob-patterns">Glob Patterns</h3> 627 - <div id="bc3dc32e" class="cell"> 647 + <div id="2e6503f7" class="cell"> 628 648 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Match all tar files</span></span> 629 649 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"path/to/*.tar"</span>, MySample)</span> 630 650 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 634 654 </section> 635 655 <section id="local-directory" class="level3"> 636 656 <h3 class="anchored" data-anchor-id="local-directory">Local Directory</h3> 637 - <div id="356f81ca" class="cell"> 657 + <div id="35083602" class="cell"> 638 658 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Scans for .tar files</span></span> 639 659 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"./my-dataset/"</span>, MySample)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 640 660 </div> 641 661 </section> 642 662 <section id="remote-urls" class="level3"> 643 663 <h3 class="anchored" data-anchor-id="remote-urls">Remote URLs</h3> 644 - <div id="ef39304f" class="cell"> 664 + <div id="9389fed0" class="cell"> 645 665 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># S3 (public buckets)</span></span> 646 666 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"s3://bucket/data-{000..099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 647 667 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 667 687 </section> 668 688 <section id="index-lookup" class="level3"> 669 689 <h3 class="anchored" data-anchor-id="index-lookup">Index Lookup</h3> 670 - <div id="5311414e" class="cell"> 690 + <div id="060d1d0d" class="cell"> 671 691 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 672 692 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 673 693 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 734 754 <section id="datasetdict" class="level2"> 735 755 <h2 class="anchored" data-anchor-id="datasetdict">DatasetDict</h2> 736 756 <p>When loading without <code>split=</code>, returns a <code>DatasetDict</code>:</p> 737 - <div id="dd3ecd9d" class="cell"> 757 + <div id="fece2ae6" class="cell"> 738 758 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data/"</span>, MySample)</span> 739 759 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 740 760 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Access splits</span></span> ··· 754 774 <section id="explicit-data-files" class="level2"> 755 775 <h2 class="anchored" data-anchor-id="explicit-data-files">Explicit Data Files</h2> 756 776 <p>Override automatic detection with <code>data_files</code>:</p> 757 - <div id="87a8c451" class="cell"> 777 + <div id="32616278" class="cell"> 758 778 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Single pattern</span></span> 759 779 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(</span> 760 780 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"path/to/"</span>,</span> ··· 783 803 <section id="streaming-mode" class="level2"> 784 804 <h2 class="anchored" data-anchor-id="streaming-mode">Streaming Mode</h2> 785 805 <p>The <code>streaming</code> parameter signals intent for streaming mode:</p> 786 - <div id="9b4892ce" class="cell"> 806 + <div id="3b0629de" class="cell"> 787 807 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Mark as streaming</span></span> 788 808 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, streaming<span class="op">=</span><span class="va">True</span>)</span> 789 809 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 808 828 <section id="auto-type-resolution" class="level2"> 809 829 <h2 class="anchored" data-anchor-id="auto-type-resolution">Auto Type Resolution</h2> 810 830 <p>When using index lookup, the sample type can be resolved automatically:</p> 811 - <div id="89c1ab32" class="cell"> 831 + <div id="1775aa36" class="cell"> 812 832 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 813 833 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 814 834 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 822 842 </section> 823 843 <section id="error-handling" class="level2"> 824 844 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 825 - <div id="900a9be5" class="cell"> 845 + <div id="77761582" class="cell"> 826 846 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 827 847 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 828 848 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">FileNotFoundError</span>:</span> ··· 838 858 </section> 839 859 <section id="complete-example" class="level2"> 840 860 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 841 - <div id="9d2fc6bf" class="cell"> 861 + <div id="428d187a" class="cell"> 842 862 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 843 863 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 844 864 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>
+31 -11
docs/reference/local-storage.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 590 610 <section id="localindex" class="level2"> 591 611 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 592 612 <p>The index tracks datasets in Redis:</p> 593 - <div id="5be7427d" class="cell"> 613 + <div id="8802dc6f" class="cell"> 594 614 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 595 615 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 596 616 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Default connection (localhost:6379)</span></span> ··· 606 626 </div> 607 627 <section id="adding-entries" class="level3"> 608 628 <h3 class="anchored" data-anchor-id="adding-entries">Adding Entries</h3> 609 - <div id="99e4a66a" class="cell"> 629 + <div id="0aa7bd42" class="cell"> 610 630 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 611 631 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 612 632 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 631 651 </section> 632 652 <section id="listing-and-retrieving" class="level3"> 633 653 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 634 - <div id="d4b9bc5b" class="cell"> 654 + <div id="d445e281" class="cell"> 635 655 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate all entries</span></span> 636 656 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.entries:</span> 637 657 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>cid<span class="sc">}</span><span class="ss">"</span>)</span> ··· 663 683 </div> 664 684 </div> 665 685 <p>The Repo class combines S3 storage with Redis indexing:</p> 666 - <div id="af0fa0b4" class="cell"> 686 + <div id="5f19e841" class="cell"> 667 687 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Repo</span> 668 688 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 669 689 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># From credentials file</span></span> ··· 683 703 <span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 684 704 </div> 685 705 <p><strong>Preferred approach</strong> - Use <code>LocalIndex</code> with <code>S3DataStore</code>:</p> 686 - <div id="d52c1263" class="cell"> 706 + <div id="318b526c" class="cell"> 687 707 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 688 708 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 689 709 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 721 741 </section> 722 742 <section id="inserting-datasets" class="level3"> 723 743 <h3 class="anchored" data-anchor-id="inserting-datasets">Inserting Datasets</h3> 724 - <div id="03361cfa" class="cell"> 744 + <div id="40d08e10" class="cell"> 725 745 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 726 746 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 727 747 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 751 771 </section> 752 772 <section id="insert-options" class="level3"> 753 773 <h3 class="anchored" data-anchor-id="insert-options">Insert Options</h3> 754 - <div id="98749d62" class="cell"> 774 + <div id="6a0584d5" class="cell"> 755 775 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>entry, ds <span class="op">=</span> repo.insert(</span> 756 776 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 757 777 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 765 785 <section id="localdatasetentry" class="level2"> 766 786 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 767 787 <p>Index entries provide content-addressable identification:</p> 768 - <div id="49574a97" class="cell"> 788 + <div id="8e1a7430" class="cell"> 769 789 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.get_entry_by_name(<span class="st">"my-dataset"</span>)</span> 770 790 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 771 791 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Core properties (IndexEntry protocol)</span></span> ··· 798 818 <section id="schema-storage" class="level2"> 799 819 <h2 class="anchored" data-anchor-id="schema-storage">Schema Storage</h2> 800 820 <p>Schemas can be stored and retrieved from the index:</p> 801 - <div id="752e8ad4" class="cell"> 821 + <div id="590708b3" class="cell"> 802 822 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 803 823 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 804 824 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 829 849 <section id="s3datastore" class="level2"> 830 850 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 831 851 <p>For direct S3 operations without Redis indexing:</p> 832 - <div id="8b0d8caa" class="cell"> 852 + <div id="b4b71f5c" class="cell"> 833 853 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 834 854 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 835 855 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 851 871 </section> 852 872 <section id="complete-workflow-example" class="level2"> 853 873 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 854 - <div id="52a3e561" class="cell"> 874 + <div id="25517ea9" class="cell"> 855 875 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 856 876 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 857 877 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>
+32 -12
docs/reference/packable-samples.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 585 605 <section id="the-packable-decorator" class="level2"> 586 606 <h2 class="anchored" data-anchor-id="the-packable-decorator">The <code>@packable</code> Decorator</h2> 587 607 <p>The recommended way to define a sample type is with the <code>@packable</code> decorator:</p> 588 - <div id="ad073d28" class="cell"> 608 + <div id="14b6af1d" class="cell"> 589 609 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 590 610 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 591 611 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 607 627 <h2 class="anchored" data-anchor-id="supported-field-types">Supported Field Types</h2> 608 628 <section id="primitives" class="level3"> 609 629 <h3 class="anchored" data-anchor-id="primitives">Primitives</h3> 610 - <div id="077be434" class="cell"> 630 + <div id="e78d8889" class="cell"> 611 631 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 612 632 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> PrimitiveSample:</span> 613 633 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> name: <span class="bu">str</span></span> ··· 620 640 <section id="numpy-arrays" class="level3"> 621 641 <h3 class="anchored" data-anchor-id="numpy-arrays">NumPy Arrays</h3> 622 642 <p>Fields annotated as <code>NDArray</code> are automatically converted:</p> 623 - <div id="c3d07424" class="cell"> 643 + <div id="f47fe19a" class="cell"> 624 644 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 625 645 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ArraySample:</span> 626 646 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Required array</span></span> ··· 642 662 </section> 643 663 <section id="lists" class="level3"> 644 664 <h3 class="anchored" data-anchor-id="lists">Lists</h3> 645 - <div id="44c4fa21" class="cell"> 665 + <div id="0e7f2d72" class="cell"> 646 666 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 647 667 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ListSample:</span> 648 668 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> tags: <span class="bu">list</span>[<span class="bu">str</span>]</span> ··· 654 674 <h2 class="anchored" data-anchor-id="serialization">Serialization</h2> 655 675 <section id="packing-to-bytes" class="level3"> 656 676 <h3 class="anchored" data-anchor-id="packing-to-bytes">Packing to Bytes</h3> 657 - <div id="019b76e0" class="cell"> 677 + <div id="75b55786" class="cell"> 658 678 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 659 679 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> 660 680 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span><span class="st">"cat"</span>,</span> ··· 668 688 </section> 669 689 <section id="unpacking-from-bytes" class="level3"> 670 690 <h3 class="anchored" data-anchor-id="unpacking-from-bytes">Unpacking from Bytes</h3> 671 - <div id="e9126b33" class="cell"> 691 + <div id="93d8820a" class="cell"> 672 692 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Deserialize from bytes</span></span> 673 693 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>restored <span class="op">=</span> ImageSample.from_bytes(packed_bytes)</span> 674 694 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 680 700 <section id="webdataset-format" class="level3"> 681 701 <h3 class="anchored" data-anchor-id="webdataset-format">WebDataset Format</h3> 682 702 <p>The <code>as_wds</code> property returns a dict ready for WebDataset:</p> 683 - <div id="a7ae0258" class="cell"> 703 + <div id="5e39d6a5" class="cell"> 684 704 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>wds_dict <span class="op">=</span> sample.as_wds</span> 685 705 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co"># {'__key__': '1234...', 'msgpack': b'...'}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 686 706 </div> 687 707 <p>Write samples to a tar file:</p> 688 - <div id="1de01007" class="cell"> 708 + <div id="d79d658a" class="cell"> 689 709 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 690 710 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span> 691 711 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"data-000000.tar"</span>) <span class="im">as</span> sink:</span> ··· 698 718 <section id="direct-inheritance-alternative" class="level2"> 699 719 <h2 class="anchored" data-anchor-id="direct-inheritance-alternative">Direct Inheritance (Alternative)</h2> 700 720 <p>You can also inherit directly from <code>PackableSample</code>:</p> 701 - <div id="970773ca" class="cell"> 721 + <div id="36a5373c" class="cell"> 702 722 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> dataclass</span> 703 723 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 704 724 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@dataclass</span></span> ··· 736 756 <section id="the-_ensure_good-method" class="level3"> 737 757 <h3 class="anchored" data-anchor-id="the-_ensure_good-method">The <code>_ensure_good()</code> Method</h3> 738 758 <p>This method runs automatically after construction and handles NDArray conversion:</p> 739 - <div id="296b7470" class="cell"> 759 + <div id="66bc56e5" class="cell"> 740 760 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> _ensure_good(<span class="va">self</span>):</span> 741 761 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> field <span class="kw">in</span> dataclasses.fields(<span class="va">self</span>):</span> 742 762 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> _is_possibly_ndarray_type(field.<span class="bu">type</span>):</span> ··· 752 772 <ul class="nav nav-tabs" role="tablist"><li class="nav-item" role="presentation"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" role="tab" aria-controls="tabset-2-1" aria-selected="true">Do</a></li><li class="nav-item" role="presentation"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" role="tab" aria-controls="tabset-2-2" aria-selected="false">Don’t</a></li></ul> 753 773 <div class="tab-content"> 754 774 <div id="tabset-2-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-2-1-tab"> 755 - <div id="3dee4ce6" class="cell"> 775 + <div id="843b97c7" class="cell"> 756 776 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 757 777 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> GoodSample:</span> 758 778 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Clear type annotation</span></span> ··· 762 782 </div> 763 783 </div> 764 784 <div id="tabset-2-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-2-2-tab"> 765 - <div id="d0d184ec" class="cell"> 785 + <div id="a31c16bf" class="cell"> 766 786 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 767 787 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> BadSample:</span> 768 788 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="co"># DON'T: Nested dataclasses not supported</span></span>
+27 -7
docs/reference/promotion.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 581 601 </section> 582 602 <section id="basic-usage" class="level2"> 583 603 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 584 - <div id="9b7db9ea" class="cell"> 604 + <div id="136db9af" class="cell"> 585 605 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 586 606 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 587 607 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> ··· 601 621 </section> 602 622 <section id="with-metadata" class="level2"> 603 623 <h2 class="anchored" data-anchor-id="with-metadata">With Metadata</h2> 604 - <div id="fec408db" class="cell"> 624 + <div id="2e4913b8" class="cell"> 605 625 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 606 626 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> entry,</span> 607 627 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 616 636 <section id="schema-deduplication" class="level2"> 617 637 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 618 638 <p>The promotion workflow automatically checks for existing schemas:</p> 619 - <div id="b493a780" class="cell"> 639 + <div id="054870cb" class="cell"> 620 640 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 621 641 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 622 642 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 636 656 <div class="tab-content"> 637 657 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 638 658 <p>By default, promotion keeps the original data URLs:</p> 639 - <div id="98e3c467" class="cell"> 659 + <div id="84889b62" class="cell"> 640 660 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 641 661 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 642 662 </div> ··· 649 669 </div> 650 670 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 651 671 <p>To copy data to a different storage location:</p> 652 - <div id="d1f0b29d" class="cell"> 672 + <div id="f2a0312e" class="cell"> 653 673 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 654 674 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 655 675 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 677 697 </section> 678 698 <section id="complete-workflow-example" class="level2"> 679 699 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 680 - <div id="f43825cc" class="cell"> 700 + <div id="5ebd107b" class="cell"> 681 701 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 682 702 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 683 703 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 748 768 </section> 749 769 <section id="error-handling" class="level2"> 750 770 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 751 - <div id="1d9ce394" class="cell"> 771 + <div id="e9403122" class="cell"> 752 772 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 753 773 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span> 754 774 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span>
+32 -12
docs/reference/protocols.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 602 622 <section id="indexentry-protocol" class="level2"> 603 623 <h2 class="anchored" data-anchor-id="indexentry-protocol">IndexEntry Protocol</h2> 604 624 <p>Represents a dataset entry in any index:</p> 605 - <div id="8932c6c6" class="cell"> 625 + <div id="0fbcfe4b" class="cell"> 606 626 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry</span> 607 627 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 608 628 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> process_entry(entry: IndexEntry) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 656 676 <section id="abstractindex-protocol" class="level2"> 657 677 <h2 class="anchored" data-anchor-id="abstractindex-protocol">AbstractIndex Protocol</h2> 658 678 <p>Defines operations for managing schemas and datasets:</p> 659 - <div id="d320f18a" class="cell"> 679 + <div id="3e5a364e" class="cell"> 660 680 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex</span> 661 681 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 662 682 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> list_all_datasets(index: AbstractIndex) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 666 686 </div> 667 687 <section id="dataset-operations" class="level3"> 668 688 <h3 class="anchored" data-anchor-id="dataset-operations">Dataset Operations</h3> 669 - <div id="3950da62" class="cell"> 689 + <div id="a9c35e38" class="cell"> 670 690 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Insert a dataset</span></span> 671 691 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 672 692 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 684 704 </section> 685 705 <section id="schema-operations" class="level3"> 686 706 <h3 class="anchored" data-anchor-id="schema-operations">Schema Operations</h3> 687 - <div id="d13f65bf" class="cell"> 707 + <div id="64780c26" class="cell"> 688 708 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 689 709 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 690 710 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> MySample,</span> ··· 715 735 <section id="abstractdatastore-protocol" class="level2"> 716 736 <h2 class="anchored" data-anchor-id="abstractdatastore-protocol">AbstractDataStore Protocol</h2> 717 737 <p>Abstracts over different storage backends:</p> 718 - <div id="9a971785" class="cell"> 738 + <div id="c7f29440" class="cell"> 719 739 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractDataStore</span> 720 740 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 721 741 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> write_dataset(store: AbstractDataStore, dataset) <span class="op">-&gt;</span> <span class="bu">list</span>[<span class="bu">str</span>]:</span> ··· 725 745 </div> 726 746 <section id="methods" class="level3"> 727 747 <h3 class="anchored" data-anchor-id="methods">Methods</h3> 728 - <div id="284ccbe5" class="cell"> 748 + <div id="05ac09cf" class="cell"> 729 749 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Write dataset shards</span></span> 730 750 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>urls <span class="op">=</span> store.write_shards(</span> 731 751 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 752 772 <section id="datasource-protocol" class="level2"> 753 773 <h2 class="anchored" data-anchor-id="datasource-protocol">DataSource Protocol</h2> 754 774 <p>Abstracts over different data source backends for streaming dataset shards:</p> 755 - <div id="8dabac06" class="cell"> 775 + <div id="5e778f11" class="cell"> 756 776 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 757 777 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 758 778 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> load_from_source(source: DataSource) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 765 785 </div> 766 786 <section id="methods-1" class="level3"> 767 787 <h3 class="anchored" data-anchor-id="methods-1">Methods</h3> 768 - <div id="ecfb7796" class="cell"> 788 + <div id="0b3e720c" class="cell"> 769 789 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Get list of shard identifiers</span></span> 770 790 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shard_ids <span class="op">=</span> source.shard_list <span class="co"># ['data-000000.tar', 'data-000001.tar', ...]</span></span> 771 791 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span> ··· 788 808 <section id="creating-custom-data-sources" class="level3"> 789 809 <h3 class="anchored" data-anchor-id="creating-custom-data-sources">Creating Custom Data Sources</h3> 790 810 <p>Implement the <code>DataSource</code> protocol for custom backends:</p> 791 - <div id="a4538f04" class="cell"> 811 + <div id="a35e3790" class="cell"> 792 812 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> typing <span class="im">import</span> Iterator, IO</span> 793 813 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 794 814 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 826 846 <section id="using-protocols-for-polymorphism" class="level2"> 827 847 <h2 class="anchored" data-anchor-id="using-protocols-for-polymorphism">Using Protocols for Polymorphism</h2> 828 848 <p>Write code that works with any backend:</p> 829 - <div id="041124c9" class="cell"> 849 + <div id="01a704ec" class="cell"> 830 850 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex, IndexEntry</span> 831 851 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> Dataset</span> 832 852 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 897 917 <section id="type-checking" class="level2"> 898 918 <h2 class="anchored" data-anchor-id="type-checking">Type Checking</h2> 899 919 <p>Protocols are runtime-checkable:</p> 900 - <div id="a8985032" class="cell"> 920 + <div id="edf89e8d" class="cell"> 901 921 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry, AbstractIndex</span> 902 922 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 903 923 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if object implements protocol</span></span> ··· 911 931 </section> 912 932 <section id="complete-example" class="level2"> 913 933 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 914 - <div id="d95bd7bf" class="cell"> 934 + <div id="58710f3d" class="cell"> 915 935 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 916 936 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 917 937 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span>
+1275
docs/reference/troubleshooting.html
··· 1 + <!DOCTYPE html> 2 + <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head> 3 + 4 + <meta charset="utf-8"> 5 + <meta name="generator" content="quarto-1.7.34"> 6 + 7 + <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> 8 + 9 + <meta name="description" content="Common issues and frequently asked questions"> 10 + 11 + <title>Troubleshooting &amp; FAQ – atdata</title> 12 + <style> 13 + code{white-space: pre-wrap;} 14 + span.smallcaps{font-variant: small-caps;} 15 + div.columns{display: flex; gap: min(4vw, 1.5em);} 16 + div.column{flex: auto; overflow-x: auto;} 17 + div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} 18 + ul.task-list{list-style: none;} 19 + ul.task-list li input[type="checkbox"] { 20 + width: 0.8em; 21 + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 22 + vertical-align: middle; 23 + } 24 + /* CSS for syntax highlighting */ 25 + html { -webkit-text-size-adjust: 100%; } 26 + pre > code.sourceCode { white-space: pre; position: relative; } 27 + pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } 28 + pre > code.sourceCode > span:empty { height: 1.2em; } 29 + .sourceCode { overflow: visible; } 30 + code.sourceCode > span { color: inherit; text-decoration: inherit; } 31 + div.sourceCode { margin: 1em 0; } 32 + pre.sourceCode { margin: 0; } 33 + @media screen { 34 + div.sourceCode { overflow: auto; } 35 + } 36 + @media print { 37 + pre > code.sourceCode { white-space: pre-wrap; } 38 + pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } 39 + } 40 + pre.numberSource code 41 + { counter-reset: source-line 0; } 42 + pre.numberSource code > span 43 + { position: relative; left: -4em; counter-increment: source-line; } 44 + pre.numberSource code > span > a:first-child::before 45 + { content: counter(source-line); 46 + position: relative; left: -1em; text-align: right; vertical-align: baseline; 47 + border: none; display: inline-block; 48 + -webkit-touch-callout: none; -webkit-user-select: none; 49 + -khtml-user-select: none; -moz-user-select: none; 50 + -ms-user-select: none; user-select: none; 51 + padding: 0 4px; width: 4em; 52 + } 53 + pre.numberSource { margin-left: 3em; padding-left: 4px; } 54 + div.sourceCode 55 + { } 56 + @media screen { 57 + pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } 58 + } 59 + </style> 60 + 61 + 62 + <script src="../site_libs/quarto-nav/quarto-nav.js"></script> 63 + <script src="../site_libs/quarto-nav/headroom.min.js"></script> 64 + <script src="../site_libs/clipboard/clipboard.min.js"></script> 65 + <script src="../site_libs/quarto-search/autocomplete.umd.js"></script> 66 + <script src="../site_libs/quarto-search/fuse.min.js"></script> 67 + <script src="../site_libs/quarto-search/quarto-search.js"></script> 68 + <meta name="quarto:offset" content="../"> 69 + <script src="../site_libs/quarto-html/quarto.js" type="module"></script> 70 + <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script> 71 + <script src="../site_libs/quarto-html/popper.min.js"></script> 72 + <script src="../site_libs/quarto-html/tippy.umd.min.js"></script> 73 + <script src="../site_libs/quarto-html/anchor.min.js"></script> 74 + <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet"> 75 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme" id="quarto-text-highlighting-styles"> 76 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-8dcd8563ea6803ab7cbb3d71ca5772e1.css" rel="stylesheet" class="quarto-color-scheme quarto-color-alternate" id="quarto-text-highlighting-styles"> 77 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme-extra" id="quarto-text-highlighting-styles"> 78 + <script src="../site_libs/bootstrap/bootstrap.min.js"></script> 79 + <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet"> 80 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme" id="quarto-bootstrap" data-mode="light"> 81 + <link href="../site_libs/bootstrap/bootstrap-dark-7964ffd8887b0991fe8d71c6c8bc75d6.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme quarto-color-alternate" id="quarto-bootstrap" data-mode="dark"> 82 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme-extra" id="quarto-bootstrap" data-mode="light"> 83 + <script id="quarto-search-options" type="application/json">{ 84 + "location": "navbar", 85 + "copy-button": false, 86 + "collapse-after": 3, 87 + "panel-placement": "end", 88 + "type": "overlay", 89 + "limit": 50, 90 + "keyboard-shortcut": [ 91 + "f", 92 + "/", 93 + "s" 94 + ], 95 + "show-item-context": false, 96 + "language": { 97 + "search-no-results-text": "No results", 98 + "search-matching-documents-text": "matching documents", 99 + "search-copy-link-title": "Copy link to search", 100 + "search-hide-matches-text": "Hide additional matches", 101 + "search-more-match-text": "more match in this document", 102 + "search-more-matches-text": "more matches in this document", 103 + "search-clear-button-title": "Clear", 104 + "search-text-placeholder": "", 105 + "search-detached-cancel-button-title": "Cancel", 106 + "search-submit-button-title": "Submit", 107 + "search-label": "Search" 108 + } 109 + }</script> 110 + 111 + 112 + <link rel="stylesheet" href="../assets/styles.css"> 113 + </head> 114 + 115 + <body class="nav-sidebar docked nav-fixed quarto-light"><script id="quarto-html-before-body" type="application/javascript"> 116 + const toggleBodyColorMode = (bsSheetEl) => { 117 + const mode = bsSheetEl.getAttribute("data-mode"); 118 + const bodyEl = window.document.querySelector("body"); 119 + if (mode === "dark") { 120 + bodyEl.classList.add("quarto-dark"); 121 + bodyEl.classList.remove("quarto-light"); 122 + } else { 123 + bodyEl.classList.add("quarto-light"); 124 + bodyEl.classList.remove("quarto-dark"); 125 + } 126 + } 127 + const toggleBodyColorPrimary = () => { 128 + const bsSheetEl = window.document.querySelector("link#quarto-bootstrap:not([rel=disabled-stylesheet])"); 129 + if (bsSheetEl) { 130 + toggleBodyColorMode(bsSheetEl); 131 + } 132 + } 133 + const setColorSchemeToggle = (alternate) => { 134 + const toggles = window.document.querySelectorAll('.quarto-color-scheme-toggle'); 135 + for (let i=0; i < toggles.length; i++) { 136 + const toggle = toggles[i]; 137 + if (toggle) { 138 + if (alternate) { 139 + toggle.classList.add("alternate"); 140 + } else { 141 + toggle.classList.remove("alternate"); 142 + } 143 + } 144 + } 145 + }; 146 + const toggleColorMode = (alternate) => { 147 + // Switch the stylesheets 148 + const primaryStylesheets = window.document.querySelectorAll('link.quarto-color-scheme:not(.quarto-color-alternate)'); 149 + const alternateStylesheets = window.document.querySelectorAll('link.quarto-color-scheme.quarto-color-alternate'); 150 + manageTransitions('#quarto-margin-sidebar .nav-link', false); 151 + if (alternate) { 152 + // note: dark is layered on light, we don't disable primary! 153 + enableStylesheet(alternateStylesheets); 154 + for (const sheetNode of alternateStylesheets) { 155 + if (sheetNode.id === "quarto-bootstrap") { 156 + toggleBodyColorMode(sheetNode); 157 + } 158 + } 159 + } else { 160 + disableStylesheet(alternateStylesheets); 161 + enableStylesheet(primaryStylesheets) 162 + toggleBodyColorPrimary(); 163 + } 164 + manageTransitions('#quarto-margin-sidebar .nav-link', true); 165 + // Switch the toggles 166 + setColorSchemeToggle(alternate) 167 + // Hack to workaround the fact that safari doesn't 168 + // properly recolor the scrollbar when toggling (#1455) 169 + if (navigator.userAgent.indexOf('Safari') > 0 && navigator.userAgent.indexOf('Chrome') == -1) { 170 + manageTransitions("body", false); 171 + window.scrollTo(0, 1); 172 + setTimeout(() => { 173 + window.scrollTo(0, 0); 174 + manageTransitions("body", true); 175 + }, 40); 176 + } 177 + } 178 + const disableStylesheet = (stylesheets) => { 179 + for (let i=0; i < stylesheets.length; i++) { 180 + const stylesheet = stylesheets[i]; 181 + stylesheet.rel = 'disabled-stylesheet'; 182 + } 183 + } 184 + const enableStylesheet = (stylesheets) => { 185 + for (let i=0; i < stylesheets.length; i++) { 186 + const stylesheet = stylesheets[i]; 187 + if(stylesheet.rel !== 'stylesheet') { // for Chrome, which will still FOUC without this check 188 + stylesheet.rel = 'stylesheet'; 189 + } 190 + } 191 + } 192 + const manageTransitions = (selector, allowTransitions) => { 193 + const els = window.document.querySelectorAll(selector); 194 + for (let i=0; i < els.length; i++) { 195 + const el = els[i]; 196 + if (allowTransitions) { 197 + el.classList.remove('notransition'); 198 + } else { 199 + el.classList.add('notransition'); 200 + } 201 + } 202 + } 203 + const isFileUrl = () => { 204 + return window.location.protocol === 'file:'; 205 + } 206 + const hasAlternateSentinel = () => { 207 + let styleSentinel = getColorSchemeSentinel(); 208 + if (styleSentinel !== null) { 209 + return styleSentinel === "alternate"; 210 + } else { 211 + return false; 212 + } 213 + } 214 + const setStyleSentinel = (alternate) => { 215 + const value = alternate ? "alternate" : "default"; 216 + if (!isFileUrl()) { 217 + window.localStorage.setItem("quarto-color-scheme", value); 218 + } else { 219 + localAlternateSentinel = value; 220 + } 221 + } 222 + const getColorSchemeSentinel = () => { 223 + if (!isFileUrl()) { 224 + const storageValue = window.localStorage.getItem("quarto-color-scheme"); 225 + return storageValue != null ? storageValue : localAlternateSentinel; 226 + } else { 227 + return localAlternateSentinel; 228 + } 229 + } 230 + const toggleGiscusIfUsed = (isAlternate, darkModeDefault) => { 231 + const baseTheme = document.querySelector('#giscus-base-theme')?.value ?? 'light'; 232 + const alternateTheme = document.querySelector('#giscus-alt-theme')?.value ?? 'dark'; 233 + let newTheme = ''; 234 + if(authorPrefersDark) { 235 + newTheme = isAlternate ? baseTheme : alternateTheme; 236 + } else { 237 + newTheme = isAlternate ? alternateTheme : baseTheme; 238 + } 239 + const changeGiscusTheme = () => { 240 + // From: https://github.com/giscus/giscus/issues/336 241 + const sendMessage = (message) => { 242 + const iframe = document.querySelector('iframe.giscus-frame'); 243 + if (!iframe) return; 244 + iframe.contentWindow.postMessage({ giscus: message }, 'https://giscus.app'); 245 + } 246 + sendMessage({ 247 + setConfig: { 248 + theme: newTheme 249 + } 250 + }); 251 + } 252 + const isGiscussLoaded = window.document.querySelector('iframe.giscus-frame') !== null; 253 + if (isGiscussLoaded) { 254 + changeGiscusTheme(); 255 + } 256 + }; 257 + const authorPrefersDark = false; 258 + const darkModeDefault = authorPrefersDark; 259 + document.querySelector('link#quarto-text-highlighting-styles.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 260 + document.querySelector('link#quarto-bootstrap.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 261 + let localAlternateSentinel = darkModeDefault ? 'alternate' : 'default'; 262 + // Dark / light mode switch 263 + window.quartoToggleColorScheme = () => { 264 + // Read the current dark / light value 265 + let toAlternate = !hasAlternateSentinel(); 266 + toggleColorMode(toAlternate); 267 + setStyleSentinel(toAlternate); 268 + toggleGiscusIfUsed(toAlternate, darkModeDefault); 269 + window.dispatchEvent(new Event('resize')); 270 + }; 271 + // Switch to dark mode if need be 272 + if (hasAlternateSentinel()) { 273 + toggleColorMode(true); 274 + } else { 275 + toggleColorMode(false); 276 + } 277 + </script> 278 + 279 + <div id="quarto-search-results"></div> 280 + <header id="quarto-header" class="headroom fixed-top"> 281 + <nav class="navbar navbar-expand-lg " data-bs-theme="dark"> 282 + <div class="navbar-container container-fluid"> 283 + <div class="navbar-brand-container mx-auto"> 284 + <a class="navbar-brand" href="../index.html"> 285 + <span class="navbar-title">atdata</span> 286 + </a> 287 + </div> 288 + <div id="quarto-search" class="" title="Search"></div> 289 + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 290 + <span class="navbar-toggler-icon"></span> 291 + </button> 292 + <div class="collapse navbar-collapse" id="navbarCollapse"> 293 + <ul class="navbar-nav navbar-nav-scroll me-auto"> 294 + <li class="nav-item"> 295 + <a class="nav-link active" href="../index.html" aria-current="page"> 296 + <span class="menu-text">Guide</span></a> 297 + </li> 298 + <li class="nav-item dropdown "> 299 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-tutorials" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 300 + <span class="menu-text">Tutorials</span> 301 + </a> 302 + <ul class="dropdown-menu" aria-labelledby="nav-menu-tutorials"> 303 + <li> 304 + <a class="dropdown-item" href="../tutorials/quickstart.html"> 305 + <span class="dropdown-text">Quick Start</span></a> 306 + </li> 307 + <li> 308 + <a class="dropdown-item" href="../tutorials/local-workflow.html"> 309 + <span class="dropdown-text">Local Workflow</span></a> 310 + </li> 311 + <li> 312 + <a class="dropdown-item" href="../tutorials/atmosphere.html"> 313 + <span class="dropdown-text">Atmosphere Publishing</span></a> 314 + </li> 315 + <li> 316 + <a class="dropdown-item" href="../tutorials/promotion.html"> 317 + <span class="dropdown-text">Promotion Workflow</span></a> 318 + </li> 319 + </ul> 320 + </li> 321 + <li class="nav-item dropdown "> 322 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-reference" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 323 + <span class="menu-text">Reference</span> 324 + </a> 325 + <ul class="dropdown-menu" aria-labelledby="nav-menu-reference"> 326 + <li> 327 + <a class="dropdown-item" href="../reference/packable-samples.html"> 328 + <span class="dropdown-text">Packable Samples</span></a> 329 + </li> 330 + <li> 331 + <a class="dropdown-item" href="../reference/datasets.html"> 332 + <span class="dropdown-text">Datasets</span></a> 333 + </li> 334 + <li> 335 + <a class="dropdown-item" href="../reference/lenses.html"> 336 + <span class="dropdown-text">Lenses</span></a> 337 + </li> 338 + <li> 339 + <a class="dropdown-item" href="../reference/local-storage.html"> 340 + <span class="dropdown-text">Local Storage</span></a> 341 + </li> 342 + <li> 343 + <a class="dropdown-item" href="../reference/atmosphere.html"> 344 + <span class="dropdown-text">Atmosphere</span></a> 345 + </li> 346 + <li> 347 + <a class="dropdown-item" href="../reference/promotion.html"> 348 + <span class="dropdown-text">Promotion</span></a> 349 + </li> 350 + <li> 351 + <a class="dropdown-item" href="../reference/load-dataset.html"> 352 + <span class="dropdown-text">load_dataset API</span></a> 353 + </li> 354 + <li> 355 + <a class="dropdown-item" href="../reference/protocols.html"> 356 + <span class="dropdown-text">Protocols</span></a> 357 + </li> 358 + <li> 359 + <a class="dropdown-item" href="../reference/uri-spec.html"> 360 + <span class="dropdown-text">URI Specification</span></a> 361 + </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 370 + </ul> 371 + </li> 372 + <li class="nav-item"> 373 + <a class="nav-link" href="../api/index.html"> 374 + <span class="menu-text">API</span></a> 375 + </li> 376 + </ul> 377 + <ul class="navbar-nav navbar-nav-scroll ms-auto"> 378 + <li class="nav-item compact"> 379 + <a class="nav-link" href="https://github.com/your-org/atdata"> <i class="bi bi-github" role="img"> 380 + </i> 381 + <span class="menu-text"></span></a> 382 + </li> 383 + </ul> 384 + </div> <!-- /navcollapse --> 385 + <div class="quarto-navbar-tools"> 386 + <a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a> 387 + </div> 388 + </div> <!-- /container-fluid --> 389 + </nav> 390 + <nav class="quarto-secondary-nav"> 391 + <div class="container-fluid d-flex"> 392 + <button type="button" class="quarto-btn-toggle btn" data-bs-toggle="collapse" role="button" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 393 + <i class="bi bi-layout-text-sidebar-reverse"></i> 394 + </button> 395 + <nav class="quarto-page-breadcrumbs" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../reference/packable-samples.html">Reference</a></li><li class="breadcrumb-item"><a href="../reference/troubleshooting.html">Troubleshooting &amp; FAQ</a></li></ol></nav> 396 + <a class="flex-grow-1" role="navigation" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item" aria-controls="quarto-sidebar" aria-expanded="false" aria-label="Toggle sidebar navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 397 + </a> 398 + </div> 399 + </nav> 400 + </header> 401 + <!-- content --> 402 + <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar"> 403 + <!-- sidebar --> 404 + <nav id="quarto-sidebar" class="sidebar collapse collapse-horizontal quarto-sidebar-collapse-item sidebar-navigation docked overflow-auto"> 405 + <div class="sidebar-menu-container"> 406 + <ul class="list-unstyled mt-1"> 407 + <li class="sidebar-item"> 408 + <div class="sidebar-item-container"> 409 + <a href="../index.html" class="sidebar-item-text sidebar-link"> 410 + <span class="menu-text">atdata</span></a> 411 + </div> 412 + </li> 413 + <li class="sidebar-item sidebar-item-section"> 414 + <div class="sidebar-item-container"> 415 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true"> 416 + <span class="menu-text">Getting Started</span></a> 417 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-1" role="navigation" aria-expanded="true" aria-label="Toggle section"> 418 + <i class="bi bi-chevron-right ms-2"></i> 419 + </a> 420 + </div> 421 + <ul id="quarto-sidebar-section-1" class="collapse list-unstyled sidebar-section depth1 show"> 422 + <li class="sidebar-item"> 423 + <div class="sidebar-item-container"> 424 + <a href="../tutorials/quickstart.html" class="sidebar-item-text sidebar-link"> 425 + <span class="menu-text">Quick Start</span></a> 426 + </div> 427 + </li> 428 + <li class="sidebar-item"> 429 + <div class="sidebar-item-container"> 430 + <a href="../tutorials/local-workflow.html" class="sidebar-item-text sidebar-link"> 431 + <span class="menu-text">Local Workflow</span></a> 432 + </div> 433 + </li> 434 + <li class="sidebar-item"> 435 + <div class="sidebar-item-container"> 436 + <a href="../tutorials/atmosphere.html" class="sidebar-item-text sidebar-link"> 437 + <span class="menu-text">Atmosphere Publishing</span></a> 438 + </div> 439 + </li> 440 + <li class="sidebar-item"> 441 + <div class="sidebar-item-container"> 442 + <a href="../tutorials/promotion.html" class="sidebar-item-text sidebar-link"> 443 + <span class="menu-text">Promotion Workflow</span></a> 444 + </div> 445 + </li> 446 + </ul> 447 + </li> 448 + <li class="sidebar-item sidebar-item-section"> 449 + <div class="sidebar-item-container"> 450 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true"> 451 + <span class="menu-text">Reference</span></a> 452 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-2" role="navigation" aria-expanded="true" aria-label="Toggle section"> 453 + <i class="bi bi-chevron-right ms-2"></i> 454 + </a> 455 + </div> 456 + <ul id="quarto-sidebar-section-2" class="collapse list-unstyled sidebar-section depth1 show"> 457 + <li class="sidebar-item"> 458 + <div class="sidebar-item-container"> 459 + <a href="../reference/packable-samples.html" class="sidebar-item-text sidebar-link"> 460 + <span class="menu-text">Packable Samples</span></a> 461 + </div> 462 + </li> 463 + <li class="sidebar-item"> 464 + <div class="sidebar-item-container"> 465 + <a href="../reference/datasets.html" class="sidebar-item-text sidebar-link"> 466 + <span class="menu-text">Datasets</span></a> 467 + </div> 468 + </li> 469 + <li class="sidebar-item"> 470 + <div class="sidebar-item-container"> 471 + <a href="../reference/lenses.html" class="sidebar-item-text sidebar-link"> 472 + <span class="menu-text">Lenses</span></a> 473 + </div> 474 + </li> 475 + <li class="sidebar-item"> 476 + <div class="sidebar-item-container"> 477 + <a href="../reference/local-storage.html" class="sidebar-item-text sidebar-link"> 478 + <span class="menu-text">Local Storage</span></a> 479 + </div> 480 + </li> 481 + <li class="sidebar-item"> 482 + <div class="sidebar-item-container"> 483 + <a href="../reference/atmosphere.html" class="sidebar-item-text sidebar-link"> 484 + <span class="menu-text">Atmosphere (ATProto Integration)</span></a> 485 + </div> 486 + </li> 487 + <li class="sidebar-item"> 488 + <div class="sidebar-item-container"> 489 + <a href="../reference/promotion.html" class="sidebar-item-text sidebar-link"> 490 + <span class="menu-text">Promotion Workflow</span></a> 491 + </div> 492 + </li> 493 + <li class="sidebar-item"> 494 + <div class="sidebar-item-container"> 495 + <a href="../reference/load-dataset.html" class="sidebar-item-text sidebar-link"> 496 + <span class="menu-text">load_dataset API</span></a> 497 + </div> 498 + </li> 499 + <li class="sidebar-item"> 500 + <div class="sidebar-item-container"> 501 + <a href="../reference/protocols.html" class="sidebar-item-text sidebar-link"> 502 + <span class="menu-text">Protocols</span></a> 503 + </div> 504 + </li> 505 + <li class="sidebar-item"> 506 + <div class="sidebar-item-container"> 507 + <a href="../reference/uri-spec.html" class="sidebar-item-text sidebar-link"> 508 + <span class="menu-text">URI Specification</span></a> 509 + </div> 510 + </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link active"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 523 + </ul> 524 + </li> 525 + <li class="sidebar-item sidebar-item-section"> 526 + <div class="sidebar-item-container"> 527 + <a class="sidebar-item-text sidebar-link text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true"> 528 + <span class="menu-text">API Reference</span></a> 529 + <a class="sidebar-item-toggle text-start" data-bs-toggle="collapse" data-bs-target="#quarto-sidebar-section-3" role="navigation" aria-expanded="true" aria-label="Toggle section"> 530 + <i class="bi bi-chevron-right ms-2"></i> 531 + </a> 532 + </div> 533 + <ul id="quarto-sidebar-section-3" class="collapse list-unstyled sidebar-section depth1 show"> 534 + <li class="sidebar-item"> 535 + <div class="sidebar-item-container"> 536 + <a href="../api/index.html" class="sidebar-item-text sidebar-link"> 537 + <span class="menu-text">API Reference</span></a> 538 + </div> 539 + </li> 540 + </ul> 541 + </li> 542 + </ul> 543 + </div> 544 + </nav> 545 + <div id="quarto-sidebar-glass" class="quarto-sidebar-collapse-item" data-bs-toggle="collapse" data-bs-target=".quarto-sidebar-collapse-item"></div> 546 + <!-- margin-sidebar --> 547 + <div id="quarto-margin-sidebar" class="sidebar margin-sidebar"> 548 + <nav id="TOC" role="doc-toc" class="toc-active"> 549 + <h2 id="toc-title">On this page</h2> 550 + 551 + <ul> 552 + <li><a href="#common-errors" id="toc-common-errors" class="nav-link active" data-scroll-target="#common-errors">Common Errors</a> 553 + <ul class="collapse"> 554 + <li><a href="#typeerror-type-object-is-not-subscriptable" id="toc-typeerror-type-object-is-not-subscriptable" class="nav-link" data-scroll-target="#typeerror-type-object-is-not-subscriptable">TypeError: ‘type’ object is not subscriptable</a></li> 555 + <li><a href="#attributeerror-nonetype-object-has-no-attribute" id="toc-attributeerror-nonetype-object-has-no-attribute" class="nav-link" data-scroll-target="#attributeerror-nonetype-object-has-no-attribute">AttributeError: ‘NoneType’ object has no attribute…</a></li> 556 + <li><a href="#runtimeerror-msgpack-field-not-found-in-sample" id="toc-runtimeerror-msgpack-field-not-found-in-sample" class="nav-link" data-scroll-target="#runtimeerror-msgpack-field-not-found-in-sample">RuntimeError: msgpack field not found in sample</a></li> 557 + <li><a href="#valueerror-field-type-not-supported" id="toc-valueerror-field-type-not-supported" class="nav-link" data-scroll-target="#valueerror-field-type-not-supported">ValueError: Field type not supported</a></li> 558 + <li><a href="#keyerror-when-iterating-dataset" id="toc-keyerror-when-iterating-dataset" class="nav-link" data-scroll-target="#keyerror-when-iterating-dataset">KeyError when iterating dataset</a></li> 559 + </ul></li> 560 + <li><a href="#faq" id="toc-faq" class="nav-link" data-scroll-target="#faq">FAQ</a> 561 + <ul class="collapse"> 562 + <li><a href="#how-do-i-check-the-sample-type-of-a-dataset" id="toc-how-do-i-check-the-sample-type-of-a-dataset" class="nav-link" data-scroll-target="#how-do-i-check-the-sample-type-of-a-dataset">How do I check the sample type of a dataset?</a></li> 563 + <li><a href="#how-do-i-convert-a-dataset-to-a-different-type" id="toc-how-do-i-convert-a-dataset-to-a-different-type" class="nav-link" data-scroll-target="#how-do-i-convert-a-dataset-to-a-different-type">How do I convert a dataset to a different type?</a></li> 564 + <li><a href="#how-do-i-handle-optional-ndarray-fields" id="toc-how-do-i-handle-optional-ndarray-fields" class="nav-link" data-scroll-target="#how-do-i-handle-optional-ndarray-fields">How do I handle optional NDArray fields?</a></li> 565 + <li><a href="#why-is-my-dataset-iteration-slow" id="toc-why-is-my-dataset-iteration-slow" class="nav-link" data-scroll-target="#why-is-my-dataset-iteration-slow">Why is my dataset iteration slow?</a></li> 566 + <li><a href="#how-do-i-export-to-parquet" id="toc-how-do-i-export-to-parquet" class="nav-link" data-scroll-target="#how-do-i-export-to-parquet">How do I export to parquet?</a></li> 567 + <li><a href="#how-do-i-handle-multiple-shards" id="toc-how-do-i-handle-multiple-shards" class="nav-link" data-scroll-target="#how-do-i-handle-multiple-shards">How do I handle multiple shards?</a></li> 568 + <li><a href="#can-i-use-s3-or-other-cloud-storage" id="toc-can-i-use-s3-or-other-cloud-storage" class="nav-link" data-scroll-target="#can-i-use-s3-or-other-cloud-storage">Can I use S3 or other cloud storage?</a></li> 569 + <li><a href="#how-do-i-publish-to-atprotoatmosphere" id="toc-how-do-i-publish-to-atprotoatmosphere" class="nav-link" data-scroll-target="#how-do-i-publish-to-atprotoatmosphere">How do I publish to ATProto/Atmosphere?</a></li> 570 + <li><a href="#whats-the-difference-between-localindex-and-atmosphereindex" id="toc-whats-the-difference-between-localindex-and-atmosphereindex" class="nav-link" data-scroll-target="#whats-the-difference-between-localindex-and-atmosphereindex">What’s the difference between LocalIndex and AtmosphereIndex?</a></li> 571 + </ul></li> 572 + <li><a href="#getting-help" id="toc-getting-help" class="nav-link" data-scroll-target="#getting-help">Getting Help</a></li> 573 + </ul> 574 + <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/reference/troubleshooting.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 575 + </div> 576 + <!-- main --> 577 + <main class="content" id="quarto-document-content"> 578 + 579 + 580 + <header id="title-block-header" class="quarto-title-block default"><nav class="quarto-page-breadcrumbs quarto-title-breadcrumbs d-none d-lg-block" aria-label="breadcrumb"><ol class="breadcrumb"><li class="breadcrumb-item"><a href="../reference/packable-samples.html">Reference</a></li><li class="breadcrumb-item"><a href="../reference/troubleshooting.html">Troubleshooting &amp; FAQ</a></li></ol></nav> 581 + <div class="quarto-title"> 582 + <h1 class="title">Troubleshooting &amp; FAQ</h1> 583 + </div> 584 + 585 + <div> 586 + <div class="description"> 587 + Common issues and frequently asked questions 588 + </div> 589 + </div> 590 + 591 + 592 + <div class="quarto-title-meta"> 593 + 594 + 595 + 596 + 597 + </div> 598 + 599 + 600 + 601 + </header> 602 + 603 + 604 + <p>This page covers common issues, error messages, and frequently asked questions when working with atdata.</p> 605 + <section id="common-errors" class="level2"> 606 + <h2 class="anchored" data-anchor-id="common-errors">Common Errors</h2> 607 + <section id="typeerror-type-object-is-not-subscriptable" class="level3"> 608 + <h3 class="anchored" data-anchor-id="typeerror-type-object-is-not-subscriptable">TypeError: ‘type’ object is not subscriptable</h3> 609 + <p><strong>Error:</strong></p> 610 + <pre><code>TypeError: 'type' object is not subscriptable</code></pre> 611 + <p><strong>Cause:</strong> Using <code>Dataset</code> or <code>SampleBatch</code> without subscripting the type parameter on Python &lt; 3.9, or using an unsubscripted generic.</p> 612 + <p><strong>Solution:</strong> Always use the subscripted form:</p> 613 + <div class="sourceCode" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Correct</span></span> 614 + <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data.tar"</span>)</span> 615 + <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>batch <span class="op">=</span> SampleBatch[MySample](samples)</span> 616 + <span id="cb2-4"><a href="#cb2-4" aria-hidden="true" tabindex="-1"></a></span> 617 + <span id="cb2-5"><a href="#cb2-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Incorrect</span></span> 618 + <span id="cb2-6"><a href="#cb2-6" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset(<span class="st">"data.tar"</span>) <span class="co"># Missing type parameter</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 619 + </section> 620 + <section id="attributeerror-nonetype-object-has-no-attribute" class="level3"> 621 + <h3 class="anchored" data-anchor-id="attributeerror-nonetype-object-has-no-attribute">AttributeError: ‘NoneType’ object has no attribute…</h3> 622 + <p><strong>Error:</strong></p> 623 + <pre><code>AttributeError: 'NoneType' object has no attribute '__args__'</code></pre> 624 + <p><strong>Cause:</strong> Creating a <code>Dataset</code> or <code>SampleBatch</code> without using the subscripted syntax <code>Class[Type](...)</code>.</p> 625 + <p><strong>Solution:</strong> These classes use Python’s <code>__orig_class__</code> mechanism to extract type parameters at runtime. You must use:</p> 626 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](url) <span class="co"># Correct</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 627 + <p>Not:</p> 628 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset(url) <span class="co"># Wrong - no type information</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 629 + </section> 630 + <section id="runtimeerror-msgpack-field-not-found-in-sample" class="level3"> 631 + <h3 class="anchored" data-anchor-id="runtimeerror-msgpack-field-not-found-in-sample">RuntimeError: msgpack field not found in sample</h3> 632 + <p><strong>Error:</strong></p> 633 + <pre><code>RuntimeError: Malformed sample: 'msgpack' field not found</code></pre> 634 + <p><strong>Cause:</strong> The tar file contains samples that weren’t written with atdata’s serialization format.</p> 635 + <p><strong>Solution:</strong> Ensure samples are written using <code>sample.as_wds</code>:</p> 636 + <div class="sourceCode" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"data.tar"</span>) <span class="im">as</span> sink:</span> 637 + <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> sample <span class="kw">in</span> samples:</span> 638 + <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> sink.write(sample.as_wds) <span class="co"># Correct</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 639 + </section> 640 + <section id="valueerror-field-type-not-supported" class="level3"> 641 + <h3 class="anchored" data-anchor-id="valueerror-field-type-not-supported">ValueError: Field type not supported</h3> 642 + <p><strong>Error:</strong></p> 643 + <pre><code>TypeError: Unsupported type for schema field: &lt;class 'SomeType'&gt;</code></pre> 644 + <p><strong>Cause:</strong> Using an unsupported Python type in a PackableSample field.</p> 645 + <p><strong>Supported types:</strong></p> 646 + <table class="caption-top table"> 647 + <thead> 648 + <tr class="header"> 649 + <th>Python Type</th> 650 + <th>Notes</th> 651 + </tr> 652 + </thead> 653 + <tbody> 654 + <tr class="odd"> 655 + <td><code>str</code></td> 656 + <td>Unicode strings</td> 657 + </tr> 658 + <tr class="even"> 659 + <td><code>int</code></td> 660 + <td>Integers</td> 661 + </tr> 662 + <tr class="odd"> 663 + <td><code>float</code></td> 664 + <td>Floating point</td> 665 + </tr> 666 + <tr class="even"> 667 + <td><code>bool</code></td> 668 + <td>Boolean</td> 669 + </tr> 670 + <tr class="odd"> 671 + <td><code>bytes</code></td> 672 + <td>Binary data</td> 673 + </tr> 674 + <tr class="even"> 675 + <td><code>NDArray</code></td> 676 + <td>Numpy arrays (any dtype)</td> 677 + </tr> 678 + <tr class="odd"> 679 + <td><code>list[T]</code></td> 680 + <td>Lists of primitives</td> 681 + </tr> 682 + <tr class="even"> 683 + <td><code>T \| None</code></td> 684 + <td>Optional fields</td> 685 + </tr> 686 + </tbody> 687 + </table> 688 + <p><strong>Not supported:</strong> Nested dataclasses, dicts, custom classes.</p> 689 + </section> 690 + <section id="keyerror-when-iterating-dataset" class="level3"> 691 + <h3 class="anchored" data-anchor-id="keyerror-when-iterating-dataset">KeyError when iterating dataset</h3> 692 + <p><strong>Error:</strong></p> 693 + <pre><code>KeyError: 'msgpack'</code></pre> 694 + <p><strong>Cause:</strong> The WebDataset tar file structure doesn’t match expected format.</p> 695 + <p><strong>Solution:</strong> Verify your tar file was created correctly:</p> 696 + <div class="sourceCode" id="cb10"><pre class="sourceCode bash code-with-copy"><code class="sourceCode bash"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Check tar contents</span></span> 697 + <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="fu">tar</span> <span class="at">-tvf</span> data.tar <span class="kw">|</span> <span class="fu">head</span> <span class="at">-20</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 698 + <p>Each sample should have a <code>.msgpack</code> extension in the tar file.</p> 699 + </section> 700 + </section> 701 + <section id="faq" class="level2"> 702 + <h2 class="anchored" data-anchor-id="faq">FAQ</h2> 703 + <section id="how-do-i-check-the-sample-type-of-a-dataset" class="level3"> 704 + <h3 class="anchored" data-anchor-id="how-do-i-check-the-sample-type-of-a-dataset">How do I check the sample type of a dataset?</h3> 705 + <div class="sourceCode" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data.tar"</span>)</span> 706 + <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(ds.sample_type) <span class="co"># &lt;class 'MySample'&gt;</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 707 + </section> 708 + <section id="how-do-i-convert-a-dataset-to-a-different-type" class="level3"> 709 + <h3 class="anchored" data-anchor-id="how-do-i-convert-a-dataset-to-a-different-type">How do I convert a dataset to a different type?</h3> 710 + <p>Use the <code>as_type()</code> method with a registered lens:</p> 711 + <div class="sourceCode" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.lens</span></span> 712 + <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> my_lens(src: SourceType) <span class="op">-&gt;</span> TargetType:</span> 713 + <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> TargetType(field<span class="op">=</span>src.other_field)</span> 714 + <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a></span> 715 + <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>ds_view <span class="op">=</span> ds.as_type(TargetType)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 716 + </section> 717 + <section id="how-do-i-handle-optional-ndarray-fields" class="level3"> 718 + <h3 class="anchored" data-anchor-id="how-do-i-handle-optional-ndarray-fields">How do I handle optional NDArray fields?</h3> 719 + <p>Use <code>NDArray | None</code> annotation:</p> 720 + <div class="sourceCode" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 721 + <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> MySample:</span> 722 + <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a> required_array: NDArray</span> 723 + <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a> optional_array: NDArray <span class="op">|</span> <span class="va">None</span> <span class="op">=</span> <span class="va">None</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 724 + </section> 725 + <section id="why-is-my-dataset-iteration-slow" class="level3"> 726 + <h3 class="anchored" data-anchor-id="why-is-my-dataset-iteration-slow">Why is my dataset iteration slow?</h3> 727 + <p>Common causes:</p> 728 + <ol type="1"> 729 + <li><strong>Network latency</strong>: Use local caching for remote datasets</li> 730 + <li><strong>Small batch sizes</strong>: Increase <code>batch_size</code> in <code>ordered()</code> or <code>shuffled()</code></li> 731 + <li><strong>Shuffle buffer</strong>: For <code>shuffled()</code>, the <code>initial</code> parameter controls buffer size</li> 732 + </ol> 733 + <div class="sourceCode" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Larger batches = better throughput</span></span> 734 + <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.shuffled(batch_size<span class="op">=</span><span class="dv">64</span>, initial<span class="op">=</span><span class="dv">1000</span>):</span> 735 + <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a> ...</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 736 + </section> 737 + <section id="how-do-i-export-to-parquet" class="level3"> 738 + <h3 class="anchored" data-anchor-id="how-do-i-export-to-parquet">How do I export to parquet?</h3> 739 + <div class="sourceCode" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data.tar"</span>)</span> 740 + <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a>ds.to_parquet(<span class="st">"output.parquet"</span>)</span> 741 + <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a></span> 742 + <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="co"># With sample limit (for large datasets)</span></span> 743 + <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>ds.to_parquet(<span class="st">"output.parquet"</span>, maxcount<span class="op">=</span><span class="dv">10000</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 744 + <div class="callout callout-style-default callout-warning callout-titled"> 745 + <div class="callout-header d-flex align-content-center"> 746 + <div class="callout-icon-container"> 747 + <i class="callout-icon"></i> 748 + </div> 749 + <div class="callout-title-container flex-fill"> 750 + Warning 751 + </div> 752 + </div> 753 + <div class="callout-body-container callout-body"> 754 + <p><code>to_parquet()</code> loads the dataset into memory. For very large datasets, use <code>maxcount</code> to limit samples or process in chunks.</p> 755 + </div> 756 + </div> 757 + </section> 758 + <section id="how-do-i-handle-multiple-shards" class="level3"> 759 + <h3 class="anchored" data-anchor-id="how-do-i-handle-multiple-shards">How do I handle multiple shards?</h3> 760 + <p>Use WebDataset brace notation:</p> 761 + <div class="sourceCode" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Single shard</span></span> 762 + <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data-000000.tar"</span>)</span> 763 + <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span> 764 + <span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Multiple shards (range)</span></span> 765 + <span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 766 + <span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a></span> 767 + <span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Multiple shards (list)</span></span> 768 + <span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](<span class="st">"data-{000000,000005,000009}.tar"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 769 + </section> 770 + <section id="can-i-use-s3-or-other-cloud-storage" class="level3"> 771 + <h3 class="anchored" data-anchor-id="can-i-use-s3-or-other-cloud-storage">Can I use S3 or other cloud storage?</h3> 772 + <p>Yes, use <code>S3Source</code> for S3-compatible storage:</p> 773 + <div class="sourceCode" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> S3Source, Dataset</span> 774 + <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span> 775 + <span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> S3Source.from_urls(</span> 776 + <span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a> [<span class="st">"s3://bucket/data-000000.tar"</span>, <span class="st">"s3://bucket/data-000001.tar"</span>],</span> 777 + <span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a> endpoint_url<span class="op">=</span><span class="st">"https://s3.example.com"</span>, <span class="co"># Optional for non-AWS S3</span></span> 778 + <span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>)</span> 779 + <span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a></span> 780 + <span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> Dataset[MySample](source)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 781 + </section> 782 + <section id="how-do-i-publish-to-atprotoatmosphere" class="level3"> 783 + <h3 class="anchored" data-anchor-id="how-do-i-publish-to-atprotoatmosphere">How do I publish to ATProto/Atmosphere?</h3> 784 + <div class="sourceCode" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 785 + <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span> 786 + <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 787 + <span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"handle.bsky.social"</span>, <span class="st">"app-password"</span>) <span class="co"># Use app password!</span></span> 788 + <span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a></span> 789 + <span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 790 + <span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a></span> 791 + <span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish schema</span></span> 792 + <span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(MySample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 793 + <span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a></span> 794 + <span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish dataset</span></span> 795 + <span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(ds, name<span class="op">=</span><span class="st">"my-dataset"</span>, schema_ref<span class="op">=</span>schema_uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 796 + </section> 797 + <section id="whats-the-difference-between-localindex-and-atmosphereindex" class="level3"> 798 + <h3 class="anchored" data-anchor-id="whats-the-difference-between-localindex-and-atmosphereindex">What’s the difference between LocalIndex and AtmosphereIndex?</h3> 799 + <table class="caption-top table"> 800 + <thead> 801 + <tr class="header"> 802 + <th>Feature</th> 803 + <th>LocalIndex</th> 804 + <th>AtmosphereIndex</th> 805 + </tr> 806 + </thead> 807 + <tbody> 808 + <tr class="odd"> 809 + <td>Storage</td> 810 + <td>Redis + S3</td> 811 + <td>ATProto PDS</td> 812 + </tr> 813 + <tr class="even"> 814 + <td>Discovery</td> 815 + <td>Local only</td> 816 + <td>Federated network</td> 817 + </tr> 818 + <tr class="odd"> 819 + <td>Auth</td> 820 + <td>None required</td> 821 + <td>ATProto account</td> 822 + </tr> 823 + <tr class="even"> 824 + <td>Use case</td> 825 + <td>Development, private data</td> 826 + <td>Public distribution</td> 827 + </tr> 828 + </tbody> 829 + </table> 830 + <p>Both implement the <code>AbstractIndex</code> protocol, so code can work with either.</p> 831 + </section> 832 + </section> 833 + <section id="getting-help" class="level2"> 834 + <h2 class="anchored" data-anchor-id="getting-help">Getting Help</h2> 835 + <ul> 836 + <li><strong>GitHub Issues</strong>: <a href="https://github.com/your-org/atdata/issues">github.com/your-org/atdata/issues</a></li> 837 + <li><strong>Documentation</strong>: Check the reference pages for detailed API documentation</li> 838 + <li><strong>Examples</strong>: See the <code>examples/</code> directory for working code samples</li> 839 + </ul> 840 + 841 + 842 + </section> 843 + 844 + </main> <!-- /main --> 845 + <script id="quarto-html-after-body" type="application/javascript"> 846 + window.document.addEventListener("DOMContentLoaded", function (event) { 847 + // Ensure there is a toggle, if there isn't float one in the top right 848 + if (window.document.querySelector('.quarto-color-scheme-toggle') === null) { 849 + const a = window.document.createElement('a'); 850 + a.classList.add('top-right'); 851 + a.classList.add('quarto-color-scheme-toggle'); 852 + a.href = ""; 853 + a.onclick = function() { try { window.quartoToggleColorScheme(); } catch {} return false; }; 854 + const i = window.document.createElement("i"); 855 + i.classList.add('bi'); 856 + a.appendChild(i); 857 + window.document.body.appendChild(a); 858 + } 859 + setColorSchemeToggle(hasAlternateSentinel()) 860 + const icon = ""; 861 + const anchorJS = new window.AnchorJS(); 862 + anchorJS.options = { 863 + placement: 'right', 864 + icon: icon 865 + }; 866 + anchorJS.add('.anchored'); 867 + const isCodeAnnotation = (el) => { 868 + for (const clz of el.classList) { 869 + if (clz.startsWith('code-annotation-')) { 870 + return true; 871 + } 872 + } 873 + return false; 874 + } 875 + const onCopySuccess = function(e) { 876 + // button target 877 + const button = e.trigger; 878 + // don't keep focus 879 + button.blur(); 880 + // flash "checked" 881 + button.classList.add('code-copy-button-checked'); 882 + var currentTitle = button.getAttribute("title"); 883 + button.setAttribute("title", "Copied!"); 884 + let tooltip; 885 + if (window.bootstrap) { 886 + button.setAttribute("data-bs-toggle", "tooltip"); 887 + button.setAttribute("data-bs-placement", "left"); 888 + button.setAttribute("data-bs-title", "Copied!"); 889 + tooltip = new bootstrap.Tooltip(button, 890 + { trigger: "manual", 891 + customClass: "code-copy-button-tooltip", 892 + offset: [0, -8]}); 893 + tooltip.show(); 894 + } 895 + setTimeout(function() { 896 + if (tooltip) { 897 + tooltip.hide(); 898 + button.removeAttribute("data-bs-title"); 899 + button.removeAttribute("data-bs-toggle"); 900 + button.removeAttribute("data-bs-placement"); 901 + } 902 + button.setAttribute("title", currentTitle); 903 + button.classList.remove('code-copy-button-checked'); 904 + }, 1000); 905 + // clear code selection 906 + e.clearSelection(); 907 + } 908 + const getTextToCopy = function(trigger) { 909 + const codeEl = trigger.previousElementSibling.cloneNode(true); 910 + for (const childEl of codeEl.children) { 911 + if (isCodeAnnotation(childEl)) { 912 + childEl.remove(); 913 + } 914 + } 915 + return codeEl.innerText; 916 + } 917 + const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', { 918 + text: getTextToCopy 919 + }); 920 + clipboard.on('success', onCopySuccess); 921 + if (window.document.getElementById('quarto-embedded-source-code-modal')) { 922 + const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', { 923 + text: getTextToCopy, 924 + container: window.document.getElementById('quarto-embedded-source-code-modal') 925 + }); 926 + clipboardModal.on('success', onCopySuccess); 927 + } 928 + var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); 929 + var mailtoRegex = new RegExp(/^mailto:/); 930 + var filterRegex = new RegExp("https:\/\/github\.com\/your-org\/atdata"); 931 + var isInternal = (href) => { 932 + return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href); 933 + } 934 + // Inspect non-navigation links and adorn them if external 935 + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); 936 + for (var i=0; i<links.length; i++) { 937 + const link = links[i]; 938 + if (!isInternal(link.href)) { 939 + // undo the damage that might have been done by quarto-nav.js in the case of 940 + // links that we want to consider external 941 + if (link.dataset.originalHref !== undefined) { 942 + link.href = link.dataset.originalHref; 943 + } 944 + } 945 + } 946 + function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) { 947 + const config = { 948 + allowHTML: true, 949 + maxWidth: 500, 950 + delay: 100, 951 + arrow: false, 952 + appendTo: function(el) { 953 + return el.parentElement; 954 + }, 955 + interactive: true, 956 + interactiveBorder: 10, 957 + theme: 'quarto', 958 + placement: 'bottom-start', 959 + }; 960 + if (contentFn) { 961 + config.content = contentFn; 962 + } 963 + if (onTriggerFn) { 964 + config.onTrigger = onTriggerFn; 965 + } 966 + if (onUntriggerFn) { 967 + config.onUntrigger = onUntriggerFn; 968 + } 969 + window.tippy(el, config); 970 + } 971 + const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]'); 972 + for (var i=0; i<noterefs.length; i++) { 973 + const ref = noterefs[i]; 974 + tippyHover(ref, function() { 975 + // use id or data attribute instead here 976 + let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href'); 977 + try { href = new URL(href).hash; } catch {} 978 + const id = href.replace(/^#\/?/, ""); 979 + const note = window.document.getElementById(id); 980 + if (note) { 981 + return note.innerHTML; 982 + } else { 983 + return ""; 984 + } 985 + }); 986 + } 987 + const xrefs = window.document.querySelectorAll('a.quarto-xref'); 988 + const processXRef = (id, note) => { 989 + // Strip column container classes 990 + const stripColumnClz = (el) => { 991 + el.classList.remove("page-full", "page-columns"); 992 + if (el.children) { 993 + for (const child of el.children) { 994 + stripColumnClz(child); 995 + } 996 + } 997 + } 998 + stripColumnClz(note) 999 + if (id === null || id.startsWith('sec-')) { 1000 + // Special case sections, only their first couple elements 1001 + const container = document.createElement("div"); 1002 + if (note.children && note.children.length > 2) { 1003 + container.appendChild(note.children[0].cloneNode(true)); 1004 + for (let i = 1; i < note.children.length; i++) { 1005 + const child = note.children[i]; 1006 + if (child.tagName === "P" && child.innerText === "") { 1007 + continue; 1008 + } else { 1009 + container.appendChild(child.cloneNode(true)); 1010 + break; 1011 + } 1012 + } 1013 + if (window.Quarto?.typesetMath) { 1014 + window.Quarto.typesetMath(container); 1015 + } 1016 + return container.innerHTML 1017 + } else { 1018 + if (window.Quarto?.typesetMath) { 1019 + window.Quarto.typesetMath(note); 1020 + } 1021 + return note.innerHTML; 1022 + } 1023 + } else { 1024 + // Remove any anchor links if they are present 1025 + const anchorLink = note.querySelector('a.anchorjs-link'); 1026 + if (anchorLink) { 1027 + anchorLink.remove(); 1028 + } 1029 + if (window.Quarto?.typesetMath) { 1030 + window.Quarto.typesetMath(note); 1031 + } 1032 + if (note.classList.contains("callout")) { 1033 + return note.outerHTML; 1034 + } else { 1035 + return note.innerHTML; 1036 + } 1037 + } 1038 + } 1039 + for (var i=0; i<xrefs.length; i++) { 1040 + const xref = xrefs[i]; 1041 + tippyHover(xref, undefined, function(instance) { 1042 + instance.disable(); 1043 + let url = xref.getAttribute('href'); 1044 + let hash = undefined; 1045 + if (url.startsWith('#')) { 1046 + hash = url; 1047 + } else { 1048 + try { hash = new URL(url).hash; } catch {} 1049 + } 1050 + if (hash) { 1051 + const id = hash.replace(/^#\/?/, ""); 1052 + const note = window.document.getElementById(id); 1053 + if (note !== null) { 1054 + try { 1055 + const html = processXRef(id, note.cloneNode(true)); 1056 + instance.setContent(html); 1057 + } finally { 1058 + instance.enable(); 1059 + instance.show(); 1060 + } 1061 + } else { 1062 + // See if we can fetch this 1063 + fetch(url.split('#')[0]) 1064 + .then(res => res.text()) 1065 + .then(html => { 1066 + const parser = new DOMParser(); 1067 + const htmlDoc = parser.parseFromString(html, "text/html"); 1068 + const note = htmlDoc.getElementById(id); 1069 + if (note !== null) { 1070 + const html = processXRef(id, note); 1071 + instance.setContent(html); 1072 + } 1073 + }).finally(() => { 1074 + instance.enable(); 1075 + instance.show(); 1076 + }); 1077 + } 1078 + } else { 1079 + // See if we can fetch a full url (with no hash to target) 1080 + // This is a special case and we should probably do some content thinning / targeting 1081 + fetch(url) 1082 + .then(res => res.text()) 1083 + .then(html => { 1084 + const parser = new DOMParser(); 1085 + const htmlDoc = parser.parseFromString(html, "text/html"); 1086 + const note = htmlDoc.querySelector('main.content'); 1087 + if (note !== null) { 1088 + // This should only happen for chapter cross references 1089 + // (since there is no id in the URL) 1090 + // remove the first header 1091 + if (note.children.length > 0 && note.children[0].tagName === "HEADER") { 1092 + note.children[0].remove(); 1093 + } 1094 + const html = processXRef(null, note); 1095 + instance.setContent(html); 1096 + } 1097 + }).finally(() => { 1098 + instance.enable(); 1099 + instance.show(); 1100 + }); 1101 + } 1102 + }, function(instance) { 1103 + }); 1104 + } 1105 + let selectedAnnoteEl; 1106 + const selectorForAnnotation = ( cell, annotation) => { 1107 + let cellAttr = 'data-code-cell="' + cell + '"'; 1108 + let lineAttr = 'data-code-annotation="' + annotation + '"'; 1109 + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; 1110 + return selector; 1111 + } 1112 + const selectCodeLines = (annoteEl) => { 1113 + const doc = window.document; 1114 + const targetCell = annoteEl.getAttribute("data-target-cell"); 1115 + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); 1116 + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); 1117 + const lines = annoteSpan.getAttribute("data-code-lines").split(","); 1118 + const lineIds = lines.map((line) => { 1119 + return targetCell + "-" + line; 1120 + }) 1121 + let top = null; 1122 + let height = null; 1123 + let parent = null; 1124 + if (lineIds.length > 0) { 1125 + //compute the position of the single el (top and bottom and make a div) 1126 + const el = window.document.getElementById(lineIds[0]); 1127 + top = el.offsetTop; 1128 + height = el.offsetHeight; 1129 + parent = el.parentElement.parentElement; 1130 + if (lineIds.length > 1) { 1131 + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); 1132 + const bottom = lastEl.offsetTop + lastEl.offsetHeight; 1133 + height = bottom - top; 1134 + } 1135 + if (top !== null && height !== null && parent !== null) { 1136 + // cook up a div (if necessary) and position it 1137 + let div = window.document.getElementById("code-annotation-line-highlight"); 1138 + if (div === null) { 1139 + div = window.document.createElement("div"); 1140 + div.setAttribute("id", "code-annotation-line-highlight"); 1141 + div.style.position = 'absolute'; 1142 + parent.appendChild(div); 1143 + } 1144 + div.style.top = top - 2 + "px"; 1145 + div.style.height = height + 4 + "px"; 1146 + div.style.left = 0; 1147 + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); 1148 + if (gutterDiv === null) { 1149 + gutterDiv = window.document.createElement("div"); 1150 + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); 1151 + gutterDiv.style.position = 'absolute'; 1152 + const codeCell = window.document.getElementById(targetCell); 1153 + const gutter = codeCell.querySelector('.code-annotation-gutter'); 1154 + gutter.appendChild(gutterDiv); 1155 + } 1156 + gutterDiv.style.top = top - 2 + "px"; 1157 + gutterDiv.style.height = height + 4 + "px"; 1158 + } 1159 + selectedAnnoteEl = annoteEl; 1160 + } 1161 + }; 1162 + const unselectCodeLines = () => { 1163 + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; 1164 + elementsIds.forEach((elId) => { 1165 + const div = window.document.getElementById(elId); 1166 + if (div) { 1167 + div.remove(); 1168 + } 1169 + }); 1170 + selectedAnnoteEl = undefined; 1171 + }; 1172 + // Handle positioning of the toggle 1173 + window.addEventListener( 1174 + "resize", 1175 + throttle(() => { 1176 + elRect = undefined; 1177 + if (selectedAnnoteEl) { 1178 + selectCodeLines(selectedAnnoteEl); 1179 + } 1180 + }, 10) 1181 + ); 1182 + function throttle(fn, ms) { 1183 + let throttle = false; 1184 + let timer; 1185 + return (...args) => { 1186 + if(!throttle) { // first call gets through 1187 + fn.apply(this, args); 1188 + throttle = true; 1189 + } else { // all the others get throttled 1190 + if(timer) clearTimeout(timer); // cancel #2 1191 + timer = setTimeout(() => { 1192 + fn.apply(this, args); 1193 + timer = throttle = false; 1194 + }, ms); 1195 + } 1196 + }; 1197 + } 1198 + // Attach click handler to the DT 1199 + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); 1200 + for (const annoteDlNode of annoteDls) { 1201 + annoteDlNode.addEventListener('click', (event) => { 1202 + const clickedEl = event.target; 1203 + if (clickedEl !== selectedAnnoteEl) { 1204 + unselectCodeLines(); 1205 + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); 1206 + if (activeEl) { 1207 + activeEl.classList.remove('code-annotation-active'); 1208 + } 1209 + selectCodeLines(clickedEl); 1210 + clickedEl.classList.add('code-annotation-active'); 1211 + } else { 1212 + // Unselect the line 1213 + unselectCodeLines(); 1214 + clickedEl.classList.remove('code-annotation-active'); 1215 + } 1216 + }); 1217 + } 1218 + const findCites = (el) => { 1219 + const parentEl = el.parentElement; 1220 + if (parentEl) { 1221 + const cites = parentEl.dataset.cites; 1222 + if (cites) { 1223 + return { 1224 + el, 1225 + cites: cites.split(' ') 1226 + }; 1227 + } else { 1228 + return findCites(el.parentElement) 1229 + } 1230 + } else { 1231 + return undefined; 1232 + } 1233 + }; 1234 + var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); 1235 + for (var i=0; i<bibliorefs.length; i++) { 1236 + const ref = bibliorefs[i]; 1237 + const citeInfo = findCites(ref); 1238 + if (citeInfo) { 1239 + tippyHover(citeInfo.el, function() { 1240 + var popup = window.document.createElement('div'); 1241 + citeInfo.cites.forEach(function(cite) { 1242 + var citeDiv = window.document.createElement('div'); 1243 + citeDiv.classList.add('hanging-indent'); 1244 + citeDiv.classList.add('csl-entry'); 1245 + var biblioDiv = window.document.getElementById('ref-' + cite); 1246 + if (biblioDiv) { 1247 + citeDiv.innerHTML = biblioDiv.innerHTML; 1248 + } 1249 + popup.appendChild(citeDiv); 1250 + }); 1251 + return popup.innerHTML; 1252 + }); 1253 + } 1254 + } 1255 + }); 1256 + </script> 1257 + </div> <!-- /content --> 1258 + <footer class="footer"> 1259 + <div class="nav-footer"> 1260 + <div class="nav-footer-left"> 1261 + <p>Built with <a href="https://quarto.org/">Quarto</a></p> 1262 + </div> 1263 + <div class="nav-footer-center"> 1264 + &nbsp; 1265 + <div class="toc-actions d-sm-block d-md-none"><ul><li><a href="https://github.com/your-org/atdata/edit/main/reference/troubleshooting.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></div> 1266 + <div class="nav-footer-right"> 1267 + <p>MIT License</p> 1268 + </div> 1269 + </div> 1270 + </footer> 1271 + 1272 + 1273 + 1274 + 1275 + </body></html>
+22 -2
docs/reference/uri-spec.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 672 692 <h2 class="anchored" data-anchor-id="examples">Examples</h2> 673 693 <section id="local-development" class="level3"> 674 694 <h3 class="anchored" data-anchor-id="local-development">Local Development</h3> 675 - <div id="2cff2485" class="cell"> 695 + <div id="2c57810f" class="cell"> 676 696 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Index</span> 677 697 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 678 698 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> Index()</span> ··· 691 711 </section> 692 712 <section id="atmosphere-atproto-federation" class="level3"> 693 713 <h3 class="anchored" data-anchor-id="atmosphere-atproto-federation">Atmosphere (ATProto Federation)</h3> 694 - <div id="6a5dabc7" class="cell"> 714 + <div id="28df91d7" class="cell"> 695 715 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> Client</span> 696 716 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 697 717 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> Client()</span>
+215 -83
docs/search.json
··· 132 132 ] 133 133 }, 134 134 { 135 + "objectID": "reference/datasets.html", 136 + "href": "reference/datasets.html", 137 + "title": "Datasets", 138 + "section": "", 139 + "text": "The Dataset class provides typed iteration over WebDataset tar files with automatic batching and lens transformations.", 140 + "crumbs": [ 141 + "Guide", 142 + "Reference", 143 + "Datasets" 144 + ] 145 + }, 146 + { 147 + "objectID": "reference/datasets.html#creating-a-dataset", 148 + "href": "reference/datasets.html#creating-a-dataset", 149 + "title": "Datasets", 150 + "section": "Creating a Dataset", 151 + "text": "Creating a Dataset\n\nimport atdata\nfrom numpy.typing import NDArray\n\n@atdata.packable\nclass ImageSample:\n image: NDArray\n label: str\n\n# Single shard (string URL - most common)\ndataset = atdata.Dataset[ImageSample](\"data-000000.tar\")\n\n# Multiple shards with brace notation\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\n\nThe type parameter [ImageSample] specifies what sample type the dataset contains. This enables type-safe iteration and automatic deserialization.", 152 + "crumbs": [ 153 + "Guide", 154 + "Reference", 155 + "Datasets" 156 + ] 157 + }, 158 + { 159 + "objectID": "reference/datasets.html#data-sources", 160 + "href": "reference/datasets.html#data-sources", 161 + "title": "Datasets", 162 + "section": "Data Sources", 163 + "text": "Data Sources\nDatasets can be created from different data sources using the DataSource protocol:\n\nURL Source (default)\nWhen you pass a string to Dataset, it automatically wraps it in a URLSource:\n\n# These are equivalent:\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\ndataset = atdata.Dataset[ImageSample](atdata.URLSource(\"data-{000000..000009}.tar\"))\n\n\n\nS3 Source\nFor private S3 buckets or S3-compatible storage (Cloudflare R2, MinIO), use S3Source:\n\n# From explicit credentials\nsource = atdata.S3Source(\n bucket=\"my-bucket\",\n keys=[\"data-000000.tar\", \"data-000001.tar\"],\n endpoint=\"https://my-r2-account.r2.cloudflarestorage.com\",\n access_key=\"AKID...\",\n secret_key=\"SECRET...\",\n)\ndataset = atdata.Dataset[ImageSample](source)\n\n# From S3 URLs\nsource = atdata.S3Source.from_urls([\n \"s3://my-bucket/data-000000.tar\",\n \"s3://my-bucket/data-000001.tar\",\n])\ndataset = atdata.Dataset[ImageSample](source)\n\n\n\n\n\n\n\nNote\n\n\n\nS3Source uses boto3 for streaming, enabling authentication with private buckets. For public S3 URLs, a string URL with URLSource works directly.", 164 + "crumbs": [ 165 + "Guide", 166 + "Reference", 167 + "Datasets" 168 + ] 169 + }, 170 + { 171 + "objectID": "reference/datasets.html#iteration-modes", 172 + "href": "reference/datasets.html#iteration-modes", 173 + "title": "Datasets", 174 + "section": "Iteration Modes", 175 + "text": "Iteration Modes\n\nOrdered Iteration\nIterate through samples in their original order:\n\n# With batching (default batch_size=1)\nfor batch in dataset.ordered(batch_size=32):\n images = batch.image # numpy array (32, H, W, C)\n labels = batch.label # list of 32 strings\n\n# Without batching (raw samples)\nfor sample in dataset.ordered(batch_size=None):\n print(sample.label)\n\n\n\nShuffled Iteration\nIterate with randomized order at both shard and sample levels:\n\nfor batch in dataset.shuffled(batch_size=32):\n # Samples are shuffled\n process(batch)\n\n# Control shuffle buffer sizes\nfor batch in dataset.shuffled(\n buffer_shards=100, # Shards to buffer (default: 100)\n buffer_samples=10000, # Samples to buffer (default: 10,000)\n batch_size=32,\n):\n process(batch)\n\n\n\n\n\n\n\nTip\n\n\n\nLarger buffer sizes increase randomness but use more memory. For training, buffer_samples=10000 is usually a good balance.", 176 + "crumbs": [ 177 + "Guide", 178 + "Reference", 179 + "Datasets" 180 + ] 181 + }, 182 + { 183 + "objectID": "reference/datasets.html#samplebatch", 184 + "href": "reference/datasets.html#samplebatch", 185 + "title": "Datasets", 186 + "section": "SampleBatch", 187 + "text": "SampleBatch\nWhen iterating with a batch_size, each iteration yields a SampleBatch with automatic attribute aggregation.\n\n@atdata.packable\nclass Sample:\n features: NDArray # shape (256,)\n label: str\n score: float\n\nfor batch in dataset.ordered(batch_size=16):\n # NDArray fields are stacked with a batch dimension\n features = batch.features # numpy array (16, 256)\n\n # Other fields become lists\n labels = batch.label # list of 16 strings\n scores = batch.score # list of 16 floats\n\nResults are cached, so accessing the same attribute multiple times is efficient.", 188 + "crumbs": [ 189 + "Guide", 190 + "Reference", 191 + "Datasets" 192 + ] 193 + }, 194 + { 195 + "objectID": "reference/datasets.html#type-transformations-with-lenses", 196 + "href": "reference/datasets.html#type-transformations-with-lenses", 197 + "title": "Datasets", 198 + "section": "Type Transformations with Lenses", 199 + "text": "Type Transformations with Lenses\nView a dataset through a different sample type using registered lenses:\n\n@atdata.packable\nclass SimplifiedSample:\n label: str\n\n@atdata.lens\ndef simplify(src: ImageSample) -&gt; SimplifiedSample:\n return SimplifiedSample(label=src.label)\n\n# Transform dataset to different type\nsimple_ds = dataset.as_type(SimplifiedSample)\n\nfor batch in simple_ds.ordered(batch_size=16):\n print(batch.label) # Only label field available\n\nSee Lenses for details on defining transformations.", 200 + "crumbs": [ 201 + "Guide", 202 + "Reference", 203 + "Datasets" 204 + ] 205 + }, 206 + { 207 + "objectID": "reference/datasets.html#dataset-properties", 208 + "href": "reference/datasets.html#dataset-properties", 209 + "title": "Datasets", 210 + "section": "Dataset Properties", 211 + "text": "Dataset Properties\n\nShard List\nGet the list of individual tar files:\n\ndataset = atdata.Dataset[Sample](\"data-{000000..000009}.tar\")\nshards = dataset.shard_list\n# ['data-000000.tar', 'data-000001.tar', ..., 'data-000009.tar']\n\n\n\nMetadata\nDatasets can have associated metadata from a URL:\n\ndataset = atdata.Dataset[Sample](\n \"data-{000000..000009}.tar\",\n metadata_url=\"https://example.com/metadata.msgpack\"\n)\n\n# Fetched and cached on first access\nmetadata = dataset.metadata # dict or None", 212 + "crumbs": [ 213 + "Guide", 214 + "Reference", 215 + "Datasets" 216 + ] 217 + }, 218 + { 219 + "objectID": "reference/datasets.html#writing-datasets", 220 + "href": "reference/datasets.html#writing-datasets", 221 + "title": "Datasets", 222 + "section": "Writing Datasets", 223 + "text": "Writing Datasets\nUse WebDataset’s TarWriter or ShardWriter to create datasets:\n\nimport webdataset as wds\nimport numpy as np\n\nsamples = [\n ImageSample(image=np.random.rand(224, 224, 3).astype(np.float32), label=\"cat\")\n for _ in range(100)\n]\n\n# Single tar file\nwith wds.writer.TarWriter(\"data-000000.tar\") as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"sample_{i:06d}\"})\n\n# Multiple shards with automatic splitting\nwith wds.writer.ShardWriter(\"data-%06d.tar\", maxcount=1000) as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"sample_{i:06d}\"})", 224 + "crumbs": [ 225 + "Guide", 226 + "Reference", 227 + "Datasets" 228 + ] 229 + }, 230 + { 231 + "objectID": "reference/datasets.html#parquet-export", 232 + "href": "reference/datasets.html#parquet-export", 233 + "title": "Datasets", 234 + "section": "Parquet Export", 235 + "text": "Parquet Export\nExport dataset contents to parquet format:\n\n# Export entire dataset\ndataset.to_parquet(\"output.parquet\")\n\n# Export with custom field mapping\ndef extract_fields(sample):\n return {\"label\": sample.label, \"score\": sample.confidence}\n\ndataset.to_parquet(\"output.parquet\", sample_map=extract_fields)\n\n# Export in segments\ndataset.to_parquet(\"output.parquet\", maxcount=10000)\n# Creates output-000000.parquet, output-000001.parquet, etc.", 236 + "crumbs": [ 237 + "Guide", 238 + "Reference", 239 + "Datasets" 240 + ] 241 + }, 242 + { 243 + "objectID": "reference/datasets.html#url-formats", 244 + "href": "reference/datasets.html#url-formats", 245 + "title": "Datasets", 246 + "section": "URL Formats", 247 + "text": "URL Formats\nWhen using string URLs (via URLSource), WebDataset supports various formats:\n\n\n\n\n\n\n\nFormat\nExample\n\n\n\n\nLocal files\n./data/file.tar, /absolute/path/file-{000000..000009}.tar\n\n\nHTTP/HTTPS\nhttps://example.com/data-{000000..000009}.tar\n\n\nGoogle Cloud\ngs://bucket/path/file.tar\n\n\n\nFor S3 with authentication, use S3Source instead of s3:// URLs.", 248 + "crumbs": [ 249 + "Guide", 250 + "Reference", 251 + "Datasets" 252 + ] 253 + }, 254 + { 255 + "objectID": "reference/datasets.html#dataset-properties-1", 256 + "href": "reference/datasets.html#dataset-properties-1", 257 + "title": "Datasets", 258 + "section": "Dataset Properties", 259 + "text": "Dataset Properties\n\nSource\nAccess the underlying DataSource:\n\ndataset = atdata.Dataset[Sample](\"data.tar\")\nsource = dataset.source # URLSource instance\nprint(source.shard_list) # ['data.tar']\n\n\n\nSample Type\nGet the type parameter used to create the dataset:\n\ndataset = atdata.Dataset[ImageSample](\"data.tar\")\nprint(dataset.sample_type) # &lt;class 'ImageSample'&gt;\nprint(dataset.batch_type) # SampleBatch[ImageSample]", 260 + "crumbs": [ 261 + "Guide", 262 + "Reference", 263 + "Datasets" 264 + ] 265 + }, 266 + { 267 + "objectID": "reference/datasets.html#related", 268 + "href": "reference/datasets.html#related", 269 + "title": "Datasets", 270 + "section": "Related", 271 + "text": "Related\n\nPackable Samples - Defining typed samples\nLenses - Type transformations\nload_dataset - HuggingFace-style loading API\nProtocols - DataSource protocol details", 272 + "crumbs": [ 273 + "Guide", 274 + "Reference", 275 + "Datasets" 276 + ] 277 + }, 278 + { 135 279 "objectID": "reference/packable-samples.html", 136 280 "href": "reference/packable-samples.html", 137 281 "title": "Packable Samples", ··· 1588 1732 ] 1589 1733 }, 1590 1734 { 1735 + "objectID": "reference/atmosphere.html#lower-level-loaders", 1736 + "href": "reference/atmosphere.html#lower-level-loaders", 1737 + "title": "Atmosphere (ATProto Integration)", 1738 + "section": "Lower-Level Loaders", 1739 + "text": "Lower-Level Loaders\nFor direct access to records, use the loader classes:\n\nSchemaLoader\n\nfrom atdata.atmosphere import SchemaLoader\n\nloader = SchemaLoader(client)\n\n# Get a specific schema\nschema = loader.get(\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/xyz\")\nprint(schema[\"name\"], schema[\"version\"])\n\n# List all schemas from a repository\nfor schema in loader.list_all(repo=\"did:plc:other-user\"):\n print(schema[\"name\"])\n\n\n\nDatasetLoader\n\nfrom atdata.atmosphere import DatasetLoader\n\nloader = DatasetLoader(client)\n\n# Get a specific dataset record\nrecord = loader.get(\"at://did:plc:abc/ac.foundation.dataset.record/xyz\")\n\n# Check storage type\nstorage_type = loader.get_storage_type(uri) # \"external\" or \"blobs\"\n\n# Get URLs based on storage type\nif storage_type == \"external\":\n urls = loader.get_urls(uri)\nelse:\n urls = loader.get_blob_urls(uri)\n\n# Get metadata\nmetadata = loader.get_metadata(uri)\n\n# Create a Dataset object directly\ndataset = loader.to_dataset(uri, MySampleType)\nfor batch in dataset.ordered(batch_size=32):\n process(batch)\n\n\n\nLensLoader\n\nfrom atdata.atmosphere import LensLoader\n\nloader = LensLoader(client)\n\n# Get a specific lens record\nlens = loader.get(\"at://did:plc:abc/ac.foundation.dataset.lens/xyz\")\nprint(lens[\"name\"])\nprint(lens[\"sourceSchema\"], \"-&gt;\", lens[\"targetSchema\"])\n\n# List all lenses from a repository\nfor lens in loader.list_all():\n print(lens[\"name\"])\n\n# Find lenses by schema\nlenses = loader.find_by_schemas(\n source_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source\",\n target_schema_uri=\"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target\",\n)", 1740 + "crumbs": [ 1741 + "Guide", 1742 + "Reference", 1743 + "Atmosphere (ATProto Integration)" 1744 + ] 1745 + }, 1746 + { 1591 1747 "objectID": "reference/atmosphere.html#at-uris", 1592 1748 "href": "reference/atmosphere.html#at-uris", 1593 1749 "title": "Atmosphere (ATProto Integration)", ··· 1636 1792 ] 1637 1793 }, 1638 1794 { 1639 - "objectID": "reference/datasets.html", 1640 - "href": "reference/datasets.html", 1641 - "title": "Datasets", 1795 + "objectID": "reference/deployment.html", 1796 + "href": "reference/deployment.html", 1797 + "title": "Deployment Guide", 1642 1798 "section": "", 1643 - "text": "The Dataset class provides typed iteration over WebDataset tar files with automatic batching and lens transformations.", 1799 + "text": "This guide covers deploying atdata in production environments, including Redis setup for LocalIndex, S3 storage configuration, and ATProto publishing considerations.", 1644 1800 "crumbs": [ 1645 1801 "Guide", 1646 1802 "Reference", 1647 - "Datasets" 1803 + "Deployment Guide" 1648 1804 ] 1649 1805 }, 1650 1806 { 1651 - "objectID": "reference/datasets.html#creating-a-dataset", 1652 - "href": "reference/datasets.html#creating-a-dataset", 1653 - "title": "Datasets", 1654 - "section": "Creating a Dataset", 1655 - "text": "Creating a Dataset\n\nimport atdata\nfrom numpy.typing import NDArray\n\n@atdata.packable\nclass ImageSample:\n image: NDArray\n label: str\n\n# Single shard (string URL - most common)\ndataset = atdata.Dataset[ImageSample](\"data-000000.tar\")\n\n# Multiple shards with brace notation\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\n\nThe type parameter [ImageSample] specifies what sample type the dataset contains. This enables type-safe iteration and automatic deserialization.", 1807 + "objectID": "reference/deployment.html#local-storage-deployment", 1808 + "href": "reference/deployment.html#local-storage-deployment", 1809 + "title": "Deployment Guide", 1810 + "section": "Local Storage Deployment", 1811 + "text": "Local Storage Deployment\nThe local storage backend uses Redis for metadata indexing and S3-compatible storage for dataset files.\n\nRedis Setup\n\nRequirements\n\nRedis 6.0+ (for Redis-OM compatibility)\nSufficient memory for index metadata (typically &lt; 100MB for most deployments)\n\n\n\nDocker Deployment\n# Basic Redis\ndocker run -d \\\n --name atdata-redis \\\n -p 6379:6379 \\\n -v redis-data:/data \\\n redis:7-alpine \\\n redis-server --appendonly yes\n\n# With password\ndocker run -d \\\n --name atdata-redis \\\n -p 6379:6379 \\\n -v redis-data:/data \\\n redis:7-alpine \\\n redis-server --appendonly yes --requirepass yourpassword\n\n\nConfiguration\nfrom redis import Redis\nfrom atdata.local import LocalIndex\n\n# Basic connection\nredis = Redis(host=\"localhost\", port=6379)\nindex = LocalIndex(redis=redis)\n\n# With authentication\nredis = Redis(\n host=\"redis.example.com\",\n port=6379,\n password=\"yourpassword\",\n ssl=True, # For production\n)\nindex = LocalIndex(redis=redis)\n\n\nRedis Clustering\nFor high-availability deployments:\nfrom redis.cluster import RedisCluster\n\n# Redis Cluster connection\nredis = RedisCluster(\n host=\"redis-cluster.example.com\",\n port=6379,\n password=\"yourpassword\",\n)\nindex = LocalIndex(redis=redis)\n\n\n\n\n\n\nNote\n\n\n\nRedis-OM (used internally) supports Redis Cluster mode. Ensure all nodes have the same configuration.\n\n\n\n\n\nS3 Storage Setup\n\nAWS S3\nfrom atdata.local import S3DataStore\n\n# Using environment credentials (recommended for AWS)\n# Set AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY\nstore = S3DataStore(\n bucket=\"my-atdata-bucket\",\n prefix=\"datasets/\",\n)\n\n# Explicit credentials\nstore = S3DataStore(\n bucket=\"my-atdata-bucket\",\n prefix=\"datasets/\",\n credentials={\n \"AWS_ACCESS_KEY_ID\": \"...\",\n \"AWS_SECRET_ACCESS_KEY\": \"...\",\n \"AWS_DEFAULT_REGION\": \"us-west-2\",\n },\n)\n\n\nS3-Compatible Storage (MinIO, Cloudflare R2, etc.)\nstore = S3DataStore(\n bucket=\"my-bucket\",\n prefix=\"datasets/\",\n endpoint_url=\"https://s3.example.com\",\n credentials={\n \"AWS_ACCESS_KEY_ID\": \"...\",\n \"AWS_SECRET_ACCESS_KEY\": \"...\",\n },\n)\n\n\nMinIO Deployment\n# Docker deployment\ndocker run -d \\\n --name minio \\\n -p 9000:9000 \\\n -p 9001:9001 \\\n -v minio-data:/data \\\n -e MINIO_ROOT_USER=minioadmin \\\n -e MINIO_ROOT_PASSWORD=minioadmin \\\n minio/minio server /data --console-address \":9001\"\nstore = S3DataStore(\n bucket=\"atdata\",\n endpoint_url=\"http://localhost:9000\",\n credentials={\n \"AWS_ACCESS_KEY_ID\": \"minioadmin\",\n \"AWS_SECRET_ACCESS_KEY\": \"minioadmin\",\n },\n)\n\n\n\nProduction Checklist\n\nRedis persistence enabled (appendonly yes)\nRedis password authentication configured\nRedis TLS enabled for remote connections\nS3 bucket access policies configured (least privilege)\nS3 bucket versioning enabled (for data recovery)\nMonitoring for Redis memory usage\nBackup strategy for Redis data", 1656 1812 "crumbs": [ 1657 1813 "Guide", 1658 1814 "Reference", 1659 - "Datasets" 1815 + "Deployment Guide" 1660 1816 ] 1661 1817 }, 1662 1818 { 1663 - "objectID": "reference/datasets.html#data-sources", 1664 - "href": "reference/datasets.html#data-sources", 1665 - "title": "Datasets", 1666 - "section": "Data Sources", 1667 - "text": "Data Sources\nDatasets can be created from different data sources using the DataSource protocol:\n\nURL Source (default)\nWhen you pass a string to Dataset, it automatically wraps it in a URLSource:\n\n# These are equivalent:\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\ndataset = atdata.Dataset[ImageSample](atdata.URLSource(\"data-{000000..000009}.tar\"))\n\n\n\nS3 Source\nFor private S3 buckets or S3-compatible storage (Cloudflare R2, MinIO), use S3Source:\n\n# From explicit credentials\nsource = atdata.S3Source(\n bucket=\"my-bucket\",\n keys=[\"data-000000.tar\", \"data-000001.tar\"],\n endpoint=\"https://my-r2-account.r2.cloudflarestorage.com\",\n access_key=\"AKID...\",\n secret_key=\"SECRET...\",\n)\ndataset = atdata.Dataset[ImageSample](source)\n\n# From S3 URLs\nsource = atdata.S3Source.from_urls([\n \"s3://my-bucket/data-000000.tar\",\n \"s3://my-bucket/data-000001.tar\",\n])\ndataset = atdata.Dataset[ImageSample](source)\n\n\n\n\n\n\n\nNote\n\n\n\nS3Source uses boto3 for streaming, enabling authentication with private buckets. For public S3 URLs, a string URL with URLSource works directly.", 1819 + "objectID": "reference/deployment.html#atproto-deployment", 1820 + "href": "reference/deployment.html#atproto-deployment", 1821 + "title": "Deployment Guide", 1822 + "section": "ATProto Deployment", 1823 + "text": "ATProto Deployment\n\nAccount Setup\n\nCreate a Bluesky account or use your existing account\nGenerate an app-specific password at bsky.app/settings/app-passwords\nNever use your main account password in code\n\n\n\n\n\n\n\nWarning\n\n\n\nSecurity: Always use app passwords, never your main password. App passwords can be revoked without affecting your account.\n\n\n\n\nAuthentication Patterns\n\nEnvironment Variables (Recommended)\nimport os\nfrom atdata.atmosphere import AtmosphereClient\n\nclient = AtmosphereClient()\nclient.login(\n os.environ[\"ATPROTO_HANDLE\"],\n os.environ[\"ATPROTO_APP_PASSWORD\"],\n)\n\n\nSession Persistence\nFor long-running services, persist and reuse sessions:\nimport os\nfrom pathlib import Path\n\nSESSION_FILE = Path(\"~/.atdata/session\").expanduser()\n\nclient = AtmosphereClient()\n\nif SESSION_FILE.exists():\n # Restore existing session\n session_string = SESSION_FILE.read_text()\n try:\n client.login_with_session(session_string)\n except Exception:\n # Session expired, re-authenticate\n client.login(handle, app_password)\n SESSION_FILE.parent.mkdir(parents=True, exist_ok=True)\n SESSION_FILE.write_text(client.export_session())\nelse:\n # Initial login\n client.login(handle, app_password)\n SESSION_FILE.parent.mkdir(parents=True, exist_ok=True)\n SESSION_FILE.write_text(client.export_session())\n\n\n\nCustom PDS Deployment\nFor self-hosted ATProto infrastructure:\nclient = AtmosphereClient(base_url=\"https://pds.example.com\")\nclient.login(\"handle.example.com\", \"app-password\")\nSee ATProto PDS documentation for self-hosting setup.\n\n\nRate Limiting Considerations\nATProto has rate limits. For bulk operations:\n\nSpace out record creation (1-2 per second for bulk uploads)\nUse batch operations where available\nImplement exponential backoff for retries\nConsider blob storage limits (~50MB per blob)\n\nimport time\n\nfor i, dataset in enumerate(datasets_to_publish):\n index.insert_dataset(dataset, name=f\"dataset-{i}\", ...)\n time.sleep(1) # Rate limiting", 1668 1824 "crumbs": [ 1669 1825 "Guide", 1670 1826 "Reference", 1671 - "Datasets" 1672 - ] 1673 - }, 1674 - { 1675 - "objectID": "reference/datasets.html#iteration-modes", 1676 - "href": "reference/datasets.html#iteration-modes", 1677 - "title": "Datasets", 1678 - "section": "Iteration Modes", 1679 - "text": "Iteration Modes\n\nOrdered Iteration\nIterate through samples in their original order:\n\n# With batching (default batch_size=1)\nfor batch in dataset.ordered(batch_size=32):\n images = batch.image # numpy array (32, H, W, C)\n labels = batch.label # list of 32 strings\n\n# Without batching (raw samples)\nfor sample in dataset.ordered(batch_size=None):\n print(sample.label)\n\n\n\nShuffled Iteration\nIterate with randomized order at both shard and sample levels:\n\nfor batch in dataset.shuffled(batch_size=32):\n # Samples are shuffled\n process(batch)\n\n# Control shuffle buffer sizes\nfor batch in dataset.shuffled(\n buffer_shards=100, # Shards to buffer (default: 100)\n buffer_samples=10000, # Samples to buffer (default: 10,000)\n batch_size=32,\n):\n process(batch)\n\n\n\n\n\n\n\nTip\n\n\n\nLarger buffer sizes increase randomness but use more memory. For training, buffer_samples=10000 is usually a good balance.", 1680 - "crumbs": [ 1681 - "Guide", 1682 - "Reference", 1683 - "Datasets" 1684 - ] 1685 - }, 1686 - { 1687 - "objectID": "reference/datasets.html#samplebatch", 1688 - "href": "reference/datasets.html#samplebatch", 1689 - "title": "Datasets", 1690 - "section": "SampleBatch", 1691 - "text": "SampleBatch\nWhen iterating with a batch_size, each iteration yields a SampleBatch with automatic attribute aggregation.\n\n@atdata.packable\nclass Sample:\n features: NDArray # shape (256,)\n label: str\n score: float\n\nfor batch in dataset.ordered(batch_size=16):\n # NDArray fields are stacked with a batch dimension\n features = batch.features # numpy array (16, 256)\n\n # Other fields become lists\n labels = batch.label # list of 16 strings\n scores = batch.score # list of 16 floats\n\nResults are cached, so accessing the same attribute multiple times is efficient.", 1692 - "crumbs": [ 1693 - "Guide", 1694 - "Reference", 1695 - "Datasets" 1827 + "Deployment Guide" 1696 1828 ] 1697 1829 }, 1698 1830 { 1699 - "objectID": "reference/datasets.html#type-transformations-with-lenses", 1700 - "href": "reference/datasets.html#type-transformations-with-lenses", 1701 - "title": "Datasets", 1702 - "section": "Type Transformations with Lenses", 1703 - "text": "Type Transformations with Lenses\nView a dataset through a different sample type using registered lenses:\n\n@atdata.packable\nclass SimplifiedSample:\n label: str\n\n@atdata.lens\ndef simplify(src: ImageSample) -&gt; SimplifiedSample:\n return SimplifiedSample(label=src.label)\n\n# Transform dataset to different type\nsimple_ds = dataset.as_type(SimplifiedSample)\n\nfor batch in simple_ds.ordered(batch_size=16):\n print(batch.label) # Only label field available\n\nSee Lenses for details on defining transformations.", 1831 + "objectID": "reference/deployment.html#docker-compose-example", 1832 + "href": "reference/deployment.html#docker-compose-example", 1833 + "title": "Deployment Guide", 1834 + "section": "Docker Compose Example", 1835 + "text": "Docker Compose Example\nComplete local deployment with Redis and MinIO:\n# docker-compose.yml\nversion: '3.8'\n\nservices:\n redis:\n image: redis:7-alpine\n command: redis-server --appendonly yes --requirepass ${REDIS_PASSWORD}\n ports:\n - \"6379:6379\"\n volumes:\n - redis-data:/data\n\n minio:\n image: minio/minio\n command: server /data --console-address \":9001\"\n ports:\n - \"9000:9000\"\n - \"9001:9001\"\n environment:\n MINIO_ROOT_USER: ${MINIO_USER}\n MINIO_ROOT_PASSWORD: ${MINIO_PASSWORD}\n volumes:\n - minio-data:/data\n\nvolumes:\n redis-data:\n minio-data:\n# .env\nREDIS_PASSWORD=your-redis-password\nMINIO_USER=minioadmin\nMINIO_PASSWORD=your-minio-password", 1704 1836 "crumbs": [ 1705 1837 "Guide", 1706 1838 "Reference", 1707 - "Datasets" 1839 + "Deployment Guide" 1708 1840 ] 1709 1841 }, 1710 1842 { 1711 - "objectID": "reference/datasets.html#dataset-properties", 1712 - "href": "reference/datasets.html#dataset-properties", 1713 - "title": "Datasets", 1714 - "section": "Dataset Properties", 1715 - "text": "Dataset Properties\n\nShard List\nGet the list of individual tar files:\n\ndataset = atdata.Dataset[Sample](\"data-{000000..000009}.tar\")\nshards = dataset.shard_list\n# ['data-000000.tar', 'data-000001.tar', ..., 'data-000009.tar']\n\n\n\nMetadata\nDatasets can have associated metadata from a URL:\n\ndataset = atdata.Dataset[Sample](\n \"data-{000000..000009}.tar\",\n metadata_url=\"https://example.com/metadata.msgpack\"\n)\n\n# Fetched and cached on first access\nmetadata = dataset.metadata # dict or None", 1843 + "objectID": "reference/deployment.html#monitoring", 1844 + "href": "reference/deployment.html#monitoring", 1845 + "title": "Deployment Guide", 1846 + "section": "Monitoring", 1847 + "text": "Monitoring\n\nRedis Metrics\nKey metrics to monitor:\n\nused_memory: Memory usage\nconnected_clients: Active connections\nkeyspace_hits/misses: Cache efficiency\naof_last_write_status: Persistence health\n\nredis-cli INFO | grep -E \"used_memory|connected_clients|keyspace\"\n\n\nS3 Metrics\n\nRequest counts and latency\nError rates (4xx, 5xx)\nStorage usage by prefix\nData transfer costs", 1716 1848 "crumbs": [ 1717 1849 "Guide", 1718 1850 "Reference", 1719 - "Datasets" 1851 + "Deployment Guide" 1720 1852 ] 1721 1853 }, 1722 1854 { 1723 - "objectID": "reference/datasets.html#writing-datasets", 1724 - "href": "reference/datasets.html#writing-datasets", 1725 - "title": "Datasets", 1726 - "section": "Writing Datasets", 1727 - "text": "Writing Datasets\nUse WebDataset’s TarWriter or ShardWriter to create datasets:\n\nimport webdataset as wds\nimport numpy as np\n\nsamples = [\n ImageSample(image=np.random.rand(224, 224, 3).astype(np.float32), label=\"cat\")\n for _ in range(100)\n]\n\n# Single tar file\nwith wds.writer.TarWriter(\"data-000000.tar\") as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"sample_{i:06d}\"})\n\n# Multiple shards with automatic splitting\nwith wds.writer.ShardWriter(\"data-%06d.tar\", maxcount=1000) as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"sample_{i:06d}\"})", 1855 + "objectID": "reference/deployment.html#security-best-practices", 1856 + "href": "reference/deployment.html#security-best-practices", 1857 + "title": "Deployment Guide", 1858 + "section": "Security Best Practices", 1859 + "text": "Security Best Practices\n\nNetwork Isolation: Run Redis and S3 in private networks\nTLS Everywhere: Encrypt connections to Redis and S3\nCredential Rotation: Rotate API keys and passwords regularly\nAccess Logging: Enable S3 access logging for audit trails\nLeast Privilege: Use minimal IAM permissions for S3 access\n\n\nS3 IAM Policy Example\n{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Effect\": \"Allow\",\n \"Action\": [\n \"s3:GetObject\",\n \"s3:PutObject\",\n \"s3:ListBucket\"\n ],\n \"Resource\": [\n \"arn:aws:s3:::my-atdata-bucket\",\n \"arn:aws:s3:::my-atdata-bucket/*\"\n ]\n }\n ]\n}", 1728 1860 "crumbs": [ 1729 1861 "Guide", 1730 1862 "Reference", 1731 - "Datasets" 1863 + "Deployment Guide" 1732 1864 ] 1733 1865 }, 1734 1866 { 1735 - "objectID": "reference/datasets.html#parquet-export", 1736 - "href": "reference/datasets.html#parquet-export", 1737 - "title": "Datasets", 1738 - "section": "Parquet Export", 1739 - "text": "Parquet Export\nExport dataset contents to parquet format:\n\n# Export entire dataset\ndataset.to_parquet(\"output.parquet\")\n\n# Export with custom field mapping\ndef extract_fields(sample):\n return {\"label\": sample.label, \"score\": sample.confidence}\n\ndataset.to_parquet(\"output.parquet\", sample_map=extract_fields)\n\n# Export in segments\ndataset.to_parquet(\"output.parquet\", maxcount=10000)\n# Creates output-000000.parquet, output-000001.parquet, etc.", 1867 + "objectID": "reference/troubleshooting.html", 1868 + "href": "reference/troubleshooting.html", 1869 + "title": "Troubleshooting & FAQ", 1870 + "section": "", 1871 + "text": "This page covers common issues, error messages, and frequently asked questions when working with atdata.", 1740 1872 "crumbs": [ 1741 1873 "Guide", 1742 1874 "Reference", 1743 - "Datasets" 1875 + "Troubleshooting & FAQ" 1744 1876 ] 1745 1877 }, 1746 1878 { 1747 - "objectID": "reference/datasets.html#url-formats", 1748 - "href": "reference/datasets.html#url-formats", 1749 - "title": "Datasets", 1750 - "section": "URL Formats", 1751 - "text": "URL Formats\nWhen using string URLs (via URLSource), WebDataset supports various formats:\n\n\n\n\n\n\n\nFormat\nExample\n\n\n\n\nLocal files\n./data/file.tar, /absolute/path/file-{000000..000009}.tar\n\n\nHTTP/HTTPS\nhttps://example.com/data-{000000..000009}.tar\n\n\nGoogle Cloud\ngs://bucket/path/file.tar\n\n\n\nFor S3 with authentication, use S3Source instead of s3:// URLs.", 1879 + "objectID": "reference/troubleshooting.html#common-errors", 1880 + "href": "reference/troubleshooting.html#common-errors", 1881 + "title": "Troubleshooting & FAQ", 1882 + "section": "Common Errors", 1883 + "text": "Common Errors\n\nTypeError: ‘type’ object is not subscriptable\nError:\nTypeError: 'type' object is not subscriptable\nCause: Using Dataset or SampleBatch without subscripting the type parameter on Python &lt; 3.9, or using an unsubscripted generic.\nSolution: Always use the subscripted form:\n# Correct\nds = Dataset[MySample](\"data.tar\")\nbatch = SampleBatch[MySample](samples)\n\n# Incorrect\nds = Dataset(\"data.tar\") # Missing type parameter\n\n\nAttributeError: ‘NoneType’ object has no attribute…\nError:\nAttributeError: 'NoneType' object has no attribute '__args__'\nCause: Creating a Dataset or SampleBatch without using the subscripted syntax Class[Type](...).\nSolution: These classes use Python’s __orig_class__ mechanism to extract type parameters at runtime. You must use:\nds = Dataset[MySample](url) # Correct\nNot:\nds = Dataset(url) # Wrong - no type information\n\n\nRuntimeError: msgpack field not found in sample\nError:\nRuntimeError: Malformed sample: 'msgpack' field not found\nCause: The tar file contains samples that weren’t written with atdata’s serialization format.\nSolution: Ensure samples are written using sample.as_wds:\nwith wds.writer.TarWriter(\"data.tar\") as sink:\n for sample in samples:\n sink.write(sample.as_wds) # Correct\n\n\nValueError: Field type not supported\nError:\nTypeError: Unsupported type for schema field: &lt;class 'SomeType'&gt;\nCause: Using an unsupported Python type in a PackableSample field.\nSupported types:\n\n\n\nPython Type\nNotes\n\n\n\n\nstr\nUnicode strings\n\n\nint\nIntegers\n\n\nfloat\nFloating point\n\n\nbool\nBoolean\n\n\nbytes\nBinary data\n\n\nNDArray\nNumpy arrays (any dtype)\n\n\nlist[T]\nLists of primitives\n\n\nT \\| None\nOptional fields\n\n\n\nNot supported: Nested dataclasses, dicts, custom classes.\n\n\nKeyError when iterating dataset\nError:\nKeyError: 'msgpack'\nCause: The WebDataset tar file structure doesn’t match expected format.\nSolution: Verify your tar file was created correctly:\n# Check tar contents\ntar -tvf data.tar | head -20\nEach sample should have a .msgpack extension in the tar file.", 1752 1884 "crumbs": [ 1753 1885 "Guide", 1754 1886 "Reference", 1755 - "Datasets" 1887 + "Troubleshooting & FAQ" 1756 1888 ] 1757 1889 }, 1758 1890 { 1759 - "objectID": "reference/datasets.html#dataset-properties-1", 1760 - "href": "reference/datasets.html#dataset-properties-1", 1761 - "title": "Datasets", 1762 - "section": "Dataset Properties", 1763 - "text": "Dataset Properties\n\nSource\nAccess the underlying DataSource:\n\ndataset = atdata.Dataset[Sample](\"data.tar\")\nsource = dataset.source # URLSource instance\nprint(source.shard_list) # ['data.tar']\n\n\n\nSample Type\nGet the type parameter used to create the dataset:\n\ndataset = atdata.Dataset[ImageSample](\"data.tar\")\nprint(dataset.sample_type) # &lt;class 'ImageSample'&gt;\nprint(dataset.batch_type) # SampleBatch[ImageSample]", 1891 + "objectID": "reference/troubleshooting.html#faq", 1892 + "href": "reference/troubleshooting.html#faq", 1893 + "title": "Troubleshooting & FAQ", 1894 + "section": "FAQ", 1895 + "text": "FAQ\n\nHow do I check the sample type of a dataset?\nds = Dataset[MySample](\"data.tar\")\nprint(ds.sample_type) # &lt;class 'MySample'&gt;\n\n\nHow do I convert a dataset to a different type?\nUse the as_type() method with a registered lens:\n@atdata.lens\ndef my_lens(src: SourceType) -&gt; TargetType:\n return TargetType(field=src.other_field)\n\nds_view = ds.as_type(TargetType)\n\n\nHow do I handle optional NDArray fields?\nUse NDArray | None annotation:\n@atdata.packable\nclass MySample:\n required_array: NDArray\n optional_array: NDArray | None = None\n\n\nWhy is my dataset iteration slow?\nCommon causes:\n\nNetwork latency: Use local caching for remote datasets\nSmall batch sizes: Increase batch_size in ordered() or shuffled()\nShuffle buffer: For shuffled(), the initial parameter controls buffer size\n\n# Larger batches = better throughput\nfor batch in ds.shuffled(batch_size=64, initial=1000):\n ...\n\n\nHow do I export to parquet?\nds = Dataset[MySample](\"data.tar\")\nds.to_parquet(\"output.parquet\")\n\n# With sample limit (for large datasets)\nds.to_parquet(\"output.parquet\", maxcount=10000)\n\n\n\n\n\n\nWarning\n\n\n\nto_parquet() loads the dataset into memory. For very large datasets, use maxcount to limit samples or process in chunks.\n\n\n\n\nHow do I handle multiple shards?\nUse WebDataset brace notation:\n# Single shard\nds = Dataset[MySample](\"data-000000.tar\")\n\n# Multiple shards (range)\nds = Dataset[MySample](\"data-{000000..000009}.tar\")\n\n# Multiple shards (list)\nds = Dataset[MySample](\"data-{000000,000005,000009}.tar\")\n\n\nCan I use S3 or other cloud storage?\nYes, use S3Source for S3-compatible storage:\nfrom atdata import S3Source, Dataset\n\nsource = S3Source.from_urls(\n [\"s3://bucket/data-000000.tar\", \"s3://bucket/data-000001.tar\"],\n endpoint_url=\"https://s3.example.com\", # Optional for non-AWS S3\n)\n\nds = Dataset[MySample](source)\n\n\nHow do I publish to ATProto/Atmosphere?\nfrom atdata.atmosphere import AtmosphereClient, AtmosphereIndex\n\nclient = AtmosphereClient()\nclient.login(\"handle.bsky.social\", \"app-password\") # Use app password!\n\nindex = AtmosphereIndex(client)\n\n# Publish schema\nschema_uri = index.publish_schema(MySample, version=\"1.0.0\")\n\n# Publish dataset\nentry = index.insert_dataset(ds, name=\"my-dataset\", schema_ref=schema_uri)\n\n\nWhat’s the difference between LocalIndex and AtmosphereIndex?\n\n\n\nFeature\nLocalIndex\nAtmosphereIndex\n\n\n\n\nStorage\nRedis + S3\nATProto PDS\n\n\nDiscovery\nLocal only\nFederated network\n\n\nAuth\nNone required\nATProto account\n\n\nUse case\nDevelopment, private data\nPublic distribution\n\n\n\nBoth implement the AbstractIndex protocol, so code can work with either.", 1764 1896 "crumbs": [ 1765 1897 "Guide", 1766 1898 "Reference", 1767 - "Datasets" 1899 + "Troubleshooting & FAQ" 1768 1900 ] 1769 1901 }, 1770 1902 { 1771 - "objectID": "reference/datasets.html#related", 1772 - "href": "reference/datasets.html#related", 1773 - "title": "Datasets", 1774 - "section": "Related", 1775 - "text": "Related\n\nPackable Samples - Defining typed samples\nLenses - Type transformations\nload_dataset - HuggingFace-style loading API\nProtocols - DataSource protocol details", 1903 + "objectID": "reference/troubleshooting.html#getting-help", 1904 + "href": "reference/troubleshooting.html#getting-help", 1905 + "title": "Troubleshooting & FAQ", 1906 + "section": "Getting Help", 1907 + "text": "Getting Help\n\nGitHub Issues: github.com/your-org/atdata/issues\nDocumentation: Check the reference pages for detailed API documentation\nExamples: See the examples/ directory for working code samples", 1776 1908 "crumbs": [ 1777 1909 "Guide", 1778 1910 "Reference", 1779 - "Datasets" 1911 + "Troubleshooting & FAQ" 1780 1912 ] 1781 1913 } 1782 1914 ]
+20 -12
docs/sitemap.xml
··· 2 2 <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> 3 3 <url> 4 4 <loc>https://github.com/your-org/atdata/reference/protocols.html</loc> 5 - <lastmod>2026-01-22T19:18:54.598Z</lastmod> 5 + <lastmod>2026-01-22T19:31:03.723Z</lastmod> 6 + </url> 7 + <url> 8 + <loc>https://github.com/your-org/atdata/reference/datasets.html</loc> 9 + <lastmod>2026-01-22T19:31:03.722Z</lastmod> 6 10 </url> 7 11 <url> 8 12 <loc>https://github.com/your-org/atdata/reference/packable-samples.html</loc> ··· 14 18 </url> 15 19 <url> 16 20 <loc>https://github.com/your-org/atdata/reference/load-dataset.html</loc> 17 - <lastmod>2026-01-22T19:20:17.161Z</lastmod> 21 + <lastmod>2026-01-22T19:31:03.722Z</lastmod> 18 22 </url> 19 23 <url> 20 24 <loc>https://github.com/your-org/atdata/reference/promotion.html</loc> 21 - <lastmod>2026-01-20T00:37:12.561Z</lastmod> 25 + <lastmod>2026-01-22T19:31:03.723Z</lastmod> 22 26 </url> 23 27 <url> 24 28 <loc>https://github.com/your-org/atdata/tutorials/local-workflow.html</loc> 25 - <lastmod>2026-01-20T00:37:30.604Z</lastmod> 29 + <lastmod>2026-01-22T19:31:03.723Z</lastmod> 26 30 </url> 27 31 <url> 28 32 <loc>https://github.com/your-org/atdata/tutorials/promotion.html</loc> 29 - <lastmod>2026-01-20T20:29:57.798Z</lastmod> 33 + <lastmod>2026-01-22T19:31:03.724Z</lastmod> 30 34 </url> 31 35 <url> 32 36 <loc>https://github.com/your-org/atdata/index.html</loc> 33 - <lastmod>2026-01-20T00:36:22.475Z</lastmod> 37 + <lastmod>2026-01-22T19:31:03.722Z</lastmod> 34 38 </url> 35 39 <url> 36 40 <loc>https://github.com/your-org/atdata/api/index.html</loc> 37 - <lastmod>2026-01-20T20:30:08.284Z</lastmod> 41 + <lastmod>2026-01-22T19:31:03.721Z</lastmod> 38 42 </url> 39 43 <url> 40 44 <loc>https://github.com/your-org/atdata/tutorials/atmosphere.html</loc> ··· 46 50 </url> 47 51 <url> 48 52 <loc>https://github.com/your-org/atdata/reference/uri-spec.html</loc> 49 - <lastmod>2026-01-20T23:29:15.382Z</lastmod> 53 + <lastmod>2026-01-22T19:31:03.723Z</lastmod> 50 54 </url> 51 55 <url> 52 56 <loc>https://github.com/your-org/atdata/reference/local-storage.html</loc> 53 - <lastmod>2026-01-22T19:20:33.675Z</lastmod> 57 + <lastmod>2026-01-22T19:31:03.723Z</lastmod> 54 58 </url> 55 59 <url> 56 60 <loc>https://github.com/your-org/atdata/reference/atmosphere.html</loc> 57 - <lastmod>2026-01-18T03:31:39.823Z</lastmod> 61 + <lastmod>2026-01-22T20:06:07.401Z</lastmod> 58 62 </url> 59 63 <url> 60 - <loc>https://github.com/your-org/atdata/reference/datasets.html</loc> 61 - <lastmod>2026-01-22T19:18:20.449Z</lastmod> 64 + <loc>https://github.com/your-org/atdata/reference/deployment.html</loc> 65 + <lastmod>2026-01-22T20:19:56.455Z</lastmod> 66 + </url> 67 + <url> 68 + <loc>https://github.com/your-org/atdata/reference/troubleshooting.html</loc> 69 + <lastmod>2026-01-22T20:18:56.494Z</lastmod> 62 70 </url> 63 71 </urlset>
+32 -12
docs/tutorials/atmosphere.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 600 620 </section> 601 621 <section id="setup" class="level2"> 602 622 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 603 - <div id="a227b908" class="cell"> 623 + <div id="be857a4f" class="cell"> 604 624 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 605 625 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 606 626 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 617 637 </section> 618 638 <section id="define-sample-types" class="level2"> 619 639 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 620 - <div id="827244f1" class="cell"> 640 + <div id="a2d68ccd" class="cell"> 621 641 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 622 642 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ImageSample:</span> 623 643 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing image data with metadata."""</span></span> ··· 636 656 <section id="type-introspection" class="level2"> 637 657 <h2 class="anchored" data-anchor-id="type-introspection">Type Introspection</h2> 638 658 <p>See what information is available from a PackableSample type:</p> 639 - <div id="48fe4343" class="cell"> 659 + <div id="a532916f" class="cell"> 640 660 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> fields, is_dataclass</span> 641 661 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 642 662 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Sample type: </span><span class="sc">{</span>ImageSample<span class="sc">.</span><span class="va">__name__</span><span class="sc">}</span><span class="ss">"</span>)</span> ··· 664 684 <section id="at-uri-parsing" class="level2"> 665 685 <h2 class="anchored" data-anchor-id="at-uri-parsing">AT URI Parsing</h2> 666 686 <p>ATProto records are identified by AT URIs:</p> 667 - <div id="f2f4e7c0" class="cell"> 687 + <div id="3f958c3b" class="cell"> 668 688 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>uris <span class="op">=</span> [</span> 669 689 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz789"</span>,</span> 670 690 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://alice.bsky.social/ac.foundation.dataset.record/my-dataset"</span>,</span> ··· 681 701 <section id="authentication" class="level2"> 682 702 <h2 class="anchored" data-anchor-id="authentication">Authentication</h2> 683 703 <p>Connect to ATProto:</p> 684 - <div id="87811271" class="cell"> 704 + <div id="9165dec8" class="cell"> 685 705 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 686 706 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"your.handle.social"</span>, <span class="st">"your-app-password"</span>)</span> 687 707 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 691 711 </section> 692 712 <section id="publish-a-schema" class="level2"> 693 713 <h2 class="anchored" data-anchor-id="publish-a-schema">Publish a Schema</h2> 694 - <div id="3cbc4aed" class="cell"> 714 + <div id="f985aa1b" class="cell"> 695 715 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>schema_publisher <span class="op">=</span> SchemaPublisher(client)</span> 696 716 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> schema_publisher.publish(</span> 697 717 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 704 724 </section> 705 725 <section id="list-your-schemas" class="level2"> 706 726 <h2 class="anchored" data-anchor-id="list-your-schemas">List Your Schemas</h2> 707 - <div id="95a4b60b" class="cell"> 727 + <div id="2193d988" class="cell"> 708 728 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>schema_loader <span class="op">=</span> SchemaLoader(client)</span> 709 729 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>schemas <span class="op">=</span> schema_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 710 730 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(schemas)<span class="sc">}</span><span class="ss"> schema(s)"</span>)</span> ··· 717 737 <h2 class="anchored" data-anchor-id="publish-a-dataset">Publish a Dataset</h2> 718 738 <section id="with-external-urls" class="level3"> 719 739 <h3 class="anchored" data-anchor-id="with-external-urls">With External URLs</h3> 720 - <div id="abacd51d" class="cell"> 740 + <div id="a0d2b9fb" class="cell"> 721 741 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset_publisher <span class="op">=</span> DatasetPublisher(client)</span> 722 742 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>dataset_uri <span class="op">=</span> dataset_publisher.publish_with_urls(</span> 723 743 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> urls<span class="op">=</span>[<span class="st">"s3://example-bucket/demo-data-{000000..000009}.tar"</span>],</span> ··· 733 753 <section id="with-blob-storage" class="level3"> 734 754 <h3 class="anchored" data-anchor-id="with-blob-storage">With Blob Storage</h3> 735 755 <p>For smaller datasets, store data directly in ATProto blobs:</p> 736 - <div id="c0ac92af" class="cell"> 756 + <div id="87822526" class="cell"> 737 757 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 738 758 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 739 759 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> ··· 774 794 </section> 775 795 <section id="list-and-load-datasets" class="level2"> 776 796 <h2 class="anchored" data-anchor-id="list-and-load-datasets">List and Load Datasets</h2> 777 - <div id="a1b08c22" class="cell"> 797 + <div id="2de5f130" class="cell"> 778 798 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>dataset_loader <span class="op">=</span> DatasetLoader(client)</span> 779 799 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>datasets <span class="op">=</span> dataset_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 780 800 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(datasets)<span class="sc">}</span><span class="ss"> dataset(s)"</span>)</span> ··· 789 809 </section> 790 810 <section id="load-a-dataset" class="level2"> 791 811 <h2 class="anchored" data-anchor-id="load-a-dataset">Load a Dataset</h2> 792 - <div id="c4b1f1e4" class="cell"> 812 + <div id="6ebf2e30" class="cell"> 793 813 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 794 814 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> dataset_loader.get_storage_type(<span class="bu">str</span>(blob_dataset_uri))</span> 795 815 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Storage type: </span><span class="sc">{</span>storage_type<span class="sc">}</span><span class="ss">"</span>)</span> ··· 806 826 </section> 807 827 <section id="complete-publishing-workflow" class="level2"> 808 828 <h2 class="anchored" data-anchor-id="complete-publishing-workflow">Complete Publishing Workflow</h2> 809 - <div id="191ea1d0" class="cell"> 829 + <div id="a7ba0337" class="cell"> 810 830 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 811 831 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 812 832 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span>
+28 -8
docs/tutorials/local-workflow.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 596 616 </section> 597 617 <section id="setup" class="level2"> 598 618 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 599 - <div id="fa7797b8" class="cell"> 619 + <div id="a22ae0ac" class="cell"> 600 620 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 601 621 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 602 622 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 606 626 </section> 607 627 <section id="define-sample-types" class="level2"> 608 628 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 609 - <div id="3bf1f5cb" class="cell"> 629 + <div id="495a7723" class="cell"> 610 630 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 611 631 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> TrainingSample:</span> 612 632 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing features and label for training."""</span></span> ··· 623 643 <section id="localdatasetentry" class="level2"> 624 644 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 625 645 <p>Create entries with content-addressable CIDs:</p> 626 - <div id="c9a861b2" class="cell"> 646 + <div id="dc3f7345" class="cell"> 627 647 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create an entry manually</span></span> 628 648 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> LocalDatasetEntry(</span> 629 649 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> _name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 655 675 <section id="localindex" class="level2"> 656 676 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 657 677 <p>The index tracks datasets in Redis:</p> 658 - <div id="ae2e4dcb" class="cell"> 678 + <div id="d63d9f9a" class="cell"> 659 679 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> redis <span class="im">import</span> Redis</span> 660 680 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 661 681 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to Redis</span></span> ··· 666 686 </div> 667 687 <section id="schema-management" class="level3"> 668 688 <h3 class="anchored" data-anchor-id="schema-management">Schema Management</h3> 669 - <div id="e4c5047e" class="cell"> 689 + <div id="03d8f6b1" class="cell"> 670 690 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 671 691 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(TrainingSample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 672 692 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published schema: </span><span class="sc">{</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> ··· 688 708 <section id="s3datastore" class="level2"> 689 709 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 690 710 <p>For direct S3 operations:</p> 691 - <div id="afd44dd0" class="cell"> 711 + <div id="d7a25b8c" class="cell"> 692 712 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>creds <span class="op">=</span> {</span> 693 713 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ENDPOINT"</span>: <span class="st">"http://localhost:9000"</span>,</span> 694 714 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"minioadmin"</span>,</span> ··· 704 724 <section id="complete-index-workflow" class="level2"> 705 725 <h2 class="anchored" data-anchor-id="complete-index-workflow">Complete Index Workflow</h2> 706 726 <p>Use <code>LocalIndex</code> with <code>S3DataStore</code> to store datasets with S3 storage and Redis indexing:</p> 707 - <div id="5cd97320" class="cell"> 727 + <div id="dea0aa44" class="cell"> 708 728 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Create sample data</span></span> 709 729 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 710 730 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> TrainingSample(</span> ··· 753 773 <section id="using-load_dataset-with-index" class="level2"> 754 774 <h2 class="anchored" data-anchor-id="using-load_dataset-with-index">Using load_dataset with Index</h2> 755 775 <p>The <code>load_dataset()</code> function supports index lookup:</p> 756 - <div id="5f0fd961" class="cell"> 776 + <div id="0567b9c0" class="cell"> 757 777 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 758 778 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 759 779 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local index</span></span>
+31 -11
docs/tutorials/promotion.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 590 610 </section> 591 611 <section id="setup" class="level2"> 592 612 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 593 - <div id="947895e5" class="cell"> 613 + <div id="f4fd206b" class="cell"> 594 614 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 595 615 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 596 616 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 603 623 <section id="prepare-a-local-dataset" class="level2"> 604 624 <h2 class="anchored" data-anchor-id="prepare-a-local-dataset">Prepare a Local Dataset</h2> 605 625 <p>First, set up a dataset in local storage:</p> 606 - <div id="c7031498" class="cell"> 626 + <div id="0702ff15" class="cell"> 607 627 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define sample type</span></span> 608 628 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 609 629 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ExperimentSample:</span> ··· 653 673 <section id="basic-promotion" class="level2"> 654 674 <h2 class="anchored" data-anchor-id="basic-promotion">Basic Promotion</h2> 655 675 <p>Promote the dataset to ATProto:</p> 656 - <div id="18fc428b" class="cell"> 676 + <div id="51d5e5c7" class="cell"> 657 677 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to atmosphere</span></span> 658 678 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 659 679 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> ··· 666 686 <section id="promotion-with-metadata" class="level2"> 667 687 <h2 class="anchored" data-anchor-id="promotion-with-metadata">Promotion with Metadata</h2> 668 688 <p>Add description, tags, and license:</p> 669 - <div id="66e32961" class="cell"> 689 + <div id="b774b2ef" class="cell"> 670 690 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 671 691 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> local_entry,</span> 672 692 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 682 702 <section id="schema-deduplication" class="level2"> 683 703 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 684 704 <p>The promotion workflow automatically checks for existing schemas:</p> 685 - <div id="9c49a719" class="cell"> 705 + <div id="03a8c55a" class="cell"> 686 706 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> _find_existing_schema</span> 687 707 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 688 708 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if schema already exists</span></span> ··· 694 714 <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="st">"No existing schema found, will publish new one"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 695 715 </div> 696 716 <p>When you promote multiple datasets with the same sample type:</p> 697 - <div id="92e1244d" class="cell"> 717 + <div id="fa0ee4c6" class="cell"> 698 718 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 699 719 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 700 720 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 709 729 <div class="tab-content"> 710 730 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 711 731 <p>By default, promotion keeps the original data URLs:</p> 712 - <div id="bed4f755" class="cell"> 732 + <div id="976d836f" class="cell"> 713 733 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 714 734 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 715 735 </div> ··· 722 742 </div> 723 743 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 724 744 <p>To copy data to a different storage location:</p> 725 - <div id="8000be2d" class="cell"> 745 + <div id="d4a43aa3" class="cell"> 726 746 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 727 747 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 728 748 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 752 772 <section id="verify-on-atmosphere" class="level2"> 753 773 <h2 class="anchored" data-anchor-id="verify-on-atmosphere">Verify on Atmosphere</h2> 754 774 <p>After promotion, verify the dataset is accessible:</p> 755 - <div id="4ac381c9" class="cell"> 775 + <div id="03ba8286" class="cell"> 756 776 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereIndex</span> 757 777 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 758 778 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>atm_index <span class="op">=</span> AtmosphereIndex(client)</span> ··· 773 793 </section> 774 794 <section id="error-handling" class="level2"> 775 795 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 776 - <div id="fbffbd4e" class="cell"> 796 + <div id="7233337c" class="cell"> 777 797 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 778 798 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span> 779 799 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span> ··· 797 817 </section> 798 818 <section id="complete-workflow" class="level2"> 799 819 <h2 class="anchored" data-anchor-id="complete-workflow">Complete Workflow</h2> 800 - <div id="7ff1e514" class="cell"> 820 + <div id="7d88d8ec" class="cell"> 801 821 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Complete local-to-atmosphere workflow</span></span> 802 822 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 803 823 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span>
+26 -6
docs/tutorials/quickstart.html
··· 359 359 <a class="dropdown-item" href="../reference/uri-spec.html"> 360 360 <span class="dropdown-text">URI Specification</span></a> 361 361 </li> 362 + <li> 363 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 364 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 365 + </li> 366 + <li> 367 + <a class="dropdown-item" href="../reference/deployment.html"> 368 + <span class="dropdown-text">Deployment Guide</span></a> 369 + </li> 362 370 </ul> 363 371 </li> 364 372 <li class="nav-item"> ··· 500 508 <span class="menu-text">URI Specification</span></a> 501 509 </div> 502 510 </li> 511 + <li class="sidebar-item"> 512 + <div class="sidebar-item-container"> 513 + <a href="../reference/troubleshooting.html" class="sidebar-item-text sidebar-link"> 514 + <span class="menu-text">Troubleshooting &amp; FAQ</span></a> 515 + </div> 516 + </li> 517 + <li class="sidebar-item"> 518 + <div class="sidebar-item-container"> 519 + <a href="../reference/deployment.html" class="sidebar-item-text sidebar-link"> 520 + <span class="menu-text">Deployment Guide</span></a> 521 + </div> 522 + </li> 503 523 </ul> 504 524 </li> 505 525 <li class="sidebar-item sidebar-item-section"> ··· 579 599 <section id="define-a-sample-type" class="level2"> 580 600 <h2 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h2> 581 601 <p>Use the <code>@packable</code> decorator to create a typed sample:</p> 582 - <div id="c8931636" class="cell"> 602 + <div id="80ae9e01" class="cell"> 583 603 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 584 604 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 585 605 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 600 620 </section> 601 621 <section id="create-sample-instances" class="level2"> 602 622 <h2 class="anchored" data-anchor-id="create-sample-instances">Create Sample Instances</h2> 603 - <div id="9ab76c0c" class="cell"> 623 + <div id="08540797" class="cell"> 604 624 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a single sample</span></span> 605 625 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 606 626 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> ··· 621 641 <section id="write-a-dataset" class="level2"> 622 642 <h2 class="anchored" data-anchor-id="write-a-dataset">Write a Dataset</h2> 623 643 <p>Use WebDataset’s <code>TarWriter</code> to create dataset files:</p> 624 - <div id="7a3ef6e8" class="cell"> 644 + <div id="1356be97" class="cell"> 625 645 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 626 646 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 627 647 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create 100 samples</span></span> ··· 645 665 <section id="load-and-iterate" class="level2"> 646 666 <h2 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h2> 647 667 <p>Create a typed <code>Dataset</code> and iterate with batching:</p> 648 - <div id="ac97711e" class="cell"> 668 + <div id="5606f904" class="cell"> 649 669 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load dataset with type</span></span> 650 670 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"my-dataset-000000.tar"</span>)</span> 651 671 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 666 686 <section id="shuffled-iteration" class="level2"> 667 687 <h2 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h2> 668 688 <p>For training, use shuffled iteration:</p> 669 - <div id="817072c1" class="cell"> 689 + <div id="9a6bedd7" class="cell"> 670 690 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 671 691 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled at shard and sample level</span></span> 672 692 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image</span> ··· 680 700 <section id="use-lenses-for-type-transformations" class="level2"> 681 701 <h2 class="anchored" data-anchor-id="use-lenses-for-type-transformations">Use Lenses for Type Transformations</h2> 682 702 <p>View datasets through different schemas:</p> 683 - <div id="235eb997" class="cell"> 703 + <div id="000711d4" class="cell"> 684 704 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Define a simplified view type</span></span> 685 705 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 686 706 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span>
+28 -29
tests/test_integration_error_handling.py
··· 575 575 def test_s3_access_denied_error(self): 576 576 """S3 access denied should raise clear error.""" 577 577 from atdata import S3Source 578 + from botocore.exceptions import ClientError 578 579 579 - # Mock S3 client that raises access denied 580 - with patch("boto3.client") as mock_boto: 581 - from botocore.exceptions import ClientError 580 + # Create source with mock credentials 581 + source = S3Source( 582 + bucket="test-bucket", 583 + keys=["data.tar"], 584 + access_key="test", 585 + secret_key="test", 586 + ) 582 587 588 + # Mock the client after source creation 589 + with patch.object(source, "_get_client") as mock_get_client: 583 590 mock_client = Mock() 584 - mock_client.list_objects_v2.side_effect = ClientError( 591 + mock_client.get_object.side_effect = ClientError( 585 592 {"Error": {"Code": "AccessDenied", "Message": "Access Denied"}}, 586 - "ListObjects", 587 - ) 588 - mock_boto.return_value = mock_client 589 - 590 - source = S3Source( 591 - bucket="test-bucket", 592 - keys=["data.tar"], 593 - credentials={ 594 - "AWS_ACCESS_KEY_ID": "test", 595 - "AWS_SECRET_ACCESS_KEY": "test", 596 - }, 593 + "GetObject", 597 594 ) 595 + mock_get_client.return_value = mock_client 598 596 599 597 # Opening shard should propagate the error 598 + # Use full S3 URI as returned by shard_list 600 599 with pytest.raises(ClientError): 601 - source.open_shard("data.tar") 600 + source.open_shard("s3://test-bucket/data.tar") 602 601 603 602 def test_s3_connection_timeout_simulation(self): 604 603 """S3 connection timeout should raise appropriate error.""" 605 604 from atdata import S3Source 605 + from botocore.exceptions import ConnectTimeoutError 606 606 607 - with patch("boto3.client") as mock_boto: 608 - from botocore.exceptions import ConnectTimeoutError 607 + # Create source with mock credentials 608 + source = S3Source( 609 + bucket="test-bucket", 610 + keys=["data.tar"], 611 + access_key="test", 612 + secret_key="test", 613 + ) 609 614 615 + # Mock the client after source creation 616 + with patch.object(source, "_get_client") as mock_get_client: 610 617 mock_client = Mock() 611 618 mock_client.get_object.side_effect = ConnectTimeoutError(endpoint_url="s3://test") 612 - mock_boto.return_value = mock_client 613 - 614 - source = S3Source( 615 - bucket="test-bucket", 616 - keys=["data.tar"], 617 - credentials={ 618 - "AWS_ACCESS_KEY_ID": "test", 619 - "AWS_SECRET_ACCESS_KEY": "test", 620 - }, 621 - ) 619 + mock_get_client.return_value = mock_client 622 620 621 + # Use full S3 URI as returned by shard_list 623 622 with pytest.raises(ConnectTimeoutError): 624 - source.open_shard("data.tar") 623 + source.open_shard("s3://test-bucket/data.tar")