A loose federation of distributed, typed datasets
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

docs: document PDSBlobStore and BlobSource with updated tutorials and examples

- Add API reference pages for PDSBlobStore and BlobSource
- Update atmosphere tutorial with blob storage workflow
- Update atmosphere_demo.py example with blob storage case
- Regenerate docs site with new classes in API index

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+3765 -662
.chainlink/issues.db

This is a binary file and will not be displayed.

+7
CHANGELOG.md
··· 25 25 - **Comprehensive integration test suite**: 593 tests covering E2E flows, error handling, edge cases 26 26 27 27 ### Changed 28 + - Update atmosphere example with blob storage case (#216) 29 + - Implement PDSBlobStore for atmosphere data storage (#244) 30 + - Update docs and examples to include PDSBlobStore (#384) 31 + - Add API docs for PDSBlobStore and BlobSource (#388) 32 + - Update atmosphere_demo.py example (#387) 33 + - Update atmosphere reference docs (#386) 34 + - Update atmosphere tutorial with PDSBlobStore (#385) 28 35 - Implement PDSBlobStore for ATProto blob storage (#380) 29 36 - Add tests for PDSBlobStore and BlobSource (#383) 30 37 - Add BlobSource for reading PDS blobs as DataSource (#382)
+8 -3
docs/api/AbstractIndex.html
··· 459 459 </thead> 460 460 <tbody> 461 461 <tr class="odd"> 462 + <td><a href="#atdata.AbstractIndex.data_store">data_store</a></td> 463 + <td>Optional data store for reading/writing shards.</td> 464 + </tr> 465 + <tr class="even"> 462 466 <td><a href="#atdata.AbstractIndex.datasets">datasets</a></td> 463 467 <td>Lazily iterate over all dataset entries in this index.</td> 464 468 </tr> 465 - <tr class="even"> 469 + <tr class="odd"> 466 470 <td><a href="#atdata.AbstractIndex.schemas">schemas</a></td> 467 471 <td>Lazily iterate over all schema records in this index.</td> 468 472 </tr> ··· 847 851 <h3 class="anchored" data-anchor-id="atdata.AbstractIndex.publish_schema">publish_schema</h3> 848 852 <div class="sourceCode" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>AbstractIndex.publish_schema(sample_type, <span class="op">*</span>, version<span class="op">=</span><span class="st">'1.0.0'</span>, <span class="op">**</span>kwargs)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 849 853 <p>Publish a schema for a sample type.</p> 854 + <p>The sample_type is accepted as <code>type</code> rather than <code>Type[Packable]</code> to support <code>@packable</code>-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.</p> 850 855 <section id="parameters-4" class="level4 doc-section doc-section-parameters"> 851 856 <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-4">Parameters</h4> 852 857 <table class="caption-top table"> ··· 861 866 <tbody> 862 867 <tr class="odd"> 863 868 <td>sample_type</td> 864 - <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 865 - <td>A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>).</td> 869 + <td><a href="`type`">type</a></td> 870 + <td>A Packable type (PackableSample subclass or <span class="citation" data-cites="packable-decorated">@packable-decorated</span>). Validated at runtime via the <span class="citation" data-cites="runtime_checkable">@runtime_checkable</span> Packable protocol.</td> 866 871 <td><em>required</em></td> 867 872 </tr> 868 873 <tr class="even">
+12 -3
docs/api/AtmosphereIndex.html
··· 423 423 424 424 <section id="atdata.atmosphere.AtmosphereIndex" class="level1"> 425 425 <h1>AtmosphereIndex</h1> 426 - <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 426 + <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.AtmosphereIndex(client, <span class="op">*</span>, data_store<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 427 427 <p>ATProto index implementing AbstractIndex protocol.</p> 428 428 <p>Wraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.</p> 429 + <p>Optionally accepts a <code>PDSBlobStore</code> for writing dataset shards as ATProto blobs, enabling fully decentralized dataset storage.</p> 429 430 <section id="example" class="level2 doc-section doc-section-example"> 430 431 <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 431 432 <p>::</p> 432 433 <pre><code>&gt;&gt;&gt; client = AtmosphereClient() 433 434 &gt;&gt;&gt; client.login("handle.bsky.social", "app-password") 434 435 &gt;&gt;&gt; 436 + &gt;&gt;&gt; # Without blob storage (external URLs only) 435 437 &gt;&gt;&gt; index = AtmosphereIndex(client) 436 - &gt;&gt;&gt; schema_ref = index.publish_schema(MySample, version="1.0.0") 438 + &gt;&gt;&gt; 439 + &gt;&gt;&gt; # With PDS blob storage 440 + &gt;&gt;&gt; store = PDSBlobStore(client) 441 + &gt;&gt;&gt; index = AtmosphereIndex(client, data_store=store) 437 442 &gt;&gt;&gt; entry = index.insert_dataset(dataset, name="my-data")</code></pre> 438 443 </section> 439 444 <section id="attributes" class="level2"> ··· 447 452 </thead> 448 453 <tbody> 449 454 <tr class="odd"> 455 + <td><a href="#atdata.atmosphere.AtmosphereIndex.data_store">data_store</a></td> 456 + <td>The PDS blob store for writing shards, or None if not configured.</td> 457 + </tr> 458 + <tr class="even"> 450 459 <td><a href="#atdata.atmosphere.AtmosphereIndex.datasets">datasets</a></td> 451 460 <td>Lazily iterate over all dataset entries (AbstractIndex protocol).</td> 452 461 </tr> 453 - <tr class="even"> 462 + <tr class="odd"> 454 463 <td><a href="#atdata.atmosphere.AtmosphereIndex.schemas">schemas</a></td> 455 464 <td>Lazily iterate over all schema records (AbstractIndex protocol).</td> 456 465 </tr>
+1068
docs/api/BlobSource.html
··· 1 + <!DOCTYPE html> 2 + <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head> 3 + 4 + <meta charset="utf-8"> 5 + <meta name="generator" content="quarto-1.7.34"> 6 + 7 + <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> 8 + 9 + 10 + <title>blobsource – atdata</title> 11 + <style> 12 + code{white-space: pre-wrap;} 13 + span.smallcaps{font-variant: small-caps;} 14 + div.columns{display: flex; gap: min(4vw, 1.5em);} 15 + div.column{flex: auto; overflow-x: auto;} 16 + div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} 17 + ul.task-list{list-style: none;} 18 + ul.task-list li input[type="checkbox"] { 19 + width: 0.8em; 20 + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 21 + vertical-align: middle; 22 + } 23 + /* CSS for syntax highlighting */ 24 + html { -webkit-text-size-adjust: 100%; } 25 + pre > code.sourceCode { white-space: pre; position: relative; } 26 + pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } 27 + pre > code.sourceCode > span:empty { height: 1.2em; } 28 + .sourceCode { overflow: visible; } 29 + code.sourceCode > span { color: inherit; text-decoration: inherit; } 30 + div.sourceCode { margin: 1em 0; } 31 + pre.sourceCode { margin: 0; } 32 + @media screen { 33 + div.sourceCode { overflow: auto; } 34 + } 35 + @media print { 36 + pre > code.sourceCode { white-space: pre-wrap; } 37 + pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } 38 + } 39 + pre.numberSource code 40 + { counter-reset: source-line 0; } 41 + pre.numberSource code > span 42 + { position: relative; left: -4em; counter-increment: source-line; } 43 + pre.numberSource code > span > a:first-child::before 44 + { content: counter(source-line); 45 + position: relative; left: -1em; text-align: right; vertical-align: baseline; 46 + border: none; display: inline-block; 47 + -webkit-touch-callout: none; -webkit-user-select: none; 48 + -khtml-user-select: none; -moz-user-select: none; 49 + -ms-user-select: none; user-select: none; 50 + padding: 0 4px; width: 4em; 51 + } 52 + pre.numberSource { margin-left: 3em; padding-left: 4px; } 53 + div.sourceCode 54 + { } 55 + @media screen { 56 + pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } 57 + } 58 + </style> 59 + 60 + 61 + <script src="../site_libs/quarto-nav/quarto-nav.js"></script> 62 + <script src="../site_libs/quarto-nav/headroom.min.js"></script> 63 + <script src="../site_libs/clipboard/clipboard.min.js"></script> 64 + <script src="../site_libs/quarto-search/autocomplete.umd.js"></script> 65 + <script src="../site_libs/quarto-search/fuse.min.js"></script> 66 + <script src="../site_libs/quarto-search/quarto-search.js"></script> 67 + <meta name="quarto:offset" content="../"> 68 + <script src="../site_libs/quarto-html/quarto.js" type="module"></script> 69 + <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script> 70 + <script src="../site_libs/quarto-html/popper.min.js"></script> 71 + <script src="../site_libs/quarto-html/tippy.umd.min.js"></script> 72 + <script src="../site_libs/quarto-html/anchor.min.js"></script> 73 + <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet"> 74 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme" id="quarto-text-highlighting-styles"> 75 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-8dcd8563ea6803ab7cbb3d71ca5772e1.css" rel="stylesheet" class="quarto-color-scheme quarto-color-alternate" id="quarto-text-highlighting-styles"> 76 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme-extra" id="quarto-text-highlighting-styles"> 77 + <script src="../site_libs/bootstrap/bootstrap.min.js"></script> 78 + <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet"> 79 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme" id="quarto-bootstrap" data-mode="light"> 80 + <link href="../site_libs/bootstrap/bootstrap-dark-7964ffd8887b0991fe8d71c6c8bc75d6.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme quarto-color-alternate" id="quarto-bootstrap" data-mode="dark"> 81 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme-extra" id="quarto-bootstrap" data-mode="light"> 82 + <script id="quarto-search-options" type="application/json">{ 83 + "location": "navbar", 84 + "copy-button": false, 85 + "collapse-after": 3, 86 + "panel-placement": "end", 87 + "type": "overlay", 88 + "limit": 50, 89 + "keyboard-shortcut": [ 90 + "f", 91 + "/", 92 + "s" 93 + ], 94 + "show-item-context": false, 95 + "language": { 96 + "search-no-results-text": "No results", 97 + "search-matching-documents-text": "matching documents", 98 + "search-copy-link-title": "Copy link to search", 99 + "search-hide-matches-text": "Hide additional matches", 100 + "search-more-match-text": "more match in this document", 101 + "search-more-matches-text": "more matches in this document", 102 + "search-clear-button-title": "Clear", 103 + "search-text-placeholder": "", 104 + "search-detached-cancel-button-title": "Cancel", 105 + "search-submit-button-title": "Submit", 106 + "search-label": "Search" 107 + } 108 + }</script> 109 + 110 + 111 + <link rel="stylesheet" href="../assets/styles.css"> 112 + </head> 113 + 114 + <body class="nav-fixed quarto-light"><script id="quarto-html-before-body" type="application/javascript"> 115 + const toggleBodyColorMode = (bsSheetEl) => { 116 + const mode = bsSheetEl.getAttribute("data-mode"); 117 + const bodyEl = window.document.querySelector("body"); 118 + if (mode === "dark") { 119 + bodyEl.classList.add("quarto-dark"); 120 + bodyEl.classList.remove("quarto-light"); 121 + } else { 122 + bodyEl.classList.add("quarto-light"); 123 + bodyEl.classList.remove("quarto-dark"); 124 + } 125 + } 126 + const toggleBodyColorPrimary = () => { 127 + const bsSheetEl = window.document.querySelector("link#quarto-bootstrap:not([rel=disabled-stylesheet])"); 128 + if (bsSheetEl) { 129 + toggleBodyColorMode(bsSheetEl); 130 + } 131 + } 132 + const setColorSchemeToggle = (alternate) => { 133 + const toggles = window.document.querySelectorAll('.quarto-color-scheme-toggle'); 134 + for (let i=0; i < toggles.length; i++) { 135 + const toggle = toggles[i]; 136 + if (toggle) { 137 + if (alternate) { 138 + toggle.classList.add("alternate"); 139 + } else { 140 + toggle.classList.remove("alternate"); 141 + } 142 + } 143 + } 144 + }; 145 + const toggleColorMode = (alternate) => { 146 + // Switch the stylesheets 147 + const primaryStylesheets = window.document.querySelectorAll('link.quarto-color-scheme:not(.quarto-color-alternate)'); 148 + const alternateStylesheets = window.document.querySelectorAll('link.quarto-color-scheme.quarto-color-alternate'); 149 + manageTransitions('#quarto-margin-sidebar .nav-link', false); 150 + if (alternate) { 151 + // note: dark is layered on light, we don't disable primary! 152 + enableStylesheet(alternateStylesheets); 153 + for (const sheetNode of alternateStylesheets) { 154 + if (sheetNode.id === "quarto-bootstrap") { 155 + toggleBodyColorMode(sheetNode); 156 + } 157 + } 158 + } else { 159 + disableStylesheet(alternateStylesheets); 160 + enableStylesheet(primaryStylesheets) 161 + toggleBodyColorPrimary(); 162 + } 163 + manageTransitions('#quarto-margin-sidebar .nav-link', true); 164 + // Switch the toggles 165 + setColorSchemeToggle(alternate) 166 + // Hack to workaround the fact that safari doesn't 167 + // properly recolor the scrollbar when toggling (#1455) 168 + if (navigator.userAgent.indexOf('Safari') > 0 && navigator.userAgent.indexOf('Chrome') == -1) { 169 + manageTransitions("body", false); 170 + window.scrollTo(0, 1); 171 + setTimeout(() => { 172 + window.scrollTo(0, 0); 173 + manageTransitions("body", true); 174 + }, 40); 175 + } 176 + } 177 + const disableStylesheet = (stylesheets) => { 178 + for (let i=0; i < stylesheets.length; i++) { 179 + const stylesheet = stylesheets[i]; 180 + stylesheet.rel = 'disabled-stylesheet'; 181 + } 182 + } 183 + const enableStylesheet = (stylesheets) => { 184 + for (let i=0; i < stylesheets.length; i++) { 185 + const stylesheet = stylesheets[i]; 186 + if(stylesheet.rel !== 'stylesheet') { // for Chrome, which will still FOUC without this check 187 + stylesheet.rel = 'stylesheet'; 188 + } 189 + } 190 + } 191 + const manageTransitions = (selector, allowTransitions) => { 192 + const els = window.document.querySelectorAll(selector); 193 + for (let i=0; i < els.length; i++) { 194 + const el = els[i]; 195 + if (allowTransitions) { 196 + el.classList.remove('notransition'); 197 + } else { 198 + el.classList.add('notransition'); 199 + } 200 + } 201 + } 202 + const isFileUrl = () => { 203 + return window.location.protocol === 'file:'; 204 + } 205 + const hasAlternateSentinel = () => { 206 + let styleSentinel = getColorSchemeSentinel(); 207 + if (styleSentinel !== null) { 208 + return styleSentinel === "alternate"; 209 + } else { 210 + return false; 211 + } 212 + } 213 + const setStyleSentinel = (alternate) => { 214 + const value = alternate ? "alternate" : "default"; 215 + if (!isFileUrl()) { 216 + window.localStorage.setItem("quarto-color-scheme", value); 217 + } else { 218 + localAlternateSentinel = value; 219 + } 220 + } 221 + const getColorSchemeSentinel = () => { 222 + if (!isFileUrl()) { 223 + const storageValue = window.localStorage.getItem("quarto-color-scheme"); 224 + return storageValue != null ? storageValue : localAlternateSentinel; 225 + } else { 226 + return localAlternateSentinel; 227 + } 228 + } 229 + const toggleGiscusIfUsed = (isAlternate, darkModeDefault) => { 230 + const baseTheme = document.querySelector('#giscus-base-theme')?.value ?? 'light'; 231 + const alternateTheme = document.querySelector('#giscus-alt-theme')?.value ?? 'dark'; 232 + let newTheme = ''; 233 + if(authorPrefersDark) { 234 + newTheme = isAlternate ? baseTheme : alternateTheme; 235 + } else { 236 + newTheme = isAlternate ? alternateTheme : baseTheme; 237 + } 238 + const changeGiscusTheme = () => { 239 + // From: https://github.com/giscus/giscus/issues/336 240 + const sendMessage = (message) => { 241 + const iframe = document.querySelector('iframe.giscus-frame'); 242 + if (!iframe) return; 243 + iframe.contentWindow.postMessage({ giscus: message }, 'https://giscus.app'); 244 + } 245 + sendMessage({ 246 + setConfig: { 247 + theme: newTheme 248 + } 249 + }); 250 + } 251 + const isGiscussLoaded = window.document.querySelector('iframe.giscus-frame') !== null; 252 + if (isGiscussLoaded) { 253 + changeGiscusTheme(); 254 + } 255 + }; 256 + const authorPrefersDark = false; 257 + const darkModeDefault = authorPrefersDark; 258 + document.querySelector('link#quarto-text-highlighting-styles.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 259 + document.querySelector('link#quarto-bootstrap.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 260 + let localAlternateSentinel = darkModeDefault ? 'alternate' : 'default'; 261 + // Dark / light mode switch 262 + window.quartoToggleColorScheme = () => { 263 + // Read the current dark / light value 264 + let toAlternate = !hasAlternateSentinel(); 265 + toggleColorMode(toAlternate); 266 + setStyleSentinel(toAlternate); 267 + toggleGiscusIfUsed(toAlternate, darkModeDefault); 268 + window.dispatchEvent(new Event('resize')); 269 + }; 270 + // Switch to dark mode if need be 271 + if (hasAlternateSentinel()) { 272 + toggleColorMode(true); 273 + } else { 274 + toggleColorMode(false); 275 + } 276 + </script> 277 + 278 + <div id="quarto-search-results"></div> 279 + <header id="quarto-header" class="headroom fixed-top"> 280 + <nav class="navbar navbar-expand-lg " data-bs-theme="dark"> 281 + <div class="navbar-container container-fluid"> 282 + <div class="navbar-brand-container mx-auto"> 283 + <a class="navbar-brand" href="../index.html"> 284 + <span class="navbar-title">atdata</span> 285 + </a> 286 + </div> 287 + <div id="quarto-search" class="" title="Search"></div> 288 + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 289 + <span class="navbar-toggler-icon"></span> 290 + </button> 291 + <div class="collapse navbar-collapse" id="navbarCollapse"> 292 + <ul class="navbar-nav navbar-nav-scroll me-auto"> 293 + <li class="nav-item"> 294 + <a class="nav-link" href="../index.html"> 295 + <span class="menu-text">Guide</span></a> 296 + </li> 297 + <li class="nav-item dropdown "> 298 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-tutorials" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 299 + <span class="menu-text">Tutorials</span> 300 + </a> 301 + <ul class="dropdown-menu" aria-labelledby="nav-menu-tutorials"> 302 + <li> 303 + <a class="dropdown-item" href="../tutorials/quickstart.html"> 304 + <span class="dropdown-text">Quick Start</span></a> 305 + </li> 306 + <li> 307 + <a class="dropdown-item" href="../tutorials/local-workflow.html"> 308 + <span class="dropdown-text">Local Workflow</span></a> 309 + </li> 310 + <li> 311 + <a class="dropdown-item" href="../tutorials/atmosphere.html"> 312 + <span class="dropdown-text">Atmosphere Publishing</span></a> 313 + </li> 314 + <li> 315 + <a class="dropdown-item" href="../tutorials/promotion.html"> 316 + <span class="dropdown-text">Promotion Workflow</span></a> 317 + </li> 318 + </ul> 319 + </li> 320 + <li class="nav-item dropdown "> 321 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-reference" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 322 + <span class="menu-text">Reference</span> 323 + </a> 324 + <ul class="dropdown-menu" aria-labelledby="nav-menu-reference"> 325 + <li> 326 + <a class="dropdown-item" href="../reference/packable-samples.html"> 327 + <span class="dropdown-text">Packable Samples</span></a> 328 + </li> 329 + <li> 330 + <a class="dropdown-item" href="../reference/datasets.html"> 331 + <span class="dropdown-text">Datasets</span></a> 332 + </li> 333 + <li> 334 + <a class="dropdown-item" href="../reference/lenses.html"> 335 + <span class="dropdown-text">Lenses</span></a> 336 + </li> 337 + <li> 338 + <a class="dropdown-item" href="../reference/local-storage.html"> 339 + <span class="dropdown-text">Local Storage</span></a> 340 + </li> 341 + <li> 342 + <a class="dropdown-item" href="../reference/atmosphere.html"> 343 + <span class="dropdown-text">Atmosphere</span></a> 344 + </li> 345 + <li> 346 + <a class="dropdown-item" href="../reference/promotion.html"> 347 + <span class="dropdown-text">Promotion</span></a> 348 + </li> 349 + <li> 350 + <a class="dropdown-item" href="../reference/load-dataset.html"> 351 + <span class="dropdown-text">load_dataset API</span></a> 352 + </li> 353 + <li> 354 + <a class="dropdown-item" href="../reference/protocols.html"> 355 + <span class="dropdown-text">Protocols</span></a> 356 + </li> 357 + <li> 358 + <a class="dropdown-item" href="../reference/uri-spec.html"> 359 + <span class="dropdown-text">URI Specification</span></a> 360 + </li> 361 + <li> 362 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 363 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 364 + </li> 365 + <li> 366 + <a class="dropdown-item" href="../reference/deployment.html"> 367 + <span class="dropdown-text">Deployment Guide</span></a> 368 + </li> 369 + </ul> 370 + </li> 371 + <li class="nav-item"> 372 + <a class="nav-link" href="../api/index.html"> 373 + <span class="menu-text">API</span></a> 374 + </li> 375 + </ul> 376 + <ul class="navbar-nav navbar-nav-scroll ms-auto"> 377 + <li class="nav-item compact"> 378 + <a class="nav-link" href="https://github.com/your-org/atdata"> <i class="bi bi-github" role="img"> 379 + </i> 380 + <span class="menu-text"></span></a> 381 + </li> 382 + </ul> 383 + </div> <!-- /navcollapse --> 384 + <div class="quarto-navbar-tools"> 385 + <a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a> 386 + </div> 387 + </div> <!-- /container-fluid --> 388 + </nav> 389 + </header> 390 + <!-- content --> 391 + <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar"> 392 + <!-- sidebar --> 393 + <!-- margin-sidebar --> 394 + <div id="quarto-margin-sidebar" class="sidebar margin-sidebar"> 395 + <nav id="TOC" role="doc-toc" class="toc-active"> 396 + <h2 id="toc-title">On this page</h2> 397 + 398 + <ul> 399 + <li><a href="#atdata.BlobSource" id="toc-atdata.BlobSource" class="nav-link active" data-scroll-target="#atdata.BlobSource">BlobSource</a> 400 + <ul class="collapse"> 401 + <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 403 + <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 404 + <ul class="collapse"> 405 + <li><a href="#atdata.BlobSource.from_refs" id="toc-atdata.BlobSource.from_refs" class="nav-link" data-scroll-target="#atdata.BlobSource.from_refs">from_refs</a></li> 406 + <li><a href="#atdata.BlobSource.list_shards" id="toc-atdata.BlobSource.list_shards" class="nav-link" data-scroll-target="#atdata.BlobSource.list_shards">list_shards</a></li> 407 + <li><a href="#atdata.BlobSource.open_shard" id="toc-atdata.BlobSource.open_shard" class="nav-link" data-scroll-target="#atdata.BlobSource.open_shard">open_shard</a></li> 408 + </ul></li> 409 + </ul></li> 410 + </ul> 411 + <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/BlobSource.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 412 + </div> 413 + <!-- main --> 414 + <main class="content" id="quarto-document-content"><header id="title-block-header" class="quarto-title-block"></header> 415 + 416 + 417 + 418 + 419 + 420 + <section id="atdata.BlobSource" class="level1"> 421 + <h1>BlobSource</h1> 422 + <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>BlobSource(blob_refs, pds_endpoint<span class="op">=</span><span class="va">None</span>, _endpoint_cache<span class="op">=</span><span class="bu">dict</span>())</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 423 + <p>Data source for ATProto PDS blob storage.</p> 424 + <p>Streams dataset shards stored as blobs on an ATProto Personal Data Server. Each shard is identified by a blob reference containing the DID and CID.</p> 425 + <p>This source resolves blob references to HTTP URLs and streams the content directly, supporting efficient iteration over shards without downloading everything upfront.</p> 426 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 427 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 428 + <table class="caption-top table"> 429 + <thead> 430 + <tr class="header"> 431 + <th>Name</th> 432 + <th>Type</th> 433 + <th>Description</th> 434 + </tr> 435 + </thead> 436 + <tbody> 437 + <tr class="odd"> 438 + <td>blob_refs</td> 439 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>[<a href="`str`">str</a>, <a href="`str`">str</a>]]</td> 440 + <td>List of blob reference dicts with ‘did’ and ‘cid’ keys.</td> 441 + </tr> 442 + <tr class="even"> 443 + <td>pds_endpoint</td> 444 + <td><a href="`str`">str</a> | None</td> 445 + <td>Optional PDS endpoint URL. If not provided, resolved from DID.</td> 446 + </tr> 447 + </tbody> 448 + </table> 449 + </section> 450 + <section id="example" class="level2 doc-section doc-section-example"> 451 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 452 + <p>::</p> 453 + <pre><code>&gt;&gt;&gt; source = BlobSource( 454 + ... blob_refs=[ 455 + ... {"did": "did:plc:abc123", "cid": "bafyrei..."}, 456 + ... {"did": "did:plc:abc123", "cid": "bafyrei..."}, 457 + ... ], 458 + ... ) 459 + &gt;&gt;&gt; for shard_id, stream in source.shards: 460 + ... process(stream)</code></pre> 461 + </section> 462 + <section id="methods" class="level2"> 463 + <h2 class="anchored" data-anchor-id="methods">Methods</h2> 464 + <table class="caption-top table"> 465 + <thead> 466 + <tr class="header"> 467 + <th>Name</th> 468 + <th>Description</th> 469 + </tr> 470 + </thead> 471 + <tbody> 472 + <tr class="odd"> 473 + <td><a href="#atdata.BlobSource.from_refs">from_refs</a></td> 474 + <td>Create BlobSource from blob reference dicts.</td> 475 + </tr> 476 + <tr class="even"> 477 + <td><a href="#atdata.BlobSource.list_shards">list_shards</a></td> 478 + <td>Return list of AT URI-style shard identifiers.</td> 479 + </tr> 480 + <tr class="odd"> 481 + <td><a href="#atdata.BlobSource.open_shard">open_shard</a></td> 482 + <td>Open a single shard by its AT URI.</td> 483 + </tr> 484 + </tbody> 485 + </table> 486 + <section id="atdata.BlobSource.from_refs" class="level3"> 487 + <h3 class="anchored" data-anchor-id="atdata.BlobSource.from_refs">from_refs</h3> 488 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>BlobSource.from_refs(refs, <span class="op">*</span>, pds_endpoint<span class="op">=</span><span class="va">None</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 489 + <p>Create BlobSource from blob reference dicts.</p> 490 + <p>Accepts blob references in the format returned by upload_blob: <code>{"$type": "blob", "ref": {"$link": "cid"}, ...}</code></p> 491 + <p>Also accepts simplified format: <code>{"did": "...", "cid": "..."}</code></p> 492 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 493 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 494 + <table class="caption-top table"> 495 + <thead> 496 + <tr class="header"> 497 + <th>Name</th> 498 + <th>Type</th> 499 + <th>Description</th> 500 + <th>Default</th> 501 + </tr> 502 + </thead> 503 + <tbody> 504 + <tr class="odd"> 505 + <td>refs</td> 506 + <td><a href="`list`">list</a>[<a href="`dict`">dict</a>]</td> 507 + <td>List of blob reference dicts.</td> 508 + <td><em>required</em></td> 509 + </tr> 510 + <tr class="even"> 511 + <td>pds_endpoint</td> 512 + <td><a href="`str`">str</a> | None</td> 513 + <td>Optional PDS endpoint to use for all blobs.</td> 514 + <td><code>None</code></td> 515 + </tr> 516 + </tbody> 517 + </table> 518 + </section> 519 + <section id="returns" class="level4 doc-section doc-section-returns"> 520 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 521 + <table class="caption-top table"> 522 + <thead> 523 + <tr class="header"> 524 + <th>Name</th> 525 + <th>Type</th> 526 + <th>Description</th> 527 + </tr> 528 + </thead> 529 + <tbody> 530 + <tr class="odd"> 531 + <td></td> 532 + <td>'BlobSource'</td> 533 + <td>Configured BlobSource.</td> 534 + </tr> 535 + </tbody> 536 + </table> 537 + </section> 538 + <section id="raises" class="level4 doc-section doc-section-raises"> 539 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 540 + <table class="caption-top table"> 541 + <thead> 542 + <tr class="header"> 543 + <th>Name</th> 544 + <th>Type</th> 545 + <th>Description</th> 546 + </tr> 547 + </thead> 548 + <tbody> 549 + <tr class="odd"> 550 + <td></td> 551 + <td><a href="`ValueError`">ValueError</a></td> 552 + <td>If refs is empty or format is invalid.</td> 553 + </tr> 554 + </tbody> 555 + </table> 556 + </section> 557 + </section> 558 + <section id="atdata.BlobSource.list_shards" class="level3"> 559 + <h3 class="anchored" data-anchor-id="atdata.BlobSource.list_shards">list_shards</h3> 560 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>BlobSource.list_shards()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 561 + <p>Return list of AT URI-style shard identifiers.</p> 562 + </section> 563 + <section id="atdata.BlobSource.open_shard" class="level3"> 564 + <h3 class="anchored" data-anchor-id="atdata.BlobSource.open_shard">open_shard</h3> 565 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>BlobSource.open_shard(shard_id)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 566 + <p>Open a single shard by its AT URI.</p> 567 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 568 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 569 + <table class="caption-top table"> 570 + <thead> 571 + <tr class="header"> 572 + <th>Name</th> 573 + <th>Type</th> 574 + <th>Description</th> 575 + <th>Default</th> 576 + </tr> 577 + </thead> 578 + <tbody> 579 + <tr class="odd"> 580 + <td>shard_id</td> 581 + <td><a href="`str`">str</a></td> 582 + <td>AT URI of the shard (at://did/blob/cid).</td> 583 + <td><em>required</em></td> 584 + </tr> 585 + </tbody> 586 + </table> 587 + </section> 588 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 589 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 590 + <table class="caption-top table"> 591 + <thead> 592 + <tr class="header"> 593 + <th>Name</th> 594 + <th>Type</th> 595 + <th>Description</th> 596 + </tr> 597 + </thead> 598 + <tbody> 599 + <tr class="odd"> 600 + <td></td> 601 + <td><a href="`typing.IO`">IO</a>[<a href="`bytes`">bytes</a>]</td> 602 + <td>Streaming response body for reading the blob.</td> 603 + </tr> 604 + </tbody> 605 + </table> 606 + </section> 607 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 608 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 609 + <table class="caption-top table"> 610 + <thead> 611 + <tr class="header"> 612 + <th>Name</th> 613 + <th>Type</th> 614 + <th>Description</th> 615 + </tr> 616 + </thead> 617 + <tbody> 618 + <tr class="odd"> 619 + <td></td> 620 + <td><a href="`KeyError`">KeyError</a></td> 621 + <td>If shard_id is not in list_shards().</td> 622 + </tr> 623 + <tr class="even"> 624 + <td></td> 625 + <td><a href="`ValueError`">ValueError</a></td> 626 + <td>If shard_id format is invalid.</td> 627 + </tr> 628 + </tbody> 629 + </table> 630 + 631 + 632 + </section> 633 + </section> 634 + </section> 635 + </section> 636 + 637 + </main> <!-- /main --> 638 + <script id="quarto-html-after-body" type="application/javascript"> 639 + window.document.addEventListener("DOMContentLoaded", function (event) { 640 + // Ensure there is a toggle, if there isn't float one in the top right 641 + if (window.document.querySelector('.quarto-color-scheme-toggle') === null) { 642 + const a = window.document.createElement('a'); 643 + a.classList.add('top-right'); 644 + a.classList.add('quarto-color-scheme-toggle'); 645 + a.href = ""; 646 + a.onclick = function() { try { window.quartoToggleColorScheme(); } catch {} return false; }; 647 + const i = window.document.createElement("i"); 648 + i.classList.add('bi'); 649 + a.appendChild(i); 650 + window.document.body.appendChild(a); 651 + } 652 + setColorSchemeToggle(hasAlternateSentinel()) 653 + const icon = ""; 654 + const anchorJS = new window.AnchorJS(); 655 + anchorJS.options = { 656 + placement: 'right', 657 + icon: icon 658 + }; 659 + anchorJS.add('.anchored'); 660 + const isCodeAnnotation = (el) => { 661 + for (const clz of el.classList) { 662 + if (clz.startsWith('code-annotation-')) { 663 + return true; 664 + } 665 + } 666 + return false; 667 + } 668 + const onCopySuccess = function(e) { 669 + // button target 670 + const button = e.trigger; 671 + // don't keep focus 672 + button.blur(); 673 + // flash "checked" 674 + button.classList.add('code-copy-button-checked'); 675 + var currentTitle = button.getAttribute("title"); 676 + button.setAttribute("title", "Copied!"); 677 + let tooltip; 678 + if (window.bootstrap) { 679 + button.setAttribute("data-bs-toggle", "tooltip"); 680 + button.setAttribute("data-bs-placement", "left"); 681 + button.setAttribute("data-bs-title", "Copied!"); 682 + tooltip = new bootstrap.Tooltip(button, 683 + { trigger: "manual", 684 + customClass: "code-copy-button-tooltip", 685 + offset: [0, -8]}); 686 + tooltip.show(); 687 + } 688 + setTimeout(function() { 689 + if (tooltip) { 690 + tooltip.hide(); 691 + button.removeAttribute("data-bs-title"); 692 + button.removeAttribute("data-bs-toggle"); 693 + button.removeAttribute("data-bs-placement"); 694 + } 695 + button.setAttribute("title", currentTitle); 696 + button.classList.remove('code-copy-button-checked'); 697 + }, 1000); 698 + // clear code selection 699 + e.clearSelection(); 700 + } 701 + const getTextToCopy = function(trigger) { 702 + const codeEl = trigger.previousElementSibling.cloneNode(true); 703 + for (const childEl of codeEl.children) { 704 + if (isCodeAnnotation(childEl)) { 705 + childEl.remove(); 706 + } 707 + } 708 + return codeEl.innerText; 709 + } 710 + const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', { 711 + text: getTextToCopy 712 + }); 713 + clipboard.on('success', onCopySuccess); 714 + if (window.document.getElementById('quarto-embedded-source-code-modal')) { 715 + const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', { 716 + text: getTextToCopy, 717 + container: window.document.getElementById('quarto-embedded-source-code-modal') 718 + }); 719 + clipboardModal.on('success', onCopySuccess); 720 + } 721 + var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); 722 + var mailtoRegex = new RegExp(/^mailto:/); 723 + var filterRegex = new RegExp("https:\/\/github\.com\/your-org\/atdata"); 724 + var isInternal = (href) => { 725 + return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href); 726 + } 727 + // Inspect non-navigation links and adorn them if external 728 + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); 729 + for (var i=0; i<links.length; i++) { 730 + const link = links[i]; 731 + if (!isInternal(link.href)) { 732 + // undo the damage that might have been done by quarto-nav.js in the case of 733 + // links that we want to consider external 734 + if (link.dataset.originalHref !== undefined) { 735 + link.href = link.dataset.originalHref; 736 + } 737 + } 738 + } 739 + function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) { 740 + const config = { 741 + allowHTML: true, 742 + maxWidth: 500, 743 + delay: 100, 744 + arrow: false, 745 + appendTo: function(el) { 746 + return el.parentElement; 747 + }, 748 + interactive: true, 749 + interactiveBorder: 10, 750 + theme: 'quarto', 751 + placement: 'bottom-start', 752 + }; 753 + if (contentFn) { 754 + config.content = contentFn; 755 + } 756 + if (onTriggerFn) { 757 + config.onTrigger = onTriggerFn; 758 + } 759 + if (onUntriggerFn) { 760 + config.onUntrigger = onUntriggerFn; 761 + } 762 + window.tippy(el, config); 763 + } 764 + const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]'); 765 + for (var i=0; i<noterefs.length; i++) { 766 + const ref = noterefs[i]; 767 + tippyHover(ref, function() { 768 + // use id or data attribute instead here 769 + let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href'); 770 + try { href = new URL(href).hash; } catch {} 771 + const id = href.replace(/^#\/?/, ""); 772 + const note = window.document.getElementById(id); 773 + if (note) { 774 + return note.innerHTML; 775 + } else { 776 + return ""; 777 + } 778 + }); 779 + } 780 + const xrefs = window.document.querySelectorAll('a.quarto-xref'); 781 + const processXRef = (id, note) => { 782 + // Strip column container classes 783 + const stripColumnClz = (el) => { 784 + el.classList.remove("page-full", "page-columns"); 785 + if (el.children) { 786 + for (const child of el.children) { 787 + stripColumnClz(child); 788 + } 789 + } 790 + } 791 + stripColumnClz(note) 792 + if (id === null || id.startsWith('sec-')) { 793 + // Special case sections, only their first couple elements 794 + const container = document.createElement("div"); 795 + if (note.children && note.children.length > 2) { 796 + container.appendChild(note.children[0].cloneNode(true)); 797 + for (let i = 1; i < note.children.length; i++) { 798 + const child = note.children[i]; 799 + if (child.tagName === "P" && child.innerText === "") { 800 + continue; 801 + } else { 802 + container.appendChild(child.cloneNode(true)); 803 + break; 804 + } 805 + } 806 + if (window.Quarto?.typesetMath) { 807 + window.Quarto.typesetMath(container); 808 + } 809 + return container.innerHTML 810 + } else { 811 + if (window.Quarto?.typesetMath) { 812 + window.Quarto.typesetMath(note); 813 + } 814 + return note.innerHTML; 815 + } 816 + } else { 817 + // Remove any anchor links if they are present 818 + const anchorLink = note.querySelector('a.anchorjs-link'); 819 + if (anchorLink) { 820 + anchorLink.remove(); 821 + } 822 + if (window.Quarto?.typesetMath) { 823 + window.Quarto.typesetMath(note); 824 + } 825 + if (note.classList.contains("callout")) { 826 + return note.outerHTML; 827 + } else { 828 + return note.innerHTML; 829 + } 830 + } 831 + } 832 + for (var i=0; i<xrefs.length; i++) { 833 + const xref = xrefs[i]; 834 + tippyHover(xref, undefined, function(instance) { 835 + instance.disable(); 836 + let url = xref.getAttribute('href'); 837 + let hash = undefined; 838 + if (url.startsWith('#')) { 839 + hash = url; 840 + } else { 841 + try { hash = new URL(url).hash; } catch {} 842 + } 843 + if (hash) { 844 + const id = hash.replace(/^#\/?/, ""); 845 + const note = window.document.getElementById(id); 846 + if (note !== null) { 847 + try { 848 + const html = processXRef(id, note.cloneNode(true)); 849 + instance.setContent(html); 850 + } finally { 851 + instance.enable(); 852 + instance.show(); 853 + } 854 + } else { 855 + // See if we can fetch this 856 + fetch(url.split('#')[0]) 857 + .then(res => res.text()) 858 + .then(html => { 859 + const parser = new DOMParser(); 860 + const htmlDoc = parser.parseFromString(html, "text/html"); 861 + const note = htmlDoc.getElementById(id); 862 + if (note !== null) { 863 + const html = processXRef(id, note); 864 + instance.setContent(html); 865 + } 866 + }).finally(() => { 867 + instance.enable(); 868 + instance.show(); 869 + }); 870 + } 871 + } else { 872 + // See if we can fetch a full url (with no hash to target) 873 + // This is a special case and we should probably do some content thinning / targeting 874 + fetch(url) 875 + .then(res => res.text()) 876 + .then(html => { 877 + const parser = new DOMParser(); 878 + const htmlDoc = parser.parseFromString(html, "text/html"); 879 + const note = htmlDoc.querySelector('main.content'); 880 + if (note !== null) { 881 + // This should only happen for chapter cross references 882 + // (since there is no id in the URL) 883 + // remove the first header 884 + if (note.children.length > 0 && note.children[0].tagName === "HEADER") { 885 + note.children[0].remove(); 886 + } 887 + const html = processXRef(null, note); 888 + instance.setContent(html); 889 + } 890 + }).finally(() => { 891 + instance.enable(); 892 + instance.show(); 893 + }); 894 + } 895 + }, function(instance) { 896 + }); 897 + } 898 + let selectedAnnoteEl; 899 + const selectorForAnnotation = ( cell, annotation) => { 900 + let cellAttr = 'data-code-cell="' + cell + '"'; 901 + let lineAttr = 'data-code-annotation="' + annotation + '"'; 902 + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; 903 + return selector; 904 + } 905 + const selectCodeLines = (annoteEl) => { 906 + const doc = window.document; 907 + const targetCell = annoteEl.getAttribute("data-target-cell"); 908 + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); 909 + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); 910 + const lines = annoteSpan.getAttribute("data-code-lines").split(","); 911 + const lineIds = lines.map((line) => { 912 + return targetCell + "-" + line; 913 + }) 914 + let top = null; 915 + let height = null; 916 + let parent = null; 917 + if (lineIds.length > 0) { 918 + //compute the position of the single el (top and bottom and make a div) 919 + const el = window.document.getElementById(lineIds[0]); 920 + top = el.offsetTop; 921 + height = el.offsetHeight; 922 + parent = el.parentElement.parentElement; 923 + if (lineIds.length > 1) { 924 + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); 925 + const bottom = lastEl.offsetTop + lastEl.offsetHeight; 926 + height = bottom - top; 927 + } 928 + if (top !== null && height !== null && parent !== null) { 929 + // cook up a div (if necessary) and position it 930 + let div = window.document.getElementById("code-annotation-line-highlight"); 931 + if (div === null) { 932 + div = window.document.createElement("div"); 933 + div.setAttribute("id", "code-annotation-line-highlight"); 934 + div.style.position = 'absolute'; 935 + parent.appendChild(div); 936 + } 937 + div.style.top = top - 2 + "px"; 938 + div.style.height = height + 4 + "px"; 939 + div.style.left = 0; 940 + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); 941 + if (gutterDiv === null) { 942 + gutterDiv = window.document.createElement("div"); 943 + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); 944 + gutterDiv.style.position = 'absolute'; 945 + const codeCell = window.document.getElementById(targetCell); 946 + const gutter = codeCell.querySelector('.code-annotation-gutter'); 947 + gutter.appendChild(gutterDiv); 948 + } 949 + gutterDiv.style.top = top - 2 + "px"; 950 + gutterDiv.style.height = height + 4 + "px"; 951 + } 952 + selectedAnnoteEl = annoteEl; 953 + } 954 + }; 955 + const unselectCodeLines = () => { 956 + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; 957 + elementsIds.forEach((elId) => { 958 + const div = window.document.getElementById(elId); 959 + if (div) { 960 + div.remove(); 961 + } 962 + }); 963 + selectedAnnoteEl = undefined; 964 + }; 965 + // Handle positioning of the toggle 966 + window.addEventListener( 967 + "resize", 968 + throttle(() => { 969 + elRect = undefined; 970 + if (selectedAnnoteEl) { 971 + selectCodeLines(selectedAnnoteEl); 972 + } 973 + }, 10) 974 + ); 975 + function throttle(fn, ms) { 976 + let throttle = false; 977 + let timer; 978 + return (...args) => { 979 + if(!throttle) { // first call gets through 980 + fn.apply(this, args); 981 + throttle = true; 982 + } else { // all the others get throttled 983 + if(timer) clearTimeout(timer); // cancel #2 984 + timer = setTimeout(() => { 985 + fn.apply(this, args); 986 + timer = throttle = false; 987 + }, ms); 988 + } 989 + }; 990 + } 991 + // Attach click handler to the DT 992 + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); 993 + for (const annoteDlNode of annoteDls) { 994 + annoteDlNode.addEventListener('click', (event) => { 995 + const clickedEl = event.target; 996 + if (clickedEl !== selectedAnnoteEl) { 997 + unselectCodeLines(); 998 + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); 999 + if (activeEl) { 1000 + activeEl.classList.remove('code-annotation-active'); 1001 + } 1002 + selectCodeLines(clickedEl); 1003 + clickedEl.classList.add('code-annotation-active'); 1004 + } else { 1005 + // Unselect the line 1006 + unselectCodeLines(); 1007 + clickedEl.classList.remove('code-annotation-active'); 1008 + } 1009 + }); 1010 + } 1011 + const findCites = (el) => { 1012 + const parentEl = el.parentElement; 1013 + if (parentEl) { 1014 + const cites = parentEl.dataset.cites; 1015 + if (cites) { 1016 + return { 1017 + el, 1018 + cites: cites.split(' ') 1019 + }; 1020 + } else { 1021 + return findCites(el.parentElement) 1022 + } 1023 + } else { 1024 + return undefined; 1025 + } 1026 + }; 1027 + var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); 1028 + for (var i=0; i<bibliorefs.length; i++) { 1029 + const ref = bibliorefs[i]; 1030 + const citeInfo = findCites(ref); 1031 + if (citeInfo) { 1032 + tippyHover(citeInfo.el, function() { 1033 + var popup = window.document.createElement('div'); 1034 + citeInfo.cites.forEach(function(cite) { 1035 + var citeDiv = window.document.createElement('div'); 1036 + citeDiv.classList.add('hanging-indent'); 1037 + citeDiv.classList.add('csl-entry'); 1038 + var biblioDiv = window.document.getElementById('ref-' + cite); 1039 + if (biblioDiv) { 1040 + citeDiv.innerHTML = biblioDiv.innerHTML; 1041 + } 1042 + popup.appendChild(citeDiv); 1043 + }); 1044 + return popup.innerHTML; 1045 + }); 1046 + } 1047 + } 1048 + }); 1049 + </script> 1050 + </div> <!-- /content --> 1051 + <footer class="footer"> 1052 + <div class="nav-footer"> 1053 + <div class="nav-footer-left"> 1054 + <p>Built with <a href="https://quarto.org/">Quarto</a></p> 1055 + </div> 1056 + <div class="nav-footer-center"> 1057 + &nbsp; 1058 + <div class="toc-actions d-sm-block d-md-none"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/BlobSource.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></div> 1059 + <div class="nav-footer-right"> 1060 + <p>MIT License</p> 1061 + </div> 1062 + </div> 1063 + </footer> 1064 + 1065 + 1066 + 1067 + 1068 + </body></html>
+1187
docs/api/PDSBlobStore.html
··· 1 + <!DOCTYPE html> 2 + <html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en"><head> 3 + 4 + <meta charset="utf-8"> 5 + <meta name="generator" content="quarto-1.7.34"> 6 + 7 + <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes"> 8 + 9 + 10 + <title>pdsblobstore – atdata</title> 11 + <style> 12 + code{white-space: pre-wrap;} 13 + span.smallcaps{font-variant: small-caps;} 14 + div.columns{display: flex; gap: min(4vw, 1.5em);} 15 + div.column{flex: auto; overflow-x: auto;} 16 + div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;} 17 + ul.task-list{list-style: none;} 18 + ul.task-list li input[type="checkbox"] { 19 + width: 0.8em; 20 + margin: 0 0.8em 0.2em -1em; /* quarto-specific, see https://github.com/quarto-dev/quarto-cli/issues/4556 */ 21 + vertical-align: middle; 22 + } 23 + /* CSS for syntax highlighting */ 24 + html { -webkit-text-size-adjust: 100%; } 25 + pre > code.sourceCode { white-space: pre; position: relative; } 26 + pre > code.sourceCode > span { display: inline-block; line-height: 1.25; } 27 + pre > code.sourceCode > span:empty { height: 1.2em; } 28 + .sourceCode { overflow: visible; } 29 + code.sourceCode > span { color: inherit; text-decoration: inherit; } 30 + div.sourceCode { margin: 1em 0; } 31 + pre.sourceCode { margin: 0; } 32 + @media screen { 33 + div.sourceCode { overflow: auto; } 34 + } 35 + @media print { 36 + pre > code.sourceCode { white-space: pre-wrap; } 37 + pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; } 38 + } 39 + pre.numberSource code 40 + { counter-reset: source-line 0; } 41 + pre.numberSource code > span 42 + { position: relative; left: -4em; counter-increment: source-line; } 43 + pre.numberSource code > span > a:first-child::before 44 + { content: counter(source-line); 45 + position: relative; left: -1em; text-align: right; vertical-align: baseline; 46 + border: none; display: inline-block; 47 + -webkit-touch-callout: none; -webkit-user-select: none; 48 + -khtml-user-select: none; -moz-user-select: none; 49 + -ms-user-select: none; user-select: none; 50 + padding: 0 4px; width: 4em; 51 + } 52 + pre.numberSource { margin-left: 3em; padding-left: 4px; } 53 + div.sourceCode 54 + { } 55 + @media screen { 56 + pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; } 57 + } 58 + </style> 59 + 60 + 61 + <script src="../site_libs/quarto-nav/quarto-nav.js"></script> 62 + <script src="../site_libs/quarto-nav/headroom.min.js"></script> 63 + <script src="../site_libs/clipboard/clipboard.min.js"></script> 64 + <script src="../site_libs/quarto-search/autocomplete.umd.js"></script> 65 + <script src="../site_libs/quarto-search/fuse.min.js"></script> 66 + <script src="../site_libs/quarto-search/quarto-search.js"></script> 67 + <meta name="quarto:offset" content="../"> 68 + <script src="../site_libs/quarto-html/quarto.js" type="module"></script> 69 + <script src="../site_libs/quarto-html/tabsets/tabsets.js" type="module"></script> 70 + <script src="../site_libs/quarto-html/popper.min.js"></script> 71 + <script src="../site_libs/quarto-html/tippy.umd.min.js"></script> 72 + <script src="../site_libs/quarto-html/anchor.min.js"></script> 73 + <link href="../site_libs/quarto-html/tippy.css" rel="stylesheet"> 74 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme" id="quarto-text-highlighting-styles"> 75 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-dark-8dcd8563ea6803ab7cbb3d71ca5772e1.css" rel="stylesheet" class="quarto-color-scheme quarto-color-alternate" id="quarto-text-highlighting-styles"> 76 + <link href="../site_libs/quarto-html/quarto-syntax-highlighting-9582434199d49cc9e91654cdeeb4866b.css" rel="stylesheet" class="quarto-color-scheme-extra" id="quarto-text-highlighting-styles"> 77 + <script src="../site_libs/bootstrap/bootstrap.min.js"></script> 78 + <link href="../site_libs/bootstrap/bootstrap-icons.css" rel="stylesheet"> 79 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme" id="quarto-bootstrap" data-mode="light"> 80 + <link href="../site_libs/bootstrap/bootstrap-dark-7964ffd8887b0991fe8d71c6c8bc75d6.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme quarto-color-alternate" id="quarto-bootstrap" data-mode="dark"> 81 + <link href="../site_libs/bootstrap/bootstrap-62bce24ca844314e7bb1a34dbdfe05cc.min.css" rel="stylesheet" append-hash="true" class="quarto-color-scheme-extra" id="quarto-bootstrap" data-mode="light"> 82 + <script id="quarto-search-options" type="application/json">{ 83 + "location": "navbar", 84 + "copy-button": false, 85 + "collapse-after": 3, 86 + "panel-placement": "end", 87 + "type": "overlay", 88 + "limit": 50, 89 + "keyboard-shortcut": [ 90 + "f", 91 + "/", 92 + "s" 93 + ], 94 + "show-item-context": false, 95 + "language": { 96 + "search-no-results-text": "No results", 97 + "search-matching-documents-text": "matching documents", 98 + "search-copy-link-title": "Copy link to search", 99 + "search-hide-matches-text": "Hide additional matches", 100 + "search-more-match-text": "more match in this document", 101 + "search-more-matches-text": "more matches in this document", 102 + "search-clear-button-title": "Clear", 103 + "search-text-placeholder": "", 104 + "search-detached-cancel-button-title": "Cancel", 105 + "search-submit-button-title": "Submit", 106 + "search-label": "Search" 107 + } 108 + }</script> 109 + 110 + 111 + <link rel="stylesheet" href="../assets/styles.css"> 112 + </head> 113 + 114 + <body class="nav-fixed quarto-light"><script id="quarto-html-before-body" type="application/javascript"> 115 + const toggleBodyColorMode = (bsSheetEl) => { 116 + const mode = bsSheetEl.getAttribute("data-mode"); 117 + const bodyEl = window.document.querySelector("body"); 118 + if (mode === "dark") { 119 + bodyEl.classList.add("quarto-dark"); 120 + bodyEl.classList.remove("quarto-light"); 121 + } else { 122 + bodyEl.classList.add("quarto-light"); 123 + bodyEl.classList.remove("quarto-dark"); 124 + } 125 + } 126 + const toggleBodyColorPrimary = () => { 127 + const bsSheetEl = window.document.querySelector("link#quarto-bootstrap:not([rel=disabled-stylesheet])"); 128 + if (bsSheetEl) { 129 + toggleBodyColorMode(bsSheetEl); 130 + } 131 + } 132 + const setColorSchemeToggle = (alternate) => { 133 + const toggles = window.document.querySelectorAll('.quarto-color-scheme-toggle'); 134 + for (let i=0; i < toggles.length; i++) { 135 + const toggle = toggles[i]; 136 + if (toggle) { 137 + if (alternate) { 138 + toggle.classList.add("alternate"); 139 + } else { 140 + toggle.classList.remove("alternate"); 141 + } 142 + } 143 + } 144 + }; 145 + const toggleColorMode = (alternate) => { 146 + // Switch the stylesheets 147 + const primaryStylesheets = window.document.querySelectorAll('link.quarto-color-scheme:not(.quarto-color-alternate)'); 148 + const alternateStylesheets = window.document.querySelectorAll('link.quarto-color-scheme.quarto-color-alternate'); 149 + manageTransitions('#quarto-margin-sidebar .nav-link', false); 150 + if (alternate) { 151 + // note: dark is layered on light, we don't disable primary! 152 + enableStylesheet(alternateStylesheets); 153 + for (const sheetNode of alternateStylesheets) { 154 + if (sheetNode.id === "quarto-bootstrap") { 155 + toggleBodyColorMode(sheetNode); 156 + } 157 + } 158 + } else { 159 + disableStylesheet(alternateStylesheets); 160 + enableStylesheet(primaryStylesheets) 161 + toggleBodyColorPrimary(); 162 + } 163 + manageTransitions('#quarto-margin-sidebar .nav-link', true); 164 + // Switch the toggles 165 + setColorSchemeToggle(alternate) 166 + // Hack to workaround the fact that safari doesn't 167 + // properly recolor the scrollbar when toggling (#1455) 168 + if (navigator.userAgent.indexOf('Safari') > 0 && navigator.userAgent.indexOf('Chrome') == -1) { 169 + manageTransitions("body", false); 170 + window.scrollTo(0, 1); 171 + setTimeout(() => { 172 + window.scrollTo(0, 0); 173 + manageTransitions("body", true); 174 + }, 40); 175 + } 176 + } 177 + const disableStylesheet = (stylesheets) => { 178 + for (let i=0; i < stylesheets.length; i++) { 179 + const stylesheet = stylesheets[i]; 180 + stylesheet.rel = 'disabled-stylesheet'; 181 + } 182 + } 183 + const enableStylesheet = (stylesheets) => { 184 + for (let i=0; i < stylesheets.length; i++) { 185 + const stylesheet = stylesheets[i]; 186 + if(stylesheet.rel !== 'stylesheet') { // for Chrome, which will still FOUC without this check 187 + stylesheet.rel = 'stylesheet'; 188 + } 189 + } 190 + } 191 + const manageTransitions = (selector, allowTransitions) => { 192 + const els = window.document.querySelectorAll(selector); 193 + for (let i=0; i < els.length; i++) { 194 + const el = els[i]; 195 + if (allowTransitions) { 196 + el.classList.remove('notransition'); 197 + } else { 198 + el.classList.add('notransition'); 199 + } 200 + } 201 + } 202 + const isFileUrl = () => { 203 + return window.location.protocol === 'file:'; 204 + } 205 + const hasAlternateSentinel = () => { 206 + let styleSentinel = getColorSchemeSentinel(); 207 + if (styleSentinel !== null) { 208 + return styleSentinel === "alternate"; 209 + } else { 210 + return false; 211 + } 212 + } 213 + const setStyleSentinel = (alternate) => { 214 + const value = alternate ? "alternate" : "default"; 215 + if (!isFileUrl()) { 216 + window.localStorage.setItem("quarto-color-scheme", value); 217 + } else { 218 + localAlternateSentinel = value; 219 + } 220 + } 221 + const getColorSchemeSentinel = () => { 222 + if (!isFileUrl()) { 223 + const storageValue = window.localStorage.getItem("quarto-color-scheme"); 224 + return storageValue != null ? storageValue : localAlternateSentinel; 225 + } else { 226 + return localAlternateSentinel; 227 + } 228 + } 229 + const toggleGiscusIfUsed = (isAlternate, darkModeDefault) => { 230 + const baseTheme = document.querySelector('#giscus-base-theme')?.value ?? 'light'; 231 + const alternateTheme = document.querySelector('#giscus-alt-theme')?.value ?? 'dark'; 232 + let newTheme = ''; 233 + if(authorPrefersDark) { 234 + newTheme = isAlternate ? baseTheme : alternateTheme; 235 + } else { 236 + newTheme = isAlternate ? alternateTheme : baseTheme; 237 + } 238 + const changeGiscusTheme = () => { 239 + // From: https://github.com/giscus/giscus/issues/336 240 + const sendMessage = (message) => { 241 + const iframe = document.querySelector('iframe.giscus-frame'); 242 + if (!iframe) return; 243 + iframe.contentWindow.postMessage({ giscus: message }, 'https://giscus.app'); 244 + } 245 + sendMessage({ 246 + setConfig: { 247 + theme: newTheme 248 + } 249 + }); 250 + } 251 + const isGiscussLoaded = window.document.querySelector('iframe.giscus-frame') !== null; 252 + if (isGiscussLoaded) { 253 + changeGiscusTheme(); 254 + } 255 + }; 256 + const authorPrefersDark = false; 257 + const darkModeDefault = authorPrefersDark; 258 + document.querySelector('link#quarto-text-highlighting-styles.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 259 + document.querySelector('link#quarto-bootstrap.quarto-color-scheme-extra').rel = 'disabled-stylesheet'; 260 + let localAlternateSentinel = darkModeDefault ? 'alternate' : 'default'; 261 + // Dark / light mode switch 262 + window.quartoToggleColorScheme = () => { 263 + // Read the current dark / light value 264 + let toAlternate = !hasAlternateSentinel(); 265 + toggleColorMode(toAlternate); 266 + setStyleSentinel(toAlternate); 267 + toggleGiscusIfUsed(toAlternate, darkModeDefault); 268 + window.dispatchEvent(new Event('resize')); 269 + }; 270 + // Switch to dark mode if need be 271 + if (hasAlternateSentinel()) { 272 + toggleColorMode(true); 273 + } else { 274 + toggleColorMode(false); 275 + } 276 + </script> 277 + 278 + <div id="quarto-search-results"></div> 279 + <header id="quarto-header" class="headroom fixed-top"> 280 + <nav class="navbar navbar-expand-lg " data-bs-theme="dark"> 281 + <div class="navbar-container container-fluid"> 282 + <div class="navbar-brand-container mx-auto"> 283 + <a class="navbar-brand" href="../index.html"> 284 + <span class="navbar-title">atdata</span> 285 + </a> 286 + </div> 287 + <div id="quarto-search" class="" title="Search"></div> 288 + <button class="navbar-toggler" type="button" data-bs-toggle="collapse" data-bs-target="#navbarCollapse" aria-controls="navbarCollapse" role="menu" aria-expanded="false" aria-label="Toggle navigation" onclick="if (window.quartoToggleHeadroom) { window.quartoToggleHeadroom(); }"> 289 + <span class="navbar-toggler-icon"></span> 290 + </button> 291 + <div class="collapse navbar-collapse" id="navbarCollapse"> 292 + <ul class="navbar-nav navbar-nav-scroll me-auto"> 293 + <li class="nav-item"> 294 + <a class="nav-link" href="../index.html"> 295 + <span class="menu-text">Guide</span></a> 296 + </li> 297 + <li class="nav-item dropdown "> 298 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-tutorials" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 299 + <span class="menu-text">Tutorials</span> 300 + </a> 301 + <ul class="dropdown-menu" aria-labelledby="nav-menu-tutorials"> 302 + <li> 303 + <a class="dropdown-item" href="../tutorials/quickstart.html"> 304 + <span class="dropdown-text">Quick Start</span></a> 305 + </li> 306 + <li> 307 + <a class="dropdown-item" href="../tutorials/local-workflow.html"> 308 + <span class="dropdown-text">Local Workflow</span></a> 309 + </li> 310 + <li> 311 + <a class="dropdown-item" href="../tutorials/atmosphere.html"> 312 + <span class="dropdown-text">Atmosphere Publishing</span></a> 313 + </li> 314 + <li> 315 + <a class="dropdown-item" href="../tutorials/promotion.html"> 316 + <span class="dropdown-text">Promotion Workflow</span></a> 317 + </li> 318 + </ul> 319 + </li> 320 + <li class="nav-item dropdown "> 321 + <a class="nav-link dropdown-toggle" href="#" id="nav-menu-reference" role="link" data-bs-toggle="dropdown" aria-expanded="false"> 322 + <span class="menu-text">Reference</span> 323 + </a> 324 + <ul class="dropdown-menu" aria-labelledby="nav-menu-reference"> 325 + <li> 326 + <a class="dropdown-item" href="../reference/packable-samples.html"> 327 + <span class="dropdown-text">Packable Samples</span></a> 328 + </li> 329 + <li> 330 + <a class="dropdown-item" href="../reference/datasets.html"> 331 + <span class="dropdown-text">Datasets</span></a> 332 + </li> 333 + <li> 334 + <a class="dropdown-item" href="../reference/lenses.html"> 335 + <span class="dropdown-text">Lenses</span></a> 336 + </li> 337 + <li> 338 + <a class="dropdown-item" href="../reference/local-storage.html"> 339 + <span class="dropdown-text">Local Storage</span></a> 340 + </li> 341 + <li> 342 + <a class="dropdown-item" href="../reference/atmosphere.html"> 343 + <span class="dropdown-text">Atmosphere</span></a> 344 + </li> 345 + <li> 346 + <a class="dropdown-item" href="../reference/promotion.html"> 347 + <span class="dropdown-text">Promotion</span></a> 348 + </li> 349 + <li> 350 + <a class="dropdown-item" href="../reference/load-dataset.html"> 351 + <span class="dropdown-text">load_dataset API</span></a> 352 + </li> 353 + <li> 354 + <a class="dropdown-item" href="../reference/protocols.html"> 355 + <span class="dropdown-text">Protocols</span></a> 356 + </li> 357 + <li> 358 + <a class="dropdown-item" href="../reference/uri-spec.html"> 359 + <span class="dropdown-text">URI Specification</span></a> 360 + </li> 361 + <li> 362 + <a class="dropdown-item" href="../reference/troubleshooting.html"> 363 + <span class="dropdown-text">Troubleshooting &amp; FAQ</span></a> 364 + </li> 365 + <li> 366 + <a class="dropdown-item" href="../reference/deployment.html"> 367 + <span class="dropdown-text">Deployment Guide</span></a> 368 + </li> 369 + </ul> 370 + </li> 371 + <li class="nav-item"> 372 + <a class="nav-link" href="../api/index.html"> 373 + <span class="menu-text">API</span></a> 374 + </li> 375 + </ul> 376 + <ul class="navbar-nav navbar-nav-scroll ms-auto"> 377 + <li class="nav-item compact"> 378 + <a class="nav-link" href="https://github.com/your-org/atdata"> <i class="bi bi-github" role="img"> 379 + </i> 380 + <span class="menu-text"></span></a> 381 + </li> 382 + </ul> 383 + </div> <!-- /navcollapse --> 384 + <div class="quarto-navbar-tools"> 385 + <a href="" class="quarto-color-scheme-toggle quarto-navigation-tool px-1" onclick="window.quartoToggleColorScheme(); return false;" title="Toggle dark mode"><i class="bi"></i></a> 386 + </div> 387 + </div> <!-- /container-fluid --> 388 + </nav> 389 + </header> 390 + <!-- content --> 391 + <div id="quarto-content" class="quarto-container page-columns page-rows-contents page-layout-article page-navbar"> 392 + <!-- sidebar --> 393 + <!-- margin-sidebar --> 394 + <div id="quarto-margin-sidebar" class="sidebar margin-sidebar"> 395 + <nav id="TOC" role="doc-toc" class="toc-active"> 396 + <h2 id="toc-title">On this page</h2> 397 + 398 + <ul> 399 + <li><a href="#atdata.atmosphere.PDSBlobStore" id="toc-atdata.atmosphere.PDSBlobStore" class="nav-link active" data-scroll-target="#atdata.atmosphere.PDSBlobStore">PDSBlobStore</a> 400 + <ul class="collapse"> 401 + <li><a href="#attributes" id="toc-attributes" class="nav-link" data-scroll-target="#attributes">Attributes</a></li> 402 + <li><a href="#example" id="toc-example" class="nav-link" data-scroll-target="#example">Example</a></li> 403 + <li><a href="#methods" id="toc-methods" class="nav-link" data-scroll-target="#methods">Methods</a> 404 + <ul class="collapse"> 405 + <li><a href="#atdata.atmosphere.PDSBlobStore.create_source" id="toc-atdata.atmosphere.PDSBlobStore.create_source" class="nav-link" data-scroll-target="#atdata.atmosphere.PDSBlobStore.create_source">create_source</a></li> 406 + <li><a href="#atdata.atmosphere.PDSBlobStore.read_url" id="toc-atdata.atmosphere.PDSBlobStore.read_url" class="nav-link" data-scroll-target="#atdata.atmosphere.PDSBlobStore.read_url">read_url</a></li> 407 + <li><a href="#atdata.atmosphere.PDSBlobStore.supports_streaming" id="toc-atdata.atmosphere.PDSBlobStore.supports_streaming" class="nav-link" data-scroll-target="#atdata.atmosphere.PDSBlobStore.supports_streaming">supports_streaming</a></li> 408 + <li><a href="#atdata.atmosphere.PDSBlobStore.write_shards" id="toc-atdata.atmosphere.PDSBlobStore.write_shards" class="nav-link" data-scroll-target="#atdata.atmosphere.PDSBlobStore.write_shards">write_shards</a></li> 409 + </ul></li> 410 + </ul></li> 411 + </ul> 412 + <div class="toc-actions"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/PDSBlobStore.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></nav> 413 + </div> 414 + <!-- main --> 415 + <main class="content" id="quarto-document-content"><header id="title-block-header" class="quarto-title-block"></header> 416 + 417 + 418 + 419 + 420 + 421 + <section id="atdata.atmosphere.PDSBlobStore" class="level1"> 422 + <h1>PDSBlobStore</h1> 423 + <div class="sourceCode" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a>atmosphere.PDSBlobStore(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 424 + <p>PDS blob store implementing AbstractDataStore protocol.</p> 425 + <p>Stores dataset shards as ATProto blobs, enabling decentralized dataset storage on the AT Protocol network.</p> 426 + <p>Each shard is written to a temporary tar file, then uploaded as a blob to the user’s PDS. The returned URLs are AT URIs that can be resolved to HTTP URLs for streaming.</p> 427 + <section id="attributes" class="level2 doc-section doc-section-attributes"> 428 + <h2 class="doc-section doc-section-attributes anchored" data-anchor-id="attributes">Attributes</h2> 429 + <table class="caption-top table"> 430 + <colgroup> 431 + <col style="width: 11%"> 432 + <col style="width: 30%"> 433 + <col style="width: 58%"> 434 + </colgroup> 435 + <thead> 436 + <tr class="header"> 437 + <th>Name</th> 438 + <th>Type</th> 439 + <th>Description</th> 440 + </tr> 441 + </thead> 442 + <tbody> 443 + <tr class="odd"> 444 + <td>client</td> 445 + <td>'AtmosphereClient'</td> 446 + <td>Authenticated AtmosphereClient instance.</td> 447 + </tr> 448 + </tbody> 449 + </table> 450 + </section> 451 + <section id="example" class="level2 doc-section doc-section-example"> 452 + <h2 class="doc-section doc-section-example anchored" data-anchor-id="example">Example</h2> 453 + <p>::</p> 454 + <pre><code>&gt;&gt;&gt; store = PDSBlobStore(client) 455 + &gt;&gt;&gt; urls = store.write_shards(dataset, prefix="training/v1") 456 + &gt;&gt;&gt; # Returns AT URIs like: 457 + &gt;&gt;&gt; # ['at://did:plc:abc/blob/bafyrei...', ...]</code></pre> 458 + </section> 459 + <section id="methods" class="level2"> 460 + <h2 class="anchored" data-anchor-id="methods">Methods</h2> 461 + <table class="caption-top table"> 462 + <thead> 463 + <tr class="header"> 464 + <th>Name</th> 465 + <th>Description</th> 466 + </tr> 467 + </thead> 468 + <tbody> 469 + <tr class="odd"> 470 + <td><a href="#atdata.atmosphere.PDSBlobStore.create_source">create_source</a></td> 471 + <td>Create a BlobSource for reading these AT URIs.</td> 472 + </tr> 473 + <tr class="even"> 474 + <td><a href="#atdata.atmosphere.PDSBlobStore.read_url">read_url</a></td> 475 + <td>Resolve an AT URI blob reference to an HTTP URL.</td> 476 + </tr> 477 + <tr class="odd"> 478 + <td><a href="#atdata.atmosphere.PDSBlobStore.supports_streaming">supports_streaming</a></td> 479 + <td>PDS blobs support streaming via HTTP.</td> 480 + </tr> 481 + <tr class="even"> 482 + <td><a href="#atdata.atmosphere.PDSBlobStore.write_shards">write_shards</a></td> 483 + <td>Write dataset shards as PDS blobs.</td> 484 + </tr> 485 + </tbody> 486 + </table> 487 + <section id="atdata.atmosphere.PDSBlobStore.create_source" class="level3"> 488 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.PDSBlobStore.create_source">create_source</h3> 489 + <div class="sourceCode" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>atmosphere.PDSBlobStore.create_source(urls)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 490 + <p>Create a BlobSource for reading these AT URIs.</p> 491 + <p>This is a convenience method for creating a DataSource that can stream the blobs written by this store.</p> 492 + <section id="parameters" class="level4 doc-section doc-section-parameters"> 493 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters">Parameters</h4> 494 + <table class="caption-top table"> 495 + <thead> 496 + <tr class="header"> 497 + <th>Name</th> 498 + <th>Type</th> 499 + <th>Description</th> 500 + <th>Default</th> 501 + </tr> 502 + </thead> 503 + <tbody> 504 + <tr class="odd"> 505 + <td>urls</td> 506 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 507 + <td>List of AT URIs from write_shards().</td> 508 + <td><em>required</em></td> 509 + </tr> 510 + </tbody> 511 + </table> 512 + </section> 513 + <section id="returns" class="level4 doc-section doc-section-returns"> 514 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns">Returns</h4> 515 + <table class="caption-top table"> 516 + <thead> 517 + <tr class="header"> 518 + <th>Name</th> 519 + <th>Type</th> 520 + <th>Description</th> 521 + </tr> 522 + </thead> 523 + <tbody> 524 + <tr class="odd"> 525 + <td></td> 526 + <td>'BlobSource'</td> 527 + <td>BlobSource configured for the given URLs.</td> 528 + </tr> 529 + </tbody> 530 + </table> 531 + </section> 532 + <section id="raises" class="level4 doc-section doc-section-raises"> 533 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises">Raises</h4> 534 + <table class="caption-top table"> 535 + <thead> 536 + <tr class="header"> 537 + <th>Name</th> 538 + <th>Type</th> 539 + <th>Description</th> 540 + </tr> 541 + </thead> 542 + <tbody> 543 + <tr class="odd"> 544 + <td></td> 545 + <td><a href="`ValueError`">ValueError</a></td> 546 + <td>If URLs are not valid AT URIs.</td> 547 + </tr> 548 + </tbody> 549 + </table> 550 + </section> 551 + </section> 552 + <section id="atdata.atmosphere.PDSBlobStore.read_url" class="level3"> 553 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.PDSBlobStore.read_url">read_url</h3> 554 + <div class="sourceCode" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>atmosphere.PDSBlobStore.read_url(url)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 555 + <p>Resolve an AT URI blob reference to an HTTP URL.</p> 556 + <p>Transforms <code>at://did/blob/cid</code> URIs to HTTP URLs that can be streamed by WebDataset.</p> 557 + <section id="parameters-1" class="level4 doc-section doc-section-parameters"> 558 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-1">Parameters</h4> 559 + <table class="caption-top table"> 560 + <thead> 561 + <tr class="header"> 562 + <th>Name</th> 563 + <th>Type</th> 564 + <th>Description</th> 565 + <th>Default</th> 566 + </tr> 567 + </thead> 568 + <tbody> 569 + <tr class="odd"> 570 + <td>url</td> 571 + <td><a href="`str`">str</a></td> 572 + <td>AT URI in format <code>at://{did}/blob/{cid}</code>.</td> 573 + <td><em>required</em></td> 574 + </tr> 575 + </tbody> 576 + </table> 577 + </section> 578 + <section id="returns-1" class="level4 doc-section doc-section-returns"> 579 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-1">Returns</h4> 580 + <table class="caption-top table"> 581 + <thead> 582 + <tr class="header"> 583 + <th>Name</th> 584 + <th>Type</th> 585 + <th>Description</th> 586 + </tr> 587 + </thead> 588 + <tbody> 589 + <tr class="odd"> 590 + <td></td> 591 + <td><a href="`str`">str</a></td> 592 + <td>HTTP URL for fetching the blob via PDS API.</td> 593 + </tr> 594 + </tbody> 595 + </table> 596 + </section> 597 + <section id="raises-1" class="level4 doc-section doc-section-raises"> 598 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-1">Raises</h4> 599 + <table class="caption-top table"> 600 + <thead> 601 + <tr class="header"> 602 + <th>Name</th> 603 + <th>Type</th> 604 + <th>Description</th> 605 + </tr> 606 + </thead> 607 + <tbody> 608 + <tr class="odd"> 609 + <td></td> 610 + <td><a href="`ValueError`">ValueError</a></td> 611 + <td>If URL format is invalid or PDS cannot be resolved.</td> 612 + </tr> 613 + </tbody> 614 + </table> 615 + </section> 616 + </section> 617 + <section id="atdata.atmosphere.PDSBlobStore.supports_streaming" class="level3"> 618 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.PDSBlobStore.supports_streaming">supports_streaming</h3> 619 + <div class="sourceCode" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>atmosphere.PDSBlobStore.supports_streaming()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 620 + <p>PDS blobs support streaming via HTTP.</p> 621 + <section id="returns-2" class="level4 doc-section doc-section-returns"> 622 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-2">Returns</h4> 623 + <table class="caption-top table"> 624 + <thead> 625 + <tr class="header"> 626 + <th>Name</th> 627 + <th>Type</th> 628 + <th>Description</th> 629 + </tr> 630 + </thead> 631 + <tbody> 632 + <tr class="odd"> 633 + <td></td> 634 + <td><a href="`bool`">bool</a></td> 635 + <td>True.</td> 636 + </tr> 637 + </tbody> 638 + </table> 639 + </section> 640 + </section> 641 + <section id="atdata.atmosphere.PDSBlobStore.write_shards" class="level3"> 642 + <h3 class="anchored" data-anchor-id="atdata.atmosphere.PDSBlobStore.write_shards">write_shards</h3> 643 + <div class="sourceCode" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>atmosphere.PDSBlobStore.write_shards(</span> 644 + <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> ds,</span> 645 + <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> <span class="op">*</span>,</span> 646 + <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> prefix,</span> 647 + <span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> maxcount<span class="op">=</span><span class="dv">10000</span>,</span> 648 + <span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> maxsize<span class="op">=</span><span class="fl">3000000000.0</span>,</span> 649 + <span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> <span class="op">**</span>kwargs,</span> 650 + <span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 651 + <p>Write dataset shards as PDS blobs.</p> 652 + <p>Creates tar archives from the dataset and uploads each as a blob to the authenticated user’s PDS.</p> 653 + <section id="parameters-2" class="level4 doc-section doc-section-parameters"> 654 + <h4 class="doc-section doc-section-parameters anchored" data-anchor-id="parameters-2">Parameters</h4> 655 + <table class="caption-top table"> 656 + <thead> 657 + <tr class="header"> 658 + <th>Name</th> 659 + <th>Type</th> 660 + <th>Description</th> 661 + <th>Default</th> 662 + </tr> 663 + </thead> 664 + <tbody> 665 + <tr class="odd"> 666 + <td>ds</td> 667 + <td>'Dataset'</td> 668 + <td>The Dataset to write.</td> 669 + <td><em>required</em></td> 670 + </tr> 671 + <tr class="even"> 672 + <td>prefix</td> 673 + <td><a href="`str`">str</a></td> 674 + <td>Logical path prefix for naming (used in shard names only).</td> 675 + <td><em>required</em></td> 676 + </tr> 677 + <tr class="odd"> 678 + <td>maxcount</td> 679 + <td><a href="`int`">int</a></td> 680 + <td>Maximum samples per shard (default: 10000).</td> 681 + <td><code>10000</code></td> 682 + </tr> 683 + <tr class="even"> 684 + <td>maxsize</td> 685 + <td><a href="`float`">float</a></td> 686 + <td>Maximum shard size in bytes (default: 3GB, PDS limit).</td> 687 + <td><code>3000000000.0</code></td> 688 + </tr> 689 + <tr class="odd"> 690 + <td>**kwargs</td> 691 + <td><a href="`typing.Any`">Any</a></td> 692 + <td>Additional args passed to wds.ShardWriter.</td> 693 + <td><code>{}</code></td> 694 + </tr> 695 + </tbody> 696 + </table> 697 + </section> 698 + <section id="returns-3" class="level4 doc-section doc-section-returns"> 699 + <h4 class="doc-section doc-section-returns anchored" data-anchor-id="returns-3">Returns</h4> 700 + <table class="caption-top table"> 701 + <thead> 702 + <tr class="header"> 703 + <th>Name</th> 704 + <th>Type</th> 705 + <th>Description</th> 706 + </tr> 707 + </thead> 708 + <tbody> 709 + <tr class="odd"> 710 + <td></td> 711 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 712 + <td>List of AT URIs for the written blobs, in format:</td> 713 + </tr> 714 + <tr class="even"> 715 + <td></td> 716 + <td><a href="`list`">list</a>[<a href="`str`">str</a>]</td> 717 + <td><code>at://{did}/blob/{cid}</code></td> 718 + </tr> 719 + </tbody> 720 + </table> 721 + </section> 722 + <section id="raises-2" class="level4 doc-section doc-section-raises"> 723 + <h4 class="doc-section doc-section-raises anchored" data-anchor-id="raises-2">Raises</h4> 724 + <table class="caption-top table"> 725 + <thead> 726 + <tr class="header"> 727 + <th>Name</th> 728 + <th>Type</th> 729 + <th>Description</th> 730 + </tr> 731 + </thead> 732 + <tbody> 733 + <tr class="odd"> 734 + <td></td> 735 + <td><a href="`ValueError`">ValueError</a></td> 736 + <td>If not authenticated.</td> 737 + </tr> 738 + <tr class="even"> 739 + <td></td> 740 + <td><a href="`RuntimeError`">RuntimeError</a></td> 741 + <td>If no shards were written.</td> 742 + </tr> 743 + </tbody> 744 + </table> 745 + </section> 746 + <section id="note" class="level4 doc-section doc-section-note"> 747 + <h4 class="doc-section doc-section-note anchored" data-anchor-id="note">Note</h4> 748 + <p>PDS blobs have size limits (typically 50MB-5GB depending on PDS). Adjust maxcount/maxsize to stay within limits.</p> 749 + 750 + 751 + </section> 752 + </section> 753 + </section> 754 + </section> 755 + 756 + </main> <!-- /main --> 757 + <script id="quarto-html-after-body" type="application/javascript"> 758 + window.document.addEventListener("DOMContentLoaded", function (event) { 759 + // Ensure there is a toggle, if there isn't float one in the top right 760 + if (window.document.querySelector('.quarto-color-scheme-toggle') === null) { 761 + const a = window.document.createElement('a'); 762 + a.classList.add('top-right'); 763 + a.classList.add('quarto-color-scheme-toggle'); 764 + a.href = ""; 765 + a.onclick = function() { try { window.quartoToggleColorScheme(); } catch {} return false; }; 766 + const i = window.document.createElement("i"); 767 + i.classList.add('bi'); 768 + a.appendChild(i); 769 + window.document.body.appendChild(a); 770 + } 771 + setColorSchemeToggle(hasAlternateSentinel()) 772 + const icon = ""; 773 + const anchorJS = new window.AnchorJS(); 774 + anchorJS.options = { 775 + placement: 'right', 776 + icon: icon 777 + }; 778 + anchorJS.add('.anchored'); 779 + const isCodeAnnotation = (el) => { 780 + for (const clz of el.classList) { 781 + if (clz.startsWith('code-annotation-')) { 782 + return true; 783 + } 784 + } 785 + return false; 786 + } 787 + const onCopySuccess = function(e) { 788 + // button target 789 + const button = e.trigger; 790 + // don't keep focus 791 + button.blur(); 792 + // flash "checked" 793 + button.classList.add('code-copy-button-checked'); 794 + var currentTitle = button.getAttribute("title"); 795 + button.setAttribute("title", "Copied!"); 796 + let tooltip; 797 + if (window.bootstrap) { 798 + button.setAttribute("data-bs-toggle", "tooltip"); 799 + button.setAttribute("data-bs-placement", "left"); 800 + button.setAttribute("data-bs-title", "Copied!"); 801 + tooltip = new bootstrap.Tooltip(button, 802 + { trigger: "manual", 803 + customClass: "code-copy-button-tooltip", 804 + offset: [0, -8]}); 805 + tooltip.show(); 806 + } 807 + setTimeout(function() { 808 + if (tooltip) { 809 + tooltip.hide(); 810 + button.removeAttribute("data-bs-title"); 811 + button.removeAttribute("data-bs-toggle"); 812 + button.removeAttribute("data-bs-placement"); 813 + } 814 + button.setAttribute("title", currentTitle); 815 + button.classList.remove('code-copy-button-checked'); 816 + }, 1000); 817 + // clear code selection 818 + e.clearSelection(); 819 + } 820 + const getTextToCopy = function(trigger) { 821 + const codeEl = trigger.previousElementSibling.cloneNode(true); 822 + for (const childEl of codeEl.children) { 823 + if (isCodeAnnotation(childEl)) { 824 + childEl.remove(); 825 + } 826 + } 827 + return codeEl.innerText; 828 + } 829 + const clipboard = new window.ClipboardJS('.code-copy-button:not([data-in-quarto-modal])', { 830 + text: getTextToCopy 831 + }); 832 + clipboard.on('success', onCopySuccess); 833 + if (window.document.getElementById('quarto-embedded-source-code-modal')) { 834 + const clipboardModal = new window.ClipboardJS('.code-copy-button[data-in-quarto-modal]', { 835 + text: getTextToCopy, 836 + container: window.document.getElementById('quarto-embedded-source-code-modal') 837 + }); 838 + clipboardModal.on('success', onCopySuccess); 839 + } 840 + var localhostRegex = new RegExp(/^(?:http|https):\/\/localhost\:?[0-9]*\//); 841 + var mailtoRegex = new RegExp(/^mailto:/); 842 + var filterRegex = new RegExp("https:\/\/github\.com\/your-org\/atdata"); 843 + var isInternal = (href) => { 844 + return filterRegex.test(href) || localhostRegex.test(href) || mailtoRegex.test(href); 845 + } 846 + // Inspect non-navigation links and adorn them if external 847 + var links = window.document.querySelectorAll('a[href]:not(.nav-link):not(.navbar-brand):not(.toc-action):not(.sidebar-link):not(.sidebar-item-toggle):not(.pagination-link):not(.no-external):not([aria-hidden]):not(.dropdown-item):not(.quarto-navigation-tool):not(.about-link)'); 848 + for (var i=0; i<links.length; i++) { 849 + const link = links[i]; 850 + if (!isInternal(link.href)) { 851 + // undo the damage that might have been done by quarto-nav.js in the case of 852 + // links that we want to consider external 853 + if (link.dataset.originalHref !== undefined) { 854 + link.href = link.dataset.originalHref; 855 + } 856 + } 857 + } 858 + function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) { 859 + const config = { 860 + allowHTML: true, 861 + maxWidth: 500, 862 + delay: 100, 863 + arrow: false, 864 + appendTo: function(el) { 865 + return el.parentElement; 866 + }, 867 + interactive: true, 868 + interactiveBorder: 10, 869 + theme: 'quarto', 870 + placement: 'bottom-start', 871 + }; 872 + if (contentFn) { 873 + config.content = contentFn; 874 + } 875 + if (onTriggerFn) { 876 + config.onTrigger = onTriggerFn; 877 + } 878 + if (onUntriggerFn) { 879 + config.onUntrigger = onUntriggerFn; 880 + } 881 + window.tippy(el, config); 882 + } 883 + const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]'); 884 + for (var i=0; i<noterefs.length; i++) { 885 + const ref = noterefs[i]; 886 + tippyHover(ref, function() { 887 + // use id or data attribute instead here 888 + let href = ref.getAttribute('data-footnote-href') || ref.getAttribute('href'); 889 + try { href = new URL(href).hash; } catch {} 890 + const id = href.replace(/^#\/?/, ""); 891 + const note = window.document.getElementById(id); 892 + if (note) { 893 + return note.innerHTML; 894 + } else { 895 + return ""; 896 + } 897 + }); 898 + } 899 + const xrefs = window.document.querySelectorAll('a.quarto-xref'); 900 + const processXRef = (id, note) => { 901 + // Strip column container classes 902 + const stripColumnClz = (el) => { 903 + el.classList.remove("page-full", "page-columns"); 904 + if (el.children) { 905 + for (const child of el.children) { 906 + stripColumnClz(child); 907 + } 908 + } 909 + } 910 + stripColumnClz(note) 911 + if (id === null || id.startsWith('sec-')) { 912 + // Special case sections, only their first couple elements 913 + const container = document.createElement("div"); 914 + if (note.children && note.children.length > 2) { 915 + container.appendChild(note.children[0].cloneNode(true)); 916 + for (let i = 1; i < note.children.length; i++) { 917 + const child = note.children[i]; 918 + if (child.tagName === "P" && child.innerText === "") { 919 + continue; 920 + } else { 921 + container.appendChild(child.cloneNode(true)); 922 + break; 923 + } 924 + } 925 + if (window.Quarto?.typesetMath) { 926 + window.Quarto.typesetMath(container); 927 + } 928 + return container.innerHTML 929 + } else { 930 + if (window.Quarto?.typesetMath) { 931 + window.Quarto.typesetMath(note); 932 + } 933 + return note.innerHTML; 934 + } 935 + } else { 936 + // Remove any anchor links if they are present 937 + const anchorLink = note.querySelector('a.anchorjs-link'); 938 + if (anchorLink) { 939 + anchorLink.remove(); 940 + } 941 + if (window.Quarto?.typesetMath) { 942 + window.Quarto.typesetMath(note); 943 + } 944 + if (note.classList.contains("callout")) { 945 + return note.outerHTML; 946 + } else { 947 + return note.innerHTML; 948 + } 949 + } 950 + } 951 + for (var i=0; i<xrefs.length; i++) { 952 + const xref = xrefs[i]; 953 + tippyHover(xref, undefined, function(instance) { 954 + instance.disable(); 955 + let url = xref.getAttribute('href'); 956 + let hash = undefined; 957 + if (url.startsWith('#')) { 958 + hash = url; 959 + } else { 960 + try { hash = new URL(url).hash; } catch {} 961 + } 962 + if (hash) { 963 + const id = hash.replace(/^#\/?/, ""); 964 + const note = window.document.getElementById(id); 965 + if (note !== null) { 966 + try { 967 + const html = processXRef(id, note.cloneNode(true)); 968 + instance.setContent(html); 969 + } finally { 970 + instance.enable(); 971 + instance.show(); 972 + } 973 + } else { 974 + // See if we can fetch this 975 + fetch(url.split('#')[0]) 976 + .then(res => res.text()) 977 + .then(html => { 978 + const parser = new DOMParser(); 979 + const htmlDoc = parser.parseFromString(html, "text/html"); 980 + const note = htmlDoc.getElementById(id); 981 + if (note !== null) { 982 + const html = processXRef(id, note); 983 + instance.setContent(html); 984 + } 985 + }).finally(() => { 986 + instance.enable(); 987 + instance.show(); 988 + }); 989 + } 990 + } else { 991 + // See if we can fetch a full url (with no hash to target) 992 + // This is a special case and we should probably do some content thinning / targeting 993 + fetch(url) 994 + .then(res => res.text()) 995 + .then(html => { 996 + const parser = new DOMParser(); 997 + const htmlDoc = parser.parseFromString(html, "text/html"); 998 + const note = htmlDoc.querySelector('main.content'); 999 + if (note !== null) { 1000 + // This should only happen for chapter cross references 1001 + // (since there is no id in the URL) 1002 + // remove the first header 1003 + if (note.children.length > 0 && note.children[0].tagName === "HEADER") { 1004 + note.children[0].remove(); 1005 + } 1006 + const html = processXRef(null, note); 1007 + instance.setContent(html); 1008 + } 1009 + }).finally(() => { 1010 + instance.enable(); 1011 + instance.show(); 1012 + }); 1013 + } 1014 + }, function(instance) { 1015 + }); 1016 + } 1017 + let selectedAnnoteEl; 1018 + const selectorForAnnotation = ( cell, annotation) => { 1019 + let cellAttr = 'data-code-cell="' + cell + '"'; 1020 + let lineAttr = 'data-code-annotation="' + annotation + '"'; 1021 + const selector = 'span[' + cellAttr + '][' + lineAttr + ']'; 1022 + return selector; 1023 + } 1024 + const selectCodeLines = (annoteEl) => { 1025 + const doc = window.document; 1026 + const targetCell = annoteEl.getAttribute("data-target-cell"); 1027 + const targetAnnotation = annoteEl.getAttribute("data-target-annotation"); 1028 + const annoteSpan = window.document.querySelector(selectorForAnnotation(targetCell, targetAnnotation)); 1029 + const lines = annoteSpan.getAttribute("data-code-lines").split(","); 1030 + const lineIds = lines.map((line) => { 1031 + return targetCell + "-" + line; 1032 + }) 1033 + let top = null; 1034 + let height = null; 1035 + let parent = null; 1036 + if (lineIds.length > 0) { 1037 + //compute the position of the single el (top and bottom and make a div) 1038 + const el = window.document.getElementById(lineIds[0]); 1039 + top = el.offsetTop; 1040 + height = el.offsetHeight; 1041 + parent = el.parentElement.parentElement; 1042 + if (lineIds.length > 1) { 1043 + const lastEl = window.document.getElementById(lineIds[lineIds.length - 1]); 1044 + const bottom = lastEl.offsetTop + lastEl.offsetHeight; 1045 + height = bottom - top; 1046 + } 1047 + if (top !== null && height !== null && parent !== null) { 1048 + // cook up a div (if necessary) and position it 1049 + let div = window.document.getElementById("code-annotation-line-highlight"); 1050 + if (div === null) { 1051 + div = window.document.createElement("div"); 1052 + div.setAttribute("id", "code-annotation-line-highlight"); 1053 + div.style.position = 'absolute'; 1054 + parent.appendChild(div); 1055 + } 1056 + div.style.top = top - 2 + "px"; 1057 + div.style.height = height + 4 + "px"; 1058 + div.style.left = 0; 1059 + let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); 1060 + if (gutterDiv === null) { 1061 + gutterDiv = window.document.createElement("div"); 1062 + gutterDiv.setAttribute("id", "code-annotation-line-highlight-gutter"); 1063 + gutterDiv.style.position = 'absolute'; 1064 + const codeCell = window.document.getElementById(targetCell); 1065 + const gutter = codeCell.querySelector('.code-annotation-gutter'); 1066 + gutter.appendChild(gutterDiv); 1067 + } 1068 + gutterDiv.style.top = top - 2 + "px"; 1069 + gutterDiv.style.height = height + 4 + "px"; 1070 + } 1071 + selectedAnnoteEl = annoteEl; 1072 + } 1073 + }; 1074 + const unselectCodeLines = () => { 1075 + const elementsIds = ["code-annotation-line-highlight", "code-annotation-line-highlight-gutter"]; 1076 + elementsIds.forEach((elId) => { 1077 + const div = window.document.getElementById(elId); 1078 + if (div) { 1079 + div.remove(); 1080 + } 1081 + }); 1082 + selectedAnnoteEl = undefined; 1083 + }; 1084 + // Handle positioning of the toggle 1085 + window.addEventListener( 1086 + "resize", 1087 + throttle(() => { 1088 + elRect = undefined; 1089 + if (selectedAnnoteEl) { 1090 + selectCodeLines(selectedAnnoteEl); 1091 + } 1092 + }, 10) 1093 + ); 1094 + function throttle(fn, ms) { 1095 + let throttle = false; 1096 + let timer; 1097 + return (...args) => { 1098 + if(!throttle) { // first call gets through 1099 + fn.apply(this, args); 1100 + throttle = true; 1101 + } else { // all the others get throttled 1102 + if(timer) clearTimeout(timer); // cancel #2 1103 + timer = setTimeout(() => { 1104 + fn.apply(this, args); 1105 + timer = throttle = false; 1106 + }, ms); 1107 + } 1108 + }; 1109 + } 1110 + // Attach click handler to the DT 1111 + const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); 1112 + for (const annoteDlNode of annoteDls) { 1113 + annoteDlNode.addEventListener('click', (event) => { 1114 + const clickedEl = event.target; 1115 + if (clickedEl !== selectedAnnoteEl) { 1116 + unselectCodeLines(); 1117 + const activeEl = window.document.querySelector('dt[data-target-cell].code-annotation-active'); 1118 + if (activeEl) { 1119 + activeEl.classList.remove('code-annotation-active'); 1120 + } 1121 + selectCodeLines(clickedEl); 1122 + clickedEl.classList.add('code-annotation-active'); 1123 + } else { 1124 + // Unselect the line 1125 + unselectCodeLines(); 1126 + clickedEl.classList.remove('code-annotation-active'); 1127 + } 1128 + }); 1129 + } 1130 + const findCites = (el) => { 1131 + const parentEl = el.parentElement; 1132 + if (parentEl) { 1133 + const cites = parentEl.dataset.cites; 1134 + if (cites) { 1135 + return { 1136 + el, 1137 + cites: cites.split(' ') 1138 + }; 1139 + } else { 1140 + return findCites(el.parentElement) 1141 + } 1142 + } else { 1143 + return undefined; 1144 + } 1145 + }; 1146 + var bibliorefs = window.document.querySelectorAll('a[role="doc-biblioref"]'); 1147 + for (var i=0; i<bibliorefs.length; i++) { 1148 + const ref = bibliorefs[i]; 1149 + const citeInfo = findCites(ref); 1150 + if (citeInfo) { 1151 + tippyHover(citeInfo.el, function() { 1152 + var popup = window.document.createElement('div'); 1153 + citeInfo.cites.forEach(function(cite) { 1154 + var citeDiv = window.document.createElement('div'); 1155 + citeDiv.classList.add('hanging-indent'); 1156 + citeDiv.classList.add('csl-entry'); 1157 + var biblioDiv = window.document.getElementById('ref-' + cite); 1158 + if (biblioDiv) { 1159 + citeDiv.innerHTML = biblioDiv.innerHTML; 1160 + } 1161 + popup.appendChild(citeDiv); 1162 + }); 1163 + return popup.innerHTML; 1164 + }); 1165 + } 1166 + } 1167 + }); 1168 + </script> 1169 + </div> <!-- /content --> 1170 + <footer class="footer"> 1171 + <div class="nav-footer"> 1172 + <div class="nav-footer-left"> 1173 + <p>Built with <a href="https://quarto.org/">Quarto</a></p> 1174 + </div> 1175 + <div class="nav-footer-center"> 1176 + &nbsp; 1177 + <div class="toc-actions d-sm-block d-md-none"><ul><li><a href="https://github.com/your-org/atdata/edit/main/api/PDSBlobStore.qmd" class="toc-action"><i class="bi bi-github"></i>Edit this page</a></li><li><a href="https://github.com/your-org/atdata/issues/new" class="toc-action"><i class="bi empty"></i>Report an issue</a></li></ul></div></div> 1178 + <div class="nav-footer-right"> 1179 + <p>MIT License</p> 1180 + </div> 1181 + </div> 1182 + </footer> 1183 + 1184 + 1185 + 1186 + 1187 + </body></html>
+14 -6
docs/api/index.html
··· 467 467 <td><a href="../api/S3Source.html#atdata.S3Source">S3Source</a></td> 468 468 <td>Data source for S3-compatible storage with explicit credentials.</td> 469 469 </tr> 470 + <tr class="odd"> 471 + <td><a href="../api/BlobSource.html#atdata.BlobSource">BlobSource</a></td> 472 + <td>Data source for ATProto PDS blob storage.</td> 473 + </tr> 470 474 </tbody> 471 475 </table> 472 476 </section> ··· 508 512 <td>Entry wrapper for ATProto dataset records implementing IndexEntry protocol.</td> 509 513 </tr> 510 514 <tr class="even"> 515 + <td><a href="../api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore">PDSBlobStore</a></td> 516 + <td>PDS blob store implementing AbstractDataStore protocol.</td> 517 + </tr> 518 + <tr class="odd"> 511 519 <td><a href="../api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher">SchemaPublisher</a></td> 512 520 <td>Publishes PackableSample schemas to ATProto.</td> 513 521 </tr> 514 - <tr class="odd"> 522 + <tr class="even"> 515 523 <td><a href="../api/SchemaLoader.html#atdata.atmosphere.SchemaLoader">SchemaLoader</a></td> 516 524 <td>Loads PackableSample schemas from ATProto.</td> 517 525 </tr> 518 - <tr class="even"> 526 + <tr class="odd"> 519 527 <td><a href="../api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher">DatasetPublisher</a></td> 520 528 <td>Publishes dataset index records to ATProto.</td> 521 529 </tr> 522 - <tr class="odd"> 530 + <tr class="even"> 523 531 <td><a href="../api/DatasetLoader.html#atdata.atmosphere.DatasetLoader">DatasetLoader</a></td> 524 532 <td>Loads dataset records from ATProto.</td> 525 533 </tr> 526 - <tr class="even"> 534 + <tr class="odd"> 527 535 <td><a href="../api/LensPublisher.html#atdata.atmosphere.LensPublisher">LensPublisher</a></td> 528 536 <td>Publishes Lens transformation records to ATProto.</td> 529 537 </tr> 530 - <tr class="odd"> 538 + <tr class="even"> 531 539 <td><a href="../api/LensLoader.html#atdata.atmosphere.LensLoader">LensLoader</a></td> 532 540 <td>Loads lens records from ATProto.</td> 533 541 </tr> 534 - <tr class="even"> 542 + <tr class="odd"> 535 543 <td><a href="../api/AtUri.html#atdata.atmosphere.AtUri">AtUri</a></td> 536 544 <td>Parsed AT Protocol URI.</td> 537 545 </tr>
+3 -3
docs/api/local.Index.html
··· 1415 1415 <tbody> 1416 1416 <tr class="odd"> 1417 1417 <td>sample_type</td> 1418 - <td><a href="`typing.Type`">Type</a>[<a href="`atdata._protocols.Packable`">Packable</a>]</td> 1419 - <td>The PackableSample subclass to publish.</td> 1418 + <td><a href="`type`">type</a></td> 1419 + <td>A Packable type (<span class="citation" data-cites="packable-decorated">@packable-decorated</span> or PackableSample subclass).</td> 1420 1420 <td><em>required</em></td> 1421 1421 </tr> 1422 1422 <tr class="even"> ··· 1472 1472 <tr class="even"> 1473 1473 <td></td> 1474 1474 <td><a href="`TypeError`">TypeError</a></td> 1475 - <td>If a field type is not supported.</td> 1475 + <td>If sample_type doesn’t satisfy the Packable protocol, or if a field type is not supported.</td> 1476 1476 </tr> 1477 1477 </tbody> 1478 1478 </table>
+6 -6
docs/index.html
··· 619 619 <h2 class="anchored" data-anchor-id="quick-example">Quick Example</h2> 620 620 <section id="define-a-sample-type" class="level3"> 621 621 <h3 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h3> 622 - <div id="5e6ae3e6" class="cell"> 622 + <div id="e139d8b7" class="cell"> 623 623 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 624 624 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 625 625 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 633 633 </section> 634 634 <section id="create-and-write-samples" class="level3"> 635 635 <h3 class="anchored" data-anchor-id="create-and-write-samples">Create and Write Samples</h3> 636 - <div id="0eba47de" class="cell"> 636 + <div id="080b7c8f" class="cell"> 637 637 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 638 638 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 639 639 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> ··· 652 652 </section> 653 653 <section id="load-and-iterate" class="level3"> 654 654 <h3 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h3> 655 - <div id="4c9f9c26" class="cell"> 655 + <div id="960dad1e" class="cell"> 656 656 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-000000.tar"</span>)</span> 657 657 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 658 658 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate with batching</span></span> ··· 665 665 </section> 666 666 <section id="huggingface-style-loading" class="level2"> 667 667 <h2 class="anchored" data-anchor-id="huggingface-style-loading">HuggingFace-Style Loading</h2> 668 - <div id="0e22b49e" class="cell"> 668 + <div id="f199e76c" class="cell"> 669 669 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local path</span></span> 670 670 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.load_dataset(<span class="st">"path/to/data-{000000..000009}.tar"</span>, split<span class="op">=</span><span class="st">"train"</span>)</span> 671 671 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 677 677 </section> 678 678 <section id="local-storage-with-redis-s3" class="level2"> 679 679 <h2 class="anchored" data-anchor-id="local-storage-with-redis-s3">Local Storage with Redis + S3</h2> 680 - <div id="5bff0047" class="cell"> 680 + <div id="4513b884" class="cell"> 681 681 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 682 682 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 683 683 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 701 701 </section> 702 702 <section id="publish-to-atproto-federation" class="level2"> 703 703 <h2 class="anchored" data-anchor-id="publish-to-atproto-federation">Publish to ATProto Federation</h2> 704 - <div id="afceac3d" class="cell"> 704 + <div id="8e79a745" class="cell"> 705 705 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 706 706 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> 707 707 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span>
+398 -286
docs/reference/atmosphere.html
··· 539 539 <li><a href="#session-management" id="toc-session-management" class="nav-link" data-scroll-target="#session-management">Session Management</a></li> 540 540 <li><a href="#custom-pds" id="toc-custom-pds" class="nav-link" data-scroll-target="#custom-pds">Custom PDS</a></li> 541 541 </ul></li> 542 + <li><a href="#pdsblobstore" id="toc-pdsblobstore" class="nav-link" data-scroll-target="#pdsblobstore">PDSBlobStore</a> 543 + <ul class="collapse"> 544 + <li><a href="#size-limits" id="toc-size-limits" class="nav-link" data-scroll-target="#size-limits">Size Limits</a></li> 545 + </ul></li> 546 + <li><a href="#blobsource" id="toc-blobsource" class="nav-link" data-scroll-target="#blobsource">BlobSource</a></li> 542 547 <li><a href="#atmosphereindex" id="toc-atmosphereindex" class="nav-link" data-scroll-target="#atmosphereindex">AtmosphereIndex</a> 543 548 <ul class="collapse"> 544 549 <li><a href="#publishing-schemas" id="toc-publishing-schemas" class="nav-link" data-scroll-target="#publishing-schemas">Publishing Schemas</a></li> ··· 611 616 <section id="atmosphereclient" class="level2"> 612 617 <h2 class="anchored" data-anchor-id="atmosphereclient">AtmosphereClient</h2> 613 618 <p>The client handles authentication and record operations:</p> 614 - <div id="4c64e271" class="cell"> 619 + <div id="aaaea216" class="cell"> 615 620 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 616 621 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 617 622 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> ··· 638 643 <section id="session-management" class="level3"> 639 644 <h3 class="anchored" data-anchor-id="session-management">Session Management</h3> 640 645 <p>Save and restore sessions to avoid re-authentication:</p> 641 - <div id="d219859d" class="cell"> 646 + <div id="8f1cfb03" class="cell"> 642 647 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export session for later</span></span> 643 648 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>session_string <span class="op">=</span> client.export_session()</span> 644 649 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 650 655 <section id="custom-pds" class="level3"> 651 656 <h3 class="anchored" data-anchor-id="custom-pds">Custom PDS</h3> 652 657 <p>Connect to a custom PDS instead of bsky.social:</p> 653 - <div id="7d355909" class="cell"> 658 + <div id="458da4e2" class="cell"> 654 659 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient(base_url<span class="op">=</span><span class="st">"https://pds.example.com"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 655 660 </div> 656 661 </section> 657 662 </section> 658 - <section id="atmosphereindex" class="level2"> 659 - <h2 class="anchored" data-anchor-id="atmosphereindex">AtmosphereIndex</h2> 660 - <p>The unified interface for ATProto operations, implementing the AbstractIndex protocol:</p> 661 - <div id="c54e0fd7" class="cell"> 662 - <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 663 + <section id="pdsblobstore" class="level2"> 664 + <h2 class="anchored" data-anchor-id="pdsblobstore">PDSBlobStore</h2> 665 + <p>Store dataset shards as ATProto blobs for fully decentralized storage:</p> 666 + <div id="3223b941" class="cell"> 667 + <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, PDSBlobStore</span> 663 668 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 664 669 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 665 670 <span id="cb5-4"><a href="#cb5-4" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"handle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 666 671 <span id="cb5-5"><a href="#cb5-5" aria-hidden="true" tabindex="-1"></a></span> 667 - <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 672 + <span id="cb5-6"><a href="#cb5-6" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 673 + <span id="cb5-7"><a href="#cb5-7" aria-hidden="true" tabindex="-1"></a></span> 674 + <span id="cb5-8"><a href="#cb5-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Write shards as blobs</span></span> 675 + <span id="cb5-9"><a href="#cb5-9" aria-hidden="true" tabindex="-1"></a>urls <span class="op">=</span> store.write_shards(dataset, prefix<span class="op">=</span><span class="st">"my-data/v1"</span>)</span> 676 + <span id="cb5-10"><a href="#cb5-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Returns: ['at://did:plc:.../blob/bafyrei...', ...]</span></span> 677 + <span id="cb5-11"><a href="#cb5-11" aria-hidden="true" tabindex="-1"></a></span> 678 + <span id="cb5-12"><a href="#cb5-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Transform AT URIs to HTTP URLs for reading</span></span> 679 + <span id="cb5-13"><a href="#cb5-13" aria-hidden="true" tabindex="-1"></a>http_url <span class="op">=</span> store.read_url(urls[<span class="dv">0</span>])</span> 680 + <span id="cb5-14"><a href="#cb5-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Returns: 'https://pds.example.com/xrpc/com.atproto.sync.getBlob?...'</span></span> 681 + <span id="cb5-15"><a href="#cb5-15" aria-hidden="true" tabindex="-1"></a></span> 682 + <span id="cb5-16"><a href="#cb5-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a BlobSource for streaming</span></span> 683 + <span id="cb5-17"><a href="#cb5-17" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> store.create_source(urls)</span> 684 + <span id="cb5-18"><a href="#cb5-18" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[MySample](source)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 685 + </div> 686 + <section id="size-limits" class="level3"> 687 + <h3 class="anchored" data-anchor-id="size-limits">Size Limits</h3> 688 + <p>PDS blobs typically have size limits (often 50MB-5GB depending on the PDS). Use <code>maxcount</code> and <code>maxsize</code> parameters to control shard sizes:</p> 689 + <div id="d0ac7ae7" class="cell"> 690 + <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>urls <span class="op">=</span> store.write_shards(</span> 691 + <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 692 + <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> prefix<span class="op">=</span><span class="st">"large-data/v1"</span>,</span> 693 + <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a> maxcount<span class="op">=</span><span class="dv">5000</span>, <span class="co"># Max 5000 samples per shard</span></span> 694 + <span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a> maxsize<span class="op">=</span><span class="fl">50e6</span>, <span class="co"># Max 50MB per shard</span></span> 695 + <span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 696 + </div> 697 + </section> 698 + </section> 699 + <section id="blobsource" class="level2"> 700 + <h2 class="anchored" data-anchor-id="blobsource">BlobSource</h2> 701 + <p>Read datasets stored as PDS blobs:</p> 702 + <div id="1501abd5" class="cell"> 703 + <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> BlobSource</span> 704 + <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 705 + <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co"># From blob references</span></span> 706 + <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> BlobSource.from_refs([</span> 707 + <span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> {<span class="st">"did"</span>: <span class="st">"did:plc:abc123"</span>, <span class="st">"cid"</span>: <span class="st">"bafyrei111"</span>},</span> 708 + <span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> {<span class="st">"did"</span>: <span class="st">"did:plc:abc123"</span>, <span class="st">"cid"</span>: <span class="st">"bafyrei222"</span>},</span> 709 + <span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a>])</span> 710 + <span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a></span> 711 + <span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Or from PDSBlobStore</span></span> 712 + <span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> store.create_source(urls)</span> 713 + <span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span> 714 + <span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Use with Dataset</span></span> 715 + <span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[MySample](source)</span> 716 + <span id="cb7-14"><a href="#cb7-14" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 717 + <span id="cb7-15"><a href="#cb7-15" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 718 + </div> 719 + </section> 720 + <section id="atmosphereindex" class="level2"> 721 + <h2 class="anchored" data-anchor-id="atmosphereindex">AtmosphereIndex</h2> 722 + <p>The unified interface for ATProto operations, implementing the AbstractIndex protocol:</p> 723 + <div id="7a57a26c" class="cell"> 724 + <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex, PDSBlobStore</span> 725 + <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span> 726 + <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 727 + <span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"handle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 728 + <span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a></span> 729 + <span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Without blob storage (use external URLs)</span></span> 730 + <span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 731 + <span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span> 732 + <span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="co"># With PDS blob storage (recommended for full decentralization)</span></span> 733 + <span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 734 + <span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client, data_store<span class="op">=</span>store)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 668 735 </div> 669 736 <section id="publishing-schemas" class="level3"> 670 737 <h3 class="anchored" data-anchor-id="publishing-schemas">Publishing Schemas</h3> 671 - <div id="03e3bdc7" class="cell"> 672 - <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 673 - <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 674 - <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> 675 - <span id="cb6-4"><a href="#cb6-4" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 676 - <span id="cb6-5"><a href="#cb6-5" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ImageSample:</span> 677 - <span id="cb6-6"><a href="#cb6-6" aria-hidden="true" tabindex="-1"></a> image: NDArray</span> 678 - <span id="cb6-7"><a href="#cb6-7" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">str</span></span> 679 - <span id="cb6-8"><a href="#cb6-8" aria-hidden="true" tabindex="-1"></a> confidence: <span class="bu">float</span></span> 680 - <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a></span> 681 - <span id="cb6-10"><a href="#cb6-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish schema</span></span> 682 - <span id="cb6-11"><a href="#cb6-11" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 683 - <span id="cb6-12"><a href="#cb6-12" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> 684 - <span id="cb6-13"><a href="#cb6-13" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 685 - <span id="cb6-14"><a href="#cb6-14" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Image classification sample"</span>,</span> 686 - <span id="cb6-15"><a href="#cb6-15" aria-hidden="true" tabindex="-1"></a>)</span> 687 - <span id="cb6-16"><a href="#cb6-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Returns: "at://did:plc:.../ac.foundation.dataset.sampleSchema/..."</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 738 + <div id="87415c7d" class="cell"> 739 + <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 740 + <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 741 + <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> 742 + <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 743 + <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ImageSample:</span> 744 + <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> image: NDArray</span> 745 + <span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">str</span></span> 746 + <span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> confidence: <span class="bu">float</span></span> 747 + <span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a></span> 748 + <span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish schema</span></span> 749 + <span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 750 + <span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> 751 + <span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 752 + <span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Image classification sample"</span>,</span> 753 + <span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a>)</span> 754 + <span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Returns: "at://did:plc:.../ac.foundation.dataset.sampleSchema/..."</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 688 755 </div> 689 756 </section> 690 757 <section id="publishing-datasets" class="level3"> 691 758 <h3 class="anchored" data-anchor-id="publishing-datasets">Publishing Datasets</h3> 692 - <div id="0034f50d" class="cell"> 693 - <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 694 - <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 695 - <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 696 - <span id="cb7-4"><a href="#cb7-4" aria-hidden="true" tabindex="-1"></a> dataset,</span> 697 - <span id="cb7-5"><a href="#cb7-5" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"imagenet-subset"</span>,</span> 698 - <span id="cb7-6"><a href="#cb7-6" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri, <span class="co"># Optional - auto-publishes if omitted</span></span> 699 - <span id="cb7-7"><a href="#cb7-7" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"ImageNet subset"</span>,</span> 700 - <span id="cb7-8"><a href="#cb7-8" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"images"</span>, <span class="st">"classification"</span>],</span> 701 - <span id="cb7-9"><a href="#cb7-9" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="st">"MIT"</span>,</span> 702 - <span id="cb7-10"><a href="#cb7-10" aria-hidden="true" tabindex="-1"></a>)</span> 703 - <span id="cb7-11"><a href="#cb7-11" aria-hidden="true" tabindex="-1"></a></span> 704 - <span id="cb7-12"><a href="#cb7-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(entry.uri) <span class="co"># AT URI of the record</span></span> 705 - <span id="cb7-13"><a href="#cb7-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(entry.data_urls) <span class="co"># WebDataset URLs</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 759 + <div id="fe8a4a31" class="cell"> 760 + <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 761 + <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 762 + <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 763 + <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a> dataset,</span> 764 + <span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"imagenet-subset"</span>,</span> 765 + <span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri, <span class="co"># Optional - auto-publishes if omitted</span></span> 766 + <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"ImageNet subset"</span>,</span> 767 + <span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"images"</span>, <span class="st">"classification"</span>],</span> 768 + <span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="st">"MIT"</span>,</span> 769 + <span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>)</span> 770 + <span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a></span> 771 + <span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(entry.uri) <span class="co"># AT URI of the record</span></span> 772 + <span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(entry.data_urls) <span class="co"># WebDataset URLs</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 706 773 </div> 707 774 </section> 708 775 <section id="listing-and-retrieving" class="level3"> 709 776 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 710 - <div id="636979ca" class="cell"> 711 - <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># List your datasets</span></span> 712 - <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets():</span> 713 - <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> 714 - <span id="cb8-4"><a href="#cb8-4" aria-hidden="true" tabindex="-1"></a></span> 715 - <span id="cb8-5"><a href="#cb8-5" aria-hidden="true" tabindex="-1"></a><span class="co"># List from another user</span></span> 716 - <span id="cb8-6"><a href="#cb8-6" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets(repo<span class="op">=</span><span class="st">"did:plc:other-user"</span>):</span> 717 - <span id="cb8-7"><a href="#cb8-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(entry.name)</span> 718 - <span id="cb8-8"><a href="#cb8-8" aria-hidden="true" tabindex="-1"></a></span> 719 - <span id="cb8-9"><a href="#cb8-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Get specific dataset</span></span> 720 - <span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.get_dataset(<span class="st">"at://did:plc:.../ac.foundation.dataset.record/..."</span>)</span> 721 - <span id="cb8-11"><a href="#cb8-11" aria-hidden="true" tabindex="-1"></a></span> 722 - <span id="cb8-12"><a href="#cb8-12" aria-hidden="true" tabindex="-1"></a><span class="co"># List schemas</span></span> 723 - <span id="cb8-13"><a href="#cb8-13" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> schema <span class="kw">in</span> index.list_schemas():</span> 724 - <span id="cb8-14"><a href="#cb8-14" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>schema[<span class="st">'name'</span>]<span class="sc">}</span><span class="ss"> v</span><span class="sc">{</span>schema[<span class="st">'version'</span>]<span class="sc">}</span><span class="ss">"</span>)</span> 725 - <span id="cb8-15"><a href="#cb8-15" aria-hidden="true" tabindex="-1"></a></span> 726 - <span id="cb8-16"><a href="#cb8-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Decode schema to Python type</span></span> 727 - <span id="cb8-17"><a href="#cb8-17" aria-hidden="true" tabindex="-1"></a>SampleType <span class="op">=</span> index.decode_schema(schema_uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 777 + <div id="f5d05245" class="cell"> 778 + <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># List your datasets</span></span> 779 + <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets():</span> 780 + <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> 781 + <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a></span> 782 + <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="co"># List from another user</span></span> 783 + <span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.list_datasets(repo<span class="op">=</span><span class="st">"did:plc:other-user"</span>):</span> 784 + <span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(entry.name)</span> 785 + <span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a></span> 786 + <span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Get specific dataset</span></span> 787 + <span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.get_dataset(<span class="st">"at://did:plc:.../ac.foundation.dataset.record/..."</span>)</span> 788 + <span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a></span> 789 + <span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a><span class="co"># List schemas</span></span> 790 + <span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> schema <span class="kw">in</span> index.list_schemas():</span> 791 + <span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>schema[<span class="st">'name'</span>]<span class="sc">}</span><span class="ss"> v</span><span class="sc">{</span>schema[<span class="st">'version'</span>]<span class="sc">}</span><span class="ss">"</span>)</span> 792 + <span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a></span> 793 + <span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Decode schema to Python type</span></span> 794 + <span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>SampleType <span class="op">=</span> index.decode_schema(schema_uri)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 728 795 </div> 729 796 </section> 730 797 </section> ··· 733 800 <p>For more control, use the individual publisher classes:</p> 734 801 <section id="schemapublisher" class="level3"> 735 802 <h3 class="anchored" data-anchor-id="schemapublisher">SchemaPublisher</h3> 736 - <div id="690e6ad9" class="cell"> 737 - <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaPublisher</span> 738 - <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 739 - <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> SchemaPublisher(client)</span> 740 - <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span> 741 - <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 742 - <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> 743 - <span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"ImageSample"</span>,</span> 744 - <span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 745 - <span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Image with label"</span>,</span> 746 - <span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span>{<span class="st">"source"</span>: <span class="st">"training"</span>},</span> 747 - <span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 803 + <div id="1d16a999" class="cell"> 804 + <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaPublisher</span> 805 + <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span> 806 + <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> SchemaPublisher(client)</span> 807 + <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a></span> 808 + <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 809 + <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> 810 + <span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"ImageSample"</span>,</span> 811 + <span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 812 + <span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Image with label"</span>,</span> 813 + <span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> metadata<span class="op">=</span>{<span class="st">"source"</span>: <span class="st">"training"</span>},</span> 814 + <span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 748 815 </div> 749 816 </section> 750 817 <section id="datasetpublisher" class="level3"> 751 818 <h3 class="anchored" data-anchor-id="datasetpublisher">DatasetPublisher</h3> 752 - <div id="b4abaae7" class="cell"> 753 - <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetPublisher</span> 754 - <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 755 - <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> DatasetPublisher(client)</span> 756 - <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a></span> 757 - <span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 758 - <span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> dataset,</span> 759 - <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"training-images"</span>,</span> 760 - <span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span>schema_uri, <span class="co"># Required if auto_publish_schema=False</span></span> 761 - <span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> auto_publish_schema<span class="op">=</span><span class="va">True</span>, <span class="co"># Publish schema automatically</span></span> 762 - <span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Training images"</span>,</span> 763 - <span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"training"</span>, <span class="st">"images"</span>],</span> 764 - <span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="st">"MIT"</span>,</span> 765 - <span id="cb10-13"><a href="#cb10-13" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 819 + <div id="bce70b2f" class="cell"> 820 + <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetPublisher</span> 821 + <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span> 822 + <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> DatasetPublisher(client)</span> 823 + <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a></span> 824 + <span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 825 + <span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> dataset,</span> 826 + <span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"training-images"</span>,</span> 827 + <span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span>schema_uri, <span class="co"># Required if auto_publish_schema=False</span></span> 828 + <span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> auto_publish_schema<span class="op">=</span><span class="va">True</span>, <span class="co"># Publish schema automatically</span></span> 829 + <span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Training images"</span>,</span> 830 + <span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"training"</span>, <span class="st">"images"</span>],</span> 831 + <span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="st">"MIT"</span>,</span> 832 + <span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 766 833 </div> 767 834 <section id="blob-storage" class="level4"> 768 835 <h4 class="anchored" data-anchor-id="blob-storage">Blob Storage</h4> 769 - <p>For smaller datasets (up to ~50MB per shard), you can store data directly in ATProto blobs instead of external URLs:</p> 770 - <div id="17cf3e11" class="cell"> 771 - <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 772 - <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 773 - <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> 774 - <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Create tar data in memory</span></span> 775 - <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a>tar_buffer <span class="op">=</span> io.BytesIO()</span> 776 - <span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(tar_buffer) <span class="im">as</span> sink:</span> 777 - <span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, sample <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 778 - <span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>sample.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 779 - <span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a></span> 780 - <span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish with blob storage</span></span> 781 - <span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish_with_blobs(</span> 782 - <span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> blobs<span class="op">=</span>[tar_buffer.getvalue()],</span> 783 - <span id="cb11-13"><a href="#cb11-13" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span>schema_uri,</span> 784 - <span id="cb11-14"><a href="#cb11-14" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"small-dataset"</span>,</span> 785 - <span id="cb11-15"><a href="#cb11-15" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Dataset stored in ATProto blobs"</span>,</span> 786 - <span id="cb11-16"><a href="#cb11-16" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"small"</span>, <span class="st">"demo"</span>],</span> 787 - <span id="cb11-17"><a href="#cb11-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 836 + <p>There are two approaches to storing data as ATProto blobs:</p> 837 + <p><strong>Approach 1: PDSBlobStore (Recommended)</strong></p> 838 + <p>Use <code>PDSBlobStore</code> with <code>AtmosphereIndex</code> for automatic shard management:</p> 839 + <div id="38be7db2" class="cell"> 840 + <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> PDSBlobStore, AtmosphereIndex</span> 841 + <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span> 842 + <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 843 + <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client, data_store<span class="op">=</span>store)</span> 844 + <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a></span> 845 + <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Dataset shards are automatically uploaded as blobs</span></span> 846 + <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 847 + <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a> dataset,</span> 848 + <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> 849 + <span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 850 + <span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a>)</span> 851 + <span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a></span> 852 + <span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Later: load using BlobSource</span></span> 853 + <span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> store.create_source(entry.data_urls)</span> 854 + <span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[MySample](source)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 855 + </div> 856 + <p><strong>Approach 2: Manual Blob Publishing</strong></p> 857 + <p>For more control, use <code>DatasetPublisher.publish_with_blobs()</code> directly:</p> 858 + <div id="60bbdb6f" class="cell"> 859 + <div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 860 + <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 861 + <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a></span> 862 + <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Create tar data in memory</span></span> 863 + <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a>tar_buffer <span class="op">=</span> io.BytesIO()</span> 864 + <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(tar_buffer) <span class="im">as</span> sink:</span> 865 + <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, sample <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 866 + <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>sample.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 867 + <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a></span> 868 + <span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish with blob storage</span></span> 869 + <span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish_with_blobs(</span> 870 + <span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a> blobs<span class="op">=</span>[tar_buffer.getvalue()],</span> 871 + <span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span>schema_uri,</span> 872 + <span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"small-dataset"</span>,</span> 873 + <span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Dataset stored in ATProto blobs"</span>,</span> 874 + <span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"small"</span>, <span class="st">"demo"</span>],</span> 875 + <span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 788 876 </div> 789 - <p>To load datasets with blob storage:</p> 790 - <div id="411daf78" class="cell"> 791 - <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 792 - <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a></span> 793 - <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> 794 - <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a></span> 795 - <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 796 - <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> loader.get_storage_type(uri) <span class="co"># "external" or "blobs"</span></span> 797 - <span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a></span> 798 - <span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"blobs"</span>:</span> 799 - <span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> <span class="co"># Get blob URLs for direct access</span></span> 800 - <span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> blob_urls <span class="op">=</span> loader.get_blob_urls(uri)</span> 801 - <span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a></span> 802 - <span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a><span class="co"># to_dataset() handles both storage types automatically</span></span> 803 - <span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> loader.to_dataset(uri, MySample)</span> 804 - <span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 805 - <span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 877 + <p><strong>Loading Blob-Stored Datasets</strong></p> 878 + <div id="98785dee" class="cell"> 879 + <div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 880 + <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> BlobSource</span> 881 + <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a></span> 882 + <span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> 883 + <span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a></span> 884 + <span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 885 + <span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> loader.get_storage_type(uri) <span class="co"># "external" or "blobs"</span></span> 886 + <span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a></span> 887 + <span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"blobs"</span>:</span> 888 + <span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a> <span class="co"># Get blob URLs and create BlobSource</span></span> 889 + <span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a> blob_urls <span class="op">=</span> loader.get_blob_urls(uri)</span> 890 + <span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="co"># Parse to blob refs for BlobSource</span></span> 891 + <span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a> <span class="co"># Or use loader.to_dataset() which handles this automatically</span></span> 892 + <span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a></span> 893 + <span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a><span class="co"># to_dataset() handles both storage types automatically</span></span> 894 + <span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> loader.to_dataset(uri, MySample)</span> 895 + <span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 896 + <span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 806 897 </div> 807 898 </section> 808 899 </section> 809 900 <section id="lenspublisher" class="level3"> 810 901 <h3 class="anchored" data-anchor-id="lenspublisher">LensPublisher</h3> 811 - <div id="aac3596a" class="cell"> 812 - <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensPublisher</span> 813 - <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a></span> 814 - <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> LensPublisher(client)</span> 815 - <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a></span> 816 - <span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="co"># With code references</span></span> 817 - <span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 818 - <span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"simplify"</span>,</span> 819 - <span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a> source_schema<span class="op">=</span>full_schema_uri,</span> 820 - <span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a> target_schema<span class="op">=</span>simple_schema_uri,</span> 821 - <span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Extract label only"</span>,</span> 822 - <span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a> getter_code<span class="op">=</span>{</span> 823 - <span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"repository"</span>: <span class="st">"https://github.com/org/repo"</span>,</span> 824 - <span id="cb13-13"><a href="#cb13-13" aria-hidden="true" tabindex="-1"></a> <span class="st">"commit"</span>: <span class="st">"abc123def..."</span>,</span> 825 - <span id="cb13-14"><a href="#cb13-14" aria-hidden="true" tabindex="-1"></a> <span class="st">"path"</span>: <span class="st">"transforms/simplify.py:simplify_getter"</span>,</span> 826 - <span id="cb13-15"><a href="#cb13-15" aria-hidden="true" tabindex="-1"></a> },</span> 827 - <span id="cb13-16"><a href="#cb13-16" aria-hidden="true" tabindex="-1"></a> putter_code<span class="op">=</span>{</span> 828 - <span id="cb13-17"><a href="#cb13-17" aria-hidden="true" tabindex="-1"></a> <span class="st">"repository"</span>: <span class="st">"https://github.com/org/repo"</span>,</span> 829 - <span id="cb13-18"><a href="#cb13-18" aria-hidden="true" tabindex="-1"></a> <span class="st">"commit"</span>: <span class="st">"abc123def..."</span>,</span> 830 - <span id="cb13-19"><a href="#cb13-19" aria-hidden="true" tabindex="-1"></a> <span class="st">"path"</span>: <span class="st">"transforms/simplify.py:simplify_putter"</span>,</span> 831 - <span id="cb13-20"><a href="#cb13-20" aria-hidden="true" tabindex="-1"></a> },</span> 832 - <span id="cb13-21"><a href="#cb13-21" aria-hidden="true" tabindex="-1"></a>)</span> 833 - <span id="cb13-22"><a href="#cb13-22" aria-hidden="true" tabindex="-1"></a></span> 834 - <span id="cb13-23"><a href="#cb13-23" aria-hidden="true" tabindex="-1"></a><span class="co"># Or publish from a Lens object</span></span> 835 - <span id="cb13-24"><a href="#cb13-24" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.lens <span class="im">import</span> lens</span> 836 - <span id="cb13-25"><a href="#cb13-25" aria-hidden="true" tabindex="-1"></a></span> 837 - <span id="cb13-26"><a href="#cb13-26" aria-hidden="true" tabindex="-1"></a><span class="at">@lens</span></span> 838 - <span id="cb13-27"><a href="#cb13-27" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> simplify(src: FullSample) <span class="op">-&gt;</span> SimpleSample:</span> 839 - <span id="cb13-28"><a href="#cb13-28" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> SimpleSample(label<span class="op">=</span>src.label)</span> 840 - <span id="cb13-29"><a href="#cb13-29" aria-hidden="true" tabindex="-1"></a></span> 841 - <span id="cb13-30"><a href="#cb13-30" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish_from_lens(</span> 842 - <span id="cb13-31"><a href="#cb13-31" aria-hidden="true" tabindex="-1"></a> simplify,</span> 843 - <span id="cb13-32"><a href="#cb13-32" aria-hidden="true" tabindex="-1"></a> source_schema<span class="op">=</span>full_schema_uri,</span> 844 - <span id="cb13-33"><a href="#cb13-33" aria-hidden="true" tabindex="-1"></a> target_schema<span class="op">=</span>simple_schema_uri,</span> 845 - <span id="cb13-34"><a href="#cb13-34" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 902 + <div id="c6332a11" class="cell"> 903 + <div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensPublisher</span> 904 + <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span> 905 + <span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a>publisher <span class="op">=</span> LensPublisher(client)</span> 906 + <span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a></span> 907 + <span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a><span class="co"># With code references</span></span> 908 + <span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish(</span> 909 + <span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"simplify"</span>,</span> 910 + <span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a> source_schema<span class="op">=</span>full_schema_uri,</span> 911 + <span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a> target_schema<span class="op">=</span>simple_schema_uri,</span> 912 + <span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Extract label only"</span>,</span> 913 + <span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a> getter_code<span class="op">=</span>{</span> 914 + <span id="cb17-12"><a href="#cb17-12" aria-hidden="true" tabindex="-1"></a> <span class="st">"repository"</span>: <span class="st">"https://github.com/org/repo"</span>,</span> 915 + <span id="cb17-13"><a href="#cb17-13" aria-hidden="true" tabindex="-1"></a> <span class="st">"commit"</span>: <span class="st">"abc123def..."</span>,</span> 916 + <span id="cb17-14"><a href="#cb17-14" aria-hidden="true" tabindex="-1"></a> <span class="st">"path"</span>: <span class="st">"transforms/simplify.py:simplify_getter"</span>,</span> 917 + <span id="cb17-15"><a href="#cb17-15" aria-hidden="true" tabindex="-1"></a> },</span> 918 + <span id="cb17-16"><a href="#cb17-16" aria-hidden="true" tabindex="-1"></a> putter_code<span class="op">=</span>{</span> 919 + <span id="cb17-17"><a href="#cb17-17" aria-hidden="true" tabindex="-1"></a> <span class="st">"repository"</span>: <span class="st">"https://github.com/org/repo"</span>,</span> 920 + <span id="cb17-18"><a href="#cb17-18" aria-hidden="true" tabindex="-1"></a> <span class="st">"commit"</span>: <span class="st">"abc123def..."</span>,</span> 921 + <span id="cb17-19"><a href="#cb17-19" aria-hidden="true" tabindex="-1"></a> <span class="st">"path"</span>: <span class="st">"transforms/simplify.py:simplify_putter"</span>,</span> 922 + <span id="cb17-20"><a href="#cb17-20" aria-hidden="true" tabindex="-1"></a> },</span> 923 + <span id="cb17-21"><a href="#cb17-21" aria-hidden="true" tabindex="-1"></a>)</span> 924 + <span id="cb17-22"><a href="#cb17-22" aria-hidden="true" tabindex="-1"></a></span> 925 + <span id="cb17-23"><a href="#cb17-23" aria-hidden="true" tabindex="-1"></a><span class="co"># Or publish from a Lens object</span></span> 926 + <span id="cb17-24"><a href="#cb17-24" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.lens <span class="im">import</span> lens</span> 927 + <span id="cb17-25"><a href="#cb17-25" aria-hidden="true" tabindex="-1"></a></span> 928 + <span id="cb17-26"><a href="#cb17-26" aria-hidden="true" tabindex="-1"></a><span class="at">@lens</span></span> 929 + <span id="cb17-27"><a href="#cb17-27" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> simplify(src: FullSample) <span class="op">-&gt;</span> SimpleSample:</span> 930 + <span id="cb17-28"><a href="#cb17-28" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> SimpleSample(label<span class="op">=</span>src.label)</span> 931 + <span id="cb17-29"><a href="#cb17-29" aria-hidden="true" tabindex="-1"></a></span> 932 + <span id="cb17-30"><a href="#cb17-30" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> publisher.publish_from_lens(</span> 933 + <span id="cb17-31"><a href="#cb17-31" aria-hidden="true" tabindex="-1"></a> simplify,</span> 934 + <span id="cb17-32"><a href="#cb17-32" aria-hidden="true" tabindex="-1"></a> source_schema<span class="op">=</span>full_schema_uri,</span> 935 + <span id="cb17-33"><a href="#cb17-33" aria-hidden="true" tabindex="-1"></a> target_schema<span class="op">=</span>simple_schema_uri,</span> 936 + <span id="cb17-34"><a href="#cb17-34" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 846 937 </div> 847 938 </section> 848 939 </section> ··· 851 942 <p>For direct access to records, use the loader classes:</p> 852 943 <section id="schemaloader" class="level3"> 853 944 <h3 class="anchored" data-anchor-id="schemaloader">SchemaLoader</h3> 854 - <div id="5c5becfb" class="cell"> 855 - <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaLoader</span> 856 - <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a></span> 857 - <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> SchemaLoader(client)</span> 858 - <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a></span> 859 - <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific schema</span></span> 860 - <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a>schema <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 861 - <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(schema[<span class="st">"name"</span>], schema[<span class="st">"version"</span>])</span> 862 - <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a></span> 863 - <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a><span class="co"># List all schemas from a repository</span></span> 864 - <span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> schema <span class="kw">in</span> loader.list_all(repo<span class="op">=</span><span class="st">"did:plc:other-user"</span>):</span> 865 - <span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(schema[<span class="st">"name"</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 945 + <div id="c8e9a8ec" class="cell"> 946 + <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> SchemaLoader</span> 947 + <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a></span> 948 + <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> SchemaLoader(client)</span> 949 + <span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a></span> 950 + <span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific schema</span></span> 951 + <span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a>schema <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 952 + <span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(schema[<span class="st">"name"</span>], schema[<span class="st">"version"</span>])</span> 953 + <span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a></span> 954 + <span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="co"># List all schemas from a repository</span></span> 955 + <span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> schema <span class="kw">in</span> loader.list_all(repo<span class="op">=</span><span class="st">"did:plc:other-user"</span>):</span> 956 + <span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(schema[<span class="st">"name"</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 866 957 </div> 867 958 </section> 868 959 <section id="datasetloader" class="level3"> 869 960 <h3 class="anchored" data-anchor-id="datasetloader">DatasetLoader</h3> 870 - <div id="c0a3e839" class="cell"> 871 - <div class="sourceCode cell-code" id="cb15"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 872 - <span id="cb15-2"><a href="#cb15-2" aria-hidden="true" tabindex="-1"></a></span> 873 - <span id="cb15-3"><a href="#cb15-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> 874 - <span id="cb15-4"><a href="#cb15-4" aria-hidden="true" tabindex="-1"></a></span> 875 - <span id="cb15-5"><a href="#cb15-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific dataset record</span></span> 876 - <span id="cb15-6"><a href="#cb15-6" aria-hidden="true" tabindex="-1"></a>record <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.record/xyz"</span>)</span> 877 - <span id="cb15-7"><a href="#cb15-7" aria-hidden="true" tabindex="-1"></a></span> 878 - <span id="cb15-8"><a href="#cb15-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 879 - <span id="cb15-9"><a href="#cb15-9" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> loader.get_storage_type(uri) <span class="co"># "external" or "blobs"</span></span> 880 - <span id="cb15-10"><a href="#cb15-10" aria-hidden="true" tabindex="-1"></a></span> 881 - <span id="cb15-11"><a href="#cb15-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Get URLs based on storage type</span></span> 882 - <span id="cb15-12"><a href="#cb15-12" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"external"</span>:</span> 883 - <span id="cb15-13"><a href="#cb15-13" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_urls(uri)</span> 884 - <span id="cb15-14"><a href="#cb15-14" aria-hidden="true" tabindex="-1"></a><span class="cf">else</span>:</span> 885 - <span id="cb15-15"><a href="#cb15-15" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_blob_urls(uri)</span> 886 - <span id="cb15-16"><a href="#cb15-16" aria-hidden="true" tabindex="-1"></a></span> 887 - <span id="cb15-17"><a href="#cb15-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Get metadata</span></span> 888 - <span id="cb15-18"><a href="#cb15-18" aria-hidden="true" tabindex="-1"></a>metadata <span class="op">=</span> loader.get_metadata(uri)</span> 889 - <span id="cb15-19"><a href="#cb15-19" aria-hidden="true" tabindex="-1"></a></span> 890 - <span id="cb15-20"><a href="#cb15-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a Dataset object directly</span></span> 891 - <span id="cb15-21"><a href="#cb15-21" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> loader.to_dataset(uri, MySampleType)</span> 892 - <span id="cb15-22"><a href="#cb15-22" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 893 - <span id="cb15-23"><a href="#cb15-23" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 961 + <div id="b663bd4f" class="cell"> 962 + <div class="sourceCode cell-code" id="cb19"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> DatasetLoader</span> 963 + <span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a></span> 964 + <span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> DatasetLoader(client)</span> 965 + <span id="cb19-4"><a href="#cb19-4" aria-hidden="true" tabindex="-1"></a></span> 966 + <span id="cb19-5"><a href="#cb19-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific dataset record</span></span> 967 + <span id="cb19-6"><a href="#cb19-6" aria-hidden="true" tabindex="-1"></a>record <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.record/xyz"</span>)</span> 968 + <span id="cb19-7"><a href="#cb19-7" aria-hidden="true" tabindex="-1"></a></span> 969 + <span id="cb19-8"><a href="#cb19-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 970 + <span id="cb19-9"><a href="#cb19-9" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> loader.get_storage_type(uri) <span class="co"># "external" or "blobs"</span></span> 971 + <span id="cb19-10"><a href="#cb19-10" aria-hidden="true" tabindex="-1"></a></span> 972 + <span id="cb19-11"><a href="#cb19-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Get URLs based on storage type</span></span> 973 + <span id="cb19-12"><a href="#cb19-12" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"external"</span>:</span> 974 + <span id="cb19-13"><a href="#cb19-13" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_urls(uri)</span> 975 + <span id="cb19-14"><a href="#cb19-14" aria-hidden="true" tabindex="-1"></a><span class="cf">else</span>:</span> 976 + <span id="cb19-15"><a href="#cb19-15" aria-hidden="true" tabindex="-1"></a> urls <span class="op">=</span> loader.get_blob_urls(uri)</span> 977 + <span id="cb19-16"><a href="#cb19-16" aria-hidden="true" tabindex="-1"></a></span> 978 + <span id="cb19-17"><a href="#cb19-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Get metadata</span></span> 979 + <span id="cb19-18"><a href="#cb19-18" aria-hidden="true" tabindex="-1"></a>metadata <span class="op">=</span> loader.get_metadata(uri)</span> 980 + <span id="cb19-19"><a href="#cb19-19" aria-hidden="true" tabindex="-1"></a></span> 981 + <span id="cb19-20"><a href="#cb19-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a Dataset object directly</span></span> 982 + <span id="cb19-21"><a href="#cb19-21" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> loader.to_dataset(uri, MySampleType)</span> 983 + <span id="cb19-22"><a href="#cb19-22" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 984 + <span id="cb19-23"><a href="#cb19-23" aria-hidden="true" tabindex="-1"></a> process(batch)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 894 985 </div> 895 986 </section> 896 987 <section id="lensloader" class="level3"> 897 988 <h3 class="anchored" data-anchor-id="lensloader">LensLoader</h3> 898 - <div id="282a71b9" class="cell"> 899 - <div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensLoader</span> 900 - <span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a></span> 901 - <span id="cb16-3"><a href="#cb16-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> LensLoader(client)</span> 902 - <span id="cb16-4"><a href="#cb16-4" aria-hidden="true" tabindex="-1"></a></span> 903 - <span id="cb16-5"><a href="#cb16-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific lens record</span></span> 904 - <span id="cb16-6"><a href="#cb16-6" aria-hidden="true" tabindex="-1"></a>lens <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.lens/xyz"</span>)</span> 905 - <span id="cb16-7"><a href="#cb16-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 906 - <span id="cb16-8"><a href="#cb16-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"sourceSchema"</span>], <span class="st">"-&gt;"</span>, lens[<span class="st">"targetSchema"</span>])</span> 907 - <span id="cb16-9"><a href="#cb16-9" aria-hidden="true" tabindex="-1"></a></span> 908 - <span id="cb16-10"><a href="#cb16-10" aria-hidden="true" tabindex="-1"></a><span class="co"># List all lenses from a repository</span></span> 909 - <span id="cb16-11"><a href="#cb16-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> lens <span class="kw">in</span> loader.list_all():</span> 910 - <span id="cb16-12"><a href="#cb16-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 911 - <span id="cb16-13"><a href="#cb16-13" aria-hidden="true" tabindex="-1"></a></span> 912 - <span id="cb16-14"><a href="#cb16-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Find lenses by schema</span></span> 913 - <span id="cb16-15"><a href="#cb16-15" aria-hidden="true" tabindex="-1"></a>lenses <span class="op">=</span> loader.find_by_schemas(</span> 914 - <span id="cb16-16"><a href="#cb16-16" aria-hidden="true" tabindex="-1"></a> source_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source"</span>,</span> 915 - <span id="cb16-17"><a href="#cb16-17" aria-hidden="true" tabindex="-1"></a> target_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target"</span>,</span> 916 - <span id="cb16-18"><a href="#cb16-18" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 989 + <div id="479fac54" class="cell"> 990 + <div class="sourceCode cell-code" id="cb20"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> LensLoader</span> 991 + <span id="cb20-2"><a href="#cb20-2" aria-hidden="true" tabindex="-1"></a></span> 992 + <span id="cb20-3"><a href="#cb20-3" aria-hidden="true" tabindex="-1"></a>loader <span class="op">=</span> LensLoader(client)</span> 993 + <span id="cb20-4"><a href="#cb20-4" aria-hidden="true" tabindex="-1"></a></span> 994 + <span id="cb20-5"><a href="#cb20-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get a specific lens record</span></span> 995 + <span id="cb20-6"><a href="#cb20-6" aria-hidden="true" tabindex="-1"></a>lens <span class="op">=</span> loader.get(<span class="st">"at://did:plc:abc/ac.foundation.dataset.lens/xyz"</span>)</span> 996 + <span id="cb20-7"><a href="#cb20-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 997 + <span id="cb20-8"><a href="#cb20-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(lens[<span class="st">"sourceSchema"</span>], <span class="st">"-&gt;"</span>, lens[<span class="st">"targetSchema"</span>])</span> 998 + <span id="cb20-9"><a href="#cb20-9" aria-hidden="true" tabindex="-1"></a></span> 999 + <span id="cb20-10"><a href="#cb20-10" aria-hidden="true" tabindex="-1"></a><span class="co"># List all lenses from a repository</span></span> 1000 + <span id="cb20-11"><a href="#cb20-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> lens <span class="kw">in</span> loader.list_all():</span> 1001 + <span id="cb20-12"><a href="#cb20-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(lens[<span class="st">"name"</span>])</span> 1002 + <span id="cb20-13"><a href="#cb20-13" aria-hidden="true" tabindex="-1"></a></span> 1003 + <span id="cb20-14"><a href="#cb20-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Find lenses by schema</span></span> 1004 + <span id="cb20-15"><a href="#cb20-15" aria-hidden="true" tabindex="-1"></a>lenses <span class="op">=</span> loader.find_by_schemas(</span> 1005 + <span id="cb20-16"><a href="#cb20-16" aria-hidden="true" tabindex="-1"></a> source_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/source"</span>,</span> 1006 + <span id="cb20-17"><a href="#cb20-17" aria-hidden="true" tabindex="-1"></a> target_schema_uri<span class="op">=</span><span class="st">"at://did:plc:abc/ac.foundation.dataset.sampleSchema/target"</span>,</span> 1007 + <span id="cb20-18"><a href="#cb20-18" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 917 1008 </div> 918 1009 </section> 919 1010 </section> 920 1011 <section id="at-uris" class="level2"> 921 1012 <h2 class="anchored" data-anchor-id="at-uris">AT URIs</h2> 922 1013 <p>ATProto records are identified by AT URIs:</p> 923 - <div id="c3a85cc4" class="cell"> 924 - <div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtUri</span> 925 - <span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span> 926 - <span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Parse an AT URI</span></span> 927 - <span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> AtUri.parse(<span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 928 - <span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a></span> 929 - <span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.authority) <span class="co"># 'did:plc:abc123'</span></span> 930 - <span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.collection) <span class="co"># 'ac.foundation.dataset.sampleSchema'</span></span> 931 - <span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.rkey) <span class="co"># 'xyz'</span></span> 932 - <span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a></span> 933 - <span id="cb17-10"><a href="#cb17-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Format back to string</span></span> 934 - <span id="cb17-11"><a href="#cb17-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="bu">str</span>(uri)) <span class="co"># 'at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1014 + <div id="8b50cb03" class="cell"> 1015 + <div class="sourceCode cell-code" id="cb21"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtUri</span> 1016 + <span id="cb21-2"><a href="#cb21-2" aria-hidden="true" tabindex="-1"></a></span> 1017 + <span id="cb21-3"><a href="#cb21-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Parse an AT URI</span></span> 1018 + <span id="cb21-4"><a href="#cb21-4" aria-hidden="true" tabindex="-1"></a>uri <span class="op">=</span> AtUri.parse(<span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz"</span>)</span> 1019 + <span id="cb21-5"><a href="#cb21-5" aria-hidden="true" tabindex="-1"></a></span> 1020 + <span id="cb21-6"><a href="#cb21-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.authority) <span class="co"># 'did:plc:abc123'</span></span> 1021 + <span id="cb21-7"><a href="#cb21-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.collection) <span class="co"># 'ac.foundation.dataset.sampleSchema'</span></span> 1022 + <span id="cb21-8"><a href="#cb21-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(uri.rkey) <span class="co"># 'xyz'</span></span> 1023 + <span id="cb21-9"><a href="#cb21-9" aria-hidden="true" tabindex="-1"></a></span> 1024 + <span id="cb21-10"><a href="#cb21-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Format back to string</span></span> 1025 + <span id="cb21-11"><a href="#cb21-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="bu">str</span>(uri)) <span class="co"># 'at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz'</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 935 1026 </div> 936 1027 </section> 937 1028 <section id="supported-field-types" class="level2"> ··· 986 1077 </section> 987 1078 <section id="complete-example" class="level2"> 988 1079 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 989 - <div id="d3ec5a19" class="cell"> 990 - <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 991 - <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 992 - <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 993 - <span id="cb18-4"><a href="#cb18-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span> 994 - <span id="cb18-5"><a href="#cb18-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 995 - <span id="cb18-6"><a href="#cb18-6" aria-hidden="true" tabindex="-1"></a></span> 996 - <span id="cb18-7"><a href="#cb18-7" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 997 - <span id="cb18-8"><a href="#cb18-8" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 998 - <span id="cb18-9"><a href="#cb18-9" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 999 - <span id="cb18-10"><a href="#cb18-10" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 1000 - <span id="cb18-11"><a href="#cb18-11" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 1001 - <span id="cb18-12"><a href="#cb18-12" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 1002 - <span id="cb18-13"><a href="#cb18-13" aria-hidden="true" tabindex="-1"></a></span> 1003 - <span id="cb18-14"><a href="#cb18-14" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 1004 - <span id="cb18-15"><a href="#cb18-15" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 1005 - <span id="cb18-16"><a href="#cb18-16" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 1006 - <span id="cb18-17"><a href="#cb18-17" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 1007 - <span id="cb18-18"><a href="#cb18-18" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 1008 - <span id="cb18-19"><a href="#cb18-19" aria-hidden="true" tabindex="-1"></a> )</span> 1009 - <span id="cb18-20"><a href="#cb18-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 1010 - <span id="cb18-21"><a href="#cb18-21" aria-hidden="true" tabindex="-1"></a>]</span> 1011 - <span id="cb18-22"><a href="#cb18-22" aria-hidden="true" tabindex="-1"></a></span> 1012 - <span id="cb18-23"><a href="#cb18-23" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 1013 - <span id="cb18-24"><a href="#cb18-24" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 1014 - <span id="cb18-25"><a href="#cb18-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 1015 - <span id="cb18-26"><a href="#cb18-26" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 1016 - <span id="cb18-27"><a href="#cb18-27" aria-hidden="true" tabindex="-1"></a></span> 1017 - <span id="cb18-28"><a href="#cb18-28" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate</span></span> 1018 - <span id="cb18-29"><a href="#cb18-29" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 1019 - <span id="cb18-30"><a href="#cb18-30" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 1020 - <span id="cb18-31"><a href="#cb18-31" aria-hidden="true" tabindex="-1"></a></span> 1021 - <span id="cb18-32"><a href="#cb18-32" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 1022 - <span id="cb18-33"><a href="#cb18-33" aria-hidden="true" tabindex="-1"></a></span> 1023 - <span id="cb18-34"><a href="#cb18-34" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 1024 - <span id="cb18-35"><a href="#cb18-35" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 1025 - <span id="cb18-36"><a href="#cb18-36" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 1026 - <span id="cb18-37"><a href="#cb18-37" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 1027 - <span id="cb18-38"><a href="#cb18-38" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 1028 - <span id="cb18-39"><a href="#cb18-39" aria-hidden="true" tabindex="-1"></a>)</span> 1029 - <span id="cb18-40"><a href="#cb18-40" aria-hidden="true" tabindex="-1"></a></span> 1030 - <span id="cb18-41"><a href="#cb18-41" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset</span></span> 1031 - <span id="cb18-42"><a href="#cb18-42" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 1032 - <span id="cb18-43"><a href="#cb18-43" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 1033 - <span id="cb18-44"><a href="#cb18-44" aria-hidden="true" tabindex="-1"></a> dataset,</span> 1034 - <span id="cb18-45"><a href="#cb18-45" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 1035 - <span id="cb18-46"><a href="#cb18-46" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 1036 - <span id="cb18-47"><a href="#cb18-47" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 1037 - <span id="cb18-48"><a href="#cb18-48" aria-hidden="true" tabindex="-1"></a>)</span> 1038 - <span id="cb18-49"><a href="#cb18-49" aria-hidden="true" tabindex="-1"></a></span> 1039 - <span id="cb18-50"><a href="#cb18-50" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 1040 - <span id="cb18-51"><a href="#cb18-51" aria-hidden="true" tabindex="-1"></a></span> 1041 - <span id="cb18-52"><a href="#cb18-52" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. Later: discover and load</span></span> 1042 - <span id="cb18-53"><a href="#cb18-53" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> dataset_entry <span class="kw">in</span> index.list_datasets():</span> 1043 - <span id="cb18-54"><a href="#cb18-54" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Found: </span><span class="sc">{</span>dataset_entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">"</span>)</span> 1044 - <span id="cb18-55"><a href="#cb18-55" aria-hidden="true" tabindex="-1"></a></span> 1045 - <span id="cb18-56"><a href="#cb18-56" aria-hidden="true" tabindex="-1"></a> <span class="co"># Reconstruct type from schema</span></span> 1046 - <span id="cb18-57"><a href="#cb18-57" aria-hidden="true" tabindex="-1"></a> SampleType <span class="op">=</span> index.decode_schema(dataset_entry.schema_ref)</span> 1047 - <span id="cb18-58"><a href="#cb18-58" aria-hidden="true" tabindex="-1"></a></span> 1048 - <span id="cb18-59"><a href="#cb18-59" aria-hidden="true" tabindex="-1"></a> <span class="co"># Load dataset</span></span> 1049 - <span id="cb18-60"><a href="#cb18-60" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> atdata.Dataset[SampleType](dataset_entry.data_urls[<span class="dv">0</span>])</span> 1050 - <span id="cb18-61"><a href="#cb18-61" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 1051 - <span id="cb18-62"><a href="#cb18-62" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(batch.features.shape)</span> 1052 - <span id="cb18-63"><a href="#cb18-63" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1080 + <p>This example shows the full workflow using <code>PDSBlobStore</code> for decentralized storage:</p> 1081 + <div id="e67be904" class="cell"> 1082 + <div class="sourceCode cell-code" id="cb22"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 1083 + <span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 1084 + <span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 1085 + <span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex, PDSBlobStore</span> 1086 + <span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 1087 + <span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a></span> 1088 + <span id="cb22-7"><a href="#cb22-7" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 1089 + <span id="cb22-8"><a href="#cb22-8" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 1090 + <span id="cb22-9"><a href="#cb22-9" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 1091 + <span id="cb22-10"><a href="#cb22-10" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 1092 + <span id="cb22-11"><a href="#cb22-11" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 1093 + <span id="cb22-12"><a href="#cb22-12" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 1094 + <span id="cb22-13"><a href="#cb22-13" aria-hidden="true" tabindex="-1"></a></span> 1095 + <span id="cb22-14"><a href="#cb22-14" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 1096 + <span id="cb22-15"><a href="#cb22-15" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 1097 + <span id="cb22-16"><a href="#cb22-16" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 1098 + <span id="cb22-17"><a href="#cb22-17" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 1099 + <span id="cb22-18"><a href="#cb22-18" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 1100 + <span id="cb22-19"><a href="#cb22-19" aria-hidden="true" tabindex="-1"></a> )</span> 1101 + <span id="cb22-20"><a href="#cb22-20" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 1102 + <span id="cb22-21"><a href="#cb22-21" aria-hidden="true" tabindex="-1"></a>]</span> 1103 + <span id="cb22-22"><a href="#cb22-22" aria-hidden="true" tabindex="-1"></a></span> 1104 + <span id="cb22-23"><a href="#cb22-23" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 1105 + <span id="cb22-24"><a href="#cb22-24" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 1106 + <span id="cb22-25"><a href="#cb22-25" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 1107 + <span id="cb22-26"><a href="#cb22-26" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 1108 + <span id="cb22-27"><a href="#cb22-27" aria-hidden="true" tabindex="-1"></a></span> 1109 + <span id="cb22-28"><a href="#cb22-28" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate and set up blob storage</span></span> 1110 + <span id="cb22-29"><a href="#cb22-29" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 1111 + <span id="cb22-30"><a href="#cb22-30" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 1112 + <span id="cb22-31"><a href="#cb22-31" aria-hidden="true" tabindex="-1"></a></span> 1113 + <span id="cb22-32"><a href="#cb22-32" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 1114 + <span id="cb22-33"><a href="#cb22-33" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client, data_store<span class="op">=</span>store)</span> 1115 + <span id="cb22-34"><a href="#cb22-34" aria-hidden="true" tabindex="-1"></a></span> 1116 + <span id="cb22-35"><a href="#cb22-35" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 1117 + <span id="cb22-36"><a href="#cb22-36" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 1118 + <span id="cb22-37"><a href="#cb22-37" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 1119 + <span id="cb22-38"><a href="#cb22-38" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 1120 + <span id="cb22-39"><a href="#cb22-39" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 1121 + <span id="cb22-40"><a href="#cb22-40" aria-hidden="true" tabindex="-1"></a>)</span> 1122 + <span id="cb22-41"><a href="#cb22-41" aria-hidden="true" tabindex="-1"></a></span> 1123 + <span id="cb22-42"><a href="#cb22-42" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset (shards uploaded as blobs)</span></span> 1124 + <span id="cb22-43"><a href="#cb22-43" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 1125 + <span id="cb22-44"><a href="#cb22-44" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 1126 + <span id="cb22-45"><a href="#cb22-45" aria-hidden="true" tabindex="-1"></a> dataset,</span> 1127 + <span id="cb22-46"><a href="#cb22-46" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 1128 + <span id="cb22-47"><a href="#cb22-47" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 1129 + <span id="cb22-48"><a href="#cb22-48" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 1130 + <span id="cb22-49"><a href="#cb22-49" aria-hidden="true" tabindex="-1"></a>)</span> 1131 + <span id="cb22-50"><a href="#cb22-50" aria-hidden="true" tabindex="-1"></a></span> 1132 + <span id="cb22-51"><a href="#cb22-51" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 1133 + <span id="cb22-52"><a href="#cb22-52" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Blob URLs: </span><span class="sc">{</span>entry<span class="sc">.</span>data_urls<span class="sc">}</span><span class="ss">"</span>)</span> 1134 + <span id="cb22-53"><a href="#cb22-53" aria-hidden="true" tabindex="-1"></a></span> 1135 + <span id="cb22-54"><a href="#cb22-54" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. Later: discover and load from blobs</span></span> 1136 + <span id="cb22-55"><a href="#cb22-55" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> dataset_entry <span class="kw">in</span> index.list_datasets():</span> 1137 + <span id="cb22-56"><a href="#cb22-56" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Found: </span><span class="sc">{</span>dataset_entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">"</span>)</span> 1138 + <span id="cb22-57"><a href="#cb22-57" aria-hidden="true" tabindex="-1"></a></span> 1139 + <span id="cb22-58"><a href="#cb22-58" aria-hidden="true" tabindex="-1"></a> <span class="co"># Reconstruct type from schema</span></span> 1140 + <span id="cb22-59"><a href="#cb22-59" aria-hidden="true" tabindex="-1"></a> SampleType <span class="op">=</span> index.decode_schema(dataset_entry.schema_ref)</span> 1141 + <span id="cb22-60"><a href="#cb22-60" aria-hidden="true" tabindex="-1"></a></span> 1142 + <span id="cb22-61"><a href="#cb22-61" aria-hidden="true" tabindex="-1"></a> <span class="co"># Create source from blob URLs</span></span> 1143 + <span id="cb22-62"><a href="#cb22-62" aria-hidden="true" tabindex="-1"></a> source <span class="op">=</span> store.create_source(dataset_entry.data_urls)</span> 1144 + <span id="cb22-63"><a href="#cb22-63" aria-hidden="true" tabindex="-1"></a></span> 1145 + <span id="cb22-64"><a href="#cb22-64" aria-hidden="true" tabindex="-1"></a> <span class="co"># Load dataset from blobs</span></span> 1146 + <span id="cb22-65"><a href="#cb22-65" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> atdata.Dataset[SampleType](source)</span> 1147 + <span id="cb22-66"><a href="#cb22-66" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 1148 + <span id="cb22-67"><a href="#cb22-67" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(batch.features.shape)</span> 1149 + <span id="cb22-68"><a href="#cb22-68" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1150 + </div> 1151 + <p>For external URL storage (without <code>PDSBlobStore</code>):</p> 1152 + <div id="58ed78ba" class="cell"> 1153 + <div class="sourceCode cell-code" id="cb23"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Use AtmosphereIndex without data_store</span></span> 1154 + <span id="cb23-2"><a href="#cb23-2" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 1155 + <span id="cb23-3"><a href="#cb23-3" aria-hidden="true" tabindex="-1"></a></span> 1156 + <span id="cb23-4"><a href="#cb23-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Dataset URLs will be stored as-is (external references)</span></span> 1157 + <span id="cb23-5"><a href="#cb23-5" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 1158 + <span id="cb23-6"><a href="#cb23-6" aria-hidden="true" tabindex="-1"></a> dataset,</span> 1159 + <span id="cb23-7"><a href="#cb23-7" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"external-features"</span>,</span> 1160 + <span id="cb23-8"><a href="#cb23-8" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 1161 + <span id="cb23-9"><a href="#cb23-9" aria-hidden="true" tabindex="-1"></a>)</span> 1162 + <span id="cb23-10"><a href="#cb23-10" aria-hidden="true" tabindex="-1"></a></span> 1163 + <span id="cb23-11"><a href="#cb23-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Load using standard URL source</span></span> 1164 + <span id="cb23-12"><a href="#cb23-12" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[FeatureSample](entry.data_urls[<span class="dv">0</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 1053 1165 </div> 1054 1166 </section> 1055 1167 <section id="related" class="level2">
+13 -13
docs/reference/datasets.html
··· 593 593 <p>The <code>Dataset</code> class provides typed iteration over WebDataset tar files with automatic batching and lens transformations.</p> 594 594 <section id="creating-a-dataset" class="level2"> 595 595 <h2 class="anchored" data-anchor-id="creating-a-dataset">Creating a Dataset</h2> 596 - <div id="f29acbac" class="cell"> 596 + <div id="a18fd27c" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 598 598 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 599 599 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 616 616 <section id="url-source-default" class="level3"> 617 617 <h3 class="anchored" data-anchor-id="url-source-default">URL Source (default)</h3> 618 618 <p>When you pass a string to <code>Dataset</code>, it automatically wraps it in a <code>URLSource</code>:</p> 619 - <div id="613ca0b1" class="cell"> 619 + <div id="9bdf912a" class="cell"> 620 620 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># These are equivalent:</span></span> 621 621 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 622 622 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](atdata.URLSource(<span class="st">"data-{000000..000009}.tar"</span>))</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 625 625 <section id="s3-source" class="level3"> 626 626 <h3 class="anchored" data-anchor-id="s3-source">S3 Source</h3> 627 627 <p>For private S3 buckets or S3-compatible storage (Cloudflare R2, MinIO), use <code>S3Source</code>:</p> 628 - <div id="7024789e" class="cell"> 628 + <div id="acb40c51" class="cell"> 629 629 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># From explicit credentials</span></span> 630 630 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> atdata.S3Source(</span> 631 631 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> bucket<span class="op">=</span><span class="st">"my-bucket"</span>,</span> ··· 663 663 <section id="ordered-iteration" class="level3"> 664 664 <h3 class="anchored" data-anchor-id="ordered-iteration">Ordered Iteration</h3> 665 665 <p>Iterate through samples in their original order:</p> 666 - <div id="caf78bf2" class="cell"> 666 + <div id="3c6db39b" class="cell"> 667 667 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># With batching (default batch_size=1)</span></span> 668 668 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 669 669 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image <span class="co"># numpy array (32, H, W, C)</span></span> ··· 677 677 <section id="shuffled-iteration" class="level3"> 678 678 <h3 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h3> 679 679 <p>Iterate with randomized order at both shard and sample levels:</p> 680 - <div id="ab24a75c" class="cell"> 680 + <div id="e4b30717" class="cell"> 681 681 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 682 682 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled</span></span> 683 683 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> process(batch)</span> ··· 708 708 <section id="samplebatch" class="level2"> 709 709 <h2 class="anchored" data-anchor-id="samplebatch">SampleBatch</h2> 710 710 <p>When iterating with a <code>batch_size</code>, each iteration yields a <code>SampleBatch</code> with automatic attribute aggregation.</p> 711 - <div id="46b42eb2" class="cell"> 711 + <div id="dddda579" class="cell"> 712 712 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 713 713 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> Sample:</span> 714 714 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># shape (256,)</span></span> ··· 728 728 <section id="type-transformations-with-lenses" class="level2"> 729 729 <h2 class="anchored" data-anchor-id="type-transformations-with-lenses">Type Transformations with Lenses</h2> 730 730 <p>View a dataset through a different sample type using registered lenses:</p> 731 - <div id="a15c6b46" class="cell"> 731 + <div id="bcb900f4" class="cell"> 732 732 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 733 733 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span> 734 734 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">str</span></span> ··· 750 750 <section id="shard-list" class="level3"> 751 751 <h3 class="anchored" data-anchor-id="shard-list">Shard List</h3> 752 752 <p>Get the list of individual tar files:</p> 753 - <div id="96651430" class="cell"> 753 + <div id="5819836e" class="cell"> 754 754 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 755 755 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shards <span class="op">=</span> dataset.shard_list</span> 756 756 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="co"># ['data-000000.tar', 'data-000001.tar', ..., 'data-000009.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 759 759 <section id="metadata" class="level3"> 760 760 <h3 class="anchored" data-anchor-id="metadata">Metadata</h3> 761 761 <p>Datasets can have associated metadata from a URL:</p> 762 - <div id="0fe36b3d" class="cell"> 762 + <div id="d4dc6651" class="cell"> 763 763 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](</span> 764 764 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"data-{000000..000009}.tar"</span>,</span> 765 765 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a> metadata_url<span class="op">=</span><span class="st">"https://example.com/metadata.msgpack"</span></span> ··· 773 773 <section id="writing-datasets" class="level2"> 774 774 <h2 class="anchored" data-anchor-id="writing-datasets">Writing Datasets</h2> 775 775 <p>Use WebDataset’s <code>TarWriter</code> or <code>ShardWriter</code> to create datasets:</p> 776 - <div id="5b8843dc" class="cell"> 776 + <div id="0ebfbcf7" class="cell"> 777 777 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 778 778 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 779 779 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 796 796 <section id="parquet-export" class="level2"> 797 797 <h2 class="anchored" data-anchor-id="parquet-export">Parquet Export</h2> 798 798 <p>Export dataset contents to parquet format:</p> 799 - <div id="c8523229" class="cell"> 799 + <div id="369a7e40" class="cell"> 800 800 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Export entire dataset</span></span> 801 801 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>dataset.to_parquet(<span class="st">"output.parquet"</span>)</span> 802 802 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a></span> ··· 847 847 <section id="source" class="level3"> 848 848 <h3 class="anchored" data-anchor-id="source">Source</h3> 849 849 <p>Access the underlying <code>DataSource</code>:</p> 850 - <div id="c088cf54" class="cell"> 850 + <div id="4fce3c8d" class="cell"> 851 851 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[Sample](<span class="st">"data.tar"</span>)</span> 852 852 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> dataset.source <span class="co"># URLSource instance</span></span> 853 853 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(source.shard_list) <span class="co"># ['data.tar']</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 856 856 <section id="sample-type" class="level3"> 857 857 <h3 class="anchored" data-anchor-id="sample-type">Sample Type</h3> 858 858 <p>Get the type parameter used to create the dataset:</p> 859 - <div id="237b62eb" class="cell"> 859 + <div id="67e26931" class="cell"> 860 860 <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"data.tar"</span>)</span> 861 861 <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.sample_type) <span class="co"># &lt;class 'ImageSample'&gt;</span></span> 862 862 <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(dataset.batch_type) <span class="co"># SampleBatch[ImageSample]</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
+10 -10
docs/reference/lenses.html
··· 585 585 <section id="creating-a-lens" class="level2"> 586 586 <h2 class="anchored" data-anchor-id="creating-a-lens">Creating a Lens</h2> 587 587 <p>Use the <code>@lens</code> decorator to define a getter:</p> 588 - <div id="d8be4d9e" class="cell"> 588 + <div id="8e469fe8" class="cell"> 589 589 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 590 590 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 591 591 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a></span> ··· 615 615 <section id="adding-a-putter" class="level2"> 616 616 <h2 class="anchored" data-anchor-id="adding-a-putter">Adding a Putter</h2> 617 617 <p>To enable bidirectional updates, add a putter:</p> 618 - <div id="c7c0b1d2" class="cell"> 618 + <div id="23772595" class="cell"> 619 619 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@simplify.putter</span></span> 620 620 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> simplify_put(view: SimpleSample, source: FullSample) <span class="op">-&gt;</span> FullSample:</span> 621 621 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> FullSample(</span> ··· 635 635 <section id="using-lenses-with-datasets" class="level2"> 636 636 <h2 class="anchored" data-anchor-id="using-lenses-with-datasets">Using Lenses with Datasets</h2> 637 637 <p>Lenses integrate with <code>Dataset.as_type()</code>:</p> 638 - <div id="2f033a35" class="cell"> 638 + <div id="3fc1a303" class="cell"> 639 639 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FullSample](<span class="st">"data-{000000..000009}.tar"</span>)</span> 640 640 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 641 641 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="co"># View through a different type</span></span> ··· 650 650 <section id="direct-lens-usage" class="level2"> 651 651 <h2 class="anchored" data-anchor-id="direct-lens-usage">Direct Lens Usage</h2> 652 652 <p>Lenses can also be called directly:</p> 653 - <div id="59469c09" class="cell"> 653 + <div id="8ada7a56" class="cell"> 654 654 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 655 655 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 656 656 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>full <span class="op">=</span> FullSample(</span> ··· 679 679 <div class="tab-content"> 680 680 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 681 681 <p>If you get a view and immediately put it back, the source is unchanged:</p> 682 - <div id="634c1b5b" class="cell"> 682 + <div id="6ac5163e" class="cell"> 683 683 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>view <span class="op">=</span> lens.get(source)</span> 684 684 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.put(view, source) <span class="op">==</span> source</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 685 685 </div> 686 686 </div> 687 687 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 688 688 <p>If you put a view, getting it back yields that view:</p> 689 - <div id="d9273642" class="cell"> 689 + <div id="86e70b00" class="cell"> 690 690 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>updated <span class="op">=</span> lens.put(view, source)</span> 691 691 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> lens.get(updated) <span class="op">==</span> view</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 692 692 </div> 693 693 </div> 694 694 <div id="tabset-1-3" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-3-tab"> 695 695 <p>Putting twice is equivalent to putting once with the final value:</p> 696 - <div id="4936f018" class="cell"> 696 + <div id="545cbcf1" class="cell"> 697 697 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>result1 <span class="op">=</span> lens.put(v2, lens.put(v1, source))</span> 698 698 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>result2 <span class="op">=</span> lens.put(v2, source)</span> 699 699 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">assert</span> result1 <span class="op">==</span> result2</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> ··· 705 705 <section id="trivial-putter" class="level2"> 706 706 <h2 class="anchored" data-anchor-id="trivial-putter">Trivial Putter</h2> 707 707 <p>If no putter is defined, a trivial putter is used that ignores view updates:</p> 708 - <div id="cef605de" class="cell"> 708 + <div id="0636b6ac" class="cell"> 709 709 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.lens</span></span> 710 710 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> extract_label(src: FullSample) <span class="op">-&gt;</span> SimpleSample:</span> 711 711 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">return</span> SimpleSample(label<span class="op">=</span>src.label, confidence<span class="op">=</span>src.confidence)</span> ··· 719 719 <section id="lensnetwork-registry" class="level2"> 720 720 <h2 class="anchored" data-anchor-id="lensnetwork-registry">LensNetwork Registry</h2> 721 721 <p>The <code>LensNetwork</code> is a singleton that stores all registered lenses:</p> 722 - <div id="7b7548a8" class="cell"> 722 + <div id="c3b32f7d" class="cell"> 723 723 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.lens <span class="im">import</span> LensNetwork</span> 724 724 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 725 725 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>network <span class="op">=</span> LensNetwork()</span> ··· 736 736 </section> 737 737 <section id="example-feature-extraction" class="level2"> 738 738 <h2 class="anchored" data-anchor-id="example-feature-extraction">Example: Feature Extraction</h2> 739 - <div id="72811776" class="cell"> 739 + <div id="a4bc199f" class="cell"> 740 740 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 741 741 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> RawSample:</span> 742 742 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> audio: NDArray</span>
+12 -12
docs/reference/load-dataset.html
··· 594 594 </section> 595 595 <section id="basic-usage" class="level2"> 596 596 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 597 - <div id="4d4d1559" class="cell"> 597 + <div id="f2161b26" class="cell"> 598 598 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 599 599 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 600 600 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> ··· 617 617 <h2 class="anchored" data-anchor-id="path-formats">Path Formats</h2> 618 618 <section id="webdataset-brace-notation" class="level3"> 619 619 <h3 class="anchored" data-anchor-id="webdataset-brace-notation">WebDataset Brace Notation</h3> 620 - <div id="f383e82d" class="cell"> 620 + <div id="6ada3862" class="cell"> 621 621 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Range notation</span></span> 622 622 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"data-{000000..000099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 623 623 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 627 627 </section> 628 628 <section id="glob-patterns" class="level3"> 629 629 <h3 class="anchored" data-anchor-id="glob-patterns">Glob Patterns</h3> 630 - <div id="66bf9de6" class="cell"> 630 + <div id="610d80d4" class="cell"> 631 631 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Match all tar files</span></span> 632 632 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"path/to/*.tar"</span>, MySample)</span> 633 633 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 637 637 </section> 638 638 <section id="local-directory" class="level3"> 639 639 <h3 class="anchored" data-anchor-id="local-directory">Local Directory</h3> 640 - <div id="98f56181" class="cell"> 640 + <div id="9e9b33de" class="cell"> 641 641 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Scans for .tar files</span></span> 642 642 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"./my-dataset/"</span>, MySample)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 643 643 </div> 644 644 </section> 645 645 <section id="remote-urls" class="level3"> 646 646 <h3 class="anchored" data-anchor-id="remote-urls">Remote URLs</h3> 647 - <div id="8c1287e4" class="cell"> 647 + <div id="57f4b2e2" class="cell"> 648 648 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># S3 (public buckets)</span></span> 649 649 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(<span class="st">"s3://bucket/data-{000..099}.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 650 650 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 670 670 </section> 671 671 <section id="index-lookup" class="level3"> 672 672 <h3 class="anchored" data-anchor-id="index-lookup">Index Lookup</h3> 673 - <div id="e8e3bd9a" class="cell"> 673 + <div id="9131d1a1" class="cell"> 674 674 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 675 675 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 676 676 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 737 737 <section id="datasetdict" class="level2"> 738 738 <h2 class="anchored" data-anchor-id="datasetdict">DatasetDict</h2> 739 739 <p>When loading without <code>split=</code>, returns a <code>DatasetDict</code>:</p> 740 - <div id="45c2fb32" class="cell"> 740 + <div id="5ad65dd0" class="cell"> 741 741 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data/"</span>, MySample)</span> 742 742 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 743 743 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Access splits</span></span> ··· 757 757 <section id="explicit-data-files" class="level2"> 758 758 <h2 class="anchored" data-anchor-id="explicit-data-files">Explicit Data Files</h2> 759 759 <p>Override automatic detection with <code>data_files</code>:</p> 760 - <div id="fdfc9d5b" class="cell"> 760 + <div id="44ab4049" class="cell"> 761 761 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Single pattern</span></span> 762 762 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> load_dataset(</span> 763 763 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"path/to/"</span>,</span> ··· 786 786 <section id="streaming-mode" class="level2"> 787 787 <h2 class="anchored" data-anchor-id="streaming-mode">Streaming Mode</h2> 788 788 <p>The <code>streaming</code> parameter signals intent for streaming mode:</p> 789 - <div id="973b8c68" class="cell"> 789 + <div id="a0a85527" class="cell"> 790 790 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Mark as streaming</span></span> 791 791 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>ds_dict <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, streaming<span class="op">=</span><span class="va">True</span>)</span> 792 792 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 811 811 <section id="auto-type-resolution" class="level2"> 812 812 <h2 class="anchored" data-anchor-id="auto-type-resolution">Auto Type Resolution</h2> 813 813 <p>When using index lookup, the sample type can be resolved automatically:</p> 814 - <div id="73592d21" class="cell"> 814 + <div id="0b982c6c" class="cell"> 815 815 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 816 816 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 817 817 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> LocalIndex()</span> ··· 825 825 </section> 826 826 <section id="error-handling" class="level2"> 827 827 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 828 - <div id="a95e0c73" class="cell"> 828 + <div id="decfab55" class="cell"> 829 829 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 830 830 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> ds <span class="op">=</span> load_dataset(<span class="st">"path/to/data.tar"</span>, MySample, split<span class="op">=</span><span class="st">"train"</span>)</span> 831 831 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">FileNotFoundError</span>:</span> ··· 841 841 </section> 842 842 <section id="complete-example" class="level2"> 843 843 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 844 - <div id="e5dd31bb" class="cell"> 844 + <div id="f793e66a" class="cell"> 845 845 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 846 846 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 847 847 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>
+11 -11
docs/reference/local-storage.html
··· 593 593 <section id="localindex" class="level2"> 594 594 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 595 595 <p>The index tracks datasets in Redis:</p> 596 - <div id="9eda0b05" class="cell"> 596 + <div id="c9920a8a" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 598 598 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 599 599 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Default connection (localhost:6379)</span></span> ··· 609 609 </div> 610 610 <section id="adding-entries" class="level3"> 611 611 <h3 class="anchored" data-anchor-id="adding-entries">Adding Entries</h3> 612 - <div id="06fc3264" class="cell"> 612 + <div id="18fb47fd" class="cell"> 613 613 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 614 614 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 615 615 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a></span> ··· 634 634 </section> 635 635 <section id="listing-and-retrieving" class="level3"> 636 636 <h3 class="anchored" data-anchor-id="listing-and-retrieving">Listing and Retrieving</h3> 637 - <div id="ad6bfc60" class="cell"> 637 + <div id="b0fa453f" class="cell"> 638 638 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Iterate all entries</span></span> 639 639 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> entry <span class="kw">in</span> index.entries:</span> 640 640 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"</span><span class="sc">{</span>entry<span class="sc">.</span>name<span class="sc">}</span><span class="ss">: </span><span class="sc">{</span>entry<span class="sc">.</span>cid<span class="sc">}</span><span class="ss">"</span>)</span> ··· 666 666 </div> 667 667 </div> 668 668 <p>The Repo class combines S3 storage with Redis indexing:</p> 669 - <div id="29ea9596" class="cell"> 669 + <div id="9c79f00d" class="cell"> 670 670 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Repo</span> 671 671 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 672 672 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># From credentials file</span></span> ··· 686 686 <span id="cb4-17"><a href="#cb4-17" aria-hidden="true" tabindex="-1"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 687 687 </div> 688 688 <p><strong>Preferred approach</strong> - Use <code>LocalIndex</code> with <code>S3DataStore</code>:</p> 689 - <div id="6c613634" class="cell"> 689 + <div id="812b9d3f" class="cell"> 690 690 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 691 691 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 692 692 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 724 724 </section> 725 725 <section id="inserting-datasets" class="level3"> 726 726 <h3 class="anchored" data-anchor-id="inserting-datasets">Inserting Datasets</h3> 727 - <div id="7f3d98eb" class="cell"> 727 + <div id="137b7e3b" class="cell"> 728 728 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 729 729 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 730 730 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 754 754 </section> 755 755 <section id="insert-options" class="level3"> 756 756 <h3 class="anchored" data-anchor-id="insert-options">Insert Options</h3> 757 - <div id="7983708f" class="cell"> 757 + <div id="11cfaf93" class="cell"> 758 758 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>entry, ds <span class="op">=</span> repo.insert(</span> 759 759 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a> dataset,</span> 760 760 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 768 768 <section id="localdatasetentry" class="level2"> 769 769 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 770 770 <p>Index entries provide content-addressable identification:</p> 771 - <div id="9cc14a25" class="cell"> 771 + <div id="475245f1" class="cell"> 772 772 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.get_entry_by_name(<span class="st">"my-dataset"</span>)</span> 773 773 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 774 774 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Core properties (IndexEntry protocol)</span></span> ··· 801 801 <section id="schema-storage" class="level2"> 802 802 <h2 class="anchored" data-anchor-id="schema-storage">Schema Storage</h2> 803 803 <p>Schemas can be stored and retrieved from the index:</p> 804 - <div id="10fa70b6" class="cell"> 804 + <div id="726131f5" class="cell"> 805 805 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 806 806 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 807 807 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 832 832 <section id="s3datastore" class="level2"> 833 833 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 834 834 <p>For direct S3 operations without Redis indexing:</p> 835 - <div id="e7ebf388" class="cell"> 835 + <div id="9e2c3eda" class="cell"> 836 836 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 837 837 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 838 838 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> S3DataStore(</span> ··· 854 854 </section> 855 855 <section id="complete-workflow-example" class="level2"> 856 856 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 857 - <div id="315c6d71" class="cell"> 857 + <div id="7ef04fda" class="cell"> 858 858 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 859 859 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 860 860 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span>
+12 -12
docs/reference/packable-samples.html
··· 588 588 <section id="the-packable-decorator" class="level2"> 589 589 <h2 class="anchored" data-anchor-id="the-packable-decorator">The <code>@packable</code> Decorator</h2> 590 590 <p>The recommended way to define a sample type is with the <code>@packable</code> decorator:</p> 591 - <div id="c7fe1d4d" class="cell"> 591 + <div id="94abf55d" class="cell"> 592 592 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 593 593 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 594 594 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 610 610 <h2 class="anchored" data-anchor-id="supported-field-types">Supported Field Types</h2> 611 611 <section id="primitives" class="level3"> 612 612 <h3 class="anchored" data-anchor-id="primitives">Primitives</h3> 613 - <div id="726a3050" class="cell"> 613 + <div id="cc7f0a01" class="cell"> 614 614 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 615 615 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> PrimitiveSample:</span> 616 616 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> name: <span class="bu">str</span></span> ··· 623 623 <section id="numpy-arrays" class="level3"> 624 624 <h3 class="anchored" data-anchor-id="numpy-arrays">NumPy Arrays</h3> 625 625 <p>Fields annotated as <code>NDArray</code> are automatically converted:</p> 626 - <div id="5d1ff141" class="cell"> 626 + <div id="1f91f2d7" class="cell"> 627 627 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 628 628 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ArraySample:</span> 629 629 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Required array</span></span> ··· 645 645 </section> 646 646 <section id="lists" class="level3"> 647 647 <h3 class="anchored" data-anchor-id="lists">Lists</h3> 648 - <div id="b6955e98" class="cell"> 648 + <div id="9302a515" class="cell"> 649 649 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 650 650 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ListSample:</span> 651 651 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> tags: <span class="bu">list</span>[<span class="bu">str</span>]</span> ··· 657 657 <h2 class="anchored" data-anchor-id="serialization">Serialization</h2> 658 658 <section id="packing-to-bytes" class="level3"> 659 659 <h3 class="anchored" data-anchor-id="packing-to-bytes">Packing to Bytes</h3> 660 - <div id="f3a59094" class="cell"> 660 + <div id="90438dc8" class="cell"> 661 661 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 662 662 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> 663 663 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span><span class="st">"cat"</span>,</span> ··· 671 671 </section> 672 672 <section id="unpacking-from-bytes" class="level3"> 673 673 <h3 class="anchored" data-anchor-id="unpacking-from-bytes">Unpacking from Bytes</h3> 674 - <div id="3cef8193" class="cell"> 674 + <div id="b6063f2c" class="cell"> 675 675 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Deserialize from bytes</span></span> 676 676 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>restored <span class="op">=</span> ImageSample.from_bytes(packed_bytes)</span> 677 677 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a></span> ··· 683 683 <section id="webdataset-format" class="level3"> 684 684 <h3 class="anchored" data-anchor-id="webdataset-format">WebDataset Format</h3> 685 685 <p>The <code>as_wds</code> property returns a dict ready for WebDataset:</p> 686 - <div id="6a42126d" class="cell"> 686 + <div id="47d071b5" class="cell"> 687 687 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>wds_dict <span class="op">=</span> sample.as_wds</span> 688 688 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="co"># {'__key__': '1234...', 'msgpack': b'...'}</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 689 689 </div> 690 690 <p>Write samples to a tar file:</p> 691 - <div id="f3b5feb9" class="cell"> 691 + <div id="b50b3d88" class="cell"> 692 692 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 693 693 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a></span> 694 694 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"data-000000.tar"</span>) <span class="im">as</span> sink:</span> ··· 701 701 <section id="direct-inheritance-alternative" class="level2"> 702 702 <h2 class="anchored" data-anchor-id="direct-inheritance-alternative">Direct Inheritance (Alternative)</h2> 703 703 <p>You can also inherit directly from <code>PackableSample</code>:</p> 704 - <div id="d7828cb0" class="cell"> 704 + <div id="4cbad78a" class="cell"> 705 705 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> dataclass</span> 706 706 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 707 707 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@dataclass</span></span> ··· 739 739 <section id="the-_ensure_good-method" class="level3"> 740 740 <h3 class="anchored" data-anchor-id="the-_ensure_good-method">The <code>_ensure_good()</code> Method</h3> 741 741 <p>This method runs automatically after construction and handles NDArray conversion:</p> 742 - <div id="821f434d" class="cell"> 742 + <div id="60472d01" class="cell"> 743 743 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> _ensure_good(<span class="va">self</span>):</span> 744 744 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> field <span class="kw">in</span> dataclasses.fields(<span class="va">self</span>):</span> 745 745 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> _is_possibly_ndarray_type(field.<span class="bu">type</span>):</span> ··· 755 755 <ul class="nav nav-tabs" role="tablist"><li class="nav-item" role="presentation"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" role="tab" aria-controls="tabset-2-1" aria-selected="true">Do</a></li><li class="nav-item" role="presentation"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" role="tab" aria-controls="tabset-2-2" aria-selected="false">Don’t</a></li></ul> 756 756 <div class="tab-content"> 757 757 <div id="tabset-2-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-2-1-tab"> 758 - <div id="55dd65fe" class="cell"> 758 + <div id="7d6b1360" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 760 760 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> GoodSample:</span> 761 761 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> features: NDArray <span class="co"># Clear type annotation</span></span> ··· 765 765 </div> 766 766 </div> 767 767 <div id="tabset-2-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-2-2-tab"> 768 - <div id="726bdb4d" class="cell"> 768 + <div id="204cf068" class="cell"> 769 769 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 770 770 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> BadSample:</span> 771 771 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a> <span class="co"># DON'T: Nested dataclasses not supported</span></span>
+7 -7
docs/reference/promotion.html
··· 584 584 </section> 585 585 <section id="basic-usage" class="level2"> 586 586 <h2 class="anchored" data-anchor-id="basic-usage">Basic Usage</h2> 587 - <div id="4a508603" class="cell"> 587 + <div id="d2fdd123" class="cell"> 588 588 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex</span> 589 589 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient</span> 590 590 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> promote_to_atmosphere</span> ··· 604 604 </section> 605 605 <section id="with-metadata" class="level2"> 606 606 <h2 class="anchored" data-anchor-id="with-metadata">With Metadata</h2> 607 - <div id="5f1b68d6" class="cell"> 607 + <div id="f9d07277" class="cell"> 608 608 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 609 609 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a> entry,</span> 610 610 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 619 619 <section id="schema-deduplication" class="level2"> 620 620 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 621 621 <p>The promotion workflow automatically checks for existing schemas:</p> 622 - <div id="5ebcd712" class="cell"> 622 + <div id="f1f595bc" class="cell"> 623 623 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 624 624 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 625 625 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a></span> ··· 639 639 <div class="tab-content"> 640 640 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 641 641 <p>By default, promotion keeps the original data URLs:</p> 642 - <div id="c2e3746b" class="cell"> 642 + <div id="810334d9" class="cell"> 643 643 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 644 644 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 645 645 </div> ··· 652 652 </div> 653 653 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 654 654 <p>To copy data to a different storage location:</p> 655 - <div id="560f6131" class="cell"> 655 + <div id="9fc50efb" class="cell"> 656 656 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 657 657 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 658 658 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 680 680 </section> 681 681 <section id="complete-workflow-example" class="level2"> 682 682 <h2 class="anchored" data-anchor-id="complete-workflow-example">Complete Workflow Example</h2> 683 - <div id="298a5d1e" class="cell"> 683 + <div id="e5a75aed" class="cell"> 684 684 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 685 685 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 686 686 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 751 751 </section> 752 752 <section id="error-handling" class="level2"> 753 753 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 754 - <div id="d6f7527e" class="cell"> 754 + <div id="cae0ca6d" class="cell"> 755 755 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 756 756 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(entry, local_index, client)</span> 757 757 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span>
+12 -12
docs/reference/protocols.html
··· 605 605 <section id="indexentry-protocol" class="level2"> 606 606 <h2 class="anchored" data-anchor-id="indexentry-protocol">IndexEntry Protocol</h2> 607 607 <p>Represents a dataset entry in any index:</p> 608 - <div id="fd34618d" class="cell"> 608 + <div id="aefd7844" class="cell"> 609 609 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry</span> 610 610 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span> 611 611 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> process_entry(entry: IndexEntry) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 659 659 <section id="abstractindex-protocol" class="level2"> 660 660 <h2 class="anchored" data-anchor-id="abstractindex-protocol">AbstractIndex Protocol</h2> 661 661 <p>Defines operations for managing schemas and datasets:</p> 662 - <div id="787a9642" class="cell"> 662 + <div id="1e72213b" class="cell"> 663 663 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex</span> 664 664 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 665 665 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> list_all_datasets(index: AbstractIndex) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 669 669 </div> 670 670 <section id="dataset-operations" class="level3"> 671 671 <h3 class="anchored" data-anchor-id="dataset-operations">Dataset Operations</h3> 672 - <div id="60dc2e88" class="cell"> 672 + <div id="d8eeea73" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Insert a dataset</span></span> 674 674 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 675 675 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 687 687 </section> 688 688 <section id="schema-operations" class="level3"> 689 689 <h3 class="anchored" data-anchor-id="schema-operations">Schema Operations</h3> 690 - <div id="d74fb8cf" class="cell"> 690 + <div id="4e007e1a" class="cell"> 691 691 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 692 692 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(</span> 693 693 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> MySample,</span> ··· 718 718 <section id="abstractdatastore-protocol" class="level2"> 719 719 <h2 class="anchored" data-anchor-id="abstractdatastore-protocol">AbstractDataStore Protocol</h2> 720 720 <p>Abstracts over different storage backends:</p> 721 - <div id="681d1f3b" class="cell"> 721 + <div id="8481566e" class="cell"> 722 722 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractDataStore</span> 723 723 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 724 724 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> write_dataset(store: AbstractDataStore, dataset) <span class="op">-&gt;</span> <span class="bu">list</span>[<span class="bu">str</span>]:</span> ··· 728 728 </div> 729 729 <section id="methods" class="level3"> 730 730 <h3 class="anchored" data-anchor-id="methods">Methods</h3> 731 - <div id="5556c4b7" class="cell"> 731 + <div id="6a693b70" class="cell"> 732 732 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Write dataset shards</span></span> 733 733 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>urls <span class="op">=</span> store.write_shards(</span> 734 734 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> dataset,</span> ··· 755 755 <section id="datasource-protocol" class="level2"> 756 756 <h2 class="anchored" data-anchor-id="datasource-protocol">DataSource Protocol</h2> 757 757 <p>Abstracts over different data source backends for streaming dataset shards:</p> 758 - <div id="fd9a5c6c" class="cell"> 758 + <div id="692f8fa5" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 760 760 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span> 761 761 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> load_from_source(source: DataSource) <span class="op">-&gt;</span> <span class="va">None</span>:</span> ··· 768 768 </div> 769 769 <section id="methods-1" class="level3"> 770 770 <h3 class="anchored" data-anchor-id="methods-1">Methods</h3> 771 - <div id="9585dc37" class="cell"> 771 + <div id="d3ea7448" class="cell"> 772 772 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Get list of shard identifiers</span></span> 773 773 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>shard_ids <span class="op">=</span> source.shard_list <span class="co"># ['data-000000.tar', 'data-000001.tar', ...]</span></span> 774 774 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a></span> ··· 791 791 <section id="creating-custom-data-sources" class="level3"> 792 792 <h3 class="anchored" data-anchor-id="creating-custom-data-sources">Creating Custom Data Sources</h3> 793 793 <p>Implement the <code>DataSource</code> protocol for custom backends:</p> 794 - <div id="d872ea12" class="cell"> 794 + <div id="9372d738" class="cell"> 795 795 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> typing <span class="im">import</span> Iterator, IO</span> 796 796 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> DataSource</span> 797 797 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a></span> ··· 829 829 <section id="using-protocols-for-polymorphism" class="level2"> 830 830 <h2 class="anchored" data-anchor-id="using-protocols-for-polymorphism">Using Protocols for Polymorphism</h2> 831 831 <p>Write code that works with any backend:</p> 832 - <div id="a0cb9067" class="cell"> 832 + <div id="090796d5" class="cell"> 833 833 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> AbstractIndex, IndexEntry</span> 834 834 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> Dataset</span> 835 835 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> ··· 900 900 <section id="type-checking" class="level2"> 901 901 <h2 class="anchored" data-anchor-id="type-checking">Type Checking</h2> 902 902 <p>Protocols are runtime-checkable:</p> 903 - <div id="915ac335" class="cell"> 903 + <div id="09ca5138" class="cell"> 904 904 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata._protocols <span class="im">import</span> IndexEntry, AbstractIndex</span> 905 905 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a></span> 906 906 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if object implements protocol</span></span> ··· 914 914 </section> 915 915 <section id="complete-example" class="level2"> 916 916 <h2 class="anchored" data-anchor-id="complete-example">Complete Example</h2> 917 - <div id="3773658b" class="cell"> 917 + <div id="4107bbb2" class="cell"> 918 918 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 919 919 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> LocalIndex, S3DataStore</span> 920 920 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereClient, AtmosphereIndex</span>
+2 -2
docs/reference/uri-spec.html
··· 675 675 <h2 class="anchored" data-anchor-id="examples">Examples</h2> 676 676 <section id="local-development" class="level3"> 677 677 <h3 class="anchored" data-anchor-id="local-development">Local Development</h3> 678 - <div id="f4a7e7c1" class="cell"> 678 + <div id="0ec519c5" class="cell"> 679 679 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> Index</span> 680 680 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span> 681 681 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> Index()</span> ··· 694 694 </section> 695 695 <section id="atmosphere-atproto-federation" class="level3"> 696 696 <h3 class="anchored" data-anchor-id="atmosphere-atproto-federation">Atmosphere (ATProto Federation)</h3> 697 - <div id="85edc00e" class="cell"> 697 + <div id="80732fae" class="cell"> 698 698 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> Client</span> 699 699 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 700 700 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> Client()</span>
+133 -53
docs/search.json
··· 1072 1072 "text": "Name\nDescription\n\n\n\n\nnum_shards\nNumber of shards in each split.\n\n\nsample_type\nThe sample type for datasets in this dict.\n\n\nstreaming\nWhether this DatasetDict was loaded in streaming mode." 1073 1073 }, 1074 1074 { 1075 - "objectID": "api/AtmosphereClient.html", 1076 - "href": "api/AtmosphereClient.html", 1077 - "title": "AtmosphereClient", 1075 + "objectID": "api/PackableSample.html", 1076 + "href": "api/PackableSample.html", 1077 + "title": "PackableSample", 1078 + "section": "", 1079 + "text": "PackableSample()\nBase class for samples that can be serialized with msgpack.\nThis abstract base class provides automatic serialization/deserialization for dataclass-based samples. Fields annotated as NDArray or NDArray | None are automatically converted between numpy arrays and bytes during packing/unpacking.\nSubclasses should be defined either by: 1. Direct inheritance with the @dataclass decorator 2. Using the @packable decorator (recommended)\n\n\n::\n&gt;&gt;&gt; @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n&gt;&gt;&gt; sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n&gt;&gt;&gt; packed = sample.packed # Serialize to bytes\n&gt;&gt;&gt; restored = MyData.from_bytes(packed) # Deserialize\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 1080 + }, 1081 + { 1082 + "objectID": "api/PackableSample.html#example", 1083 + "href": "api/PackableSample.html#example", 1084 + "title": "PackableSample", 1085 + "section": "", 1086 + "text": "::\n&gt;&gt;&gt; @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n&gt;&gt;&gt; sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n&gt;&gt;&gt; packed = sample.packed # Serialize to bytes\n&gt;&gt;&gt; restored = MyData.from_bytes(packed) # Deserialize" 1087 + }, 1088 + { 1089 + "objectID": "api/PackableSample.html#attributes", 1090 + "href": "api/PackableSample.html#attributes", 1091 + "title": "PackableSample", 1092 + "section": "", 1093 + "text": "Name\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes." 1094 + }, 1095 + { 1096 + "objectID": "api/PackableSample.html#methods", 1097 + "href": "api/PackableSample.html#methods", 1098 + "title": "PackableSample", 1078 1099 "section": "", 1079 - "text": "atmosphere.AtmosphereClient(base_url=None, *, _client=None)\nATProto client wrapper for atdata operations.\nThis class wraps the atproto SDK client and provides higher-level methods for working with atdata records (schemas, datasets, lenses).\n\n\n::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"alice.bsky.social\", \"app-password\")\n&gt;&gt;&gt; print(client.did)\n'did:plc:...'\n\n\n\nThe password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1100 + "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 1080 1101 }, 1081 1102 { 1082 - "objectID": "api/AtmosphereClient.html#example", 1083 - "href": "api/AtmosphereClient.html#example", 1084 - "title": "AtmosphereClient", 1103 + "objectID": "api/PDSBlobStore.html", 1104 + "href": "api/PDSBlobStore.html", 1105 + "title": "PDSBlobStore", 1085 1106 "section": "", 1086 - "text": "::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"alice.bsky.social\", \"app-password\")\n&gt;&gt;&gt; print(client.did)\n'did:plc:...'" 1107 + "text": "atmosphere.PDSBlobStore(client)\nPDS blob store implementing AbstractDataStore protocol.\nStores dataset shards as ATProto blobs, enabling decentralized dataset storage on the AT Protocol network.\nEach shard is written to a temporary tar file, then uploaded as a blob to the user’s PDS. The returned URLs are AT URIs that can be resolved to HTTP URLs for streaming.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nclient\n'AtmosphereClient'\nAuthenticated AtmosphereClient instance.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; store = PDSBlobStore(client)\n&gt;&gt;&gt; urls = store.write_shards(dataset, prefix=\"training/v1\")\n&gt;&gt;&gt; # Returns AT URIs like:\n&gt;&gt;&gt; # ['at://did:plc:abc/blob/bafyrei...', ...]\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ncreate_source\nCreate a BlobSource for reading these AT URIs.\n\n\nread_url\nResolve an AT URI blob reference to an HTTP URL.\n\n\nsupports_streaming\nPDS blobs support streaming via HTTP.\n\n\nwrite_shards\nWrite dataset shards as PDS blobs.\n\n\n\n\n\natmosphere.PDSBlobStore.create_source(urls)\nCreate a BlobSource for reading these AT URIs.\nThis is a convenience method for creating a DataSource that can stream the blobs written by this store.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of AT URIs from write_shards().\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'BlobSource'\nBlobSource configured for the given URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URLs are not valid AT URIs.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.read_url(url)\nResolve an AT URI blob reference to an HTTP URL.\nTransforms at://did/blob/cid URIs to HTTP URLs that can be streamed by WebDataset.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nAT URI in format at://{did}/blob/{cid}.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nHTTP URL for fetching the blob via PDS API.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URL format is invalid or PDS cannot be resolved.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.supports_streaming()\nPDS blobs support streaming via HTTP.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.write_shards(\n ds,\n *,\n prefix,\n maxcount=10000,\n maxsize=3000000000.0,\n **kwargs,\n)\nWrite dataset shards as PDS blobs.\nCreates tar archives from the dataset and uploads each as a blob to the authenticated user’s PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\n'Dataset'\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nLogical path prefix for naming (used in shard names only).\nrequired\n\n\nmaxcount\nint\nMaximum samples per shard (default: 10000).\n10000\n\n\nmaxsize\nfloat\nMaximum shard size in bytes (default: 3GB, PDS limit).\n3000000000.0\n\n\n**kwargs\nAny\nAdditional args passed to wds.ShardWriter.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of AT URIs for the written blobs, in format:\n\n\n\nlist[str]\nat://{did}/blob/{cid}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\nRuntimeError\nIf no shards were written.\n\n\n\n\n\n\nPDS blobs have size limits (typically 50MB-5GB depending on PDS). Adjust maxcount/maxsize to stay within limits." 1087 1108 }, 1088 1109 { 1089 - "objectID": "api/AtmosphereClient.html#note", 1090 - "href": "api/AtmosphereClient.html#note", 1091 - "title": "AtmosphereClient", 1110 + "objectID": "api/PDSBlobStore.html#attributes", 1111 + "href": "api/PDSBlobStore.html#attributes", 1112 + "title": "PDSBlobStore", 1092 1113 "section": "", 1093 - "text": "The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings." 1114 + "text": "Name\nType\nDescription\n\n\n\n\nclient\n'AtmosphereClient'\nAuthenticated AtmosphereClient instance." 1094 1115 }, 1095 1116 { 1096 - "objectID": "api/AtmosphereClient.html#attributes", 1097 - "href": "api/AtmosphereClient.html#attributes", 1098 - "title": "AtmosphereClient", 1117 + "objectID": "api/PDSBlobStore.html#example", 1118 + "href": "api/PDSBlobStore.html#example", 1119 + "title": "PDSBlobStore", 1099 1120 "section": "", 1100 - "text": "Name\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session." 1121 + "text": "::\n&gt;&gt;&gt; store = PDSBlobStore(client)\n&gt;&gt;&gt; urls = store.write_shards(dataset, prefix=\"training/v1\")\n&gt;&gt;&gt; # Returns AT URIs like:\n&gt;&gt;&gt; # ['at://did:plc:abc/blob/bafyrei...', ...]" 1101 1122 }, 1102 1123 { 1103 - "objectID": "api/AtmosphereClient.html#methods", 1104 - "href": "api/AtmosphereClient.html#methods", 1105 - "title": "AtmosphereClient", 1124 + "objectID": "api/PDSBlobStore.html#methods", 1125 + "href": "api/PDSBlobStore.html#methods", 1126 + "title": "PDSBlobStore", 1106 1127 "section": "", 1107 - "text": "Name\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1128 + "text": "Name\nDescription\n\n\n\n\ncreate_source\nCreate a BlobSource for reading these AT URIs.\n\n\nread_url\nResolve an AT URI blob reference to an HTTP URL.\n\n\nsupports_streaming\nPDS blobs support streaming via HTTP.\n\n\nwrite_shards\nWrite dataset shards as PDS blobs.\n\n\n\n\n\natmosphere.PDSBlobStore.create_source(urls)\nCreate a BlobSource for reading these AT URIs.\nThis is a convenience method for creating a DataSource that can stream the blobs written by this store.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurls\nlist[str]\nList of AT URIs from write_shards().\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'BlobSource'\nBlobSource configured for the given URLs.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URLs are not valid AT URIs.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.read_url(url)\nResolve an AT URI blob reference to an HTTP URL.\nTransforms at://did/blob/cid URIs to HTTP URLs that can be streamed by WebDataset.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nurl\nstr\nAT URI in format at://{did}/blob/{cid}.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nHTTP URL for fetching the blob via PDS API.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf URL format is invalid or PDS cannot be resolved.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.supports_streaming()\nPDS blobs support streaming via HTTP.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbool\nTrue.\n\n\n\n\n\n\n\natmosphere.PDSBlobStore.write_shards(\n ds,\n *,\n prefix,\n maxcount=10000,\n maxsize=3000000000.0,\n **kwargs,\n)\nWrite dataset shards as PDS blobs.\nCreates tar archives from the dataset and uploads each as a blob to the authenticated user’s PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\n'Dataset'\nThe Dataset to write.\nrequired\n\n\nprefix\nstr\nLogical path prefix for naming (used in shard names only).\nrequired\n\n\nmaxcount\nint\nMaximum samples per shard (default: 10000).\n10000\n\n\nmaxsize\nfloat\nMaximum shard size in bytes (default: 3GB, PDS limit).\n3000000000.0\n\n\n**kwargs\nAny\nAdditional args passed to wds.ShardWriter.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[str]\nList of AT URIs for the written blobs, in format:\n\n\n\nlist[str]\nat://{did}/blob/{cid}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\nRuntimeError\nIf no shards were written.\n\n\n\n\n\n\nPDS blobs have size limits (typically 50MB-5GB depending on PDS). Adjust maxcount/maxsize to stay within limits." 1108 1129 }, 1109 1130 { 1110 1131 "objectID": "api/DictSample.html", ··· 1167 1188 "href": "api/AtmosphereIndex.html", 1168 1189 "title": "AtmosphereIndex", 1169 1190 "section": "", 1170 - "text": "atmosphere.AtmosphereIndex(client)\nATProto index implementing AbstractIndex protocol.\nWraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.\n\n\n::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"handle.bsky.social\", \"app-password\")\n&gt;&gt;&gt;\n&gt;&gt;&gt; index = AtmosphereIndex(client)\n&gt;&gt;&gt; schema_ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; entry = index.insert_dataset(dataset, name=\"my-data\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nDynamically generated Packable type.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a schema.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable name.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional schema AT URI. If None, auto-publishes schema.\nNone\n\n\n**kwargs\n\nAdditional options (description, tags, license).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[AtmosphereIndexEntry]\nList of AtmosphereIndexEntry for each dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string.\n'1.0.0'\n\n\n**kwargs\n\nAdditional options (description, metadata).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the schema record." 1191 + "text": "atmosphere.AtmosphereIndex(client, *, data_store=None)\nATProto index implementing AbstractIndex protocol.\nWraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide a unified interface compatible with LocalIndex.\nOptionally accepts a PDSBlobStore for writing dataset shards as ATProto blobs, enabling fully decentralized dataset storage.\n\n\n::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"handle.bsky.social\", \"app-password\")\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Without blob storage (external URLs only)\n&gt;&gt;&gt; index = AtmosphereIndex(client)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # With PDS blob storage\n&gt;&gt;&gt; store = PDSBlobStore(client)\n&gt;&gt;&gt; index = AtmosphereIndex(client, data_store=store)\n&gt;&gt;&gt; entry = index.insert_dataset(dataset, name=\"my-data\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndata_store\nThe PDS blob store for writing shards, or None if not configured.\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol).\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python type from a schema record.\n\n\nget_dataset\nGet a dataset by AT URI.\n\n\nget_schema\nGet a schema record by AT URI.\n\n\ninsert_dataset\nInsert a dataset into ATProto.\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\npublish_schema\nPublish a schema to ATProto.\n\n\n\n\n\natmosphere.AtmosphereIndex.decode_schema(ref)\nReconstruct a Python type from a schema record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nDynamically generated Packable type.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_dataset(ref)\nGet a dataset by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the dataset record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.get_schema(ref)\nGet a schema record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nAT URI of the schema record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf record is not a schema.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.insert_dataset(\n ds,\n *,\n name,\n schema_ref=None,\n **kwargs,\n)\nInsert a dataset into ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to publish.\nrequired\n\n\nname\nstr\nHuman-readable name.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional schema AT URI. If None, auto-publishes schema.\nNone\n\n\n**kwargs\n\nAdditional options (description, tags, license).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtmosphereIndexEntry\nAtmosphereIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_datasets(repo=None)\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[AtmosphereIndexEntry]\nList of AtmosphereIndexEntry for each dataset.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.list_schemas(repo=None)\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nDID of repository. Defaults to authenticated user.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\natmosphere.AtmosphereIndex.publish_schema(\n sample_type,\n *,\n version='1.0.0',\n **kwargs,\n)\nPublish a schema to ATProto.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string.\n'1.0.0'\n\n\n**kwargs\n\nAdditional options (description, metadata).\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nAT URI of the schema record." 1171 1192 }, 1172 1193 { 1173 1194 "objectID": "api/AtmosphereIndex.html#example", 1174 1195 "href": "api/AtmosphereIndex.html#example", 1175 1196 "title": "AtmosphereIndex", 1176 1197 "section": "", 1177 - "text": "::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"handle.bsky.social\", \"app-password\")\n&gt;&gt;&gt;\n&gt;&gt;&gt; index = AtmosphereIndex(client)\n&gt;&gt;&gt; schema_ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; entry = index.insert_dataset(dataset, name=\"my-data\")" 1198 + "text": "::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"handle.bsky.social\", \"app-password\")\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Without blob storage (external URLs only)\n&gt;&gt;&gt; index = AtmosphereIndex(client)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # With PDS blob storage\n&gt;&gt;&gt; store = PDSBlobStore(client)\n&gt;&gt;&gt; index = AtmosphereIndex(client, data_store=store)\n&gt;&gt;&gt; entry = index.insert_dataset(dataset, name=\"my-data\")" 1178 1199 }, 1179 1200 { 1180 1201 "objectID": "api/AtmosphereIndex.html#attributes", 1181 1202 "href": "api/AtmosphereIndex.html#attributes", 1182 1203 "title": "AtmosphereIndex", 1183 1204 "section": "", 1184 - "text": "Name\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol)." 1205 + "text": "Name\nDescription\n\n\n\n\ndata_store\nThe PDS blob store for writing shards, or None if not configured.\n\n\ndatasets\nLazily iterate over all dataset entries (AbstractIndex protocol).\n\n\nschemas\nLazily iterate over all schema records (AbstractIndex protocol)." 1185 1206 }, 1186 1207 { 1187 1208 "objectID": "api/AtmosphereIndex.html#methods", ··· 1272 1293 "href": "api/local.Index.html", 1273 1294 "title": "local.Index", 1274 1295 "section": "", 1275 - "text": "local.Index(\n redis=None,\n data_store=None,\n auto_stubs=False,\n stub_dir=None,\n **kwargs,\n)\nRedis-backed index for tracking datasets in a repository.\nImplements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.\nWhen initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_redis\n\nRedis connection for index storage.\n\n\n_data_store\n\nOptional AbstractDataStore for writing dataset shards.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # After enabling auto_stubs and configuring IDE extraPaths:\n&gt;&gt;&gt; from local.MySample_1_0_0 import MySample\n&gt;&gt;&gt;\n&gt;&gt;&gt; # This gives full IDE autocomplete:\n&gt;&gt;&gt; DecodedType = index.decode_schema_as(ref, MySample)\n&gt;&gt;&gt; sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; index = LocalIndex(auto_stubs=True)\n&gt;&gt;&gt; ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; index.load_schema(ref)\n&gt;&gt;&gt; print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n&gt;&gt;&gt; # Then in your code:\n&gt;&gt;&gt; # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.&lt;ClassName&gt; after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # Load and use immediately\n&gt;&gt;&gt; MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n&gt;&gt;&gt; sample = MyType(name=\"hello\", value=42)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Or access later via namespace\n&gt;&gt;&gt; index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n&gt;&gt;&gt; other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nThe PackableSample subclass to publish.\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf a field type is not supported." 1296 + "text": "local.Index(\n redis=None,\n data_store=None,\n auto_stubs=False,\n stub_dir=None,\n **kwargs,\n)\nRedis-backed index for tracking datasets in a repository.\nImplements the AbstractIndex protocol. Maintains a registry of LocalDatasetEntry objects in Redis, allowing enumeration and lookup of stored datasets.\nWhen initialized with a data_store, insert_dataset() will write dataset shards to storage before indexing. Without a data_store, insert_dataset() only indexes existing URLs.\n\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n_redis\n\nRedis connection for index storage.\n\n\n_data_store\n\nOptional AbstractDataStore for writing dataset shards.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # After enabling auto_stubs and configuring IDE extraPaths:\n&gt;&gt;&gt; from local.MySample_1_0_0 import MySample\n&gt;&gt;&gt;\n&gt;&gt;&gt; # This gives full IDE autocomplete:\n&gt;&gt;&gt; DecodedType = index.decode_schema_as(ref, MySample)\n&gt;&gt;&gt; sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; index = LocalIndex(auto_stubs=True)\n&gt;&gt;&gt; ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; index.load_schema(ref)\n&gt;&gt;&gt; print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n&gt;&gt;&gt; # Then in your code:\n&gt;&gt;&gt; # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.&lt;ClassName&gt; after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # Load and use immediately\n&gt;&gt;&gt; MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n&gt;&gt;&gt; sample = MyType(name=\"hello\", value=42)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Or access later via namespace\n&gt;&gt;&gt; index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n&gt;&gt;&gt; other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\ntype\nA Packable type (@packable-decorated or PackableSample subclass).\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf sample_type doesn’t satisfy the Packable protocol, or if a field type is not supported." 1276 1297 }, 1277 1298 { 1278 1299 "objectID": "api/local.Index.html#attributes", ··· 1286 1307 "href": "api/local.Index.html#methods", 1287 1308 "title": "local.Index", 1288 1309 "section": "", 1289 - "text": "Name\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # After enabling auto_stubs and configuring IDE extraPaths:\n&gt;&gt;&gt; from local.MySample_1_0_0 import MySample\n&gt;&gt;&gt;\n&gt;&gt;&gt; # This gives full IDE autocomplete:\n&gt;&gt;&gt; DecodedType = index.decode_schema_as(ref, MySample)\n&gt;&gt;&gt; sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; index = LocalIndex(auto_stubs=True)\n&gt;&gt;&gt; ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; index.load_schema(ref)\n&gt;&gt;&gt; print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n&gt;&gt;&gt; # Then in your code:\n&gt;&gt;&gt; # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.&lt;ClassName&gt; after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # Load and use immediately\n&gt;&gt;&gt; MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n&gt;&gt;&gt; sample = MyType(name=\"hello\", value=42)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Or access later via namespace\n&gt;&gt;&gt; index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n&gt;&gt;&gt; other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nThe PackableSample subclass to publish.\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf a field type is not supported." 1310 + "text": "Name\nDescription\n\n\n\n\nadd_entry\nAdd a dataset to the index.\n\n\nclear_stubs\nRemove all auto-generated stub files.\n\n\ndecode_schema\nReconstruct a Python PackableSample type from a stored schema.\n\n\ndecode_schema_as\nDecode a schema with explicit type hint for IDE support.\n\n\nget_dataset\nGet a dataset entry by name (AbstractIndex protocol).\n\n\nget_entry\nGet an entry by its CID.\n\n\nget_entry_by_name\nGet an entry by its human-readable name.\n\n\nget_import_path\nGet the import path for a schema’s generated module.\n\n\nget_schema\nGet a schema record by reference (AbstractIndex protocol).\n\n\nget_schema_record\nGet a schema record as LocalSchemaRecord object.\n\n\ninsert_dataset\nInsert a dataset into the index (AbstractIndex protocol).\n\n\nlist_datasets\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\nlist_entries\nGet all index entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\nload_schema\nLoad a schema and make it available in the types namespace.\n\n\npublish_schema\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nlocal.Index.add_entry(ds, *, name, schema_ref=None, metadata=None)\nAdd a dataset to the index.\nCreates a LocalDatasetEntry for the dataset and persists it to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe dataset to add to the index.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference. If None, generates from sample type.\nNone\n\n\nmetadata\ndict | None\nOptional metadata dictionary. If None, uses ds._metadata if available.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nThe created LocalDatasetEntry object.\n\n\n\n\n\n\n\nlocal.Index.clear_stubs()\nRemove all auto-generated stub files.\nOnly works if auto_stubs was enabled when creating the Index.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nint\nNumber of stub files removed, or 0 if auto_stubs is disabled.\n\n\n\n\n\n\n\nlocal.Index.decode_schema(ref)\nReconstruct a Python PackableSample type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a PackableSample subclass matching the schema definition.\nIf auto_stubs is enabled, a Python module will be generated and the class will be imported from it, providing full IDE autocomplete support. The returned class has proper type information that IDEs can understand.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA PackableSample subclass - either imported from a generated module\n\n\n\nType[Packable]\n(if auto_stubs is enabled) or dynamically created.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n\nlocal.Index.decode_schema_as(ref, type_hint)\nDecode a schema with explicit type hint for IDE support.\nThis is a typed wrapper around decode_schema() that preserves the type information for IDE autocomplete. Use this when you have a stub file for the schema and want full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\ntype_hint\ntype[T]\nThe stub type to use for type hints. Import this from the generated stub file.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ntype[T]\nThe decoded type, cast to match the type_hint for IDE support.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # After enabling auto_stubs and configuring IDE extraPaths:\n&gt;&gt;&gt; from local.MySample_1_0_0 import MySample\n&gt;&gt;&gt;\n&gt;&gt;&gt; # This gives full IDE autocomplete:\n&gt;&gt;&gt; DecodedType = index.decode_schema_as(ref, MySample)\n&gt;&gt;&gt; sample = DecodedType(text=\"hello\", value=42) # IDE knows signature!\n\n\n\nThe type_hint is only used for static type checking - at runtime, the actual decoded type from the schema is returned. Ensure the stub matches the schema to avoid runtime surprises.\n\n\n\n\nlocal.Index.get_dataset(ref)\nGet a dataset entry by name (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry(cid)\nGet an entry by its CID.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncid\nstr\nContent identifier of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry for the given CID.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf entry not found.\n\n\n\n\n\n\n\nlocal.Index.get_entry_by_name(name)\nGet an entry by its human-readable name.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nname\nstr\nHuman-readable name of the entry.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nLocalDatasetEntry with the given name.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf no entry with that name exists.\n\n\n\n\n\n\n\nlocal.Index.get_import_path(ref)\nGet the import path for a schema’s generated module.\nWhen auto_stubs is enabled, this returns the import path that can be used to import the schema type with full IDE support.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr | None\nImport path like “local.MySample_1_0_0”, or None if auto_stubs\n\n\n\nstr | None\nis disabled.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; index = LocalIndex(auto_stubs=True)\n&gt;&gt;&gt; ref = index.publish_schema(MySample, version=\"1.0.0\")\n&gt;&gt;&gt; index.load_schema(ref)\n&gt;&gt;&gt; print(index.get_import_path(ref))\nlocal.MySample_1_0_0\n&gt;&gt;&gt; # Then in your code:\n&gt;&gt;&gt; # from local.MySample_1_0_0 import MySample\n\n\n\n\nlocal.Index.get_schema(ref)\nGet a schema record by reference (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string. Supports both new format (atdata://local/sampleSchema/{name}@version) and legacy format (local://schemas/{module.Class}@version).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with keys ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, ‘$ref’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.get_schema_record(ref)\nGet a schema record as LocalSchemaRecord object.\nUse this when you need the full LocalSchemaRecord with typed properties. For Protocol-compliant dict access, use get_schema() instead.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalSchemaRecord\nLocalSchemaRecord with schema details.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf reference format is invalid.\n\n\n\n\n\n\n\nlocal.Index.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index (AbstractIndex protocol).\nIf a data_store was provided at initialization, writes dataset shards to storage first, then indexes the new URLs. Otherwise, indexes the dataset’s existing URL.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register.\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nstr | None\nOptional schema reference.\nNone\n\n\n**kwargs\n\nAdditional options: - metadata: Optional metadata dict - prefix: Storage prefix (default: dataset name) - cache_local: If True, cache writes locally first\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nLocalDatasetEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nlocal.Index.list_datasets()\nGet all dataset entries as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nlocal.Index.list_entries()\nGet all index entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[LocalDatasetEntry]\nList of all LocalDatasetEntry objects in the index.\n\n\n\n\n\n\n\nlocal.Index.list_schemas()\nGet all schema records as a materialized list (AbstractIndex protocol).\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nlocal.Index.load_schema(ref)\nLoad a schema and make it available in the types namespace.\nThis method decodes the schema, optionally generates a Python module for IDE support (if auto_stubs is enabled), and registers the type in the :attr:types namespace for easy access.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (atdata://local/sampleSchema/… or legacy local://schemas/…).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nThe decoded PackableSample subclass. Also available via\n\n\n\nType[Packable]\nindex.types.&lt;ClassName&gt; after this call.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; # Load and use immediately\n&gt;&gt;&gt; MyType = index.load_schema(\"atdata://local/sampleSchema/MySample@1.0.0\")\n&gt;&gt;&gt; sample = MyType(name=\"hello\", value=42)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Or access later via namespace\n&gt;&gt;&gt; index.load_schema(\"atdata://local/sampleSchema/OtherType@1.0.0\")\n&gt;&gt;&gt; other = index.types.OtherType(data=\"test\")\n\n\n\n\nlocal.Index.publish_schema(sample_type, *, version=None, description=None)\nPublish a schema for a sample type to Redis.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\ntype\nA Packable type (@packable-decorated or PackableSample subclass).\nrequired\n\n\nversion\nstr | None\nSemantic version string (e.g., ‘1.0.0’). If None, auto-increments from the latest published version (patch bump), or starts at ‘1.0.0’ if no previous version exists.\nNone\n\n\ndescription\nstr | None\nOptional human-readable description. If None, uses the class docstring.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string: ‘atdata://local/sampleSchema/{name}@version’.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf sample_type is not a dataclass.\n\n\n\nTypeError\nIf sample_type doesn’t satisfy the Packable protocol, or if a field type is not supported." 1290 1311 }, 1291 1312 { 1292 1313 "objectID": "api/Dataset.html", ··· 1626 1647 "href": "api/AbstractIndex.html", 1627 1648 "title": "AbstractIndex", 1628 1649 "section": "", 1629 - "text": "AbstractIndex()\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\nThis protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses\nA single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.\n\n\nSome index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.\n\n\n\n::\n&gt;&gt;&gt; def publish_and_list(index: AbstractIndex) -&gt; None:\n... # Publish schemas for different types\n... schema1 = index.publish_schema(ImageSample, version=\"1.0.0\")\n... schema2 = index.publish_schema(TextSample, version=\"1.0.0\")\n...\n... # Insert datasets of different types\n... index.insert_dataset(image_ds, name=\"images\")\n... index.insert_dataset(text_ds, name=\"texts\")\n...\n... # List all datasets (mixed types)\n... for entry in index.list_datasets():\n... print(f\"{entry.name} -&gt; {entry.schema_ref}\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n&gt;&gt;&gt; entry = index.get_dataset(\"my-dataset\")\n&gt;&gt;&gt; SampleType = index.decode_schema(entry.schema_ref)\n&gt;&gt;&gt; ds = Dataset[SampleType](entry.data_urls[0])\n&gt;&gt;&gt; for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1650 + "text": "AbstractIndex()\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\nThis protocol defines the common interface for managing dataset metadata: - Publishing and retrieving schemas - Inserting and listing datasets - (Future) Publishing and retrieving lenses\nA single index can hold datasets of many different sample types. The sample type is tracked via schema references, not as a generic parameter on the index.\n\n\nSome index implementations support additional features: - data_store: An AbstractDataStore for reading/writing dataset shards. If present, load_dataset will use it for S3 credential resolution.\n\n\n\n::\n&gt;&gt;&gt; def publish_and_list(index: AbstractIndex) -&gt; None:\n... # Publish schemas for different types\n... schema1 = index.publish_schema(ImageSample, version=\"1.0.0\")\n... schema2 = index.publish_schema(TextSample, version=\"1.0.0\")\n...\n... # Insert datasets of different types\n... index.insert_dataset(image_ds, name=\"images\")\n... index.insert_dataset(text_ds, name=\"texts\")\n...\n... # List all datasets (mixed types)\n... for entry in index.list_datasets():\n... print(f\"{entry.name} -&gt; {entry.schema_ref}\")\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndata_store\nOptional data store for reading/writing shards.\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n&gt;&gt;&gt; entry = index.get_dataset(\"my-dataset\")\n&gt;&gt;&gt; SampleType = index.decode_schema(entry.schema_ref)\n&gt;&gt;&gt; ds = Dataset[SampleType](entry.data_urls[0])\n&gt;&gt;&gt; for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\nThe sample_type is accepted as type rather than Type[Packable] to support @packable-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\ntype\nA Packable type (PackableSample subclass or @packable-decorated). Validated at runtime via the @runtime_checkable Packable protocol.\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1630 1651 }, 1631 1652 { 1632 1653 "objectID": "api/AbstractIndex.html#optional-extensions", ··· 1647 1668 "href": "api/AbstractIndex.html#attributes", 1648 1669 "title": "AbstractIndex", 1649 1670 "section": "", 1650 - "text": "Name\nDescription\n\n\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index." 1671 + "text": "Name\nDescription\n\n\n\n\ndata_store\nOptional data store for reading/writing shards.\n\n\ndatasets\nLazily iterate over all dataset entries in this index.\n\n\nschemas\nLazily iterate over all schema records in this index." 1651 1672 }, 1652 1673 { 1653 1674 "objectID": "api/AbstractIndex.html#methods", 1654 1675 "href": "api/AbstractIndex.html#methods", 1655 1676 "title": "AbstractIndex", 1656 1677 "section": "", 1657 - "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n&gt;&gt;&gt; entry = index.get_dataset(\"my-dataset\")\n&gt;&gt;&gt; SampleType = index.decode_schema(entry.schema_ref)\n&gt;&gt;&gt; ds = Dataset[SampleType](entry.data_urls[0])\n&gt;&gt;&gt; for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\nType[Packable]\nA Packable type (PackableSample subclass or @packable-decorated).\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1678 + "text": "Name\nDescription\n\n\n\n\ndecode_schema\nReconstruct a Python Packable type from a stored schema.\n\n\nget_dataset\nGet a dataset entry by name or reference.\n\n\nget_schema\nGet a schema record by reference.\n\n\ninsert_dataset\nInsert a dataset into the index.\n\n\nlist_datasets\nGet all dataset entries as a materialized list.\n\n\nlist_schemas\nGet all schema records as a materialized list.\n\n\npublish_schema\nPublish a schema for a sample type.\n\n\n\n\n\nAbstractIndex.decode_schema(ref)\nReconstruct a Python Packable type from a stored schema.\nThis method enables loading datasets without knowing the sample type ahead of time. The index retrieves the schema record and dynamically generates a Packable class matching the schema definition.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nType[Packable]\nA dynamically generated Packable class with fields matching\n\n\n\nType[Packable]\nthe schema definition. The class can be used with\n\n\n\nType[Packable]\nDataset[T] to load and iterate over samples.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\nValueError\nIf schema cannot be decoded (unsupported field types).\n\n\n\n\n\n\n::\n&gt;&gt;&gt; entry = index.get_dataset(\"my-dataset\")\n&gt;&gt;&gt; SampleType = index.decode_schema(entry.schema_ref)\n&gt;&gt;&gt; ds = Dataset[SampleType](entry.data_urls[0])\n&gt;&gt;&gt; for sample in ds.ordered():\n... print(sample) # sample is instance of SampleType\n\n\n\n\nAbstractIndex.get_dataset(ref)\nGet a dataset entry by name or reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nDataset name, path, or full reference string.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the dataset.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf dataset not found.\n\n\n\n\n\n\n\nAbstractIndex.get_schema(ref)\nGet a schema record by reference.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nref\nstr\nSchema reference string (local:// or at://).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nSchema record as a dictionary with fields like ‘name’, ‘version’,\n\n\n\ndict\n‘fields’, etc.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf schema not found.\n\n\n\n\n\n\n\nAbstractIndex.insert_dataset(ds, *, name, schema_ref=None, **kwargs)\nInsert a dataset into the index.\nThe sample type is inferred from ds.sample_type. If schema_ref is not provided, the schema may be auto-published based on the sample type.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nds\nDataset\nThe Dataset to register in the index (any sample type).\nrequired\n\n\nname\nstr\nHuman-readable name for the dataset.\nrequired\n\n\nschema_ref\nOptional[str]\nOptional explicit schema reference. If not provided, the schema may be auto-published or inferred from ds.sample_type.\nNone\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIndexEntry\nIndexEntry for the inserted dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_datasets()\nGet all dataset entries as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[IndexEntry]\nList of IndexEntry for each dataset.\n\n\n\n\n\n\n\nAbstractIndex.list_schemas()\nGet all schema records as a materialized list.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records as dictionaries.\n\n\n\n\n\n\n\nAbstractIndex.publish_schema(sample_type, *, version='1.0.0', **kwargs)\nPublish a schema for a sample type.\nThe sample_type is accepted as type rather than Type[Packable] to support @packable-decorated classes, which satisfy the Packable protocol at runtime but cannot be statically verified by type checkers.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsample_type\ntype\nA Packable type (PackableSample subclass or @packable-decorated). Validated at runtime via the @runtime_checkable Packable protocol.\nrequired\n\n\nversion\nstr\nSemantic version string for the schema.\n'1.0.0'\n\n\n**kwargs\n\nAdditional backend-specific options.\n{}\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSchema reference string:\n\n\n\nstr\n- Local: ‘local://schemas/{module.Class}@version’\n\n\n\nstr\n- Atmosphere: ‘at://did:plc:…/ac.foundation.dataset.sampleSchema/…’" 1658 1679 }, 1659 1680 { 1660 1681 "objectID": "api/local.LocalDatasetEntry.html", ··· 1731 1752 "href": "api/index.html", 1732 1753 "title": "API Reference", 1733 1754 "section": "", 1734 - "text": "Core types, decorators, and dataset classes\n\n\n\npackable\nDecorator to convert a regular class into a PackableSample.\n\n\nPackableSample\nBase class for samples that can be serialized with msgpack.\n\n\nDictSample\nDynamic sample type providing dict-like access to raw msgpack data.\n\n\nDataset\nA typed dataset built on WebDataset with lens transformations.\n\n\nSampleBatch\nA batch of samples with automatic attribute aggregation.\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nlens\nLens-based type transformations for datasets.\n\n\nload_dataset\nLoad a dataset from local files, remote URLs, or an index.\n\n\nDatasetDict\nA dictionary of split names to Dataset instances.\n\n\n\n\n\n\nAbstract protocols for storage backends\n\n\n\nPackable\nStructural protocol for packable sample types.\n\n\nIndexEntry\nCommon interface for index entries (local or atmosphere).\n\n\nAbstractIndex\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\n\n\nAbstractDataStore\nProtocol for data storage operations.\n\n\nDataSource\nProtocol for data sources that provide streams to Dataset.\n\n\n\n\n\n\nData source implementations for streaming\n\n\n\nURLSource\nData source for WebDataset-compatible URLs.\n\n\nS3Source\nData source for S3-compatible storage with explicit credentials.\n\n\n\n\n\n\nLocal Redis/S3 storage backend\n\n\n\nlocal.Index\nRedis-backed index for tracking datasets in a repository.\n\n\nlocal.LocalDatasetEntry\nIndex entry for a dataset stored in the local repository.\n\n\nlocal.S3DataStore\nS3-compatible data store implementing AbstractDataStore protocol.\n\n\n\n\n\n\nATProto federation\n\n\n\nAtmosphereClient\nATProto client wrapper for atdata operations.\n\n\nAtmosphereIndex\nATProto index implementing AbstractIndex protocol.\n\n\nAtmosphereIndexEntry\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\n\n\nSchemaPublisher\nPublishes PackableSample schemas to ATProto.\n\n\nSchemaLoader\nLoads PackableSample schemas from ATProto.\n\n\nDatasetPublisher\nPublishes dataset index records to ATProto.\n\n\nDatasetLoader\nLoads dataset records from ATProto.\n\n\nLensPublisher\nPublishes Lens transformation records to ATProto.\n\n\nLensLoader\nLoads lens records from ATProto.\n\n\nAtUri\nParsed AT Protocol URI.\n\n\n\n\n\n\nLocal to atmosphere migration\n\n\n\npromote_to_atmosphere\nPromote a local dataset to the atmosphere network." 1755 + "text": "Core types, decorators, and dataset classes\n\n\n\npackable\nDecorator to convert a regular class into a PackableSample.\n\n\nPackableSample\nBase class for samples that can be serialized with msgpack.\n\n\nDictSample\nDynamic sample type providing dict-like access to raw msgpack data.\n\n\nDataset\nA typed dataset built on WebDataset with lens transformations.\n\n\nSampleBatch\nA batch of samples with automatic attribute aggregation.\n\n\nLens\nA bidirectional transformation between two sample types.\n\n\nlens\nLens-based type transformations for datasets.\n\n\nload_dataset\nLoad a dataset from local files, remote URLs, or an index.\n\n\nDatasetDict\nA dictionary of split names to Dataset instances.\n\n\n\n\n\n\nAbstract protocols for storage backends\n\n\n\nPackable\nStructural protocol for packable sample types.\n\n\nIndexEntry\nCommon interface for index entries (local or atmosphere).\n\n\nAbstractIndex\nProtocol for index operations - implemented by LocalIndex and AtmosphereIndex.\n\n\nAbstractDataStore\nProtocol for data storage operations.\n\n\nDataSource\nProtocol for data sources that provide streams to Dataset.\n\n\n\n\n\n\nData source implementations for streaming\n\n\n\nURLSource\nData source for WebDataset-compatible URLs.\n\n\nS3Source\nData source for S3-compatible storage with explicit credentials.\n\n\nBlobSource\nData source for ATProto PDS blob storage.\n\n\n\n\n\n\nLocal Redis/S3 storage backend\n\n\n\nlocal.Index\nRedis-backed index for tracking datasets in a repository.\n\n\nlocal.LocalDatasetEntry\nIndex entry for a dataset stored in the local repository.\n\n\nlocal.S3DataStore\nS3-compatible data store implementing AbstractDataStore protocol.\n\n\n\n\n\n\nATProto federation\n\n\n\nAtmosphereClient\nATProto client wrapper for atdata operations.\n\n\nAtmosphereIndex\nATProto index implementing AbstractIndex protocol.\n\n\nAtmosphereIndexEntry\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\n\n\nPDSBlobStore\nPDS blob store implementing AbstractDataStore protocol.\n\n\nSchemaPublisher\nPublishes PackableSample schemas to ATProto.\n\n\nSchemaLoader\nLoads PackableSample schemas from ATProto.\n\n\nDatasetPublisher\nPublishes dataset index records to ATProto.\n\n\nDatasetLoader\nLoads dataset records from ATProto.\n\n\nLensPublisher\nPublishes Lens transformation records to ATProto.\n\n\nLensLoader\nLoads lens records from ATProto.\n\n\nAtUri\nParsed AT Protocol URI.\n\n\n\n\n\n\nLocal to atmosphere migration\n\n\n\npromote_to_atmosphere\nPromote a local dataset to the atmosphere network." 1735 1756 }, 1736 1757 { 1737 1758 "objectID": "api/index.html#core", ··· 1752 1773 "href": "api/index.html#data-sources", 1753 1774 "title": "API Reference", 1754 1775 "section": "", 1755 - "text": "Data source implementations for streaming\n\n\n\nURLSource\nData source for WebDataset-compatible URLs.\n\n\nS3Source\nData source for S3-compatible storage with explicit credentials." 1776 + "text": "Data source implementations for streaming\n\n\n\nURLSource\nData source for WebDataset-compatible URLs.\n\n\nS3Source\nData source for S3-compatible storage with explicit credentials.\n\n\nBlobSource\nData source for ATProto PDS blob storage." 1756 1777 }, 1757 1778 { 1758 1779 "objectID": "api/index.html#local-storage", ··· 1766 1787 "href": "api/index.html#atmosphere", 1767 1788 "title": "API Reference", 1768 1789 "section": "", 1769 - "text": "ATProto federation\n\n\n\nAtmosphereClient\nATProto client wrapper for atdata operations.\n\n\nAtmosphereIndex\nATProto index implementing AbstractIndex protocol.\n\n\nAtmosphereIndexEntry\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\n\n\nSchemaPublisher\nPublishes PackableSample schemas to ATProto.\n\n\nSchemaLoader\nLoads PackableSample schemas from ATProto.\n\n\nDatasetPublisher\nPublishes dataset index records to ATProto.\n\n\nDatasetLoader\nLoads dataset records from ATProto.\n\n\nLensPublisher\nPublishes Lens transformation records to ATProto.\n\n\nLensLoader\nLoads lens records from ATProto.\n\n\nAtUri\nParsed AT Protocol URI." 1790 + "text": "ATProto federation\n\n\n\nAtmosphereClient\nATProto client wrapper for atdata operations.\n\n\nAtmosphereIndex\nATProto index implementing AbstractIndex protocol.\n\n\nAtmosphereIndexEntry\nEntry wrapper for ATProto dataset records implementing IndexEntry protocol.\n\n\nPDSBlobStore\nPDS blob store implementing AbstractDataStore protocol.\n\n\nSchemaPublisher\nPublishes PackableSample schemas to ATProto.\n\n\nSchemaLoader\nLoads PackableSample schemas from ATProto.\n\n\nDatasetPublisher\nPublishes dataset index records to ATProto.\n\n\nDatasetLoader\nLoads dataset records from ATProto.\n\n\nLensPublisher\nPublishes Lens transformation records to ATProto.\n\n\nLensLoader\nLoads lens records from ATProto.\n\n\nAtUri\nParsed AT Protocol URI." 1770 1791 }, 1771 1792 { 1772 1793 "objectID": "api/index.html#promotion", ··· 1916 1937 "text": "::\n&gt;&gt;&gt; # Load without type - get DictSample for exploration\n&gt;&gt;&gt; ds = load_dataset(\"./data/train.tar\", split=\"train\")\n&gt;&gt;&gt; for sample in ds.ordered():\n... print(sample.keys()) # Explore fields\n... print(sample[\"text\"]) # Dict-style access\n... print(sample.label) # Attribute access\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Convert to typed schema\n&gt;&gt;&gt; typed_ds = ds.as_type(TextData)\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Or load with explicit type directly\n&gt;&gt;&gt; train_ds = load_dataset(\"./data/train-*.tar\", TextData, split=\"train\")\n&gt;&gt;&gt;\n&gt;&gt;&gt; # Load from index with auto-type resolution\n&gt;&gt;&gt; index = LocalIndex()\n&gt;&gt;&gt; ds = load_dataset(\"@local/my-dataset\", index=index, split=\"train\")" 1917 1938 }, 1918 1939 { 1919 - "objectID": "api/PackableSample.html", 1920 - "href": "api/PackableSample.html", 1921 - "title": "PackableSample", 1940 + "objectID": "api/AtmosphereClient.html", 1941 + "href": "api/AtmosphereClient.html", 1942 + "title": "AtmosphereClient", 1922 1943 "section": "", 1923 - "text": "PackableSample()\nBase class for samples that can be serialized with msgpack.\nThis abstract base class provides automatic serialization/deserialization for dataclass-based samples. Fields annotated as NDArray or NDArray | None are automatically converted between numpy arrays and bytes during packing/unpacking.\nSubclasses should be defined either by: 1. Direct inheritance with the @dataclass decorator 2. Using the @packable decorator (recommended)\n\n\n::\n&gt;&gt;&gt; @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n&gt;&gt;&gt; sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n&gt;&gt;&gt; packed = sample.packed # Serialize to bytes\n&gt;&gt;&gt; restored = MyData.from_bytes(packed) # Deserialize\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 1944 + "text": "atmosphere.AtmosphereClient(base_url=None, *, _client=None)\nATProto client wrapper for atdata operations.\nThis class wraps the atproto SDK client and provides higher-level methods for working with atdata records (schemas, datasets, lenses).\n\n\n::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"alice.bsky.social\", \"app-password\")\n&gt;&gt;&gt; print(client.did)\n'did:plc:...'\n\n\n\nThe password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings.\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session.\n\n\n\n\n\n\n\n\n\nName\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1924 1945 }, 1925 1946 { 1926 - "objectID": "api/PackableSample.html#example", 1927 - "href": "api/PackableSample.html#example", 1928 - "title": "PackableSample", 1947 + "objectID": "api/AtmosphereClient.html#example", 1948 + "href": "api/AtmosphereClient.html#example", 1949 + "title": "AtmosphereClient", 1950 + "section": "", 1951 + "text": "::\n&gt;&gt;&gt; client = AtmosphereClient()\n&gt;&gt;&gt; client.login(\"alice.bsky.social\", \"app-password\")\n&gt;&gt;&gt; print(client.did)\n'did:plc:...'" 1952 + }, 1953 + { 1954 + "objectID": "api/AtmosphereClient.html#note", 1955 + "href": "api/AtmosphereClient.html#note", 1956 + "title": "AtmosphereClient", 1957 + "section": "", 1958 + "text": "The password should be an app-specific password, not your main account password. Create app passwords in your Bluesky account settings." 1959 + }, 1960 + { 1961 + "objectID": "api/AtmosphereClient.html#attributes", 1962 + "href": "api/AtmosphereClient.html#attributes", 1963 + "title": "AtmosphereClient", 1964 + "section": "", 1965 + "text": "Name\nDescription\n\n\n\n\ndid\nGet the DID of the authenticated user.\n\n\nhandle\nGet the handle of the authenticated user.\n\n\nis_authenticated\nCheck if the client has a valid session." 1966 + }, 1967 + { 1968 + "objectID": "api/AtmosphereClient.html#methods", 1969 + "href": "api/AtmosphereClient.html#methods", 1970 + "title": "AtmosphereClient", 1971 + "section": "", 1972 + "text": "Name\nDescription\n\n\n\n\ncreate_record\nCreate a record in the user’s repository.\n\n\ndelete_record\nDelete a record.\n\n\nexport_session\nExport the current session for later reuse.\n\n\nget_blob\nDownload a blob from a PDS.\n\n\nget_blob_url\nGet the direct URL for fetching a blob.\n\n\nget_record\nFetch a record by AT URI.\n\n\nlist_datasets\nList dataset records.\n\n\nlist_lenses\nList lens records.\n\n\nlist_records\nList records in a collection.\n\n\nlist_schemas\nList schema records.\n\n\nlogin\nAuthenticate with the ATProto PDS.\n\n\nlogin_with_session\nAuthenticate using an exported session string.\n\n\nput_record\nCreate or update a record at a specific key.\n\n\nupload_blob\nUpload binary data as a blob to the PDS.\n\n\n\n\n\natmosphere.AtmosphereClient.create_record(\n collection,\n record,\n *,\n rkey=None,\n validate=False,\n)\nCreate a record in the user’s repository.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection (e.g., ‘ac.foundation.dataset.sampleSchema’).\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nrkey\nOptional[str]\nOptional explicit record key. If not provided, a TID is generated.\nNone\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema. Set to False for custom lexicons that the PDS doesn’t know about.\nFalse\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the created record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf record creation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.delete_record(uri, *, swap_commit=None)\nDelete a record.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record to delete.\nrequired\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap delete.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf deletion fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.export_session()\nExport the current session for later reuse.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nSession string that can be passed to login_with_session().\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob(did, cid)\nDownload a blob from a PDS.\nThis resolves the PDS endpoint from the DID document and fetches the blob directly from the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nbytes\nThe blob data as bytes.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\nrequests.HTTPError\nIf blob fetch fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_blob_url(did, cid)\nGet the direct URL for fetching a blob.\nThis is useful for passing to WebDataset or other HTTP clients.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndid\nstr\nThe DID of the repository containing the blob.\nrequired\n\n\ncid\nstr\nThe CID of the blob.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nstr\nThe full URL for fetching the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf PDS endpoint cannot be resolved.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.get_record(uri)\nFetch a record by AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nuri\nstr | AtUri\nThe AT URI of the record.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nThe record data as a dictionary.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf record not found.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_datasets(repo=None, limit=100)\nList dataset records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of dataset records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_lenses(repo=None, limit=100)\nList lens records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of lens records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_records(\n collection,\n *,\n repo=None,\n limit=100,\n cursor=None,\n)\nList records in a collection.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrepo\nOptional[str]\nThe DID of the repository to query. Defaults to the authenticated user’s repository.\nNone\n\n\nlimit\nint\nMaximum number of records to return (default 100).\n100\n\n\ncursor\nOptional[str]\nPagination cursor from a previous call.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nA tuple of (records, next_cursor). The cursor is None if there\n\n\n\nOptional[str]\nare no more records.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf repo is None and not authenticated.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.list_schemas(repo=None, limit=100)\nList schema records.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrepo\nOptional[str]\nThe DID to query. Defaults to authenticated user.\nNone\n\n\nlimit\nint\nMaximum number to return.\n100\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nlist[dict]\nList of schema records.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login(handle, password)\nAuthenticate with the ATProto PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nhandle\nstr\nYour Bluesky handle (e.g., ‘alice.bsky.social’).\nrequired\n\n\npassword\nstr\nApp-specific password (not your main password).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\natproto.exceptions.AtProtocolError\nIf authentication fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.login_with_session(session_string)\nAuthenticate using an exported session string.\nThis allows reusing a session without re-authenticating, which helps avoid rate limits on session creation.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nsession_string\nstr\nSession string from export_session().\nrequired\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.put_record(\n collection,\n rkey,\n record,\n *,\n validate=False,\n swap_commit=None,\n)\nCreate or update a record at a specific key.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ncollection\nstr\nThe NSID of the record collection.\nrequired\n\n\nrkey\nstr\nThe record key.\nrequired\n\n\nrecord\ndict\nThe record data. Must include a ‘$type’ field.\nrequired\n\n\nvalidate\nbool\nWhether to validate against the Lexicon schema.\nFalse\n\n\nswap_commit\nOptional[str]\nOptional CID for compare-and-swap update.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nAtUri\nThe AT URI of the record.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf operation fails.\n\n\n\n\n\n\n\natmosphere.AtmosphereClient.upload_blob(\n data,\n mime_type='application/octet-stream',\n)\nUpload binary data as a blob to the PDS.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nbytes\nBinary data to upload.\nrequired\n\n\nmime_type\nstr\nMIME type of the data (for reference, not enforced by PDS).\n'application/octet-stream'\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\ndict\nA blob reference dict with keys: ‘$type’, ‘ref’, ‘mimeType’, ‘size’.\n\n\n\ndict\nThis can be embedded directly in record fields.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf not authenticated.\n\n\n\natproto.exceptions.AtProtocolError\nIf upload fails." 1973 + }, 1974 + { 1975 + "objectID": "api/BlobSource.html", 1976 + "href": "api/BlobSource.html", 1977 + "title": "BlobSource", 1978 + "section": "", 1979 + "text": "BlobSource(blob_refs, pds_endpoint=None, _endpoint_cache=dict())\nData source for ATProto PDS blob storage.\nStreams dataset shards stored as blobs on an ATProto Personal Data Server. Each shard is identified by a blob reference containing the DID and CID.\nThis source resolves blob references to HTTP URLs and streams the content directly, supporting efficient iteration over shards without downloading everything upfront.\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\nblob_refs\nlist[dict[str, str]]\nList of blob reference dicts with ‘did’ and ‘cid’ keys.\n\n\npds_endpoint\nstr | None\nOptional PDS endpoint URL. If not provided, resolved from DID.\n\n\n\n\n\n\n::\n&gt;&gt;&gt; source = BlobSource(\n... blob_refs=[\n... {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei...\"},\n... {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei...\"},\n... ],\n... )\n&gt;&gt;&gt; for shard_id, stream in source.shards:\n... process(stream)\n\n\n\n\n\n\nName\nDescription\n\n\n\n\nfrom_refs\nCreate BlobSource from blob reference dicts.\n\n\nlist_shards\nReturn list of AT URI-style shard identifiers.\n\n\nopen_shard\nOpen a single shard by its AT URI.\n\n\n\n\n\nBlobSource.from_refs(refs, *, pds_endpoint=None)\nCreate BlobSource from blob reference dicts.\nAccepts blob references in the format returned by upload_blob: {\"$type\": \"blob\", \"ref\": {\"$link\": \"cid\"}, ...}\nAlso accepts simplified format: {\"did\": \"...\", \"cid\": \"...\"}\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrefs\nlist[dict]\nList of blob reference dicts.\nrequired\n\n\npds_endpoint\nstr | None\nOptional PDS endpoint to use for all blobs.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'BlobSource'\nConfigured BlobSource.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf refs is empty or format is invalid.\n\n\n\n\n\n\n\nBlobSource.list_shards()\nReturn list of AT URI-style shard identifiers.\n\n\n\nBlobSource.open_shard(shard_id)\nOpen a single shard by its AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nAT URI of the shard (at://did/blob/cid).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nStreaming response body for reading the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards().\n\n\n\nValueError\nIf shard_id format is invalid." 1980 + }, 1981 + { 1982 + "objectID": "api/BlobSource.html#attributes", 1983 + "href": "api/BlobSource.html#attributes", 1984 + "title": "BlobSource", 1929 1985 "section": "", 1930 - "text": "::\n&gt;&gt;&gt; @packable\n... class MyData:\n... name: str\n... embeddings: NDArray\n...\n&gt;&gt;&gt; sample = MyData(name=\"test\", embeddings=np.array([1.0, 2.0]))\n&gt;&gt;&gt; packed = sample.packed # Serialize to bytes\n&gt;&gt;&gt; restored = MyData.from_bytes(packed) # Deserialize" 1986 + "text": "Name\nType\nDescription\n\n\n\n\nblob_refs\nlist[dict[str, str]]\nList of blob reference dicts with ‘did’ and ‘cid’ keys.\n\n\npds_endpoint\nstr | None\nOptional PDS endpoint URL. If not provided, resolved from DID." 1931 1987 }, 1932 1988 { 1933 - "objectID": "api/PackableSample.html#attributes", 1934 - "href": "api/PackableSample.html#attributes", 1935 - "title": "PackableSample", 1989 + "objectID": "api/BlobSource.html#example", 1990 + "href": "api/BlobSource.html#example", 1991 + "title": "BlobSource", 1936 1992 "section": "", 1937 - "text": "Name\nDescription\n\n\n\n\nas_wds\nPack this sample’s data for writing to WebDataset.\n\n\npacked\nPack this sample’s data into msgpack bytes." 1993 + "text": "::\n&gt;&gt;&gt; source = BlobSource(\n... blob_refs=[\n... {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei...\"},\n... {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei...\"},\n... ],\n... )\n&gt;&gt;&gt; for shard_id, stream in source.shards:\n... process(stream)" 1938 1994 }, 1939 1995 { 1940 - "objectID": "api/PackableSample.html#methods", 1941 - "href": "api/PackableSample.html#methods", 1942 - "title": "PackableSample", 1996 + "objectID": "api/BlobSource.html#methods", 1997 + "href": "api/BlobSource.html#methods", 1998 + "title": "BlobSource", 1943 1999 "section": "", 1944 - "text": "Name\nDescription\n\n\n\n\nfrom_bytes\nCreate a sample instance from raw msgpack bytes.\n\n\nfrom_data\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nPackableSample.from_bytes(bs)\nCreate a sample instance from raw msgpack bytes.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nbs\nbytes\nRaw bytes from a msgpack-serialized sample.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nA new instance of this sample class deserialized from the bytes.\n\n\n\n\n\n\n\nPackableSample.from_data(data)\nCreate a sample instance from unpacked msgpack data.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\ndata\nWDSRawSample\nDictionary with keys matching the sample’s field names.\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nSelf\nNew instance with NDArray fields auto-converted from bytes." 2000 + "text": "Name\nDescription\n\n\n\n\nfrom_refs\nCreate BlobSource from blob reference dicts.\n\n\nlist_shards\nReturn list of AT URI-style shard identifiers.\n\n\nopen_shard\nOpen a single shard by its AT URI.\n\n\n\n\n\nBlobSource.from_refs(refs, *, pds_endpoint=None)\nCreate BlobSource from blob reference dicts.\nAccepts blob references in the format returned by upload_blob: {\"$type\": \"blob\", \"ref\": {\"$link\": \"cid\"}, ...}\nAlso accepts simplified format: {\"did\": \"...\", \"cid\": \"...\"}\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nrefs\nlist[dict]\nList of blob reference dicts.\nrequired\n\n\npds_endpoint\nstr | None\nOptional PDS endpoint to use for all blobs.\nNone\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\n'BlobSource'\nConfigured BlobSource.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nValueError\nIf refs is empty or format is invalid.\n\n\n\n\n\n\n\nBlobSource.list_shards()\nReturn list of AT URI-style shard identifiers.\n\n\n\nBlobSource.open_shard(shard_id)\nOpen a single shard by its AT URI.\n\n\n\n\n\nName\nType\nDescription\nDefault\n\n\n\n\nshard_id\nstr\nAT URI of the shard (at://did/blob/cid).\nrequired\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nIO[bytes]\nStreaming response body for reading the blob.\n\n\n\n\n\n\n\n\n\nName\nType\nDescription\n\n\n\n\n\nKeyError\nIf shard_id is not in list_shards().\n\n\n\nValueError\nIf shard_id format is invalid." 1945 2001 }, 1946 2002 { 1947 2003 "objectID": "api/SchemaLoader.html", ··· 1993 2049 "href": "tutorials/atmosphere.html#setup", 1994 2050 "title": "Atmosphere Publishing", 1995 2051 "section": "Setup", 1996 - "text": "Setup\n\nimport numpy as np\nfrom numpy.typing import NDArray\nimport atdata\nfrom atdata.atmosphere import (\n AtmosphereClient,\n SchemaPublisher,\n SchemaLoader,\n DatasetPublisher,\n DatasetLoader,\n AtUri,\n)\nimport webdataset as wds", 2052 + "text": "Setup\n\nimport numpy as np\nfrom numpy.typing import NDArray\nimport atdata\nfrom atdata.atmosphere import (\n AtmosphereClient,\n AtmosphereIndex,\n PDSBlobStore,\n SchemaPublisher,\n SchemaLoader,\n DatasetPublisher,\n DatasetLoader,\n AtUri,\n)\nfrom atdata import BlobSource\nimport webdataset as wds", 1997 2053 "crumbs": [ 1998 2054 "Guide", 1999 2055 "Getting Started", ··· 2077 2133 "href": "tutorials/atmosphere.html#publish-a-dataset", 2078 2134 "title": "Atmosphere Publishing", 2079 2135 "section": "Publish a Dataset", 2080 - "text": "Publish a Dataset\n\nWith External URLs\n\ndataset_publisher = DatasetPublisher(client)\ndataset_uri = dataset_publisher.publish_with_urls(\n urls=[\"s3://example-bucket/demo-data-{000000..000009}.tar\"],\n schema_uri=str(schema_uri),\n name=\"Demo Image Dataset\",\n description=\"Example dataset demonstrating atmosphere publishing\",\n tags=[\"demo\", \"images\", \"atdata\"],\n license=\"MIT\",\n)\nprint(f\"Dataset URI: {dataset_uri}\")\n\n\n\nWith Blob Storage\nFor smaller datasets, store data directly in ATProto blobs:\n\nimport io\n\n@atdata.packable\nclass DemoSample:\n id: int\n text: str\n\n# Create samples\nsamples = [\n DemoSample(id=0, text=\"Hello from blob storage!\"),\n DemoSample(id=1, text=\"ATProto is decentralized.\"),\n DemoSample(id=2, text=\"atdata makes ML data easy.\"),\n]\n\n# Create tar in memory\ntar_buffer = io.BytesIO()\nwith wds.writer.TarWriter(tar_buffer) as sink:\n for sample in samples:\n sink.write(sample.as_wds)\n\ntar_data = tar_buffer.getvalue()\nprint(f\"Created tar with {len(samples)} samples ({len(tar_data):,} bytes)\")\n\n# Publish schema\nblob_schema_uri = schema_publisher.publish(DemoSample, version=\"1.0.0\")\n\n# Publish with blob storage\nblob_dataset_uri = dataset_publisher.publish_with_blobs(\n blobs=[tar_data],\n schema_uri=str(blob_schema_uri),\n name=\"Blob Storage Demo Dataset\",\n description=\"Small dataset stored directly in ATProto blobs\",\n tags=[\"demo\", \"blob-storage\"],\n)\nprint(f\"Dataset URI: {blob_dataset_uri}\")", 2136 + "text": "Publish a Dataset\n\nWith External URLs\n\ndataset_publisher = DatasetPublisher(client)\ndataset_uri = dataset_publisher.publish_with_urls(\n urls=[\"s3://example-bucket/demo-data-{000000..000009}.tar\"],\n schema_uri=str(schema_uri),\n name=\"Demo Image Dataset\",\n description=\"Example dataset demonstrating atmosphere publishing\",\n tags=[\"demo\", \"images\", \"atdata\"],\n license=\"MIT\",\n)\nprint(f\"Dataset URI: {dataset_uri}\")\n\n\n\nWith PDS Blob Storage (Recommended)\nFor fully decentralized storage, use PDSBlobStore to store dataset shards directly as ATProto blobs in your PDS:\n\n# Create store and index with blob storage\nstore = PDSBlobStore(client)\nindex = AtmosphereIndex(client, data_store=store)\n\n# Define sample type\n@atdata.packable\nclass FeatureSample:\n features: NDArray\n label: int\n\n# Create dataset in memory or from existing tar\nsamples = [FeatureSample(features=np.random.randn(64).astype(np.float32), label=i % 10) for i in range(100)]\n\n# Write to temporary tar\nwith wds.writer.TarWriter(\"temp.tar\") as sink:\n for i, s in enumerate(samples):\n sink.write({**s.as_wds, \"__key__\": f\"{i:06d}\"})\n\ndataset = atdata.Dataset[FeatureSample](\"temp.tar\")\n\n# Publish - shards are uploaded as blobs automatically\nschema_uri = index.publish_schema(FeatureSample, version=\"1.0.0\")\nentry = index.insert_dataset(\n dataset,\n name=\"blob-stored-features\",\n schema_ref=schema_uri,\n description=\"Features stored as PDS blobs\",\n)\n\nprint(f\"Dataset URI: {entry.uri}\")\nprint(f\"Blob URLs: {entry.data_urls}\") # at://did/blob/cid format\n\n\n\n\n\n\n\nReading Blob-Stored Datasets\n\n\n\nUse BlobSource to stream directly from PDS blobs:\n\n# Create source from the blob URLs\nsource = store.create_source(entry.data_urls)\n\n# Or manually from blob references\nsource = BlobSource.from_refs([\n {\"did\": client.did, \"cid\": \"bafyrei...\"},\n])\n\n# Load and iterate\nds = atdata.Dataset[FeatureSample](source)\nfor batch in ds.ordered(batch_size=32):\n print(batch.features.shape)\n\n\n\n\n\nWith External URLs\nFor larger datasets or when using existing object storage:\n\ndataset_publisher = DatasetPublisher(client)\ndataset_uri = dataset_publisher.publish_with_urls(\n urls=[\"s3://example-bucket/demo-data-{000000..000009}.tar\"],\n schema_uri=str(schema_uri),\n name=\"Demo Image Dataset\",\n description=\"Example dataset demonstrating atmosphere publishing\",\n tags=[\"demo\", \"images\", \"atdata\"],\n license=\"MIT\",\n)\nprint(f\"Dataset URI: {dataset_uri}\")", 2081 2137 "crumbs": [ 2082 2138 "Guide", 2083 2139 "Getting Started", ··· 2113 2169 "href": "tutorials/atmosphere.html#complete-publishing-workflow", 2114 2170 "title": "Atmosphere Publishing", 2115 2171 "section": "Complete Publishing Workflow", 2116 - "text": "Complete Publishing Workflow\n\n# 1. Define and create samples\n@atdata.packable\nclass FeatureSample:\n features: NDArray\n label: int\n source: str\n\nsamples = [\n FeatureSample(\n features=np.random.randn(128).astype(np.float32),\n label=i % 10,\n source=\"synthetic\",\n )\n for i in range(1000)\n]\n\n# 2. Write to tar\nwith wds.writer.TarWriter(\"features.tar\") as sink:\n for i, s in enumerate(samples):\n sink.write({**s.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# 3. Authenticate\nfrom atdata.atmosphere import AtmosphereIndex\n\nclient = AtmosphereClient()\nclient.login(\"myhandle.bsky.social\", \"app-password\")\nindex = AtmosphereIndex(client)\n\n# 4. Publish schema\nschema_uri = index.publish_schema(\n FeatureSample,\n version=\"1.0.0\",\n description=\"Feature vectors with labels\",\n)\n\n# 5. Publish dataset\ndataset = atdata.Dataset[FeatureSample](\"features.tar\")\nentry = index.insert_dataset(\n dataset,\n name=\"synthetic-features-v1\",\n schema_ref=schema_uri,\n tags=[\"features\", \"synthetic\"],\n)\n\nprint(f\"Published: {entry.uri}\")", 2172 + "text": "Complete Publishing Workflow\nThis example shows the recommended workflow using PDSBlobStore for fully decentralized storage:\n\n# 1. Define and create samples\n@atdata.packable\nclass FeatureSample:\n features: NDArray\n label: int\n source: str\n\nsamples = [\n FeatureSample(\n features=np.random.randn(128).astype(np.float32),\n label=i % 10,\n source=\"synthetic\",\n )\n for i in range(1000)\n]\n\n# 2. Write to tar\nwith wds.writer.TarWriter(\"features.tar\") as sink:\n for i, s in enumerate(samples):\n sink.write({**s.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# 3. Authenticate and create index with blob storage\nclient = AtmosphereClient()\nclient.login(\"myhandle.bsky.social\", \"app-password\")\n\nstore = PDSBlobStore(client)\nindex = AtmosphereIndex(client, data_store=store)\n\n# 4. Publish schema\nschema_uri = index.publish_schema(\n FeatureSample,\n version=\"1.0.0\",\n description=\"Feature vectors with labels\",\n)\n\n# 5. Publish dataset (shards uploaded as blobs automatically)\ndataset = atdata.Dataset[FeatureSample](\"features.tar\")\nentry = index.insert_dataset(\n dataset,\n name=\"synthetic-features-v1\",\n schema_ref=schema_uri,\n tags=[\"features\", \"synthetic\"],\n)\n\nprint(f\"Published: {entry.uri}\")\nprint(f\"Data stored at: {entry.data_urls}\") # at://did/blob/cid URLs\n\n# 6. Later: load from blobs\nsource = store.create_source(entry.data_urls)\nds = atdata.Dataset[FeatureSample](source)\nfor batch in ds.ordered(batch_size=32):\n print(f\"Loaded batch with {len(batch.label)} samples\")\n break", 2117 2173 "crumbs": [ 2118 2174 "Guide", 2119 2175 "Getting Started", ··· 2469 2525 ] 2470 2526 }, 2471 2527 { 2528 + "objectID": "reference/atmosphere.html#pdsblobstore", 2529 + "href": "reference/atmosphere.html#pdsblobstore", 2530 + "title": "Atmosphere (ATProto Integration)", 2531 + "section": "PDSBlobStore", 2532 + "text": "PDSBlobStore\nStore dataset shards as ATProto blobs for fully decentralized storage:\n\nfrom atdata.atmosphere import AtmosphereClient, PDSBlobStore\n\nclient = AtmosphereClient()\nclient.login(\"handle.bsky.social\", \"app-password\")\n\nstore = PDSBlobStore(client)\n\n# Write shards as blobs\nurls = store.write_shards(dataset, prefix=\"my-data/v1\")\n# Returns: ['at://did:plc:.../blob/bafyrei...', ...]\n\n# Transform AT URIs to HTTP URLs for reading\nhttp_url = store.read_url(urls[0])\n# Returns: 'https://pds.example.com/xrpc/com.atproto.sync.getBlob?...'\n\n# Create a BlobSource for streaming\nsource = store.create_source(urls)\nds = atdata.Dataset[MySample](source)\n\n\nSize Limits\nPDS blobs typically have size limits (often 50MB-5GB depending on the PDS). Use maxcount and maxsize parameters to control shard sizes:\n\nurls = store.write_shards(\n dataset,\n prefix=\"large-data/v1\",\n maxcount=5000, # Max 5000 samples per shard\n maxsize=50e6, # Max 50MB per shard\n)", 2533 + "crumbs": [ 2534 + "Guide", 2535 + "Reference", 2536 + "Atmosphere (ATProto Integration)" 2537 + ] 2538 + }, 2539 + { 2540 + "objectID": "reference/atmosphere.html#blobsource", 2541 + "href": "reference/atmosphere.html#blobsource", 2542 + "title": "Atmosphere (ATProto Integration)", 2543 + "section": "BlobSource", 2544 + "text": "BlobSource\nRead datasets stored as PDS blobs:\n\nfrom atdata import BlobSource\n\n# From blob references\nsource = BlobSource.from_refs([\n {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei111\"},\n {\"did\": \"did:plc:abc123\", \"cid\": \"bafyrei222\"},\n])\n\n# Or from PDSBlobStore\nsource = store.create_source(urls)\n\n# Use with Dataset\nds = atdata.Dataset[MySample](source)\nfor batch in ds.ordered(batch_size=32):\n process(batch)", 2545 + "crumbs": [ 2546 + "Guide", 2547 + "Reference", 2548 + "Atmosphere (ATProto Integration)" 2549 + ] 2550 + }, 2551 + { 2472 2552 "objectID": "reference/atmosphere.html#atmosphereindex", 2473 2553 "href": "reference/atmosphere.html#atmosphereindex", 2474 2554 "title": "Atmosphere (ATProto Integration)", 2475 2555 "section": "AtmosphereIndex", 2476 - "text": "AtmosphereIndex\nThe unified interface for ATProto operations, implementing the AbstractIndex protocol:\n\nfrom atdata.atmosphere import AtmosphereClient, AtmosphereIndex\n\nclient = AtmosphereClient()\nclient.login(\"handle.bsky.social\", \"app-password\")\n\nindex = AtmosphereIndex(client)\n\n\nPublishing Schemas\n\nimport atdata\nfrom numpy.typing import NDArray\n\n@atdata.packable\nclass ImageSample:\n image: NDArray\n label: str\n confidence: float\n\n# Publish schema\nschema_uri = index.publish_schema(\n ImageSample,\n version=\"1.0.0\",\n description=\"Image classification sample\",\n)\n# Returns: \"at://did:plc:.../ac.foundation.dataset.sampleSchema/...\"\n\n\n\nPublishing Datasets\n\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\n\nentry = index.insert_dataset(\n dataset,\n name=\"imagenet-subset\",\n schema_ref=schema_uri, # Optional - auto-publishes if omitted\n description=\"ImageNet subset\",\n tags=[\"images\", \"classification\"],\n license=\"MIT\",\n)\n\nprint(entry.uri) # AT URI of the record\nprint(entry.data_urls) # WebDataset URLs\n\n\n\nListing and Retrieving\n\n# List your datasets\nfor entry in index.list_datasets():\n print(f\"{entry.name}: {entry.schema_ref}\")\n\n# List from another user\nfor entry in index.list_datasets(repo=\"did:plc:other-user\"):\n print(entry.name)\n\n# Get specific dataset\nentry = index.get_dataset(\"at://did:plc:.../ac.foundation.dataset.record/...\")\n\n# List schemas\nfor schema in index.list_schemas():\n print(f\"{schema['name']} v{schema['version']}\")\n\n# Decode schema to Python type\nSampleType = index.decode_schema(schema_uri)", 2556 + "text": "AtmosphereIndex\nThe unified interface for ATProto operations, implementing the AbstractIndex protocol:\n\nfrom atdata.atmosphere import AtmosphereClient, AtmosphereIndex, PDSBlobStore\n\nclient = AtmosphereClient()\nclient.login(\"handle.bsky.social\", \"app-password\")\n\n# Without blob storage (use external URLs)\nindex = AtmosphereIndex(client)\n\n# With PDS blob storage (recommended for full decentralization)\nstore = PDSBlobStore(client)\nindex = AtmosphereIndex(client, data_store=store)\n\n\nPublishing Schemas\n\nimport atdata\nfrom numpy.typing import NDArray\n\n@atdata.packable\nclass ImageSample:\n image: NDArray\n label: str\n confidence: float\n\n# Publish schema\nschema_uri = index.publish_schema(\n ImageSample,\n version=\"1.0.0\",\n description=\"Image classification sample\",\n)\n# Returns: \"at://did:plc:.../ac.foundation.dataset.sampleSchema/...\"\n\n\n\nPublishing Datasets\n\ndataset = atdata.Dataset[ImageSample](\"data-{000000..000009}.tar\")\n\nentry = index.insert_dataset(\n dataset,\n name=\"imagenet-subset\",\n schema_ref=schema_uri, # Optional - auto-publishes if omitted\n description=\"ImageNet subset\",\n tags=[\"images\", \"classification\"],\n license=\"MIT\",\n)\n\nprint(entry.uri) # AT URI of the record\nprint(entry.data_urls) # WebDataset URLs\n\n\n\nListing and Retrieving\n\n# List your datasets\nfor entry in index.list_datasets():\n print(f\"{entry.name}: {entry.schema_ref}\")\n\n# List from another user\nfor entry in index.list_datasets(repo=\"did:plc:other-user\"):\n print(entry.name)\n\n# Get specific dataset\nentry = index.get_dataset(\"at://did:plc:.../ac.foundation.dataset.record/...\")\n\n# List schemas\nfor schema in index.list_schemas():\n print(f\"{schema['name']} v{schema['version']}\")\n\n# Decode schema to Python type\nSampleType = index.decode_schema(schema_uri)", 2477 2557 "crumbs": [ 2478 2558 "Guide", 2479 2559 "Reference", ··· 2485 2565 "href": "reference/atmosphere.html#lower-level-publishers", 2486 2566 "title": "Atmosphere (ATProto Integration)", 2487 2567 "section": "Lower-Level Publishers", 2488 - "text": "Lower-Level Publishers\nFor more control, use the individual publisher classes:\n\nSchemaPublisher\n\nfrom atdata.atmosphere import SchemaPublisher\n\npublisher = SchemaPublisher(client)\n\nuri = publisher.publish(\n ImageSample,\n name=\"ImageSample\",\n version=\"1.0.0\",\n description=\"Image with label\",\n metadata={\"source\": \"training\"},\n)\n\n\n\nDatasetPublisher\n\nfrom atdata.atmosphere import DatasetPublisher\n\npublisher = DatasetPublisher(client)\n\nuri = publisher.publish(\n dataset,\n name=\"training-images\",\n schema_uri=schema_uri, # Required if auto_publish_schema=False\n auto_publish_schema=True, # Publish schema automatically\n description=\"Training images\",\n tags=[\"training\", \"images\"],\n license=\"MIT\",\n)\n\n\nBlob Storage\nFor smaller datasets (up to ~50MB per shard), you can store data directly in ATProto blobs instead of external URLs:\n\nimport io\nimport webdataset as wds\n\n# Create tar data in memory\ntar_buffer = io.BytesIO()\nwith wds.writer.TarWriter(tar_buffer) as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# Publish with blob storage\nuri = publisher.publish_with_blobs(\n blobs=[tar_buffer.getvalue()],\n schema_uri=schema_uri,\n name=\"small-dataset\",\n description=\"Dataset stored in ATProto blobs\",\n tags=[\"small\", \"demo\"],\n)\n\nTo load datasets with blob storage:\n\nfrom atdata.atmosphere import DatasetLoader\n\nloader = DatasetLoader(client)\n\n# Check storage type\nstorage_type = loader.get_storage_type(uri) # \"external\" or \"blobs\"\n\nif storage_type == \"blobs\":\n # Get blob URLs for direct access\n blob_urls = loader.get_blob_urls(uri)\n\n# to_dataset() handles both storage types automatically\ndataset = loader.to_dataset(uri, MySample)\nfor batch in dataset.ordered(batch_size=32):\n process(batch)\n\n\n\n\nLensPublisher\n\nfrom atdata.atmosphere import LensPublisher\n\npublisher = LensPublisher(client)\n\n# With code references\nuri = publisher.publish(\n name=\"simplify\",\n source_schema=full_schema_uri,\n target_schema=simple_schema_uri,\n description=\"Extract label only\",\n getter_code={\n \"repository\": \"https://github.com/org/repo\",\n \"commit\": \"abc123def...\",\n \"path\": \"transforms/simplify.py:simplify_getter\",\n },\n putter_code={\n \"repository\": \"https://github.com/org/repo\",\n \"commit\": \"abc123def...\",\n \"path\": \"transforms/simplify.py:simplify_putter\",\n },\n)\n\n# Or publish from a Lens object\nfrom atdata.lens import lens\n\n@lens\ndef simplify(src: FullSample) -&gt; SimpleSample:\n return SimpleSample(label=src.label)\n\nuri = publisher.publish_from_lens(\n simplify,\n source_schema=full_schema_uri,\n target_schema=simple_schema_uri,\n)", 2568 + "text": "Lower-Level Publishers\nFor more control, use the individual publisher classes:\n\nSchemaPublisher\n\nfrom atdata.atmosphere import SchemaPublisher\n\npublisher = SchemaPublisher(client)\n\nuri = publisher.publish(\n ImageSample,\n name=\"ImageSample\",\n version=\"1.0.0\",\n description=\"Image with label\",\n metadata={\"source\": \"training\"},\n)\n\n\n\nDatasetPublisher\n\nfrom atdata.atmosphere import DatasetPublisher\n\npublisher = DatasetPublisher(client)\n\nuri = publisher.publish(\n dataset,\n name=\"training-images\",\n schema_uri=schema_uri, # Required if auto_publish_schema=False\n auto_publish_schema=True, # Publish schema automatically\n description=\"Training images\",\n tags=[\"training\", \"images\"],\n license=\"MIT\",\n)\n\n\nBlob Storage\nThere are two approaches to storing data as ATProto blobs:\nApproach 1: PDSBlobStore (Recommended)\nUse PDSBlobStore with AtmosphereIndex for automatic shard management:\n\nfrom atdata.atmosphere import PDSBlobStore, AtmosphereIndex\n\nstore = PDSBlobStore(client)\nindex = AtmosphereIndex(client, data_store=store)\n\n# Dataset shards are automatically uploaded as blobs\nentry = index.insert_dataset(\n dataset,\n name=\"my-dataset\",\n schema_ref=schema_uri,\n)\n\n# Later: load using BlobSource\nsource = store.create_source(entry.data_urls)\nds = atdata.Dataset[MySample](source)\n\nApproach 2: Manual Blob Publishing\nFor more control, use DatasetPublisher.publish_with_blobs() directly:\n\nimport io\nimport webdataset as wds\n\n# Create tar data in memory\ntar_buffer = io.BytesIO()\nwith wds.writer.TarWriter(tar_buffer) as sink:\n for i, sample in enumerate(samples):\n sink.write({**sample.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# Publish with blob storage\nuri = publisher.publish_with_blobs(\n blobs=[tar_buffer.getvalue()],\n schema_uri=schema_uri,\n name=\"small-dataset\",\n description=\"Dataset stored in ATProto blobs\",\n tags=[\"small\", \"demo\"],\n)\n\nLoading Blob-Stored Datasets\n\nfrom atdata.atmosphere import DatasetLoader\nfrom atdata import BlobSource\n\nloader = DatasetLoader(client)\n\n# Check storage type\nstorage_type = loader.get_storage_type(uri) # \"external\" or \"blobs\"\n\nif storage_type == \"blobs\":\n # Get blob URLs and create BlobSource\n blob_urls = loader.get_blob_urls(uri)\n # Parse to blob refs for BlobSource\n # Or use loader.to_dataset() which handles this automatically\n\n# to_dataset() handles both storage types automatically\ndataset = loader.to_dataset(uri, MySample)\nfor batch in dataset.ordered(batch_size=32):\n process(batch)\n\n\n\n\nLensPublisher\n\nfrom atdata.atmosphere import LensPublisher\n\npublisher = LensPublisher(client)\n\n# With code references\nuri = publisher.publish(\n name=\"simplify\",\n source_schema=full_schema_uri,\n target_schema=simple_schema_uri,\n description=\"Extract label only\",\n getter_code={\n \"repository\": \"https://github.com/org/repo\",\n \"commit\": \"abc123def...\",\n \"path\": \"transforms/simplify.py:simplify_getter\",\n },\n putter_code={\n \"repository\": \"https://github.com/org/repo\",\n \"commit\": \"abc123def...\",\n \"path\": \"transforms/simplify.py:simplify_putter\",\n },\n)\n\n# Or publish from a Lens object\nfrom atdata.lens import lens\n\n@lens\ndef simplify(src: FullSample) -&gt; SimpleSample:\n return SimpleSample(label=src.label)\n\nuri = publisher.publish_from_lens(\n simplify,\n source_schema=full_schema_uri,\n target_schema=simple_schema_uri,\n)", 2489 2569 "crumbs": [ 2490 2570 "Guide", 2491 2571 "Reference", ··· 2533 2613 "href": "reference/atmosphere.html#complete-example", 2534 2614 "title": "Atmosphere (ATProto Integration)", 2535 2615 "section": "Complete Example", 2536 - "text": "Complete Example\n\nimport numpy as np\nfrom numpy.typing import NDArray\nimport atdata\nfrom atdata.atmosphere import AtmosphereClient, AtmosphereIndex\nimport webdataset as wds\n\n# 1. Define and create samples\n@atdata.packable\nclass FeatureSample:\n features: NDArray\n label: int\n source: str\n\nsamples = [\n FeatureSample(\n features=np.random.randn(128).astype(np.float32),\n label=i % 10,\n source=\"synthetic\",\n )\n for i in range(1000)\n]\n\n# 2. Write to tar\nwith wds.writer.TarWriter(\"features.tar\") as sink:\n for i, s in enumerate(samples):\n sink.write({**s.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# 3. Authenticate\nclient = AtmosphereClient()\nclient.login(\"myhandle.bsky.social\", \"app-password\")\n\nindex = AtmosphereIndex(client)\n\n# 4. Publish schema\nschema_uri = index.publish_schema(\n FeatureSample,\n version=\"1.0.0\",\n description=\"Feature vectors with labels\",\n)\n\n# 5. Publish dataset\ndataset = atdata.Dataset[FeatureSample](\"features.tar\")\nentry = index.insert_dataset(\n dataset,\n name=\"synthetic-features-v1\",\n schema_ref=schema_uri,\n tags=[\"features\", \"synthetic\"],\n)\n\nprint(f\"Published: {entry.uri}\")\n\n# 6. Later: discover and load\nfor dataset_entry in index.list_datasets():\n print(f\"Found: {dataset_entry.name}\")\n\n # Reconstruct type from schema\n SampleType = index.decode_schema(dataset_entry.schema_ref)\n\n # Load dataset\n ds = atdata.Dataset[SampleType](dataset_entry.data_urls[0])\n for batch in ds.ordered(batch_size=32):\n print(batch.features.shape)\n break", 2616 + "text": "Complete Example\nThis example shows the full workflow using PDSBlobStore for decentralized storage:\n\nimport numpy as np\nfrom numpy.typing import NDArray\nimport atdata\nfrom atdata.atmosphere import AtmosphereClient, AtmosphereIndex, PDSBlobStore\nimport webdataset as wds\n\n# 1. Define and create samples\n@atdata.packable\nclass FeatureSample:\n features: NDArray\n label: int\n source: str\n\nsamples = [\n FeatureSample(\n features=np.random.randn(128).astype(np.float32),\n label=i % 10,\n source=\"synthetic\",\n )\n for i in range(1000)\n]\n\n# 2. Write to tar\nwith wds.writer.TarWriter(\"features.tar\") as sink:\n for i, s in enumerate(samples):\n sink.write({**s.as_wds, \"__key__\": f\"{i:06d}\"})\n\n# 3. Authenticate and set up blob storage\nclient = AtmosphereClient()\nclient.login(\"myhandle.bsky.social\", \"app-password\")\n\nstore = PDSBlobStore(client)\nindex = AtmosphereIndex(client, data_store=store)\n\n# 4. Publish schema\nschema_uri = index.publish_schema(\n FeatureSample,\n version=\"1.0.0\",\n description=\"Feature vectors with labels\",\n)\n\n# 5. Publish dataset (shards uploaded as blobs)\ndataset = atdata.Dataset[FeatureSample](\"features.tar\")\nentry = index.insert_dataset(\n dataset,\n name=\"synthetic-features-v1\",\n schema_ref=schema_uri,\n tags=[\"features\", \"synthetic\"],\n)\n\nprint(f\"Published: {entry.uri}\")\nprint(f\"Blob URLs: {entry.data_urls}\")\n\n# 6. Later: discover and load from blobs\nfor dataset_entry in index.list_datasets():\n print(f\"Found: {dataset_entry.name}\")\n\n # Reconstruct type from schema\n SampleType = index.decode_schema(dataset_entry.schema_ref)\n\n # Create source from blob URLs\n source = store.create_source(dataset_entry.data_urls)\n\n # Load dataset from blobs\n ds = atdata.Dataset[SampleType](source)\n for batch in ds.ordered(batch_size=32):\n print(batch.features.shape)\n break\n\nFor external URL storage (without PDSBlobStore):\n\n# Use AtmosphereIndex without data_store\nindex = AtmosphereIndex(client)\n\n# Dataset URLs will be stored as-is (external references)\nentry = index.insert_dataset(\n dataset,\n name=\"external-features\",\n schema_ref=schema_uri,\n)\n\n# Load using standard URL source\nds = atdata.Dataset[FeatureSample](entry.data_urls[0])", 2537 2617 "crumbs": [ 2538 2618 "Guide", 2539 2619 "Reference",
+19 -11
docs/sitemap.xml
··· 37 37 <lastmod>2026-01-24T19:19:45.336Z</lastmod> 38 38 </url> 39 39 <url> 40 - <loc>https://github.com/your-org/atdata/api/AtmosphereClient.html</loc> 41 - <lastmod>2026-01-23T23:20:15.723Z</lastmod> 40 + <loc>https://github.com/your-org/atdata/api/PackableSample.html</loc> 41 + <lastmod>2026-01-23T23:20:15.564Z</lastmod> 42 + </url> 43 + <url> 44 + <loc>https://github.com/your-org/atdata/api/PDSBlobStore.html</loc> 45 + <lastmod>2026-01-27T05:36:00.303Z</lastmod> 42 46 </url> 43 47 <url> 44 48 <loc>https://github.com/your-org/atdata/api/DictSample.html</loc> ··· 50 54 </url> 51 55 <url> 52 56 <loc>https://github.com/your-org/atdata/api/AtmosphereIndex.html</loc> 53 - <lastmod>2026-01-23T23:20:15.736Z</lastmod> 57 + <lastmod>2026-01-27T05:36:00.293Z</lastmod> 54 58 </url> 55 59 <url> 56 60 <loc>https://github.com/your-org/atdata/api/DataSource.html</loc> ··· 62 66 </url> 63 67 <url> 64 68 <loc>https://github.com/your-org/atdata/api/Lens.html</loc> 65 - <lastmod>2026-01-24T19:29:16.065Z</lastmod> 69 + <lastmod>2026-01-27T05:36:00.154Z</lastmod> 66 70 </url> 67 71 <url> 68 72 <loc>https://github.com/your-org/atdata/api/local.Index.html</loc> 69 - <lastmod>2026-01-23T23:20:15.683Z</lastmod> 73 + <lastmod>2026-01-27T05:36:00.238Z</lastmod> 70 74 </url> 71 75 <url> 72 76 <loc>https://github.com/your-org/atdata/api/Dataset.html</loc> ··· 110 114 </url> 111 115 <url> 112 116 <loc>https://github.com/your-org/atdata/api/AbstractIndex.html</loc> 113 - <lastmod>2026-01-23T23:20:15.632Z</lastmod> 117 + <lastmod>2026-01-27T05:36:00.180Z</lastmod> 114 118 </url> 115 119 <url> 116 120 <loc>https://github.com/your-org/atdata/api/local.LocalDatasetEntry.html</loc> ··· 126 130 </url> 127 131 <url> 128 132 <loc>https://github.com/your-org/atdata/api/index.html</loc> 129 - <lastmod>2026-01-24T19:29:16.007Z</lastmod> 133 + <lastmod>2026-01-27T05:36:00.093Z</lastmod> 130 134 </url> 131 135 <url> 132 136 <loc>https://github.com/your-org/atdata/api/URLSource.html</loc> ··· 149 153 <lastmod>2026-01-24T19:19:45.334Z</lastmod> 150 154 </url> 151 155 <url> 152 - <loc>https://github.com/your-org/atdata/api/PackableSample.html</loc> 153 - <lastmod>2026-01-23T23:20:15.564Z</lastmod> 156 + <loc>https://github.com/your-org/atdata/api/AtmosphereClient.html</loc> 157 + <lastmod>2026-01-23T23:20:15.723Z</lastmod> 158 + </url> 159 + <url> 160 + <loc>https://github.com/your-org/atdata/api/BlobSource.html</loc> 161 + <lastmod>2026-01-27T05:36:00.209Z</lastmod> 154 162 </url> 155 163 <url> 156 164 <loc>https://github.com/your-org/atdata/api/SchemaLoader.html</loc> ··· 158 166 </url> 159 167 <url> 160 168 <loc>https://github.com/your-org/atdata/tutorials/atmosphere.html</loc> 161 - <lastmod>2026-01-18T03:31:39.824Z</lastmod> 169 + <lastmod>2026-01-27T05:31:23.765Z</lastmod> 162 170 </url> 163 171 <url> 164 172 <loc>https://github.com/your-org/atdata/tutorials/quickstart.html</loc> ··· 174 182 </url> 175 183 <url> 176 184 <loc>https://github.com/your-org/atdata/reference/atmosphere.html</loc> 177 - <lastmod>2026-01-22T20:06:07.401Z</lastmod> 185 + <lastmod>2026-01-27T05:32:25.227Z</lastmod> 178 186 </url> 179 187 <url> 180 188 <loc>https://github.com/your-org/atdata/reference/deployment.html</loc>
+176 -124
docs/tutorials/atmosphere.html
··· 543 543 <li><a href="#publish-a-dataset" id="toc-publish-a-dataset" class="nav-link" data-scroll-target="#publish-a-dataset">Publish a Dataset</a> 544 544 <ul class="collapse"> 545 545 <li><a href="#with-external-urls" id="toc-with-external-urls" class="nav-link" data-scroll-target="#with-external-urls">With External URLs</a></li> 546 - <li><a href="#with-blob-storage" id="toc-with-blob-storage" class="nav-link" data-scroll-target="#with-blob-storage">With Blob Storage</a></li> 546 + <li><a href="#with-pds-blob-storage-recommended" id="toc-with-pds-blob-storage-recommended" class="nav-link" data-scroll-target="#with-pds-blob-storage-recommended">With PDS Blob Storage (Recommended)</a></li> 547 + <li><a href="#with-external-urls-1" id="toc-with-external-urls-1" class="nav-link" data-scroll-target="#with-external-urls-1">With External URLs</a></li> 547 548 </ul></li> 548 549 <li><a href="#list-and-load-datasets" id="toc-list-and-load-datasets" class="nav-link" data-scroll-target="#list-and-load-datasets">List and Load Datasets</a></li> 549 550 <li><a href="#load-a-dataset" id="toc-load-a-dataset" class="nav-link" data-scroll-target="#load-a-dataset">Load a Dataset</a></li> ··· 603 604 </section> 604 605 <section id="setup" class="level2"> 605 606 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 606 - <div id="38b7697f" class="cell"> 607 + <div id="1598fc78" class="cell"> 607 608 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 608 609 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 609 610 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> 610 611 <span id="cb1-4"><a href="#cb1-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> (</span> 611 612 <span id="cb1-5"><a href="#cb1-5" aria-hidden="true" tabindex="-1"></a> AtmosphereClient,</span> 612 - <span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> SchemaPublisher,</span> 613 - <span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> SchemaLoader,</span> 614 - <span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> DatasetPublisher,</span> 615 - <span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> DatasetLoader,</span> 616 - <span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> AtUri,</span> 617 - <span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a>)</span> 618 - <span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 613 + <span id="cb1-6"><a href="#cb1-6" aria-hidden="true" tabindex="-1"></a> AtmosphereIndex,</span> 614 + <span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> PDSBlobStore,</span> 615 + <span id="cb1-8"><a href="#cb1-8" aria-hidden="true" tabindex="-1"></a> SchemaPublisher,</span> 616 + <span id="cb1-9"><a href="#cb1-9" aria-hidden="true" tabindex="-1"></a> SchemaLoader,</span> 617 + <span id="cb1-10"><a href="#cb1-10" aria-hidden="true" tabindex="-1"></a> DatasetPublisher,</span> 618 + <span id="cb1-11"><a href="#cb1-11" aria-hidden="true" tabindex="-1"></a> DatasetLoader,</span> 619 + <span id="cb1-12"><a href="#cb1-12" aria-hidden="true" tabindex="-1"></a> AtUri,</span> 620 + <span id="cb1-13"><a href="#cb1-13" aria-hidden="true" tabindex="-1"></a>)</span> 621 + <span id="cb1-14"><a href="#cb1-14" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> BlobSource</span> 622 + <span id="cb1-15"><a href="#cb1-15" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 619 623 </div> 620 624 </section> 621 625 <section id="define-sample-types" class="level2"> 622 626 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 623 - <div id="96ff9fa1" class="cell"> 627 + <div id="284570d6" class="cell"> 624 628 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 625 629 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ImageSample:</span> 626 630 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing image data with metadata."""</span></span> ··· 639 643 <section id="type-introspection" class="level2"> 640 644 <h2 class="anchored" data-anchor-id="type-introspection">Type Introspection</h2> 641 645 <p>See what information is available from a PackableSample type:</p> 642 - <div id="bcc4ef1b" class="cell"> 646 + <div id="daae7f6e" class="cell"> 643 647 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> dataclasses <span class="im">import</span> fields, is_dataclass</span> 644 648 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a></span> 645 649 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Sample type: </span><span class="sc">{</span>ImageSample<span class="sc">.</span><span class="va">__name__</span><span class="sc">}</span><span class="ss">"</span>)</span> ··· 667 671 <section id="at-uri-parsing" class="level2"> 668 672 <h2 class="anchored" data-anchor-id="at-uri-parsing">AT URI Parsing</h2> 669 673 <p>ATProto records are identified by AT URIs:</p> 670 - <div id="0586bdec" class="cell"> 674 + <div id="ad781e91" class="cell"> 671 675 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>uris <span class="op">=</span> [</span> 672 676 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://did:plc:abc123/ac.foundation.dataset.sampleSchema/xyz789"</span>,</span> 673 677 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"at://alice.bsky.social/ac.foundation.dataset.record/my-dataset"</span>,</span> ··· 684 688 <section id="authentication" class="level2"> 685 689 <h2 class="anchored" data-anchor-id="authentication">Authentication</h2> 686 690 <p>Connect to ATProto:</p> 687 - <div id="e5f8657b" class="cell"> 691 + <div id="95a2266a" class="cell"> 688 692 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 689 693 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"your.handle.social"</span>, <span class="st">"your-app-password"</span>)</span> 690 694 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 694 698 </section> 695 699 <section id="publish-a-schema" class="level2"> 696 700 <h2 class="anchored" data-anchor-id="publish-a-schema">Publish a Schema</h2> 697 - <div id="e4252ae0" class="cell"> 701 + <div id="fd62bf26" class="cell"> 698 702 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>schema_publisher <span class="op">=</span> SchemaPublisher(client)</span> 699 703 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> schema_publisher.publish(</span> 700 704 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> ImageSample,</span> ··· 707 711 </section> 708 712 <section id="list-your-schemas" class="level2"> 709 713 <h2 class="anchored" data-anchor-id="list-your-schemas">List Your Schemas</h2> 710 - <div id="d67ee978" class="cell"> 714 + <div id="f9bd719b" class="cell"> 711 715 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>schema_loader <span class="op">=</span> SchemaLoader(client)</span> 712 716 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>schemas <span class="op">=</span> schema_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 713 717 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(schemas)<span class="sc">}</span><span class="ss"> schema(s)"</span>)</span> ··· 720 724 <h2 class="anchored" data-anchor-id="publish-a-dataset">Publish a Dataset</h2> 721 725 <section id="with-external-urls" class="level3"> 722 726 <h3 class="anchored" data-anchor-id="with-external-urls">With External URLs</h3> 723 - <div id="10827a3b" class="cell"> 727 + <div id="25d3a845" class="cell"> 724 728 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>dataset_publisher <span class="op">=</span> DatasetPublisher(client)</span> 725 729 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>dataset_uri <span class="op">=</span> dataset_publisher.publish_with_urls(</span> 726 730 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> urls<span class="op">=</span>[<span class="st">"s3://example-bucket/demo-data-{000000..000009}.tar"</span>],</span> ··· 733 737 <span id="cb8-10"><a href="#cb8-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Dataset URI: </span><span class="sc">{</span>dataset_uri<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 734 738 </div> 735 739 </section> 736 - <section id="with-blob-storage" class="level3"> 737 - <h3 class="anchored" data-anchor-id="with-blob-storage">With Blob Storage</h3> 738 - <p>For smaller datasets, store data directly in ATProto blobs:</p> 739 - <div id="02cdc5b0" class="cell"> 740 - <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> io</span> 741 - <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 742 - <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 743 - <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> DemoSample:</span> 744 - <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a> <span class="bu">id</span>: <span class="bu">int</span></span> 745 - <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a> text: <span class="bu">str</span></span> 746 - <span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a></span> 747 - <span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Create samples</span></span> 748 - <span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 749 - <span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a> DemoSample(<span class="bu">id</span><span class="op">=</span><span class="dv">0</span>, text<span class="op">=</span><span class="st">"Hello from blob storage!"</span>),</span> 750 - <span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a> DemoSample(<span class="bu">id</span><span class="op">=</span><span class="dv">1</span>, text<span class="op">=</span><span class="st">"ATProto is decentralized."</span>),</span> 751 - <span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a> DemoSample(<span class="bu">id</span><span class="op">=</span><span class="dv">2</span>, text<span class="op">=</span><span class="st">"atdata makes ML data easy."</span>),</span> 752 - <span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a>]</span> 753 - <span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a></span> 754 - <span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Create tar in memory</span></span> 755 - <span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a>tar_buffer <span class="op">=</span> io.BytesIO()</span> 756 - <span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(tar_buffer) <span class="im">as</span> sink:</span> 757 - <span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> sample <span class="kw">in</span> samples:</span> 758 - <span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a> sink.write(sample.as_wds)</span> 740 + <section id="with-pds-blob-storage-recommended" class="level3"> 741 + <h3 class="anchored" data-anchor-id="with-pds-blob-storage-recommended">With PDS Blob Storage (Recommended)</h3> 742 + <p>For fully decentralized storage, use <code>PDSBlobStore</code> to store dataset shards directly as ATProto blobs in your PDS:</p> 743 + <div id="e4cf8aef" class="cell"> 744 + <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create store and index with blob storage</span></span> 745 + <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 746 + <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client, data_store<span class="op">=</span>store)</span> 747 + <span id="cb9-4"><a href="#cb9-4" aria-hidden="true" tabindex="-1"></a></span> 748 + <span id="cb9-5"><a href="#cb9-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Define sample type</span></span> 749 + <span id="cb9-6"><a href="#cb9-6" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 750 + <span id="cb9-7"><a href="#cb9-7" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 751 + <span id="cb9-8"><a href="#cb9-8" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 752 + <span id="cb9-9"><a href="#cb9-9" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 753 + <span id="cb9-10"><a href="#cb9-10" aria-hidden="true" tabindex="-1"></a></span> 754 + <span id="cb9-11"><a href="#cb9-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Create dataset in memory or from existing tar</span></span> 755 + <span id="cb9-12"><a href="#cb9-12" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [FeatureSample(features<span class="op">=</span>np.random.randn(<span class="dv">64</span>).astype(np.float32), label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>) <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">100</span>)]</span> 756 + <span id="cb9-13"><a href="#cb9-13" aria-hidden="true" tabindex="-1"></a></span> 757 + <span id="cb9-14"><a href="#cb9-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Write to temporary tar</span></span> 758 + <span id="cb9-15"><a href="#cb9-15" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"temp.tar"</span>) <span class="im">as</span> sink:</span> 759 + <span id="cb9-16"><a href="#cb9-16" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 760 + <span id="cb9-17"><a href="#cb9-17" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 761 + <span id="cb9-18"><a href="#cb9-18" aria-hidden="true" tabindex="-1"></a></span> 762 + <span id="cb9-19"><a href="#cb9-19" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"temp.tar"</span>)</span> 759 763 <span id="cb9-20"><a href="#cb9-20" aria-hidden="true" tabindex="-1"></a></span> 760 - <span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a>tar_data <span class="op">=</span> tar_buffer.getvalue()</span> 761 - <span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Created tar with </span><span class="sc">{</span><span class="bu">len</span>(samples)<span class="sc">}</span><span class="ss"> samples (</span><span class="sc">{</span><span class="bu">len</span>(tar_data)<span class="sc">:,}</span><span class="ss"> bytes)"</span>)</span> 762 - <span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a></span> 763 - <span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish schema</span></span> 764 - <span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a>blob_schema_uri <span class="op">=</span> schema_publisher.publish(DemoSample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 765 - <span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a></span> 766 - <span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish with blob storage</span></span> 767 - <span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a>blob_dataset_uri <span class="op">=</span> dataset_publisher.publish_with_blobs(</span> 768 - <span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a> blobs<span class="op">=</span>[tar_data],</span> 769 - <span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span><span class="bu">str</span>(blob_schema_uri),</span> 770 - <span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"Blob Storage Demo Dataset"</span>,</span> 771 - <span id="cb9-32"><a href="#cb9-32" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Small dataset stored directly in ATProto blobs"</span>,</span> 772 - <span id="cb9-33"><a href="#cb9-33" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"demo"</span>, <span class="st">"blob-storage"</span>],</span> 773 - <span id="cb9-34"><a href="#cb9-34" aria-hidden="true" tabindex="-1"></a>)</span> 774 - <span id="cb9-35"><a href="#cb9-35" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Dataset URI: </span><span class="sc">{</span>blob_dataset_uri<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 764 + <span id="cb9-21"><a href="#cb9-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish - shards are uploaded as blobs automatically</span></span> 765 + <span id="cb9-22"><a href="#cb9-22" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(FeatureSample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 766 + <span id="cb9-23"><a href="#cb9-23" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 767 + <span id="cb9-24"><a href="#cb9-24" aria-hidden="true" tabindex="-1"></a> dataset,</span> 768 + <span id="cb9-25"><a href="#cb9-25" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"blob-stored-features"</span>,</span> 769 + <span id="cb9-26"><a href="#cb9-26" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 770 + <span id="cb9-27"><a href="#cb9-27" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Features stored as PDS blobs"</span>,</span> 771 + <span id="cb9-28"><a href="#cb9-28" aria-hidden="true" tabindex="-1"></a>)</span> 772 + <span id="cb9-29"><a href="#cb9-29" aria-hidden="true" tabindex="-1"></a></span> 773 + <span id="cb9-30"><a href="#cb9-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Dataset URI: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 774 + <span id="cb9-31"><a href="#cb9-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Blob URLs: </span><span class="sc">{</span>entry<span class="sc">.</span>data_urls<span class="sc">}</span><span class="ss">"</span>) <span class="co"># at://did/blob/cid format</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 775 + </div> 776 + <div class="callout callout-style-default callout-tip callout-titled"> 777 + <div class="callout-header d-flex align-content-center"> 778 + <div class="callout-icon-container"> 779 + <i class="callout-icon"></i> 780 + </div> 781 + <div class="callout-title-container flex-fill"> 782 + Reading Blob-Stored Datasets 783 + </div> 784 + </div> 785 + <div class="callout-body-container callout-body"> 786 + <p>Use <code>BlobSource</code> to stream directly from PDS blobs:</p> 787 + <div id="13f07c85" class="cell"> 788 + <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create source from the blob URLs</span></span> 789 + <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> store.create_source(entry.data_urls)</span> 790 + <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a></span> 791 + <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Or manually from blob references</span></span> 792 + <span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> BlobSource.from_refs([</span> 793 + <span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> {<span class="st">"did"</span>: client.did, <span class="st">"cid"</span>: <span class="st">"bafyrei..."</span>},</span> 794 + <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a>])</span> 795 + <span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a></span> 796 + <span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Load and iterate</span></span> 797 + <span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[FeatureSample](source)</span> 798 + <span id="cb10-11"><a href="#cb10-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 799 + <span id="cb10-12"><a href="#cb10-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(batch.features.shape)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 800 + </div> 801 + </div> 802 + </div> 803 + </section> 804 + <section id="with-external-urls-1" class="level3"> 805 + <h3 class="anchored" data-anchor-id="with-external-urls-1">With External URLs</h3> 806 + <p>For larger datasets or when using existing object storage:</p> 807 + <div id="e5dc0c7d" class="cell"> 808 + <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>dataset_publisher <span class="op">=</span> DatasetPublisher(client)</span> 809 + <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>dataset_uri <span class="op">=</span> dataset_publisher.publish_with_urls(</span> 810 + <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a> urls<span class="op">=</span>[<span class="st">"s3://example-bucket/demo-data-{000000..000009}.tar"</span>],</span> 811 + <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a> schema_uri<span class="op">=</span><span class="bu">str</span>(schema_uri),</span> 812 + <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"Demo Image Dataset"</span>,</span> 813 + <span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Example dataset demonstrating atmosphere publishing"</span>,</span> 814 + <span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"demo"</span>, <span class="st">"images"</span>, <span class="st">"atdata"</span>],</span> 815 + <span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a> license<span class="op">=</span><span class="st">"MIT"</span>,</span> 816 + <span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a>)</span> 817 + <span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Dataset URI: </span><span class="sc">{</span>dataset_uri<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 775 818 </div> 776 819 </section> 777 820 </section> 778 821 <section id="list-and-load-datasets" class="level2"> 779 822 <h2 class="anchored" data-anchor-id="list-and-load-datasets">List and Load Datasets</h2> 780 - <div id="13e2d797" class="cell"> 781 - <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a>dataset_loader <span class="op">=</span> DatasetLoader(client)</span> 782 - <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a>datasets <span class="op">=</span> dataset_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 783 - <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(datasets)<span class="sc">}</span><span class="ss"> dataset(s)"</span>)</span> 784 - <span id="cb10-4"><a href="#cb10-4" aria-hidden="true" tabindex="-1"></a></span> 785 - <span id="cb10-5"><a href="#cb10-5" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> ds <span class="kw">in</span> datasets:</span> 786 - <span id="cb10-6"><a href="#cb10-6" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" - </span><span class="sc">{</span>ds<span class="sc">.</span>get(<span class="st">'name'</span>, <span class="st">'Unknown'</span>)<span class="sc">}</span><span class="ss">"</span>)</span> 787 - <span id="cb10-7"><a href="#cb10-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" Schema: </span><span class="sc">{</span>ds<span class="sc">.</span>get(<span class="st">'schemaRef'</span>, <span class="st">'N/A'</span>)<span class="sc">}</span><span class="ss">"</span>)</span> 788 - <span id="cb10-8"><a href="#cb10-8" aria-hidden="true" tabindex="-1"></a> tags <span class="op">=</span> ds.get(<span class="st">'tags'</span>, [])</span> 789 - <span id="cb10-9"><a href="#cb10-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> tags:</span> 790 - <span id="cb10-10"><a href="#cb10-10" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" Tags: </span><span class="sc">{</span><span class="st">', '</span><span class="sc">.</span>join(tags)<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 823 + <div id="1d53ab16" class="cell"> 824 + <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>dataset_loader <span class="op">=</span> DatasetLoader(client)</span> 825 + <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a>datasets <span class="op">=</span> dataset_loader.list_all(limit<span class="op">=</span><span class="dv">10</span>)</span> 826 + <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Found </span><span class="sc">{</span><span class="bu">len</span>(datasets)<span class="sc">}</span><span class="ss"> dataset(s)"</span>)</span> 827 + <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a></span> 828 + <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> ds <span class="kw">in</span> datasets:</span> 829 + <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" - </span><span class="sc">{</span>ds<span class="sc">.</span>get(<span class="st">'name'</span>, <span class="st">'Unknown'</span>)<span class="sc">}</span><span class="ss">"</span>)</span> 830 + <span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" Schema: </span><span class="sc">{</span>ds<span class="sc">.</span>get(<span class="st">'schemaRef'</span>, <span class="st">'N/A'</span>)<span class="sc">}</span><span class="ss">"</span>)</span> 831 + <span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a> tags <span class="op">=</span> ds.get(<span class="st">'tags'</span>, [])</span> 832 + <span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> <span class="cf">if</span> tags:</span> 833 + <span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f" Tags: </span><span class="sc">{</span><span class="st">', '</span><span class="sc">.</span>join(tags)<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 791 834 </div> 792 835 </section> 793 836 <section id="load-a-dataset" class="level2"> 794 837 <h2 class="anchored" data-anchor-id="load-a-dataset">Load a Dataset</h2> 795 - <div id="3fd0dcc4" class="cell"> 796 - <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 797 - <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> dataset_loader.get_storage_type(<span class="bu">str</span>(blob_dataset_uri))</span> 798 - <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Storage type: </span><span class="sc">{</span>storage_type<span class="sc">}</span><span class="ss">"</span>)</span> 799 - <span id="cb11-4"><a href="#cb11-4" aria-hidden="true" tabindex="-1"></a></span> 800 - <span id="cb11-5"><a href="#cb11-5" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"blobs"</span>:</span> 801 - <span id="cb11-6"><a href="#cb11-6" aria-hidden="true" tabindex="-1"></a> blob_urls <span class="op">=</span> dataset_loader.get_blob_urls(<span class="bu">str</span>(blob_dataset_uri))</span> 802 - <span id="cb11-7"><a href="#cb11-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Blob URLs: </span><span class="sc">{</span><span class="bu">len</span>(blob_urls)<span class="sc">}</span><span class="ss"> blob(s)"</span>)</span> 803 - <span id="cb11-8"><a href="#cb11-8" aria-hidden="true" tabindex="-1"></a></span> 804 - <span id="cb11-9"><a href="#cb11-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Load and iterate (works for both storage types)</span></span> 805 - <span id="cb11-10"><a href="#cb11-10" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> dataset_loader.to_dataset(<span class="bu">str</span>(blob_dataset_uri), DemoSample)</span> 806 - <span id="cb11-11"><a href="#cb11-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.ordered():</span> 807 - <span id="cb11-12"><a href="#cb11-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Sample id=</span><span class="sc">{</span>batch<span class="sc">.</span><span class="bu">id</span><span class="sc">}</span><span class="ss">, text=</span><span class="sc">{</span>batch<span class="sc">.</span>text<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 838 + <div id="3ec469cc" class="cell"> 839 + <div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Check storage type</span></span> 840 + <span id="cb13-2"><a href="#cb13-2" aria-hidden="true" tabindex="-1"></a>storage_type <span class="op">=</span> dataset_loader.get_storage_type(<span class="bu">str</span>(blob_dataset_uri))</span> 841 + <span id="cb13-3"><a href="#cb13-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Storage type: </span><span class="sc">{</span>storage_type<span class="sc">}</span><span class="ss">"</span>)</span> 842 + <span id="cb13-4"><a href="#cb13-4" aria-hidden="true" tabindex="-1"></a></span> 843 + <span id="cb13-5"><a href="#cb13-5" aria-hidden="true" tabindex="-1"></a><span class="cf">if</span> storage_type <span class="op">==</span> <span class="st">"blobs"</span>:</span> 844 + <span id="cb13-6"><a href="#cb13-6" aria-hidden="true" tabindex="-1"></a> blob_urls <span class="op">=</span> dataset_loader.get_blob_urls(<span class="bu">str</span>(blob_dataset_uri))</span> 845 + <span id="cb13-7"><a href="#cb13-7" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Blob URLs: </span><span class="sc">{</span><span class="bu">len</span>(blob_urls)<span class="sc">}</span><span class="ss"> blob(s)"</span>)</span> 846 + <span id="cb13-8"><a href="#cb13-8" aria-hidden="true" tabindex="-1"></a></span> 847 + <span id="cb13-9"><a href="#cb13-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Load and iterate (works for both storage types)</span></span> 848 + <span id="cb13-10"><a href="#cb13-10" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> dataset_loader.to_dataset(<span class="bu">str</span>(blob_dataset_uri), DemoSample)</span> 849 + <span id="cb13-11"><a href="#cb13-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.ordered():</span> 850 + <span id="cb13-12"><a href="#cb13-12" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Sample id=</span><span class="sc">{</span>batch<span class="sc">.</span><span class="bu">id</span><span class="sc">}</span><span class="ss">, text=</span><span class="sc">{</span>batch<span class="sc">.</span>text<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 808 851 </div> 809 852 </section> 810 853 <section id="complete-publishing-workflow" class="level2"> 811 854 <h2 class="anchored" data-anchor-id="complete-publishing-workflow">Complete Publishing Workflow</h2> 812 - <div id="86e1598b" class="cell"> 813 - <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 814 - <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 815 - <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 816 - <span id="cb12-4"><a href="#cb12-4" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 817 - <span id="cb12-5"><a href="#cb12-5" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 818 - <span id="cb12-6"><a href="#cb12-6" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 819 - <span id="cb12-7"><a href="#cb12-7" aria-hidden="true" tabindex="-1"></a></span> 820 - <span id="cb12-8"><a href="#cb12-8" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 821 - <span id="cb12-9"><a href="#cb12-9" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 822 - <span id="cb12-10"><a href="#cb12-10" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 823 - <span id="cb12-11"><a href="#cb12-11" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 824 - <span id="cb12-12"><a href="#cb12-12" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 825 - <span id="cb12-13"><a href="#cb12-13" aria-hidden="true" tabindex="-1"></a> )</span> 826 - <span id="cb12-14"><a href="#cb12-14" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 827 - <span id="cb12-15"><a href="#cb12-15" aria-hidden="true" tabindex="-1"></a>]</span> 828 - <span id="cb12-16"><a href="#cb12-16" aria-hidden="true" tabindex="-1"></a></span> 829 - <span id="cb12-17"><a href="#cb12-17" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 830 - <span id="cb12-18"><a href="#cb12-18" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 831 - <span id="cb12-19"><a href="#cb12-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 832 - <span id="cb12-20"><a href="#cb12-20" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 833 - <span id="cb12-21"><a href="#cb12-21" aria-hidden="true" tabindex="-1"></a></span> 834 - <span id="cb12-22"><a href="#cb12-22" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate</span></span> 835 - <span id="cb12-23"><a href="#cb12-23" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereIndex</span> 836 - <span id="cb12-24"><a href="#cb12-24" aria-hidden="true" tabindex="-1"></a></span> 837 - <span id="cb12-25"><a href="#cb12-25" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 838 - <span id="cb12-26"><a href="#cb12-26" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 839 - <span id="cb12-27"><a href="#cb12-27" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client)</span> 840 - <span id="cb12-28"><a href="#cb12-28" aria-hidden="true" tabindex="-1"></a></span> 841 - <span id="cb12-29"><a href="#cb12-29" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 842 - <span id="cb12-30"><a href="#cb12-30" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 843 - <span id="cb12-31"><a href="#cb12-31" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 844 - <span id="cb12-32"><a href="#cb12-32" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 845 - <span id="cb12-33"><a href="#cb12-33" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 846 - <span id="cb12-34"><a href="#cb12-34" aria-hidden="true" tabindex="-1"></a>)</span> 847 - <span id="cb12-35"><a href="#cb12-35" aria-hidden="true" tabindex="-1"></a></span> 848 - <span id="cb12-36"><a href="#cb12-36" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset</span></span> 849 - <span id="cb12-37"><a href="#cb12-37" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 850 - <span id="cb12-38"><a href="#cb12-38" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 851 - <span id="cb12-39"><a href="#cb12-39" aria-hidden="true" tabindex="-1"></a> dataset,</span> 852 - <span id="cb12-40"><a href="#cb12-40" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 853 - <span id="cb12-41"><a href="#cb12-41" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 854 - <span id="cb12-42"><a href="#cb12-42" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 855 - <span id="cb12-43"><a href="#cb12-43" aria-hidden="true" tabindex="-1"></a>)</span> 856 - <span id="cb12-44"><a href="#cb12-44" aria-hidden="true" tabindex="-1"></a></span> 857 - <span id="cb12-45"><a href="#cb12-45" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 855 + <p>This example shows the recommended workflow using <code>PDSBlobStore</code> for fully decentralized storage:</p> 856 + <div id="0d13f586" class="cell"> 857 + <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define and create samples</span></span> 858 + <span id="cb14-2"><a href="#cb14-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 859 + <span id="cb14-3"><a href="#cb14-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> FeatureSample:</span> 860 + <span id="cb14-4"><a href="#cb14-4" aria-hidden="true" tabindex="-1"></a> features: NDArray</span> 861 + <span id="cb14-5"><a href="#cb14-5" aria-hidden="true" tabindex="-1"></a> label: <span class="bu">int</span></span> 862 + <span id="cb14-6"><a href="#cb14-6" aria-hidden="true" tabindex="-1"></a> source: <span class="bu">str</span></span> 863 + <span id="cb14-7"><a href="#cb14-7" aria-hidden="true" tabindex="-1"></a></span> 864 + <span id="cb14-8"><a href="#cb14-8" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 865 + <span id="cb14-9"><a href="#cb14-9" aria-hidden="true" tabindex="-1"></a> FeatureSample(</span> 866 + <span id="cb14-10"><a href="#cb14-10" aria-hidden="true" tabindex="-1"></a> features<span class="op">=</span>np.random.randn(<span class="dv">128</span>).astype(np.float32),</span> 867 + <span id="cb14-11"><a href="#cb14-11" aria-hidden="true" tabindex="-1"></a> label<span class="op">=</span>i <span class="op">%</span> <span class="dv">10</span>,</span> 868 + <span id="cb14-12"><a href="#cb14-12" aria-hidden="true" tabindex="-1"></a> source<span class="op">=</span><span class="st">"synthetic"</span>,</span> 869 + <span id="cb14-13"><a href="#cb14-13" aria-hidden="true" tabindex="-1"></a> )</span> 870 + <span id="cb14-14"><a href="#cb14-14" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i <span class="kw">in</span> <span class="bu">range</span>(<span class="dv">1000</span>)</span> 871 + <span id="cb14-15"><a href="#cb14-15" aria-hidden="true" tabindex="-1"></a>]</span> 872 + <span id="cb14-16"><a href="#cb14-16" aria-hidden="true" tabindex="-1"></a></span> 873 + <span id="cb14-17"><a href="#cb14-17" aria-hidden="true" tabindex="-1"></a><span class="co"># 2. Write to tar</span></span> 874 + <span id="cb14-18"><a href="#cb14-18" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> wds.writer.TarWriter(<span class="st">"features.tar"</span>) <span class="im">as</span> sink:</span> 875 + <span id="cb14-19"><a href="#cb14-19" aria-hidden="true" tabindex="-1"></a> <span class="cf">for</span> i, s <span class="kw">in</span> <span class="bu">enumerate</span>(samples):</span> 876 + <span id="cb14-20"><a href="#cb14-20" aria-hidden="true" tabindex="-1"></a> sink.write({<span class="op">**</span>s.as_wds, <span class="st">"__key__"</span>: <span class="ss">f"</span><span class="sc">{</span>i<span class="sc">:06d}</span><span class="ss">"</span>})</span> 877 + <span id="cb14-21"><a href="#cb14-21" aria-hidden="true" tabindex="-1"></a></span> 878 + <span id="cb14-22"><a href="#cb14-22" aria-hidden="true" tabindex="-1"></a><span class="co"># 3. Authenticate and create index with blob storage</span></span> 879 + <span id="cb14-23"><a href="#cb14-23" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 880 + <span id="cb14-24"><a href="#cb14-24" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> 881 + <span id="cb14-25"><a href="#cb14-25" aria-hidden="true" tabindex="-1"></a></span> 882 + <span id="cb14-26"><a href="#cb14-26" aria-hidden="true" tabindex="-1"></a>store <span class="op">=</span> PDSBlobStore(client)</span> 883 + <span id="cb14-27"><a href="#cb14-27" aria-hidden="true" tabindex="-1"></a>index <span class="op">=</span> AtmosphereIndex(client, data_store<span class="op">=</span>store)</span> 884 + <span id="cb14-28"><a href="#cb14-28" aria-hidden="true" tabindex="-1"></a></span> 885 + <span id="cb14-29"><a href="#cb14-29" aria-hidden="true" tabindex="-1"></a><span class="co"># 4. Publish schema</span></span> 886 + <span id="cb14-30"><a href="#cb14-30" aria-hidden="true" tabindex="-1"></a>schema_uri <span class="op">=</span> index.publish_schema(</span> 887 + <span id="cb14-31"><a href="#cb14-31" aria-hidden="true" tabindex="-1"></a> FeatureSample,</span> 888 + <span id="cb14-32"><a href="#cb14-32" aria-hidden="true" tabindex="-1"></a> version<span class="op">=</span><span class="st">"1.0.0"</span>,</span> 889 + <span id="cb14-33"><a href="#cb14-33" aria-hidden="true" tabindex="-1"></a> description<span class="op">=</span><span class="st">"Feature vectors with labels"</span>,</span> 890 + <span id="cb14-34"><a href="#cb14-34" aria-hidden="true" tabindex="-1"></a>)</span> 891 + <span id="cb14-35"><a href="#cb14-35" aria-hidden="true" tabindex="-1"></a></span> 892 + <span id="cb14-36"><a href="#cb14-36" aria-hidden="true" tabindex="-1"></a><span class="co"># 5. Publish dataset (shards uploaded as blobs automatically)</span></span> 893 + <span id="cb14-37"><a href="#cb14-37" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[FeatureSample](<span class="st">"features.tar"</span>)</span> 894 + <span id="cb14-38"><a href="#cb14-38" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> index.insert_dataset(</span> 895 + <span id="cb14-39"><a href="#cb14-39" aria-hidden="true" tabindex="-1"></a> dataset,</span> 896 + <span id="cb14-40"><a href="#cb14-40" aria-hidden="true" tabindex="-1"></a> name<span class="op">=</span><span class="st">"synthetic-features-v1"</span>,</span> 897 + <span id="cb14-41"><a href="#cb14-41" aria-hidden="true" tabindex="-1"></a> schema_ref<span class="op">=</span>schema_uri,</span> 898 + <span id="cb14-42"><a href="#cb14-42" aria-hidden="true" tabindex="-1"></a> tags<span class="op">=</span>[<span class="st">"features"</span>, <span class="st">"synthetic"</span>],</span> 899 + <span id="cb14-43"><a href="#cb14-43" aria-hidden="true" tabindex="-1"></a>)</span> 900 + <span id="cb14-44"><a href="#cb14-44" aria-hidden="true" tabindex="-1"></a></span> 901 + <span id="cb14-45"><a href="#cb14-45" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published: </span><span class="sc">{</span>entry<span class="sc">.</span>uri<span class="sc">}</span><span class="ss">"</span>)</span> 902 + <span id="cb14-46"><a href="#cb14-46" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Data stored at: </span><span class="sc">{</span>entry<span class="sc">.</span>data_urls<span class="sc">}</span><span class="ss">"</span>) <span class="co"># at://did/blob/cid URLs</span></span> 903 + <span id="cb14-47"><a href="#cb14-47" aria-hidden="true" tabindex="-1"></a></span> 904 + <span id="cb14-48"><a href="#cb14-48" aria-hidden="true" tabindex="-1"></a><span class="co"># 6. Later: load from blobs</span></span> 905 + <span id="cb14-49"><a href="#cb14-49" aria-hidden="true" tabindex="-1"></a>source <span class="op">=</span> store.create_source(entry.data_urls)</span> 906 + <span id="cb14-50"><a href="#cb14-50" aria-hidden="true" tabindex="-1"></a>ds <span class="op">=</span> atdata.Dataset[FeatureSample](source)</span> 907 + <span id="cb14-51"><a href="#cb14-51" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> ds.ordered(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 908 + <span id="cb14-52"><a href="#cb14-52" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="ss">f"Loaded batch with </span><span class="sc">{</span><span class="bu">len</span>(batch.label)<span class="sc">}</span><span class="ss"> samples"</span>)</span> 909 + <span id="cb14-53"><a href="#cb14-53" aria-hidden="true" tabindex="-1"></a> <span class="cf">break</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 858 910 </div> 859 911 </section> 860 912 <section id="next-steps" class="level2">
+8 -8
docs/tutorials/local-workflow.html
··· 599 599 </section> 600 600 <section id="setup" class="level2"> 601 601 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 602 - <div id="bf308a3b" class="cell"> 602 + <div id="cba2b198" class="cell"> 603 603 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 604 604 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 605 605 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 609 609 </section> 610 610 <section id="define-sample-types" class="level2"> 611 611 <h2 class="anchored" data-anchor-id="define-sample-types">Define Sample Types</h2> 612 - <div id="b92ec5f1" class="cell"> 612 + <div id="8bf33c29" class="cell"> 613 613 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 614 614 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> TrainingSample:</span> 615 615 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> <span class="co">"""A sample containing features and label for training."""</span></span> ··· 626 626 <section id="localdatasetentry" class="level2"> 627 627 <h2 class="anchored" data-anchor-id="localdatasetentry">LocalDatasetEntry</h2> 628 628 <p>Create entries with content-addressable CIDs:</p> 629 - <div id="f75b391d" class="cell"> 629 + <div id="a93468d2" class="cell"> 630 630 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create an entry manually</span></span> 631 631 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>entry <span class="op">=</span> LocalDatasetEntry(</span> 632 632 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> _name<span class="op">=</span><span class="st">"my-dataset"</span>,</span> ··· 658 658 <section id="localindex" class="level2"> 659 659 <h2 class="anchored" data-anchor-id="localindex">LocalIndex</h2> 660 660 <p>The index tracks datasets in Redis:</p> 661 - <div id="162ca9e1" class="cell"> 661 + <div id="05315823" class="cell"> 662 662 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> redis <span class="im">import</span> Redis</span> 663 663 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span> 664 664 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to Redis</span></span> ··· 669 669 </div> 670 670 <section id="schema-management" class="level3"> 671 671 <h3 class="anchored" data-anchor-id="schema-management">Schema Management</h3> 672 - <div id="9c358257" class="cell"> 672 + <div id="a16e84e2" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Publish a schema</span></span> 674 674 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a>schema_ref <span class="op">=</span> index.publish_schema(TrainingSample, version<span class="op">=</span><span class="st">"1.0.0"</span>)</span> 675 675 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f"Published schema: </span><span class="sc">{</span>schema_ref<span class="sc">}</span><span class="ss">"</span>)</span> ··· 691 691 <section id="s3datastore" class="level2"> 692 692 <h2 class="anchored" data-anchor-id="s3datastore">S3DataStore</h2> 693 693 <p>For direct S3 operations:</p> 694 - <div id="9b5e4a31" class="cell"> 694 + <div id="0b93923c" class="cell"> 695 695 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>creds <span class="op">=</span> {</span> 696 696 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ENDPOINT"</span>: <span class="st">"http://localhost:9000"</span>,</span> 697 697 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a> <span class="st">"AWS_ACCESS_KEY_ID"</span>: <span class="st">"minioadmin"</span>,</span> ··· 707 707 <section id="complete-index-workflow" class="level2"> 708 708 <h2 class="anchored" data-anchor-id="complete-index-workflow">Complete Index Workflow</h2> 709 709 <p>Use <code>LocalIndex</code> with <code>S3DataStore</code> to store datasets with S3 storage and Redis indexing:</p> 710 - <div id="37996513" class="cell"> 710 + <div id="437797e0" class="cell"> 711 711 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Create sample data</span></span> 712 712 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>samples <span class="op">=</span> [</span> 713 713 <span id="cb8-3"><a href="#cb8-3" aria-hidden="true" tabindex="-1"></a> TrainingSample(</span> ··· 756 756 <section id="using-load_dataset-with-index" class="level2"> 757 757 <h2 class="anchored" data-anchor-id="using-load_dataset-with-index">Using load_dataset with Index</h2> 758 758 <p>The <code>load_dataset()</code> function supports index lookup:</p> 759 - <div id="633bb1d8" class="cell"> 759 + <div id="a2176738" class="cell"> 760 760 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata <span class="im">import</span> load_dataset</span> 761 761 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 762 762 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load from local index</span></span>
+11 -11
docs/tutorials/promotion.html
··· 593 593 </section> 594 594 <section id="setup" class="level2"> 595 595 <h2 class="anchored" data-anchor-id="setup">Setup</h2> 596 - <div id="103f946d" class="cell"> 596 + <div id="a4944327" class="cell"> 597 597 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 598 598 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 599 599 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 606 606 <section id="prepare-a-local-dataset" class="level2"> 607 607 <h2 class="anchored" data-anchor-id="prepare-a-local-dataset">Prepare a Local Dataset</h2> 608 608 <p>First, set up a dataset in local storage:</p> 609 - <div id="1d38a39c" class="cell"> 609 + <div id="5d9c3c9c" class="cell"> 610 610 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># 1. Define sample type</span></span> 611 611 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 612 612 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> ExperimentSample:</span> ··· 656 656 <section id="basic-promotion" class="level2"> 657 657 <h2 class="anchored" data-anchor-id="basic-promotion">Basic Promotion</h2> 658 658 <p>Promote the dataset to ATProto:</p> 659 - <div id="e42328f0" class="cell"> 659 + <div id="ad535d77" class="cell"> 660 660 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Connect to atmosphere</span></span> 661 661 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>client <span class="op">=</span> AtmosphereClient()</span> 662 662 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>client.login(<span class="st">"myhandle.bsky.social"</span>, <span class="st">"app-password"</span>)</span> ··· 669 669 <section id="promotion-with-metadata" class="level2"> 670 670 <h2 class="anchored" data-anchor-id="promotion-with-metadata">Promotion with Metadata</h2> 671 671 <p>Add description, tags, and license:</p> 672 - <div id="7d374aac" class="cell"> 672 + <div id="dc02ee9b" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(</span> 674 674 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a> local_entry,</span> 675 675 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a> local_index,</span> ··· 685 685 <section id="schema-deduplication" class="level2"> 686 686 <h2 class="anchored" data-anchor-id="schema-deduplication">Schema Deduplication</h2> 687 687 <p>The promotion workflow automatically checks for existing schemas:</p> 688 - <div id="128c6d02" class="cell"> 688 + <div id="0f305439" class="cell"> 689 689 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.promote <span class="im">import</span> _find_existing_schema</span> 690 690 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a></span> 691 691 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Check if schema already exists</span></span> ··· 697 697 <span id="cb6-9"><a href="#cb6-9" aria-hidden="true" tabindex="-1"></a> <span class="bu">print</span>(<span class="st">"No existing schema found, will publish new one"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 698 698 </div> 699 699 <p>When you promote multiple datasets with the same sample type:</p> 700 - <div id="55249656" class="cell"> 700 + <div id="5f2af9ec" class="cell"> 701 701 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># First promotion: publishes schema</span></span> 702 702 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a>uri1 <span class="op">=</span> promote_to_atmosphere(entry1, local_index, client)</span> 703 703 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a></span> ··· 712 712 <div class="tab-content"> 713 713 <div id="tabset-1-1" class="tab-pane active" role="tabpanel" aria-labelledby="tabset-1-1-tab"> 714 714 <p>By default, promotion keeps the original data URLs:</p> 715 - <div id="341b0658" class="cell"> 715 + <div id="11816bc6" class="cell"> 716 716 <div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Data stays in original S3 location</span></span> 717 717 <span id="cb8-2"><a href="#cb8-2" aria-hidden="true" tabindex="-1"></a>at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div> 718 718 </div> ··· 725 725 </div> 726 726 <div id="tabset-1-2" class="tab-pane" role="tabpanel" aria-labelledby="tabset-1-2-tab"> 727 727 <p>To copy data to a different storage location:</p> 728 - <div id="7078a3dd" class="cell"> 728 + <div id="a09bd7f1" class="cell"> 729 729 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.local <span class="im">import</span> S3DataStore</span> 730 730 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a></span> 731 731 <span id="cb9-3"><a href="#cb9-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create new data store</span></span> ··· 755 755 <section id="verify-on-atmosphere" class="level2"> 756 756 <h2 class="anchored" data-anchor-id="verify-on-atmosphere">Verify on Atmosphere</h2> 757 757 <p>After promotion, verify the dataset is accessible:</p> 758 - <div id="891b451d" class="cell"> 758 + <div id="9698b9c2" class="cell"> 759 759 <div class="sourceCode cell-code" id="cb10"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> atdata.atmosphere <span class="im">import</span> AtmosphereIndex</span> 760 760 <span id="cb10-2"><a href="#cb10-2" aria-hidden="true" tabindex="-1"></a></span> 761 761 <span id="cb10-3"><a href="#cb10-3" aria-hidden="true" tabindex="-1"></a>atm_index <span class="op">=</span> AtmosphereIndex(client)</span> ··· 776 776 </section> 777 777 <section id="error-handling" class="level2"> 778 778 <h2 class="anchored" data-anchor-id="error-handling">Error Handling</h2> 779 - <div id="31b5f22a" class="cell"> 779 + <div id="10f3b8be" class="cell"> 780 780 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="cf">try</span>:</span> 781 781 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a> at_uri <span class="op">=</span> promote_to_atmosphere(local_entry, local_index, client)</span> 782 782 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="cf">except</span> <span class="pp">KeyError</span> <span class="im">as</span> e:</span> ··· 800 800 </section> 801 801 <section id="complete-workflow" class="level2"> 802 802 <h2 class="anchored" data-anchor-id="complete-workflow">Complete Workflow</h2> 803 - <div id="8e76bf26" class="cell"> 803 + <div id="d8dd7cbc" class="cell"> 804 804 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Complete local-to-atmosphere workflow</span></span> 805 805 <span id="cb12-2"><a href="#cb12-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 806 806 <span id="cb12-3"><a href="#cb12-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span>
+6 -6
docs/tutorials/quickstart.html
··· 582 582 <section id="define-a-sample-type" class="level2"> 583 583 <h2 class="anchored" data-anchor-id="define-a-sample-type">Define a Sample Type</h2> 584 584 <p>Use the <code>@packable</code> decorator to create a typed sample:</p> 585 - <div id="2184c2cd" class="cell"> 585 + <div id="3049c1b6" class="cell"> 586 586 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span> 587 587 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> numpy.typing <span class="im">import</span> NDArray</span> 588 588 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> atdata</span> ··· 603 603 </section> 604 604 <section id="create-sample-instances" class="level2"> 605 605 <h2 class="anchored" data-anchor-id="create-sample-instances">Create Sample Instances</h2> 606 - <div id="edf75cb3" class="cell"> 606 + <div id="e1f58bf7" class="cell"> 607 607 <div class="sourceCode cell-code" id="cb3"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a single sample</span></span> 608 608 <span id="cb3-2"><a href="#cb3-2" aria-hidden="true" tabindex="-1"></a>sample <span class="op">=</span> ImageSample(</span> 609 609 <span id="cb3-3"><a href="#cb3-3" aria-hidden="true" tabindex="-1"></a> image<span class="op">=</span>np.random.rand(<span class="dv">224</span>, <span class="dv">224</span>, <span class="dv">3</span>).astype(np.float32),</span> ··· 624 624 <section id="write-a-dataset" class="level2"> 625 625 <h2 class="anchored" data-anchor-id="write-a-dataset">Write a Dataset</h2> 626 626 <p>Use WebDataset’s <code>TarWriter</code> to create dataset files:</p> 627 - <div id="2cf451ef" class="cell"> 627 + <div id="58d86c25" class="cell"> 628 628 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> webdataset <span class="im">as</span> wds</span> 629 629 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a></span> 630 630 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Create 100 samples</span></span> ··· 648 648 <section id="load-and-iterate" class="level2"> 649 649 <h2 class="anchored" data-anchor-id="load-and-iterate">Load and Iterate</h2> 650 650 <p>Create a typed <code>Dataset</code> and iterate with batching:</p> 651 - <div id="f95261a1" class="cell"> 651 + <div id="7bba76ea" class="cell"> 652 652 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Load dataset with type</span></span> 653 653 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a>dataset <span class="op">=</span> atdata.Dataset[ImageSample](<span class="st">"my-dataset-000000.tar"</span>)</span> 654 654 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a></span> ··· 669 669 <section id="shuffled-iteration" class="level2"> 670 670 <h2 class="anchored" data-anchor-id="shuffled-iteration">Shuffled Iteration</h2> 671 671 <p>For training, use shuffled iteration:</p> 672 - <div id="8b4f6c9a" class="cell"> 672 + <div id="e9cca4f6" class="cell"> 673 673 <div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> batch <span class="kw">in</span> dataset.shuffled(batch_size<span class="op">=</span><span class="dv">32</span>):</span> 674 674 <span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> <span class="co"># Samples are shuffled at shard and sample level</span></span> 675 675 <span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> images <span class="op">=</span> batch.image</span> ··· 683 683 <section id="use-lenses-for-type-transformations" class="level2"> 684 684 <h2 class="anchored" data-anchor-id="use-lenses-for-type-transformations">Use Lenses for Type Transformations</h2> 685 685 <p>View datasets through different schemas:</p> 686 - <div id="c2cdb8c9" class="cell"> 686 + <div id="4299fc09" class="cell"> 687 687 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Define a simplified view type</span></span> 688 688 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="at">@atdata.packable</span></span> 689 689 <span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a><span class="kw">class</span> SimplifiedSample:</span>
+4
docs_src/_quarto.yml
··· 52 52 children: embedded 53 53 - name: S3Source 54 54 children: embedded 55 + - name: BlobSource 56 + children: embedded 55 57 56 58 # Local storage 57 59 - title: Local Storage ··· 74 76 - name: AtmosphereIndex 75 77 children: embedded 76 78 - name: AtmosphereIndexEntry 79 + children: embedded 80 + - name: PDSBlobStore 77 81 children: embedded 78 82 - name: SchemaPublisher 79 83 - name: SchemaLoader
+10 -5
docs_src/api/AbstractIndex.qmd
··· 41 41 42 42 | Name | Description | 43 43 | --- | --- | 44 + | [data_store](#atdata.AbstractIndex.data_store) | Optional data store for reading/writing shards. | 44 45 | [datasets](#atdata.AbstractIndex.datasets) | Lazily iterate over all dataset entries in this index. | 45 46 | [schemas](#atdata.AbstractIndex.schemas) | Lazily iterate over all schema records in this index. | 46 47 ··· 214 215 215 216 Publish a schema for a sample type. 216 217 218 + The sample_type is accepted as ``type`` rather than ``Type[Packable]`` to 219 + support ``@packable``-decorated classes, which satisfy the Packable protocol 220 + at runtime but cannot be statically verified by type checkers. 221 + 217 222 #### Parameters {.doc-section .doc-section-parameters} 218 223 219 - | Name | Type | Description | Default | 220 - |-------------|-------------------------------------------------------------------|-------------------------------------------------------------------|------------| 221 - | sample_type | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | A Packable type (PackableSample subclass or @packable-decorated). | _required_ | 222 - | version | [str](`str`) | Semantic version string for the schema. | `'1.0.0'` | 223 - | **kwargs | | Additional backend-specific options. | `{}` | 224 + | Name | Type | Description | Default | 225 + |-------------|----------------|--------------------------------------------------------------------------------------------------------------------------------------|------------| 226 + | sample_type | [type](`type`) | A Packable type (PackableSample subclass or @packable-decorated). Validated at runtime via the @runtime_checkable Packable protocol. | _required_ | 227 + | version | [str](`str`) | Semantic version string for the schema. | `'1.0.0'` | 228 + | **kwargs | | Additional backend-specific options. | `{}` | 224 229 225 230 #### Returns {.doc-section .doc-section-returns} 226 231
+10 -2
docs_src/api/AtmosphereIndex.qmd
··· 1 1 # AtmosphereIndex { #atdata.atmosphere.AtmosphereIndex } 2 2 3 3 ```python 4 - atmosphere.AtmosphereIndex(client) 4 + atmosphere.AtmosphereIndex(client, *, data_store=None) 5 5 ``` 6 6 7 7 ATProto index implementing AbstractIndex protocol. ··· 9 9 Wraps SchemaPublisher/Loader and DatasetPublisher/Loader to provide 10 10 a unified interface compatible with LocalIndex. 11 11 12 + Optionally accepts a ``PDSBlobStore`` for writing dataset shards as 13 + ATProto blobs, enabling fully decentralized dataset storage. 14 + 12 15 ## Example {.doc-section .doc-section-example} 13 16 14 17 :: ··· 16 19 >>> client = AtmosphereClient() 17 20 >>> client.login("handle.bsky.social", "app-password") 18 21 >>> 22 + >>> # Without blob storage (external URLs only) 19 23 >>> index = AtmosphereIndex(client) 20 - >>> schema_ref = index.publish_schema(MySample, version="1.0.0") 24 + >>> 25 + >>> # With PDS blob storage 26 + >>> store = PDSBlobStore(client) 27 + >>> index = AtmosphereIndex(client, data_store=store) 21 28 >>> entry = index.insert_dataset(dataset, name="my-data") 22 29 23 30 ## Attributes 24 31 25 32 | Name | Description | 26 33 | --- | --- | 34 + | [data_store](#atdata.atmosphere.AtmosphereIndex.data_store) | The PDS blob store for writing shards, or None if not configured. | 27 35 | [datasets](#atdata.atmosphere.AtmosphereIndex.datasets) | Lazily iterate over all dataset entries (AbstractIndex protocol). | 28 36 | [schemas](#atdata.atmosphere.AtmosphereIndex.schemas) | Lazily iterate over all schema records (AbstractIndex protocol). | 29 37
+109
docs_src/api/BlobSource.qmd
··· 1 + # BlobSource { #atdata.BlobSource } 2 + 3 + ```python 4 + BlobSource(blob_refs, pds_endpoint=None, _endpoint_cache=dict()) 5 + ``` 6 + 7 + Data source for ATProto PDS blob storage. 8 + 9 + Streams dataset shards stored as blobs on an ATProto Personal Data Server. 10 + Each shard is identified by a blob reference containing the DID and CID. 11 + 12 + This source resolves blob references to HTTP URLs and streams the content 13 + directly, supporting efficient iteration over shards without downloading 14 + everything upfront. 15 + 16 + ## Attributes {.doc-section .doc-section-attributes} 17 + 18 + | Name | Type | Description | 19 + |--------------|----------------------------------------------------------------|----------------------------------------------------------------| 20 + | blob_refs | [list](`list`)\[[dict](`dict`)\[[str](`str`), [str](`str`)\]\] | List of blob reference dicts with 'did' and 'cid' keys. | 21 + | pds_endpoint | [str](`str`) \| None | Optional PDS endpoint URL. If not provided, resolved from DID. | 22 + 23 + ## Example {.doc-section .doc-section-example} 24 + 25 + :: 26 + 27 + >>> source = BlobSource( 28 + ... blob_refs=[ 29 + ... {"did": "did:plc:abc123", "cid": "bafyrei..."}, 30 + ... {"did": "did:plc:abc123", "cid": "bafyrei..."}, 31 + ... ], 32 + ... ) 33 + >>> for shard_id, stream in source.shards: 34 + ... process(stream) 35 + 36 + ## Methods 37 + 38 + | Name | Description | 39 + | --- | --- | 40 + | [from_refs](#atdata.BlobSource.from_refs) | Create BlobSource from blob reference dicts. | 41 + | [list_shards](#atdata.BlobSource.list_shards) | Return list of AT URI-style shard identifiers. | 42 + | [open_shard](#atdata.BlobSource.open_shard) | Open a single shard by its AT URI. | 43 + 44 + ### from_refs { #atdata.BlobSource.from_refs } 45 + 46 + ```python 47 + BlobSource.from_refs(refs, *, pds_endpoint=None) 48 + ``` 49 + 50 + Create BlobSource from blob reference dicts. 51 + 52 + Accepts blob references in the format returned by upload_blob: 53 + ``{"$type": "blob", "ref": {"$link": "cid"}, ...}`` 54 + 55 + Also accepts simplified format: ``{"did": "...", "cid": "..."}`` 56 + 57 + #### Parameters {.doc-section .doc-section-parameters} 58 + 59 + | Name | Type | Description | Default | 60 + |--------------|----------------------------------|---------------------------------------------|------------| 61 + | refs | [list](`list`)\[[dict](`dict`)\] | List of blob reference dicts. | _required_ | 62 + | pds_endpoint | [str](`str`) \| None | Optional PDS endpoint to use for all blobs. | `None` | 63 + 64 + #### Returns {.doc-section .doc-section-returns} 65 + 66 + | Name | Type | Description | 67 + |--------|----------------|------------------------| 68 + | | \'BlobSource\' | Configured BlobSource. | 69 + 70 + #### Raises {.doc-section .doc-section-raises} 71 + 72 + | Name | Type | Description | 73 + |--------|----------------------------|----------------------------------------| 74 + | | [ValueError](`ValueError`) | If refs is empty or format is invalid. | 75 + 76 + ### list_shards { #atdata.BlobSource.list_shards } 77 + 78 + ```python 79 + BlobSource.list_shards() 80 + ``` 81 + 82 + Return list of AT URI-style shard identifiers. 83 + 84 + ### open_shard { #atdata.BlobSource.open_shard } 85 + 86 + ```python 87 + BlobSource.open_shard(shard_id) 88 + ``` 89 + 90 + Open a single shard by its AT URI. 91 + 92 + #### Parameters {.doc-section .doc-section-parameters} 93 + 94 + | Name | Type | Description | Default | 95 + |----------|--------------|------------------------------------------|------------| 96 + | shard_id | [str](`str`) | AT URI of the shard (at://did/blob/cid). | _required_ | 97 + 98 + #### Returns {.doc-section .doc-section-returns} 99 + 100 + | Name | Type | Description | 101 + |--------|---------------------------------------|-----------------------------------------------| 102 + | | [IO](`typing.IO`)\[[bytes](`bytes`)\] | Streaming response body for reading the blob. | 103 + 104 + #### Raises {.doc-section .doc-section-raises} 105 + 106 + | Name | Type | Description | 107 + |--------|----------------------------|--------------------------------------| 108 + | | [KeyError](`KeyError`) | If shard_id is not in list_shards(). | 109 + | | [ValueError](`ValueError`) | If shard_id format is invalid. |
+157
docs_src/api/PDSBlobStore.qmd
··· 1 + # PDSBlobStore { #atdata.atmosphere.PDSBlobStore } 2 + 3 + ```python 4 + atmosphere.PDSBlobStore(client) 5 + ``` 6 + 7 + PDS blob store implementing AbstractDataStore protocol. 8 + 9 + Stores dataset shards as ATProto blobs, enabling decentralized dataset 10 + storage on the AT Protocol network. 11 + 12 + Each shard is written to a temporary tar file, then uploaded as a blob 13 + to the user's PDS. The returned URLs are AT URIs that can be resolved 14 + to HTTP URLs for streaming. 15 + 16 + ## Attributes {.doc-section .doc-section-attributes} 17 + 18 + | Name | Type | Description | 19 + |--------|----------------------|------------------------------------------| 20 + | client | \'AtmosphereClient\' | Authenticated AtmosphereClient instance. | 21 + 22 + ## Example {.doc-section .doc-section-example} 23 + 24 + :: 25 + 26 + >>> store = PDSBlobStore(client) 27 + >>> urls = store.write_shards(dataset, prefix="training/v1") 28 + >>> # Returns AT URIs like: 29 + >>> # ['at://did:plc:abc/blob/bafyrei...', ...] 30 + 31 + ## Methods 32 + 33 + | Name | Description | 34 + | --- | --- | 35 + | [create_source](#atdata.atmosphere.PDSBlobStore.create_source) | Create a BlobSource for reading these AT URIs. | 36 + | [read_url](#atdata.atmosphere.PDSBlobStore.read_url) | Resolve an AT URI blob reference to an HTTP URL. | 37 + | [supports_streaming](#atdata.atmosphere.PDSBlobStore.supports_streaming) | PDS blobs support streaming via HTTP. | 38 + | [write_shards](#atdata.atmosphere.PDSBlobStore.write_shards) | Write dataset shards as PDS blobs. | 39 + 40 + ### create_source { #atdata.atmosphere.PDSBlobStore.create_source } 41 + 42 + ```python 43 + atmosphere.PDSBlobStore.create_source(urls) 44 + ``` 45 + 46 + Create a BlobSource for reading these AT URIs. 47 + 48 + This is a convenience method for creating a DataSource that can 49 + stream the blobs written by this store. 50 + 51 + #### Parameters {.doc-section .doc-section-parameters} 52 + 53 + | Name | Type | Description | Default | 54 + |--------|--------------------------------|--------------------------------------|------------| 55 + | urls | [list](`list`)\[[str](`str`)\] | List of AT URIs from write_shards(). | _required_ | 56 + 57 + #### Returns {.doc-section .doc-section-returns} 58 + 59 + | Name | Type | Description | 60 + |--------|----------------|-------------------------------------------| 61 + | | \'BlobSource\' | BlobSource configured for the given URLs. | 62 + 63 + #### Raises {.doc-section .doc-section-raises} 64 + 65 + | Name | Type | Description | 66 + |--------|----------------------------|--------------------------------| 67 + | | [ValueError](`ValueError`) | If URLs are not valid AT URIs. | 68 + 69 + ### read_url { #atdata.atmosphere.PDSBlobStore.read_url } 70 + 71 + ```python 72 + atmosphere.PDSBlobStore.read_url(url) 73 + ``` 74 + 75 + Resolve an AT URI blob reference to an HTTP URL. 76 + 77 + Transforms ``at://did/blob/cid`` URIs to HTTP URLs that can be 78 + streamed by WebDataset. 79 + 80 + #### Parameters {.doc-section .doc-section-parameters} 81 + 82 + | Name | Type | Description | Default | 83 + |--------|--------------|---------------------------------------------|------------| 84 + | url | [str](`str`) | AT URI in format ``at://{did}/blob/{cid}``. | _required_ | 85 + 86 + #### Returns {.doc-section .doc-section-returns} 87 + 88 + | Name | Type | Description | 89 + |--------|--------------|---------------------------------------------| 90 + | | [str](`str`) | HTTP URL for fetching the blob via PDS API. | 91 + 92 + #### Raises {.doc-section .doc-section-raises} 93 + 94 + | Name | Type | Description | 95 + |--------|----------------------------|-----------------------------------------------------| 96 + | | [ValueError](`ValueError`) | If URL format is invalid or PDS cannot be resolved. | 97 + 98 + ### supports_streaming { #atdata.atmosphere.PDSBlobStore.supports_streaming } 99 + 100 + ```python 101 + atmosphere.PDSBlobStore.supports_streaming() 102 + ``` 103 + 104 + PDS blobs support streaming via HTTP. 105 + 106 + #### Returns {.doc-section .doc-section-returns} 107 + 108 + | Name | Type | Description | 109 + |--------|----------------|---------------| 110 + | | [bool](`bool`) | True. | 111 + 112 + ### write_shards { #atdata.atmosphere.PDSBlobStore.write_shards } 113 + 114 + ```python 115 + atmosphere.PDSBlobStore.write_shards( 116 + ds, 117 + *, 118 + prefix, 119 + maxcount=10000, 120 + maxsize=3000000000.0, 121 + **kwargs, 122 + ) 123 + ``` 124 + 125 + Write dataset shards as PDS blobs. 126 + 127 + Creates tar archives from the dataset and uploads each as a blob 128 + to the authenticated user's PDS. 129 + 130 + #### Parameters {.doc-section .doc-section-parameters} 131 + 132 + | Name | Type | Description | Default | 133 + |----------|---------------------|------------------------------------------------------------|----------------| 134 + | ds | \'Dataset\' | The Dataset to write. | _required_ | 135 + | prefix | [str](`str`) | Logical path prefix for naming (used in shard names only). | _required_ | 136 + | maxcount | [int](`int`) | Maximum samples per shard (default: 10000). | `10000` | 137 + | maxsize | [float](`float`) | Maximum shard size in bytes (default: 3GB, PDS limit). | `3000000000.0` | 138 + | **kwargs | [Any](`typing.Any`) | Additional args passed to wds.ShardWriter. | `{}` | 139 + 140 + #### Returns {.doc-section .doc-section-returns} 141 + 142 + | Name | Type | Description | 143 + |--------|--------------------------------|---------------------------------------------------| 144 + | | [list](`list`)\[[str](`str`)\] | List of AT URIs for the written blobs, in format: | 145 + | | [list](`list`)\[[str](`str`)\] | ``at://{did}/blob/{cid}`` | 146 + 147 + #### Raises {.doc-section .doc-section-raises} 148 + 149 + | Name | Type | Description | 150 + |--------|--------------------------------|----------------------------| 151 + | | [ValueError](`ValueError`) | If not authenticated. | 152 + | | [RuntimeError](`RuntimeError`) | If no shards were written. | 153 + 154 + #### Note {.doc-section .doc-section-note} 155 + 156 + PDS blobs have size limits (typically 50MB-5GB depending on PDS). 157 + Adjust maxcount/maxsize to stay within limits.
+2
docs_src/api/index.qmd
··· 36 36 | --- | --- | 37 37 | [URLSource](URLSource.qmd#atdata.URLSource) | Data source for WebDataset-compatible URLs. | 38 38 | [S3Source](S3Source.qmd#atdata.S3Source) | Data source for S3-compatible storage with explicit credentials. | 39 + | [BlobSource](BlobSource.qmd#atdata.BlobSource) | Data source for ATProto PDS blob storage. | 39 40 40 41 ## Local Storage 41 42 ··· 56 57 | [AtmosphereClient](AtmosphereClient.qmd#atdata.atmosphere.AtmosphereClient) | ATProto client wrapper for atdata operations. | 57 58 | [AtmosphereIndex](AtmosphereIndex.qmd#atdata.atmosphere.AtmosphereIndex) | ATProto index implementing AbstractIndex protocol. | 58 59 | [AtmosphereIndexEntry](AtmosphereIndexEntry.qmd#atdata.atmosphere.AtmosphereIndexEntry) | Entry wrapper for ATProto dataset records implementing IndexEntry protocol. | 60 + | [PDSBlobStore](PDSBlobStore.qmd#atdata.atmosphere.PDSBlobStore) | PDS blob store implementing AbstractDataStore protocol. | 59 61 | [SchemaPublisher](SchemaPublisher.qmd#atdata.atmosphere.SchemaPublisher) | Publishes PackableSample schemas to ATProto. | 60 62 | [SchemaLoader](SchemaLoader.qmd#atdata.atmosphere.SchemaLoader) | Loads PackableSample schemas from ATProto. | 61 63 | [DatasetPublisher](DatasetPublisher.qmd#atdata.atmosphere.DatasetPublisher) | Publishes dataset index records to ATProto. |
+9 -9
docs_src/api/local.Index.qmd
··· 462 462 463 463 #### Parameters {.doc-section .doc-section-parameters} 464 464 465 - | Name | Type | Description | Default | 466 - |-------------|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 467 - | sample_type | [Type](`typing.Type`)\[[Packable](`atdata._protocols.Packable`)\] | The PackableSample subclass to publish. | _required_ | 468 - | version | [str](`str`) \| None | Semantic version string (e.g., '1.0.0'). If None, auto-increments from the latest published version (patch bump), or starts at '1.0.0' if no previous version exists. | `None` | 469 - | description | [str](`str`) \| None | Optional human-readable description. If None, uses the class docstring. | `None` | 465 + | Name | Type | Description | Default | 466 + |-------------|----------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| 467 + | sample_type | [type](`type`) | A Packable type (@packable-decorated or PackableSample subclass). | _required_ | 468 + | version | [str](`str`) \| None | Semantic version string (e.g., '1.0.0'). If None, auto-increments from the latest published version (patch bump), or starts at '1.0.0' if no previous version exists. | `None` | 469 + | description | [str](`str`) \| None | Optional human-readable description. If None, uses the class docstring. | `None` | 470 470 471 471 #### Returns {.doc-section .doc-section-returns} 472 472 ··· 476 476 477 477 #### Raises {.doc-section .doc-section-raises} 478 478 479 - | Name | Type | Description | 480 - |--------|----------------------------|------------------------------------| 481 - | | [ValueError](`ValueError`) | If sample_type is not a dataclass. | 482 - | | [TypeError](`TypeError`) | If a field type is not supported. | 479 + | Name | Type | Description | 480 + |--------|----------------------------|--------------------------------------------------------------------------------------------| 481 + | | [ValueError](`ValueError`) | If sample_type is not a dataclass. | 482 + | | [TypeError](`TypeError`) | If sample_type doesn't satisfy the Packable protocol, or if a field type is not supported. |
+1 -1
docs_src/objects.json
··· 1 - {"project": "atdata", "version": "0.0.9999", "count": 322, "items": [{"name": "atdata.packable", "domain": "py", "role": "function", "priority": "1", "uri": "api/packable.html#atdata.packable", "dispname": "-"}, {"name": "atdata.dataset.packable", "domain": "py", "role": "function", "priority": "1", "uri": "api/packable.html#atdata.packable", "dispname": "atdata.packable"}, {"name": "atdata.PackableSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.as_wds", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.as_wds", "dispname": "atdata.PackableSample.as_wds"}, {"name": "atdata.PackableSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_bytes", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_bytes", "dispname": "atdata.PackableSample.from_bytes"}, {"name": "atdata.PackableSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_data", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_data", "dispname": "atdata.PackableSample.from_data"}, {"name": "atdata.PackableSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.packed", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.packed", "dispname": "atdata.PackableSample.packed"}, {"name": "atdata.PackableSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample", "dispname": "-"}, {"name": "atdata.dataset.PackableSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample", "dispname": "atdata.PackableSample"}, {"name": "atdata.DictSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.as_wds", "dispname": "-"}, {"name": "atdata.dataset.DictSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.as_wds", "dispname": "atdata.DictSample.as_wds"}, {"name": "atdata.DictSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_bytes", "dispname": "-"}, {"name": "atdata.dataset.DictSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_bytes", "dispname": "atdata.DictSample.from_bytes"}, {"name": "atdata.DictSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_data", "dispname": "-"}, {"name": "atdata.dataset.DictSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_data", "dispname": "atdata.DictSample.from_data"}, {"name": "atdata.DictSample.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.get", "dispname": "-"}, {"name": "atdata.dataset.DictSample.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.get", "dispname": "atdata.DictSample.get"}, {"name": "atdata.DictSample.items", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.items", "dispname": "-"}, {"name": "atdata.dataset.DictSample.items", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.items", "dispname": "atdata.DictSample.items"}, {"name": "atdata.DictSample.keys", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.keys", "dispname": "-"}, {"name": "atdata.dataset.DictSample.keys", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.keys", "dispname": "atdata.DictSample.keys"}, {"name": "atdata.DictSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.packed", "dispname": "-"}, {"name": "atdata.dataset.DictSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.packed", "dispname": "atdata.DictSample.packed"}, {"name": "atdata.DictSample.to_dict", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.to_dict", "dispname": "-"}, {"name": "atdata.dataset.DictSample.to_dict", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.to_dict", "dispname": "atdata.DictSample.to_dict"}, {"name": "atdata.DictSample.values", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.values", "dispname": "-"}, {"name": "atdata.dataset.DictSample.values", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.values", "dispname": "atdata.DictSample.values"}, {"name": "atdata.DictSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample", "dispname": "-"}, {"name": "atdata.dataset.DictSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample", "dispname": "atdata.DictSample"}, {"name": "atdata.Dataset.as_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.as_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.as_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.as_type", "dispname": "atdata.Dataset.as_type"}, {"name": "atdata.Dataset.batch_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.batch_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.batch_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.batch_type", "dispname": "atdata.Dataset.batch_type"}, {"name": "atdata.Dataset.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.list_shards", "dispname": "-"}, {"name": "atdata.dataset.Dataset.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.list_shards", "dispname": "atdata.Dataset.list_shards"}, {"name": "atdata.Dataset.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata", "dispname": "-"}, {"name": "atdata.dataset.Dataset.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata", "dispname": "atdata.Dataset.metadata"}, {"name": "atdata.Dataset.metadata_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata_url", "dispname": "-"}, {"name": "atdata.dataset.Dataset.metadata_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata_url", "dispname": "atdata.Dataset.metadata_url"}, {"name": "atdata.Dataset.ordered", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.ordered", "dispname": "-"}, {"name": "atdata.dataset.Dataset.ordered", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.ordered", "dispname": "atdata.Dataset.ordered"}, {"name": "atdata.Dataset.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.sample_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.sample_type", "dispname": "atdata.Dataset.sample_type"}, {"name": "atdata.Dataset.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shard_list", "dispname": "-"}, {"name": "atdata.dataset.Dataset.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shard_list", "dispname": "atdata.Dataset.shard_list"}, {"name": "atdata.Dataset.shuffled", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shuffled", "dispname": "-"}, {"name": "atdata.dataset.Dataset.shuffled", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shuffled", "dispname": "atdata.Dataset.shuffled"}, {"name": "atdata.Dataset.source", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.source", "dispname": "-"}, {"name": "atdata.dataset.Dataset.source", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.source", "dispname": "atdata.Dataset.source"}, {"name": "atdata.Dataset.to_parquet", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.to_parquet", "dispname": "-"}, {"name": "atdata.dataset.Dataset.to_parquet", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.to_parquet", "dispname": "atdata.Dataset.to_parquet"}, {"name": "atdata.Dataset.wrap", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap", "dispname": "-"}, {"name": "atdata.dataset.Dataset.wrap", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap", "dispname": "atdata.Dataset.wrap"}, {"name": "atdata.Dataset.wrap_batch", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap_batch", "dispname": "-"}, {"name": "atdata.dataset.Dataset.wrap_batch", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap_batch", "dispname": "atdata.Dataset.wrap_batch"}, {"name": "atdata.Dataset", "domain": "py", "role": "class", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset", "dispname": "-"}, {"name": "atdata.dataset.Dataset", "domain": "py", "role": "class", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset", "dispname": "atdata.Dataset"}, {"name": "atdata.SampleBatch.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch.sample_type", "dispname": "-"}, {"name": "atdata.dataset.SampleBatch.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch.sample_type", "dispname": "atdata.SampleBatch.sample_type"}, {"name": "atdata.SampleBatch", "domain": "py", "role": "class", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch", "dispname": "-"}, {"name": "atdata.dataset.SampleBatch", "domain": "py", "role": "class", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch", "dispname": "atdata.SampleBatch"}, {"name": "atdata.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.get", "dispname": "-"}, {"name": "atdata.lens.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.get", "dispname": "atdata.Lens.get"}, {"name": "atdata.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.put", "dispname": "-"}, {"name": "atdata.lens.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.put", "dispname": "atdata.Lens.put"}, {"name": "atdata.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.putter", "dispname": "-"}, {"name": "atdata.lens.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.putter", "dispname": "atdata.Lens.putter"}, {"name": "atdata.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/Lens.html#atdata.Lens", "dispname": "-"}, {"name": "atdata.lens.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/Lens.html#atdata.Lens", "dispname": "atdata.Lens"}, {"name": "atdata.lens.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.get", "dispname": "-"}, {"name": "atdata.lens.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.put", "dispname": "-"}, {"name": "atdata.lens.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.putter", "dispname": "-"}, {"name": "atdata.lens.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens", "dispname": "-"}, {"name": "atdata.lens.LensNetwork.register", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork.register", "dispname": "-"}, {"name": "atdata.lens.LensNetwork.transform", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork.transform", "dispname": "-"}, {"name": "atdata.lens.LensNetwork", "domain": "py", "role": "class", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork", "dispname": "-"}, {"name": "atdata.lens.lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.lens", "dispname": "-"}, {"name": "atdata.lens", "domain": "py", "role": "module", "priority": "1", "uri": "api/lens.html#atdata.lens", "dispname": "-"}, {"name": "atdata.load_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/load_dataset.html#atdata.load_dataset", "dispname": "-"}, {"name": "atdata._hf_api.load_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/load_dataset.html#atdata.load_dataset", "dispname": "atdata.load_dataset"}, {"name": "atdata.DatasetDict.num_shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.num_shards", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.num_shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.num_shards", "dispname": "atdata.DatasetDict.num_shards"}, {"name": "atdata.DatasetDict.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.sample_type", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.sample_type", "dispname": "atdata.DatasetDict.sample_type"}, {"name": "atdata.DatasetDict.streaming", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.streaming", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.streaming", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.streaming", "dispname": "atdata.DatasetDict.streaming"}, {"name": "atdata.DatasetDict", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict", "dispname": "atdata.DatasetDict"}, {"name": "atdata.Packable.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.as_wds", "dispname": "-"}, {"name": "atdata._protocols.Packable.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.as_wds", "dispname": "atdata.Packable.as_wds"}, {"name": "atdata.Packable.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_bytes", "dispname": "-"}, {"name": "atdata._protocols.Packable.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_bytes", "dispname": "atdata.Packable.from_bytes"}, {"name": "atdata.Packable.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_data", "dispname": "-"}, {"name": "atdata._protocols.Packable.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_data", "dispname": "atdata.Packable.from_data"}, {"name": "atdata.Packable.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.packed", "dispname": "-"}, {"name": "atdata._protocols.Packable.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.packed", "dispname": "atdata.Packable.packed"}, {"name": "atdata.Packable", "domain": "py", "role": "class", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable", "dispname": "-"}, {"name": "atdata._protocols.Packable", "domain": "py", "role": "class", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable", "dispname": "atdata.Packable"}, {"name": "atdata.IndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.data_urls", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.data_urls", "dispname": "atdata.IndexEntry.data_urls"}, {"name": "atdata.IndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.metadata", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.metadata", "dispname": "atdata.IndexEntry.metadata"}, {"name": "atdata.IndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.name", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.name", "dispname": "atdata.IndexEntry.name"}, {"name": "atdata.IndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.schema_ref", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.schema_ref", "dispname": "atdata.IndexEntry.schema_ref"}, {"name": "atdata.IndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry", "dispname": "atdata.IndexEntry"}, {"name": "atdata.AbstractIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.datasets", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.datasets", "dispname": "atdata.AbstractIndex.datasets"}, {"name": "atdata.AbstractIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.decode_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.decode_schema", "dispname": "atdata.AbstractIndex.decode_schema"}, {"name": "atdata.AbstractIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_dataset", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_dataset", "dispname": "atdata.AbstractIndex.get_dataset"}, {"name": "atdata.AbstractIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_schema", "dispname": "atdata.AbstractIndex.get_schema"}, {"name": "atdata.AbstractIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.insert_dataset", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.insert_dataset", "dispname": "atdata.AbstractIndex.insert_dataset"}, {"name": "atdata.AbstractIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_datasets", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_datasets", "dispname": "atdata.AbstractIndex.list_datasets"}, {"name": "atdata.AbstractIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_schemas", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_schemas", "dispname": "atdata.AbstractIndex.list_schemas"}, {"name": "atdata.AbstractIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.publish_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.publish_schema", "dispname": "atdata.AbstractIndex.publish_schema"}, {"name": "atdata.AbstractIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.schemas", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.schemas", "dispname": "atdata.AbstractIndex.schemas"}, {"name": "atdata.AbstractIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex", "dispname": "atdata.AbstractIndex"}, {"name": "atdata.AbstractDataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.read_url", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.read_url", "dispname": "atdata.AbstractDataStore.read_url"}, {"name": "atdata.AbstractDataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.supports_streaming", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.supports_streaming", "dispname": "atdata.AbstractDataStore.supports_streaming"}, {"name": "atdata.AbstractDataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.write_shards", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.write_shards", "dispname": "atdata.AbstractDataStore.write_shards"}, {"name": "atdata.AbstractDataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore", "dispname": "atdata.AbstractDataStore"}, {"name": "atdata.DataSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.list_shards", "dispname": "-"}, {"name": "atdata._protocols.DataSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.list_shards", "dispname": "atdata.DataSource.list_shards"}, {"name": "atdata.DataSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.open_shard", "dispname": "-"}, {"name": "atdata._protocols.DataSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.open_shard", "dispname": "atdata.DataSource.open_shard"}, {"name": "atdata.DataSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.shards", "dispname": "-"}, {"name": "atdata._protocols.DataSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.shards", "dispname": "atdata.DataSource.shards"}, {"name": "atdata.DataSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource", "dispname": "-"}, {"name": "atdata._protocols.DataSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource", "dispname": "atdata.DataSource"}, {"name": "atdata.URLSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.list_shards", "dispname": "-"}, {"name": "atdata._sources.URLSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.list_shards", "dispname": "atdata.URLSource.list_shards"}, {"name": "atdata.URLSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.open_shard", "dispname": "-"}, {"name": "atdata._sources.URLSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.open_shard", "dispname": "atdata.URLSource.open_shard"}, {"name": "atdata.URLSource.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shard_list", "dispname": "-"}, {"name": "atdata._sources.URLSource.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shard_list", "dispname": "atdata.URLSource.shard_list"}, {"name": "atdata.URLSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shards", "dispname": "-"}, {"name": "atdata._sources.URLSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shards", "dispname": "atdata.URLSource.shards"}, {"name": "atdata.URLSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource", "dispname": "-"}, {"name": "atdata._sources.URLSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource", "dispname": "atdata.URLSource"}, {"name": "atdata.S3Source.from_credentials", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_credentials", "dispname": "-"}, {"name": "atdata._sources.S3Source.from_credentials", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_credentials", "dispname": "atdata.S3Source.from_credentials"}, {"name": "atdata.S3Source.from_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_urls", "dispname": "-"}, {"name": "atdata._sources.S3Source.from_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_urls", "dispname": "atdata.S3Source.from_urls"}, {"name": "atdata.S3Source.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.list_shards", "dispname": "-"}, {"name": "atdata._sources.S3Source.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.list_shards", "dispname": "atdata.S3Source.list_shards"}, {"name": "atdata.S3Source.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.open_shard", "dispname": "-"}, {"name": "atdata._sources.S3Source.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.open_shard", "dispname": "atdata.S3Source.open_shard"}, {"name": "atdata.S3Source.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shard_list", "dispname": "-"}, {"name": "atdata._sources.S3Source.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shard_list", "dispname": "atdata.S3Source.shard_list"}, {"name": "atdata.S3Source.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shards", "dispname": "-"}, {"name": "atdata._sources.S3Source.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shards", "dispname": "atdata.S3Source.shards"}, {"name": "atdata.S3Source", "domain": "py", "role": "class", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source", "dispname": "-"}, {"name": "atdata._sources.S3Source", "domain": "py", "role": "class", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source", "dispname": "atdata.S3Source"}, {"name": "atdata.local.Index.add_entry", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.add_entry", "dispname": "-"}, {"name": "atdata.local.Index.all_entries", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.all_entries", "dispname": "-"}, {"name": "atdata.local.Index.clear_stubs", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.clear_stubs", "dispname": "-"}, {"name": "atdata.local.Index.data_store", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.data_store", "dispname": "-"}, {"name": "atdata.local.Index.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.datasets", "dispname": "-"}, {"name": "atdata.local.Index.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.decode_schema", "dispname": "-"}, {"name": "atdata.local.Index.decode_schema_as", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.decode_schema_as", "dispname": "-"}, {"name": "atdata.local.Index.entries", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.entries", "dispname": "-"}, {"name": "atdata.local.Index.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_dataset", "dispname": "-"}, {"name": "atdata.local.Index.get_entry", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_entry", "dispname": "-"}, {"name": "atdata.local.Index.get_entry_by_name", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_entry_by_name", "dispname": "-"}, {"name": "atdata.local.Index.get_import_path", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_import_path", "dispname": "-"}, {"name": "atdata.local.Index.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_schema", "dispname": "-"}, {"name": "atdata.local.Index.get_schema_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_schema_record", "dispname": "-"}, {"name": "atdata.local.Index.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.insert_dataset", "dispname": "-"}, {"name": "atdata.local.Index.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_datasets", "dispname": "-"}, {"name": "atdata.local.Index.list_entries", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_entries", "dispname": "-"}, {"name": "atdata.local.Index.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_schemas", "dispname": "-"}, {"name": "atdata.local.Index.load_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.load_schema", "dispname": "-"}, {"name": "atdata.local.Index.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.publish_schema", "dispname": "-"}, {"name": "atdata.local.Index.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.schemas", "dispname": "-"}, {"name": "atdata.local.Index.stub_dir", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.stub_dir", "dispname": "-"}, {"name": "atdata.local.Index.types", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.types", "dispname": "-"}, {"name": "atdata.local.Index", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.cid", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.cid", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.data_urls", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.from_redis", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.from_redis", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.metadata", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.name", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.sample_kind", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.sample_kind", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.schema_ref", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.wds_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.wds_url", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.write_to", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.write_to", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry", "dispname": "-"}, {"name": "atdata.local.S3DataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.read_url", "dispname": "-"}, {"name": "atdata.local.S3DataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.supports_streaming", "dispname": "-"}, {"name": "atdata.local.S3DataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.write_shards", "dispname": "-"}, {"name": "atdata.local.S3DataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereClient.create_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.create_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.create_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.create_record", "dispname": "atdata.atmosphere.AtmosphereClient.create_record"}, {"name": "atdata.atmosphere.AtmosphereClient.delete_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.delete_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.delete_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.delete_record", "dispname": "atdata.atmosphere.AtmosphereClient.delete_record"}, {"name": "atdata.atmosphere.AtmosphereClient.did", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.did", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.did", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.did", "dispname": "atdata.atmosphere.AtmosphereClient.did"}, {"name": "atdata.atmosphere.AtmosphereClient.export_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.export_session", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.export_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.export_session", "dispname": "atdata.atmosphere.AtmosphereClient.export_session"}, {"name": "atdata.atmosphere.AtmosphereClient.get_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob", "dispname": "atdata.atmosphere.AtmosphereClient.get_blob"}, {"name": "atdata.atmosphere.AtmosphereClient.get_blob_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob_url", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_blob_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob_url", "dispname": "atdata.atmosphere.AtmosphereClient.get_blob_url"}, {"name": "atdata.atmosphere.AtmosphereClient.get_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_record", "dispname": "atdata.atmosphere.AtmosphereClient.get_record"}, {"name": "atdata.atmosphere.AtmosphereClient.handle", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.handle", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.handle", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.handle", "dispname": "atdata.atmosphere.AtmosphereClient.handle"}, {"name": "atdata.atmosphere.AtmosphereClient.is_authenticated", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.is_authenticated", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.is_authenticated", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.is_authenticated", "dispname": "atdata.atmosphere.AtmosphereClient.is_authenticated"}, {"name": "atdata.atmosphere.AtmosphereClient.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_datasets", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_datasets", "dispname": "atdata.atmosphere.AtmosphereClient.list_datasets"}, {"name": "atdata.atmosphere.AtmosphereClient.list_lenses", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_lenses", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_lenses", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_lenses", "dispname": "atdata.atmosphere.AtmosphereClient.list_lenses"}, {"name": "atdata.atmosphere.AtmosphereClient.list_records", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_records", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_records", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_records", "dispname": "atdata.atmosphere.AtmosphereClient.list_records"}, {"name": "atdata.atmosphere.AtmosphereClient.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_schemas", "dispname": "atdata.atmosphere.AtmosphereClient.list_schemas"}, {"name": "atdata.atmosphere.AtmosphereClient.login", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.login", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login", "dispname": "atdata.atmosphere.AtmosphereClient.login"}, {"name": "atdata.atmosphere.AtmosphereClient.login_with_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login_with_session", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.login_with_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login_with_session", "dispname": "atdata.atmosphere.AtmosphereClient.login_with_session"}, {"name": "atdata.atmosphere.AtmosphereClient.put_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.put_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.put_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.put_record", "dispname": "atdata.atmosphere.AtmosphereClient.put_record"}, {"name": "atdata.atmosphere.AtmosphereClient.upload_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.upload_blob", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.upload_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.upload_blob", "dispname": "atdata.atmosphere.AtmosphereClient.upload_blob"}, {"name": "atdata.atmosphere.AtmosphereClient", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient", "dispname": "atdata.atmosphere.AtmosphereClient"}, {"name": "atdata.atmosphere.AtmosphereIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.datasets", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.decode_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.get_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.get_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.insert_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.list_datasets", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.list_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.publish_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.schemas", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.data_urls", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.metadata", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.name", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.schema_ref", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.uri", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.uri", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry", "dispname": "-"}, {"name": "atdata.atmosphere.SchemaPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher.publish", "dispname": "atdata.atmosphere.SchemaPublisher.publish"}, {"name": "atdata.atmosphere.SchemaPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher", "dispname": "atdata.atmosphere.SchemaPublisher"}, {"name": "atdata.atmosphere.SchemaLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.get", "dispname": "atdata.atmosphere.SchemaLoader.get"}, {"name": "atdata.atmosphere.SchemaLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.list_all", "dispname": "atdata.atmosphere.SchemaLoader.list_all"}, {"name": "atdata.atmosphere.SchemaLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader", "dispname": "atdata.atmosphere.SchemaLoader"}, {"name": "atdata.atmosphere.DatasetPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish", "dispname": "atdata.atmosphere.DatasetPublisher.publish"}, {"name": "atdata.atmosphere.DatasetPublisher.publish_with_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_blobs", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish_with_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_blobs", "dispname": "atdata.atmosphere.DatasetPublisher.publish_with_blobs"}, {"name": "atdata.atmosphere.DatasetPublisher.publish_with_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish_with_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_urls", "dispname": "atdata.atmosphere.DatasetPublisher.publish_with_urls"}, {"name": "atdata.atmosphere.DatasetPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher", "dispname": "atdata.atmosphere.DatasetPublisher"}, {"name": "atdata.atmosphere.DatasetLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get", "dispname": "atdata.atmosphere.DatasetLoader.get"}, {"name": "atdata.atmosphere.DatasetLoader.get_blob_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blob_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_blob_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blob_urls", "dispname": "atdata.atmosphere.DatasetLoader.get_blob_urls"}, {"name": "atdata.atmosphere.DatasetLoader.get_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blobs", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blobs", "dispname": "atdata.atmosphere.DatasetLoader.get_blobs"}, {"name": "atdata.atmosphere.DatasetLoader.get_metadata", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_metadata", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_metadata", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_metadata", "dispname": "atdata.atmosphere.DatasetLoader.get_metadata"}, {"name": "atdata.atmosphere.DatasetLoader.get_storage_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_storage_type", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_storage_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_storage_type", "dispname": "atdata.atmosphere.DatasetLoader.get_storage_type"}, {"name": "atdata.atmosphere.DatasetLoader.get_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_urls", "dispname": "atdata.atmosphere.DatasetLoader.get_urls"}, {"name": "atdata.atmosphere.DatasetLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.list_all", "dispname": "atdata.atmosphere.DatasetLoader.list_all"}, {"name": "atdata.atmosphere.DatasetLoader.to_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.to_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.to_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.to_dataset", "dispname": "atdata.atmosphere.DatasetLoader.to_dataset"}, {"name": "atdata.atmosphere.DatasetLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader", "dispname": "atdata.atmosphere.DatasetLoader"}, {"name": "atdata.atmosphere.LensPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish", "dispname": "atdata.atmosphere.LensPublisher.publish"}, {"name": "atdata.atmosphere.LensPublisher.publish_from_lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish_from_lens", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher.publish_from_lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish_from_lens", "dispname": "atdata.atmosphere.LensPublisher.publish_from_lens"}, {"name": "atdata.atmosphere.LensPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher", "dispname": "atdata.atmosphere.LensPublisher"}, {"name": "atdata.atmosphere.LensLoader.find_by_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.find_by_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.find_by_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.find_by_schemas", "dispname": "atdata.atmosphere.LensLoader.find_by_schemas"}, {"name": "atdata.atmosphere.LensLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.get", "dispname": "atdata.atmosphere.LensLoader.get"}, {"name": "atdata.atmosphere.LensLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.list_all", "dispname": "atdata.atmosphere.LensLoader.list_all"}, {"name": "atdata.atmosphere.LensLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader", "dispname": "atdata.atmosphere.LensLoader"}, {"name": "atdata.atmosphere.AtUri.authority", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.authority", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.authority", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.authority", "dispname": "atdata.atmosphere.AtUri.authority"}, {"name": "atdata.atmosphere.AtUri.collection", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.collection", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.collection", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.collection", "dispname": "atdata.atmosphere.AtUri.collection"}, {"name": "atdata.atmosphere.AtUri.parse", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.parse", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.parse", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.parse", "dispname": "atdata.atmosphere.AtUri.parse"}, {"name": "atdata.atmosphere.AtUri.rkey", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.rkey", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.rkey", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.rkey", "dispname": "atdata.atmosphere.AtUri.rkey"}, {"name": "atdata.atmosphere.AtUri", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri", "dispname": "atdata.atmosphere.AtUri"}, {"name": "atdata.promote.promote_to_atmosphere", "domain": "py", "role": "function", "priority": "1", "uri": "api/promote_to_atmosphere.html#atdata.promote.promote_to_atmosphere", "dispname": "-"}]} 1 + {"project": "atdata", "version": "0.0.9999", "count": 347, "items": [{"name": "atdata.packable", "domain": "py", "role": "function", "priority": "1", "uri": "api/packable.html#atdata.packable", "dispname": "-"}, {"name": "atdata.dataset.packable", "domain": "py", "role": "function", "priority": "1", "uri": "api/packable.html#atdata.packable", "dispname": "atdata.packable"}, {"name": "atdata.PackableSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.as_wds", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.as_wds", "dispname": "atdata.PackableSample.as_wds"}, {"name": "atdata.PackableSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_bytes", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_bytes", "dispname": "atdata.PackableSample.from_bytes"}, {"name": "atdata.PackableSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_data", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.from_data", "dispname": "atdata.PackableSample.from_data"}, {"name": "atdata.PackableSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.packed", "dispname": "-"}, {"name": "atdata.dataset.PackableSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample.packed", "dispname": "atdata.PackableSample.packed"}, {"name": "atdata.PackableSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample", "dispname": "-"}, {"name": "atdata.dataset.PackableSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/PackableSample.html#atdata.PackableSample", "dispname": "atdata.PackableSample"}, {"name": "atdata.DictSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.as_wds", "dispname": "-"}, {"name": "atdata.dataset.DictSample.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.as_wds", "dispname": "atdata.DictSample.as_wds"}, {"name": "atdata.DictSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_bytes", "dispname": "-"}, {"name": "atdata.dataset.DictSample.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_bytes", "dispname": "atdata.DictSample.from_bytes"}, {"name": "atdata.DictSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_data", "dispname": "-"}, {"name": "atdata.dataset.DictSample.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.from_data", "dispname": "atdata.DictSample.from_data"}, {"name": "atdata.DictSample.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.get", "dispname": "-"}, {"name": "atdata.dataset.DictSample.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.get", "dispname": "atdata.DictSample.get"}, {"name": "atdata.DictSample.items", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.items", "dispname": "-"}, {"name": "atdata.dataset.DictSample.items", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.items", "dispname": "atdata.DictSample.items"}, {"name": "atdata.DictSample.keys", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.keys", "dispname": "-"}, {"name": "atdata.dataset.DictSample.keys", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.keys", "dispname": "atdata.DictSample.keys"}, {"name": "atdata.DictSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.packed", "dispname": "-"}, {"name": "atdata.dataset.DictSample.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.packed", "dispname": "atdata.DictSample.packed"}, {"name": "atdata.DictSample.to_dict", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.to_dict", "dispname": "-"}, {"name": "atdata.dataset.DictSample.to_dict", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.to_dict", "dispname": "atdata.DictSample.to_dict"}, {"name": "atdata.DictSample.values", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.values", "dispname": "-"}, {"name": "atdata.dataset.DictSample.values", "domain": "py", "role": "function", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample.values", "dispname": "atdata.DictSample.values"}, {"name": "atdata.DictSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample", "dispname": "-"}, {"name": "atdata.dataset.DictSample", "domain": "py", "role": "class", "priority": "1", "uri": "api/DictSample.html#atdata.DictSample", "dispname": "atdata.DictSample"}, {"name": "atdata.Dataset.as_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.as_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.as_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.as_type", "dispname": "atdata.Dataset.as_type"}, {"name": "atdata.Dataset.batch_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.batch_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.batch_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.batch_type", "dispname": "atdata.Dataset.batch_type"}, {"name": "atdata.Dataset.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.list_shards", "dispname": "-"}, {"name": "atdata.dataset.Dataset.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.list_shards", "dispname": "atdata.Dataset.list_shards"}, {"name": "atdata.Dataset.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata", "dispname": "-"}, {"name": "atdata.dataset.Dataset.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata", "dispname": "atdata.Dataset.metadata"}, {"name": "atdata.Dataset.metadata_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata_url", "dispname": "-"}, {"name": "atdata.dataset.Dataset.metadata_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.metadata_url", "dispname": "atdata.Dataset.metadata_url"}, {"name": "atdata.Dataset.ordered", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.ordered", "dispname": "-"}, {"name": "atdata.dataset.Dataset.ordered", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.ordered", "dispname": "atdata.Dataset.ordered"}, {"name": "atdata.Dataset.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.sample_type", "dispname": "-"}, {"name": "atdata.dataset.Dataset.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.sample_type", "dispname": "atdata.Dataset.sample_type"}, {"name": "atdata.Dataset.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shard_list", "dispname": "-"}, {"name": "atdata.dataset.Dataset.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shard_list", "dispname": "atdata.Dataset.shard_list"}, {"name": "atdata.Dataset.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shards", "dispname": "-"}, {"name": "atdata.dataset.Dataset.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shards", "dispname": "atdata.Dataset.shards"}, {"name": "atdata.Dataset.shuffled", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shuffled", "dispname": "-"}, {"name": "atdata.dataset.Dataset.shuffled", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.shuffled", "dispname": "atdata.Dataset.shuffled"}, {"name": "atdata.Dataset.source", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.source", "dispname": "-"}, {"name": "atdata.dataset.Dataset.source", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.source", "dispname": "atdata.Dataset.source"}, {"name": "atdata.Dataset.to_parquet", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.to_parquet", "dispname": "-"}, {"name": "atdata.dataset.Dataset.to_parquet", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.to_parquet", "dispname": "atdata.Dataset.to_parquet"}, {"name": "atdata.Dataset.wrap", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap", "dispname": "-"}, {"name": "atdata.dataset.Dataset.wrap", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap", "dispname": "atdata.Dataset.wrap"}, {"name": "atdata.Dataset.wrap_batch", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap_batch", "dispname": "-"}, {"name": "atdata.dataset.Dataset.wrap_batch", "domain": "py", "role": "function", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset.wrap_batch", "dispname": "atdata.Dataset.wrap_batch"}, {"name": "atdata.Dataset", "domain": "py", "role": "class", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset", "dispname": "-"}, {"name": "atdata.dataset.Dataset", "domain": "py", "role": "class", "priority": "1", "uri": "api/Dataset.html#atdata.Dataset", "dispname": "atdata.Dataset"}, {"name": "atdata.SampleBatch.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch.sample_type", "dispname": "-"}, {"name": "atdata.dataset.SampleBatch.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch.sample_type", "dispname": "atdata.SampleBatch.sample_type"}, {"name": "atdata.SampleBatch", "domain": "py", "role": "class", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch", "dispname": "-"}, {"name": "atdata.dataset.SampleBatch", "domain": "py", "role": "class", "priority": "1", "uri": "api/SampleBatch.html#atdata.SampleBatch", "dispname": "atdata.SampleBatch"}, {"name": "atdata.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.get", "dispname": "-"}, {"name": "atdata.lens.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.get", "dispname": "atdata.Lens.get"}, {"name": "atdata.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.put", "dispname": "-"}, {"name": "atdata.lens.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.put", "dispname": "atdata.Lens.put"}, {"name": "atdata.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.putter", "dispname": "-"}, {"name": "atdata.lens.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/Lens.html#atdata.Lens.putter", "dispname": "atdata.Lens.putter"}, {"name": "atdata.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/Lens.html#atdata.Lens", "dispname": "-"}, {"name": "atdata.lens.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/Lens.html#atdata.Lens", "dispname": "atdata.Lens"}, {"name": "atdata.lens.Lens.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.get", "dispname": "-"}, {"name": "atdata.lens.Lens.put", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.put", "dispname": "-"}, {"name": "atdata.lens.Lens.putter", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens.putter", "dispname": "-"}, {"name": "atdata.lens.Lens", "domain": "py", "role": "class", "priority": "1", "uri": "api/lens.html#atdata.lens.Lens", "dispname": "-"}, {"name": "atdata.lens.LensNetwork.register", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork.register", "dispname": "-"}, {"name": "atdata.lens.LensNetwork.transform", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork.transform", "dispname": "-"}, {"name": "atdata.lens.LensNetwork", "domain": "py", "role": "class", "priority": "1", "uri": "api/lens.html#atdata.lens.LensNetwork", "dispname": "-"}, {"name": "atdata.lens.lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/lens.html#atdata.lens.lens", "dispname": "-"}, {"name": "atdata.lens", "domain": "py", "role": "module", "priority": "1", "uri": "api/lens.html#atdata.lens", "dispname": "-"}, {"name": "atdata.load_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/load_dataset.html#atdata.load_dataset", "dispname": "-"}, {"name": "atdata._hf_api.load_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/load_dataset.html#atdata.load_dataset", "dispname": "atdata.load_dataset"}, {"name": "atdata.DatasetDict.num_shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.num_shards", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.num_shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.num_shards", "dispname": "atdata.DatasetDict.num_shards"}, {"name": "atdata.DatasetDict.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.sample_type", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.sample_type", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.sample_type", "dispname": "atdata.DatasetDict.sample_type"}, {"name": "atdata.DatasetDict.streaming", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.streaming", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict.streaming", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict.streaming", "dispname": "atdata.DatasetDict.streaming"}, {"name": "atdata.DatasetDict", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict", "dispname": "-"}, {"name": "atdata._hf_api.DatasetDict", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetDict.html#atdata.DatasetDict", "dispname": "atdata.DatasetDict"}, {"name": "atdata.Packable.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.as_wds", "dispname": "-"}, {"name": "atdata._protocols.Packable.as_wds", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.as_wds", "dispname": "atdata.Packable.as_wds"}, {"name": "atdata.Packable.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_bytes", "dispname": "-"}, {"name": "atdata._protocols.Packable.from_bytes", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_bytes", "dispname": "atdata.Packable.from_bytes"}, {"name": "atdata.Packable.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_data", "dispname": "-"}, {"name": "atdata._protocols.Packable.from_data", "domain": "py", "role": "function", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.from_data", "dispname": "atdata.Packable.from_data"}, {"name": "atdata.Packable.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.packed", "dispname": "-"}, {"name": "atdata._protocols.Packable.packed", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable.packed", "dispname": "atdata.Packable.packed"}, {"name": "atdata.Packable", "domain": "py", "role": "class", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable", "dispname": "-"}, {"name": "atdata._protocols.Packable", "domain": "py", "role": "class", "priority": "1", "uri": "api/Packable-protocol.html#atdata.Packable", "dispname": "atdata.Packable"}, {"name": "atdata.IndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.data_urls", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.data_urls", "dispname": "atdata.IndexEntry.data_urls"}, {"name": "atdata.IndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.metadata", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.metadata", "dispname": "atdata.IndexEntry.metadata"}, {"name": "atdata.IndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.name", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.name", "dispname": "atdata.IndexEntry.name"}, {"name": "atdata.IndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.schema_ref", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry.schema_ref", "dispname": "atdata.IndexEntry.schema_ref"}, {"name": "atdata.IndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry", "dispname": "-"}, {"name": "atdata._protocols.IndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/IndexEntry.html#atdata.IndexEntry", "dispname": "atdata.IndexEntry"}, {"name": "atdata.AbstractIndex.data_store", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.data_store", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.data_store", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.data_store", "dispname": "atdata.AbstractIndex.data_store"}, {"name": "atdata.AbstractIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.datasets", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.datasets", "dispname": "atdata.AbstractIndex.datasets"}, {"name": "atdata.AbstractIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.decode_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.decode_schema", "dispname": "atdata.AbstractIndex.decode_schema"}, {"name": "atdata.AbstractIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_dataset", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_dataset", "dispname": "atdata.AbstractIndex.get_dataset"}, {"name": "atdata.AbstractIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.get_schema", "dispname": "atdata.AbstractIndex.get_schema"}, {"name": "atdata.AbstractIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.insert_dataset", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.insert_dataset", "dispname": "atdata.AbstractIndex.insert_dataset"}, {"name": "atdata.AbstractIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_datasets", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_datasets", "dispname": "atdata.AbstractIndex.list_datasets"}, {"name": "atdata.AbstractIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_schemas", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.list_schemas", "dispname": "atdata.AbstractIndex.list_schemas"}, {"name": "atdata.AbstractIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.publish_schema", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.publish_schema", "dispname": "atdata.AbstractIndex.publish_schema"}, {"name": "atdata.AbstractIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.schemas", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex.schemas", "dispname": "atdata.AbstractIndex.schemas"}, {"name": "atdata.AbstractIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex", "dispname": "-"}, {"name": "atdata._protocols.AbstractIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractIndex.html#atdata.AbstractIndex", "dispname": "atdata.AbstractIndex"}, {"name": "atdata.AbstractDataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.read_url", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.read_url", "dispname": "atdata.AbstractDataStore.read_url"}, {"name": "atdata.AbstractDataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.supports_streaming", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.supports_streaming", "dispname": "atdata.AbstractDataStore.supports_streaming"}, {"name": "atdata.AbstractDataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.write_shards", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore.write_shards", "dispname": "atdata.AbstractDataStore.write_shards"}, {"name": "atdata.AbstractDataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore", "dispname": "-"}, {"name": "atdata._protocols.AbstractDataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/AbstractDataStore.html#atdata.AbstractDataStore", "dispname": "atdata.AbstractDataStore"}, {"name": "atdata.DataSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.list_shards", "dispname": "-"}, {"name": "atdata._protocols.DataSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.list_shards", "dispname": "atdata.DataSource.list_shards"}, {"name": "atdata.DataSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.open_shard", "dispname": "-"}, {"name": "atdata._protocols.DataSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.open_shard", "dispname": "atdata.DataSource.open_shard"}, {"name": "atdata.DataSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.shards", "dispname": "-"}, {"name": "atdata._protocols.DataSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource.shards", "dispname": "atdata.DataSource.shards"}, {"name": "atdata.DataSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource", "dispname": "-"}, {"name": "atdata._protocols.DataSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/DataSource.html#atdata.DataSource", "dispname": "atdata.DataSource"}, {"name": "atdata.URLSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.list_shards", "dispname": "-"}, {"name": "atdata._sources.URLSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.list_shards", "dispname": "atdata.URLSource.list_shards"}, {"name": "atdata.URLSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.open_shard", "dispname": "-"}, {"name": "atdata._sources.URLSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.open_shard", "dispname": "atdata.URLSource.open_shard"}, {"name": "atdata.URLSource.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shard_list", "dispname": "-"}, {"name": "atdata._sources.URLSource.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shard_list", "dispname": "atdata.URLSource.shard_list"}, {"name": "atdata.URLSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shards", "dispname": "-"}, {"name": "atdata._sources.URLSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource.shards", "dispname": "atdata.URLSource.shards"}, {"name": "atdata.URLSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource", "dispname": "-"}, {"name": "atdata._sources.URLSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/URLSource.html#atdata.URLSource", "dispname": "atdata.URLSource"}, {"name": "atdata.S3Source.from_credentials", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_credentials", "dispname": "-"}, {"name": "atdata._sources.S3Source.from_credentials", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_credentials", "dispname": "atdata.S3Source.from_credentials"}, {"name": "atdata.S3Source.from_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_urls", "dispname": "-"}, {"name": "atdata._sources.S3Source.from_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.from_urls", "dispname": "atdata.S3Source.from_urls"}, {"name": "atdata.S3Source.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.list_shards", "dispname": "-"}, {"name": "atdata._sources.S3Source.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.list_shards", "dispname": "atdata.S3Source.list_shards"}, {"name": "atdata.S3Source.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.open_shard", "dispname": "-"}, {"name": "atdata._sources.S3Source.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.open_shard", "dispname": "atdata.S3Source.open_shard"}, {"name": "atdata.S3Source.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shard_list", "dispname": "-"}, {"name": "atdata._sources.S3Source.shard_list", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shard_list", "dispname": "atdata.S3Source.shard_list"}, {"name": "atdata.S3Source.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shards", "dispname": "-"}, {"name": "atdata._sources.S3Source.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source.shards", "dispname": "atdata.S3Source.shards"}, {"name": "atdata.S3Source", "domain": "py", "role": "class", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source", "dispname": "-"}, {"name": "atdata._sources.S3Source", "domain": "py", "role": "class", "priority": "1", "uri": "api/S3Source.html#atdata.S3Source", "dispname": "atdata.S3Source"}, {"name": "atdata.BlobSource.from_refs", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.from_refs", "dispname": "-"}, {"name": "atdata._sources.BlobSource.from_refs", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.from_refs", "dispname": "atdata.BlobSource.from_refs"}, {"name": "atdata.BlobSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.list_shards", "dispname": "-"}, {"name": "atdata._sources.BlobSource.list_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.list_shards", "dispname": "atdata.BlobSource.list_shards"}, {"name": "atdata.BlobSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.open_shard", "dispname": "-"}, {"name": "atdata._sources.BlobSource.open_shard", "domain": "py", "role": "function", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.open_shard", "dispname": "atdata.BlobSource.open_shard"}, {"name": "atdata.BlobSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.shards", "dispname": "-"}, {"name": "atdata._sources.BlobSource.shards", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource.shards", "dispname": "atdata.BlobSource.shards"}, {"name": "atdata.BlobSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource", "dispname": "-"}, {"name": "atdata._sources.BlobSource", "domain": "py", "role": "class", "priority": "1", "uri": "api/BlobSource.html#atdata.BlobSource", "dispname": "atdata.BlobSource"}, {"name": "atdata.local.Index.add_entry", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.add_entry", "dispname": "-"}, {"name": "atdata.local.Index.all_entries", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.all_entries", "dispname": "-"}, {"name": "atdata.local.Index.clear_stubs", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.clear_stubs", "dispname": "-"}, {"name": "atdata.local.Index.data_store", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.data_store", "dispname": "-"}, {"name": "atdata.local.Index.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.datasets", "dispname": "-"}, {"name": "atdata.local.Index.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.decode_schema", "dispname": "-"}, {"name": "atdata.local.Index.decode_schema_as", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.decode_schema_as", "dispname": "-"}, {"name": "atdata.local.Index.entries", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.entries", "dispname": "-"}, {"name": "atdata.local.Index.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_dataset", "dispname": "-"}, {"name": "atdata.local.Index.get_entry", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_entry", "dispname": "-"}, {"name": "atdata.local.Index.get_entry_by_name", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_entry_by_name", "dispname": "-"}, {"name": "atdata.local.Index.get_import_path", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_import_path", "dispname": "-"}, {"name": "atdata.local.Index.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_schema", "dispname": "-"}, {"name": "atdata.local.Index.get_schema_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.get_schema_record", "dispname": "-"}, {"name": "atdata.local.Index.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.insert_dataset", "dispname": "-"}, {"name": "atdata.local.Index.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_datasets", "dispname": "-"}, {"name": "atdata.local.Index.list_entries", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_entries", "dispname": "-"}, {"name": "atdata.local.Index.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.list_schemas", "dispname": "-"}, {"name": "atdata.local.Index.load_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.load_schema", "dispname": "-"}, {"name": "atdata.local.Index.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.publish_schema", "dispname": "-"}, {"name": "atdata.local.Index.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.schemas", "dispname": "-"}, {"name": "atdata.local.Index.stub_dir", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.stub_dir", "dispname": "-"}, {"name": "atdata.local.Index.types", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index.types", "dispname": "-"}, {"name": "atdata.local.Index", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.Index.html#atdata.local.Index", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.cid", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.cid", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.data_urls", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.from_redis", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.from_redis", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.metadata", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.name", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.sample_kind", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.sample_kind", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.schema_ref", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.wds_url", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.wds_url", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry.write_to", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry.write_to", "dispname": "-"}, {"name": "atdata.local.LocalDatasetEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.LocalDatasetEntry.html#atdata.local.LocalDatasetEntry", "dispname": "-"}, {"name": "atdata.local.S3DataStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.read_url", "dispname": "-"}, {"name": "atdata.local.S3DataStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.supports_streaming", "dispname": "-"}, {"name": "atdata.local.S3DataStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore.write_shards", "dispname": "-"}, {"name": "atdata.local.S3DataStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/local.S3DataStore.html#atdata.local.S3DataStore", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereClient.create_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.create_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.create_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.create_record", "dispname": "atdata.atmosphere.AtmosphereClient.create_record"}, {"name": "atdata.atmosphere.AtmosphereClient.delete_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.delete_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.delete_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.delete_record", "dispname": "atdata.atmosphere.AtmosphereClient.delete_record"}, {"name": "atdata.atmosphere.AtmosphereClient.did", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.did", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.did", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.did", "dispname": "atdata.atmosphere.AtmosphereClient.did"}, {"name": "atdata.atmosphere.AtmosphereClient.export_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.export_session", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.export_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.export_session", "dispname": "atdata.atmosphere.AtmosphereClient.export_session"}, {"name": "atdata.atmosphere.AtmosphereClient.get_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob", "dispname": "atdata.atmosphere.AtmosphereClient.get_blob"}, {"name": "atdata.atmosphere.AtmosphereClient.get_blob_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob_url", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_blob_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_blob_url", "dispname": "atdata.atmosphere.AtmosphereClient.get_blob_url"}, {"name": "atdata.atmosphere.AtmosphereClient.get_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.get_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.get_record", "dispname": "atdata.atmosphere.AtmosphereClient.get_record"}, {"name": "atdata.atmosphere.AtmosphereClient.handle", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.handle", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.handle", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.handle", "dispname": "atdata.atmosphere.AtmosphereClient.handle"}, {"name": "atdata.atmosphere.AtmosphereClient.is_authenticated", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.is_authenticated", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.is_authenticated", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.is_authenticated", "dispname": "atdata.atmosphere.AtmosphereClient.is_authenticated"}, {"name": "atdata.atmosphere.AtmosphereClient.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_datasets", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_datasets", "dispname": "atdata.atmosphere.AtmosphereClient.list_datasets"}, {"name": "atdata.atmosphere.AtmosphereClient.list_lenses", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_lenses", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_lenses", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_lenses", "dispname": "atdata.atmosphere.AtmosphereClient.list_lenses"}, {"name": "atdata.atmosphere.AtmosphereClient.list_records", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_records", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_records", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_records", "dispname": "atdata.atmosphere.AtmosphereClient.list_records"}, {"name": "atdata.atmosphere.AtmosphereClient.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.list_schemas", "dispname": "atdata.atmosphere.AtmosphereClient.list_schemas"}, {"name": "atdata.atmosphere.AtmosphereClient.login", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.login", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login", "dispname": "atdata.atmosphere.AtmosphereClient.login"}, {"name": "atdata.atmosphere.AtmosphereClient.login_with_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login_with_session", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.login_with_session", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.login_with_session", "dispname": "atdata.atmosphere.AtmosphereClient.login_with_session"}, {"name": "atdata.atmosphere.AtmosphereClient.put_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.put_record", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.put_record", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.put_record", "dispname": "atdata.atmosphere.AtmosphereClient.put_record"}, {"name": "atdata.atmosphere.AtmosphereClient.upload_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.upload_blob", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient.upload_blob", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient.upload_blob", "dispname": "atdata.atmosphere.AtmosphereClient.upload_blob"}, {"name": "atdata.atmosphere.AtmosphereClient", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient", "dispname": "-"}, {"name": "atdata.atmosphere.client.AtmosphereClient", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereClient.html#atdata.atmosphere.AtmosphereClient", "dispname": "atdata.atmosphere.AtmosphereClient"}, {"name": "atdata.atmosphere.AtmosphereIndex.data_store", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.data_store", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.datasets", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.datasets", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.decode_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.decode_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.get_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.get_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.get_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.get_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.insert_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.insert_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.list_datasets", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.list_datasets", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.list_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.list_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.publish_schema", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.publish_schema", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex.schemas", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex.schemas", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndex", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereIndex.html#atdata.atmosphere.AtmosphereIndex", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.data_urls", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.data_urls", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.metadata", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.metadata", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.name", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.name", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.schema_ref", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.schema_ref", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry.uri", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry.uri", "dispname": "-"}, {"name": "atdata.atmosphere.AtmosphereIndexEntry", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtmosphereIndexEntry.html#atdata.atmosphere.AtmosphereIndexEntry", "dispname": "-"}, {"name": "atdata.atmosphere.PDSBlobStore.create_source", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.create_source", "dispname": "-"}, {"name": "atdata.atmosphere.store.PDSBlobStore.create_source", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.create_source", "dispname": "atdata.atmosphere.PDSBlobStore.create_source"}, {"name": "atdata.atmosphere.PDSBlobStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.read_url", "dispname": "-"}, {"name": "atdata.atmosphere.store.PDSBlobStore.read_url", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.read_url", "dispname": "atdata.atmosphere.PDSBlobStore.read_url"}, {"name": "atdata.atmosphere.PDSBlobStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.supports_streaming", "dispname": "-"}, {"name": "atdata.atmosphere.store.PDSBlobStore.supports_streaming", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.supports_streaming", "dispname": "atdata.atmosphere.PDSBlobStore.supports_streaming"}, {"name": "atdata.atmosphere.PDSBlobStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.write_shards", "dispname": "-"}, {"name": "atdata.atmosphere.store.PDSBlobStore.write_shards", "domain": "py", "role": "function", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore.write_shards", "dispname": "atdata.atmosphere.PDSBlobStore.write_shards"}, {"name": "atdata.atmosphere.PDSBlobStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore", "dispname": "-"}, {"name": "atdata.atmosphere.store.PDSBlobStore", "domain": "py", "role": "class", "priority": "1", "uri": "api/PDSBlobStore.html#atdata.atmosphere.PDSBlobStore", "dispname": "atdata.atmosphere.PDSBlobStore"}, {"name": "atdata.atmosphere.SchemaPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher.publish", "dispname": "atdata.atmosphere.SchemaPublisher.publish"}, {"name": "atdata.atmosphere.SchemaPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaPublisher.html#atdata.atmosphere.SchemaPublisher", "dispname": "atdata.atmosphere.SchemaPublisher"}, {"name": "atdata.atmosphere.SchemaLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.get", "dispname": "atdata.atmosphere.SchemaLoader.get"}, {"name": "atdata.atmosphere.SchemaLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader.list_all", "dispname": "atdata.atmosphere.SchemaLoader.list_all"}, {"name": "atdata.atmosphere.SchemaLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader", "dispname": "-"}, {"name": "atdata.atmosphere.schema.SchemaLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/SchemaLoader.html#atdata.atmosphere.SchemaLoader", "dispname": "atdata.atmosphere.SchemaLoader"}, {"name": "atdata.atmosphere.DatasetPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish", "dispname": "atdata.atmosphere.DatasetPublisher.publish"}, {"name": "atdata.atmosphere.DatasetPublisher.publish_with_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_blobs", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish_with_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_blobs", "dispname": "atdata.atmosphere.DatasetPublisher.publish_with_blobs"}, {"name": "atdata.atmosphere.DatasetPublisher.publish_with_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher.publish_with_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher.publish_with_urls", "dispname": "atdata.atmosphere.DatasetPublisher.publish_with_urls"}, {"name": "atdata.atmosphere.DatasetPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetPublisher.html#atdata.atmosphere.DatasetPublisher", "dispname": "atdata.atmosphere.DatasetPublisher"}, {"name": "atdata.atmosphere.DatasetLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get", "dispname": "atdata.atmosphere.DatasetLoader.get"}, {"name": "atdata.atmosphere.DatasetLoader.get_blob_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blob_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_blob_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blob_urls", "dispname": "atdata.atmosphere.DatasetLoader.get_blob_urls"}, {"name": "atdata.atmosphere.DatasetLoader.get_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blobs", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_blobs", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_blobs", "dispname": "atdata.atmosphere.DatasetLoader.get_blobs"}, {"name": "atdata.atmosphere.DatasetLoader.get_metadata", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_metadata", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_metadata", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_metadata", "dispname": "atdata.atmosphere.DatasetLoader.get_metadata"}, {"name": "atdata.atmosphere.DatasetLoader.get_storage_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_storage_type", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_storage_type", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_storage_type", "dispname": "atdata.atmosphere.DatasetLoader.get_storage_type"}, {"name": "atdata.atmosphere.DatasetLoader.get_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_urls", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.get_urls", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.get_urls", "dispname": "atdata.atmosphere.DatasetLoader.get_urls"}, {"name": "atdata.atmosphere.DatasetLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.list_all", "dispname": "atdata.atmosphere.DatasetLoader.list_all"}, {"name": "atdata.atmosphere.DatasetLoader.to_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.to_dataset", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader.to_dataset", "domain": "py", "role": "function", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader.to_dataset", "dispname": "atdata.atmosphere.DatasetLoader.to_dataset"}, {"name": "atdata.atmosphere.DatasetLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader", "dispname": "-"}, {"name": "atdata.atmosphere.records.DatasetLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/DatasetLoader.html#atdata.atmosphere.DatasetLoader", "dispname": "atdata.atmosphere.DatasetLoader"}, {"name": "atdata.atmosphere.LensPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher.publish", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish", "dispname": "atdata.atmosphere.LensPublisher.publish"}, {"name": "atdata.atmosphere.LensPublisher.publish_from_lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish_from_lens", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher.publish_from_lens", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher.publish_from_lens", "dispname": "atdata.atmosphere.LensPublisher.publish_from_lens"}, {"name": "atdata.atmosphere.LensPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensPublisher", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensPublisher.html#atdata.atmosphere.LensPublisher", "dispname": "atdata.atmosphere.LensPublisher"}, {"name": "atdata.atmosphere.LensLoader.find_by_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.find_by_schemas", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.find_by_schemas", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.find_by_schemas", "dispname": "atdata.atmosphere.LensLoader.find_by_schemas"}, {"name": "atdata.atmosphere.LensLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.get", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.get", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.get", "dispname": "atdata.atmosphere.LensLoader.get"}, {"name": "atdata.atmosphere.LensLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.list_all", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader.list_all", "domain": "py", "role": "function", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader.list_all", "dispname": "atdata.atmosphere.LensLoader.list_all"}, {"name": "atdata.atmosphere.LensLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader", "dispname": "-"}, {"name": "atdata.atmosphere.lens.LensLoader", "domain": "py", "role": "class", "priority": "1", "uri": "api/LensLoader.html#atdata.atmosphere.LensLoader", "dispname": "atdata.atmosphere.LensLoader"}, {"name": "atdata.atmosphere.AtUri.authority", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.authority", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.authority", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.authority", "dispname": "atdata.atmosphere.AtUri.authority"}, {"name": "atdata.atmosphere.AtUri.collection", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.collection", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.collection", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.collection", "dispname": "atdata.atmosphere.AtUri.collection"}, {"name": "atdata.atmosphere.AtUri.parse", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.parse", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.parse", "domain": "py", "role": "function", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.parse", "dispname": "atdata.atmosphere.AtUri.parse"}, {"name": "atdata.atmosphere.AtUri.rkey", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.rkey", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri.rkey", "domain": "py", "role": "attribute", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri.rkey", "dispname": "atdata.atmosphere.AtUri.rkey"}, {"name": "atdata.atmosphere.AtUri", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri", "dispname": "-"}, {"name": "atdata.atmosphere._types.AtUri", "domain": "py", "role": "class", "priority": "1", "uri": "api/AtUri.html#atdata.atmosphere.AtUri", "dispname": "atdata.atmosphere.AtUri"}, {"name": "atdata.promote.promote_to_atmosphere", "domain": "py", "role": "function", "priority": "1", "uri": "api/promote_to_atmosphere.html#atdata.promote.promote_to_atmosphere", "dispname": "-"}]}
+134 -11
docs_src/reference/atmosphere.qmd
··· 65 65 client = AtmosphereClient(base_url="https://pds.example.com") 66 66 ``` 67 67 68 + ## PDSBlobStore 69 + 70 + Store dataset shards as ATProto blobs for fully decentralized storage: 71 + 72 + ```{python} 73 + #| eval: false 74 + from atdata.atmosphere import AtmosphereClient, PDSBlobStore 75 + 76 + client = AtmosphereClient() 77 + client.login("handle.bsky.social", "app-password") 78 + 79 + store = PDSBlobStore(client) 80 + 81 + # Write shards as blobs 82 + urls = store.write_shards(dataset, prefix="my-data/v1") 83 + # Returns: ['at://did:plc:.../blob/bafyrei...', ...] 84 + 85 + # Transform AT URIs to HTTP URLs for reading 86 + http_url = store.read_url(urls[0]) 87 + # Returns: 'https://pds.example.com/xrpc/com.atproto.sync.getBlob?...' 88 + 89 + # Create a BlobSource for streaming 90 + source = store.create_source(urls) 91 + ds = atdata.Dataset[MySample](source) 92 + ``` 93 + 94 + ### Size Limits 95 + 96 + PDS blobs typically have size limits (often 50MB-5GB depending on the PDS). Use `maxcount` and `maxsize` parameters to control shard sizes: 97 + 98 + ```{python} 99 + #| eval: false 100 + urls = store.write_shards( 101 + dataset, 102 + prefix="large-data/v1", 103 + maxcount=5000, # Max 5000 samples per shard 104 + maxsize=50e6, # Max 50MB per shard 105 + ) 106 + ``` 107 + 108 + ## BlobSource 109 + 110 + Read datasets stored as PDS blobs: 111 + 112 + ```{python} 113 + #| eval: false 114 + from atdata import BlobSource 115 + 116 + # From blob references 117 + source = BlobSource.from_refs([ 118 + {"did": "did:plc:abc123", "cid": "bafyrei111"}, 119 + {"did": "did:plc:abc123", "cid": "bafyrei222"}, 120 + ]) 121 + 122 + # Or from PDSBlobStore 123 + source = store.create_source(urls) 124 + 125 + # Use with Dataset 126 + ds = atdata.Dataset[MySample](source) 127 + for batch in ds.ordered(batch_size=32): 128 + process(batch) 129 + ``` 130 + 68 131 ## AtmosphereIndex 69 132 70 133 The unified interface for ATProto operations, implementing the AbstractIndex protocol: 71 134 72 135 ```{python} 73 136 #| eval: false 74 - from atdata.atmosphere import AtmosphereClient, AtmosphereIndex 137 + from atdata.atmosphere import AtmosphereClient, AtmosphereIndex, PDSBlobStore 75 138 76 139 client = AtmosphereClient() 77 140 client.login("handle.bsky.social", "app-password") 78 141 142 + # Without blob storage (use external URLs) 79 143 index = AtmosphereIndex(client) 144 + 145 + # With PDS blob storage (recommended for full decentralization) 146 + store = PDSBlobStore(client) 147 + index = AtmosphereIndex(client, data_store=store) 80 148 ``` 81 149 82 150 ### Publishing Schemas ··· 185 253 186 254 #### Blob Storage {#blob-storage} 187 255 188 - For smaller datasets (up to ~50MB per shard), you can store data directly in ATProto blobs instead of external URLs: 256 + There are two approaches to storing data as ATProto blobs: 257 + 258 + **Approach 1: PDSBlobStore (Recommended)** 259 + 260 + Use `PDSBlobStore` with `AtmosphereIndex` for automatic shard management: 261 + 262 + ```{python} 263 + #| eval: false 264 + from atdata.atmosphere import PDSBlobStore, AtmosphereIndex 265 + 266 + store = PDSBlobStore(client) 267 + index = AtmosphereIndex(client, data_store=store) 268 + 269 + # Dataset shards are automatically uploaded as blobs 270 + entry = index.insert_dataset( 271 + dataset, 272 + name="my-dataset", 273 + schema_ref=schema_uri, 274 + ) 275 + 276 + # Later: load using BlobSource 277 + source = store.create_source(entry.data_urls) 278 + ds = atdata.Dataset[MySample](source) 279 + ``` 280 + 281 + **Approach 2: Manual Blob Publishing** 282 + 283 + For more control, use `DatasetPublisher.publish_with_blobs()` directly: 189 284 190 285 ```{python} 191 286 #| eval: false ··· 208 303 ) 209 304 ``` 210 305 211 - To load datasets with blob storage: 306 + **Loading Blob-Stored Datasets** 212 307 213 308 ```{python} 214 309 #| eval: false 215 310 from atdata.atmosphere import DatasetLoader 311 + from atdata import BlobSource 216 312 217 313 loader = DatasetLoader(client) 218 314 ··· 220 316 storage_type = loader.get_storage_type(uri) # "external" or "blobs" 221 317 222 318 if storage_type == "blobs": 223 - # Get blob URLs for direct access 319 + # Get blob URLs and create BlobSource 224 320 blob_urls = loader.get_blob_urls(uri) 321 + # Parse to blob refs for BlobSource 322 + # Or use loader.to_dataset() which handles this automatically 225 323 226 324 # to_dataset() handles both storage types automatically 227 325 dataset = loader.to_dataset(uri, MySample) ··· 380 478 381 479 ## Complete Example 382 480 481 + This example shows the full workflow using `PDSBlobStore` for decentralized storage: 482 + 383 483 ```{python} 384 484 #| eval: false 385 485 import numpy as np 386 486 from numpy.typing import NDArray 387 487 import atdata 388 - from atdata.atmosphere import AtmosphereClient, AtmosphereIndex 488 + from atdata.atmosphere import AtmosphereClient, AtmosphereIndex, PDSBlobStore 389 489 import webdataset as wds 390 490 391 491 # 1. Define and create samples ··· 409 509 for i, s in enumerate(samples): 410 510 sink.write({**s.as_wds, "__key__": f"{i:06d}"}) 411 511 412 - # 3. Authenticate 512 + # 3. Authenticate and set up blob storage 413 513 client = AtmosphereClient() 414 514 client.login("myhandle.bsky.social", "app-password") 415 515 416 - index = AtmosphereIndex(client) 516 + store = PDSBlobStore(client) 517 + index = AtmosphereIndex(client, data_store=store) 417 518 418 519 # 4. Publish schema 419 520 schema_uri = index.publish_schema( ··· 422 523 description="Feature vectors with labels", 423 524 ) 424 525 425 - # 5. Publish dataset 526 + # 5. Publish dataset (shards uploaded as blobs) 426 527 dataset = atdata.Dataset[FeatureSample]("features.tar") 427 528 entry = index.insert_dataset( 428 529 dataset, ··· 432 533 ) 433 534 434 535 print(f"Published: {entry.uri}") 536 + print(f"Blob URLs: {entry.data_urls}") 435 537 436 - # 6. Later: discover and load 538 + # 6. Later: discover and load from blobs 437 539 for dataset_entry in index.list_datasets(): 438 540 print(f"Found: {dataset_entry.name}") 439 541 440 542 # Reconstruct type from schema 441 543 SampleType = index.decode_schema(dataset_entry.schema_ref) 442 544 443 - # Load dataset 444 - ds = atdata.Dataset[SampleType](dataset_entry.data_urls[0]) 545 + # Create source from blob URLs 546 + source = store.create_source(dataset_entry.data_urls) 547 + 548 + # Load dataset from blobs 549 + ds = atdata.Dataset[SampleType](source) 445 550 for batch in ds.ordered(batch_size=32): 446 551 print(batch.features.shape) 447 552 break 553 + ``` 554 + 555 + For external URL storage (without `PDSBlobStore`): 556 + 557 + ```{python} 558 + #| eval: false 559 + # Use AtmosphereIndex without data_store 560 + index = AtmosphereIndex(client) 561 + 562 + # Dataset URLs will be stored as-is (external references) 563 + entry = index.insert_dataset( 564 + dataset, 565 + name="external-features", 566 + schema_ref=schema_uri, 567 + ) 568 + 569 + # Load using standard URL source 570 + ds = atdata.Dataset[FeatureSample](entry.data_urls[0]) 448 571 ``` 449 572 450 573 ## Related
+83 -34
docs_src/tutorials/atmosphere.qmd
··· 23 23 import atdata 24 24 from atdata.atmosphere import ( 25 25 AtmosphereClient, 26 + AtmosphereIndex, 27 + PDSBlobStore, 26 28 SchemaPublisher, 27 29 SchemaLoader, 28 30 DatasetPublisher, 29 31 DatasetLoader, 30 32 AtUri, 31 33 ) 34 + from atdata import BlobSource 32 35 import webdataset as wds 33 36 ``` 34 37 ··· 157 160 print(f"Dataset URI: {dataset_uri}") 158 161 ``` 159 162 160 - ### With Blob Storage 163 + ### With PDS Blob Storage (Recommended) 161 164 162 - For smaller datasets, store data directly in ATProto blobs: 165 + For fully decentralized storage, use `PDSBlobStore` to store dataset shards directly as ATProto blobs in your PDS: 163 166 164 167 ```{python} 165 168 #| eval: false 166 - import io 169 + # Create store and index with blob storage 170 + store = PDSBlobStore(client) 171 + index = AtmosphereIndex(client, data_store=store) 167 172 173 + # Define sample type 168 174 @atdata.packable 169 - class DemoSample: 170 - id: int 171 - text: str 175 + class FeatureSample: 176 + features: NDArray 177 + label: int 172 178 173 - # Create samples 174 - samples = [ 175 - DemoSample(id=0, text="Hello from blob storage!"), 176 - DemoSample(id=1, text="ATProto is decentralized."), 177 - DemoSample(id=2, text="atdata makes ML data easy."), 178 - ] 179 + # Create dataset in memory or from existing tar 180 + samples = [FeatureSample(features=np.random.randn(64).astype(np.float32), label=i % 10) for i in range(100)] 181 + 182 + # Write to temporary tar 183 + with wds.writer.TarWriter("temp.tar") as sink: 184 + for i, s in enumerate(samples): 185 + sink.write({**s.as_wds, "__key__": f"{i:06d}"}) 186 + 187 + dataset = atdata.Dataset[FeatureSample]("temp.tar") 188 + 189 + # Publish - shards are uploaded as blobs automatically 190 + schema_uri = index.publish_schema(FeatureSample, version="1.0.0") 191 + entry = index.insert_dataset( 192 + dataset, 193 + name="blob-stored-features", 194 + schema_ref=schema_uri, 195 + description="Features stored as PDS blobs", 196 + ) 197 + 198 + print(f"Dataset URI: {entry.uri}") 199 + print(f"Blob URLs: {entry.data_urls}") # at://did/blob/cid format 200 + ``` 201 + 202 + ::: {.callout-tip} 203 + ## Reading Blob-Stored Datasets 204 + 205 + Use `BlobSource` to stream directly from PDS blobs: 206 + 207 + ```{python} 208 + #| eval: false 209 + # Create source from the blob URLs 210 + source = store.create_source(entry.data_urls) 211 + 212 + # Or manually from blob references 213 + source = BlobSource.from_refs([ 214 + {"did": client.did, "cid": "bafyrei..."}, 215 + ]) 179 216 180 - # Create tar in memory 181 - tar_buffer = io.BytesIO() 182 - with wds.writer.TarWriter(tar_buffer) as sink: 183 - for sample in samples: 184 - sink.write(sample.as_wds) 217 + # Load and iterate 218 + ds = atdata.Dataset[FeatureSample](source) 219 + for batch in ds.ordered(batch_size=32): 220 + print(batch.features.shape) 221 + ``` 222 + ::: 185 223 186 - tar_data = tar_buffer.getvalue() 187 - print(f"Created tar with {len(samples)} samples ({len(tar_data):,} bytes)") 224 + ### With External URLs 188 225 189 - # Publish schema 190 - blob_schema_uri = schema_publisher.publish(DemoSample, version="1.0.0") 226 + For larger datasets or when using existing object storage: 191 227 192 - # Publish with blob storage 193 - blob_dataset_uri = dataset_publisher.publish_with_blobs( 194 - blobs=[tar_data], 195 - schema_uri=str(blob_schema_uri), 196 - name="Blob Storage Demo Dataset", 197 - description="Small dataset stored directly in ATProto blobs", 198 - tags=["demo", "blob-storage"], 228 + ```{python} 229 + #| eval: false 230 + dataset_publisher = DatasetPublisher(client) 231 + dataset_uri = dataset_publisher.publish_with_urls( 232 + urls=["s3://example-bucket/demo-data-{000000..000009}.tar"], 233 + schema_uri=str(schema_uri), 234 + name="Demo Image Dataset", 235 + description="Example dataset demonstrating atmosphere publishing", 236 + tags=["demo", "images", "atdata"], 237 + license="MIT", 199 238 ) 200 - print(f"Dataset URI: {blob_dataset_uri}") 239 + print(f"Dataset URI: {dataset_uri}") 201 240 ``` 202 241 203 242 ## List and Load Datasets ··· 236 275 237 276 ## Complete Publishing Workflow 238 277 278 + This example shows the recommended workflow using `PDSBlobStore` for fully decentralized storage: 279 + 239 280 ```{python} 240 281 #| eval: false 241 282 # 1. Define and create samples ··· 259 300 for i, s in enumerate(samples): 260 301 sink.write({**s.as_wds, "__key__": f"{i:06d}"}) 261 302 262 - # 3. Authenticate 263 - from atdata.atmosphere import AtmosphereIndex 264 - 303 + # 3. Authenticate and create index with blob storage 265 304 client = AtmosphereClient() 266 305 client.login("myhandle.bsky.social", "app-password") 267 - index = AtmosphereIndex(client) 306 + 307 + store = PDSBlobStore(client) 308 + index = AtmosphereIndex(client, data_store=store) 268 309 269 310 # 4. Publish schema 270 311 schema_uri = index.publish_schema( ··· 273 314 description="Feature vectors with labels", 274 315 ) 275 316 276 - # 5. Publish dataset 317 + # 5. Publish dataset (shards uploaded as blobs automatically) 277 318 dataset = atdata.Dataset[FeatureSample]("features.tar") 278 319 entry = index.insert_dataset( 279 320 dataset, ··· 283 324 ) 284 325 285 326 print(f"Published: {entry.uri}") 327 + print(f"Data stored at: {entry.data_urls}") # at://did/blob/cid URLs 328 + 329 + # 6. Later: load from blobs 330 + source = store.create_source(entry.data_urls) 331 + ds = atdata.Dataset[FeatureSample](source) 332 + for batch in ds.ordered(batch_size=32): 333 + print(f"Loaded batch with {len(batch.label)} samples") 334 + break 286 335 ``` 287 336 288 337 ## Next Steps
+111 -1
examples/atmosphere_demo.py
··· 29 29 import atdata 30 30 from atdata.atmosphere import ( 31 31 AtmosphereClient, 32 + AtmosphereIndex, 33 + PDSBlobStore, 32 34 SchemaPublisher, 33 35 SchemaLoader, 34 36 DatasetPublisher, 35 37 DatasetLoader, 36 38 AtUri, 37 39 ) 40 + from atdata import BlobSource 38 41 39 42 40 43 # ============================================================================= ··· 362 365 print("\nBlob storage demo complete!") 363 366 364 367 368 + def demo_pds_blob_store(handle: str, password: str): 369 + """Demonstrate PDSBlobStore for decentralized dataset storage. 370 + 371 + PDSBlobStore is the recommended way to store datasets as ATProto blobs. 372 + It provides automatic shard management and integrates with AtmosphereIndex. 373 + 374 + Args: 375 + handle: Bluesky handle (e.g., 'alice.bsky.social') 376 + password: App-specific password 377 + """ 378 + import webdataset as wds 379 + 380 + print("\n" + "=" * 60) 381 + print("PDSBlobStore Demo (Recommended Approach)") 382 + print("=" * 60) 383 + 384 + # Create client and authenticate 385 + print(f"\nConnecting as {handle}...") 386 + client = AtmosphereClient() 387 + client.login(handle, password) 388 + print(f"Authenticated as {client.handle}") 389 + 390 + # Create PDSBlobStore and AtmosphereIndex 391 + print("\nSetting up PDSBlobStore and AtmosphereIndex...") 392 + store = PDSBlobStore(client) 393 + index = AtmosphereIndex(client, data_store=store) 394 + print(" Store and index configured") 395 + 396 + # Define a sample type 397 + @atdata.packable 398 + class FeatureSample: 399 + features: NDArray 400 + label: int 401 + source: str 402 + 403 + # Create sample data 404 + print("\nCreating sample data...") 405 + samples = [ 406 + FeatureSample( 407 + features=np.random.randn(64).astype(np.float32), 408 + label=i % 5, 409 + source="demo", 410 + ) 411 + for i in range(50) 412 + ] 413 + 414 + # Write to temporary tar file 415 + import tempfile 416 + import os 417 + 418 + with tempfile.TemporaryDirectory() as temp_dir: 419 + tar_path = os.path.join(temp_dir, "demo.tar") 420 + 421 + with wds.writer.TarWriter(tar_path) as sink: 422 + for i, s in enumerate(samples): 423 + sink.write({**s.as_wds, "__key__": f"{i:06d}"}) 424 + 425 + print(f" Created tar with {len(samples)} samples") 426 + 427 + # Create dataset 428 + dataset = atdata.Dataset[FeatureSample](tar_path) 429 + 430 + # Publish schema 431 + print("\nPublishing schema...") 432 + schema_uri = index.publish_schema( 433 + FeatureSample, 434 + version="1.0.0", 435 + description="Demo feature vectors", 436 + ) 437 + print(f" Schema URI: {schema_uri}") 438 + 439 + # Publish dataset with blob storage 440 + print("\nPublishing dataset (shards uploaded as blobs)...") 441 + entry = index.insert_dataset( 442 + dataset, 443 + name=f"pds-blob-demo-{datetime.now().strftime('%Y%m%d-%H%M%S')}", 444 + schema_ref=schema_uri, 445 + description="Dataset stored using PDSBlobStore", 446 + tags=["demo", "pds-blob-store"], 447 + ) 448 + 449 + print(f" Dataset URI: {entry.uri}") 450 + print(f" Blob URLs: {len(entry.data_urls)} shard(s)") 451 + for url in entry.data_urls: 452 + print(f" {url[:70]}...") 453 + 454 + # Load back from blobs 455 + print("\nLoading dataset from blobs...") 456 + source = store.create_source(entry.data_urls) 457 + print(f" Created BlobSource with {len(source.blob_refs)} blob(s)") 458 + 459 + loaded_ds = atdata.Dataset[FeatureSample](source) 460 + count = 0 461 + for batch in loaded_ds.ordered(): 462 + count += 1 463 + print(f" Sample {count}: label={batch.label}, features shape={batch.features.shape}") 464 + if count >= 3: 465 + print(" ...") 466 + break 467 + 468 + print("\nPDSBlobStore demo complete!") 469 + print(" - Shards stored as ATProto blobs in your PDS") 470 + print(" - No external storage required") 471 + print(" - Fully decentralized!") 472 + 473 + 365 474 def demo_dataset_loading(): 366 475 """Demonstrate loading a dataset from an ATProto record.""" 367 476 print("\n" + "=" * 60) ··· 446 555 # Run live demos if credentials provided 447 556 if args.handle and args.password: 448 557 demo_live_connection(args.handle, args.password) 449 - demo_blob_storage(args.handle, args.password) 558 + demo_pds_blob_store(args.handle, args.password) # Recommended approach 559 + demo_blob_storage(args.handle, args.password) # Legacy approach 450 560 else: 451 561 print("\n" + "=" * 60) 452 562 print("Live Demo Skipped")