A loose federation of distributed, typed datasets
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

refactor: standardize WebDataset writer imports across codebase

- Update CLAUDE.md to document wds.writer.TarWriter import pattern
- Note that wds.writer imports avoid linting issues vs wds.TarWriter
- Update 5 instances in test_local.py to use wds.writer.TarWriter
- Fix typo: parseable → parsable

All imports now follow consistent pattern that prevents type checker warnings.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

+8 -6
.chainlink/issues.db

This is a binary file and will not be displayed.

+3 -1
CLAUDE.md
··· 136 136 137 137 **WebDataset Integration** 138 138 139 - - Uses `wds.ShardWriter` / `wds.TarWriter` for writing 139 + - Uses `wds.writer.ShardWriter` / `wds.writer.TarWriter` for writing 140 + - **Important:** Always import from `wds.writer` (e.g., `wds.writer.TarWriter`) instead of `wds.TarWriter` 141 + - This avoids linting issues while functionally equivalent 140 142 - Dataset iteration via `wds.DataPipeline` with custom `wrap()` / `wrap_batch()` methods 141 143 - Supports `ordered()` and `shuffled()` iteration modes 142 144
+5 -5
tests/test_local.py
··· 92 92 # Create a temporary WebDataset 93 93 dataset_path = tmp_path / "test-dataset-000000.tar" 94 94 95 - with wds.TarWriter(str(dataset_path)) as sink: 95 + with wds.writer.TarWriter(str(dataset_path)) as sink: 96 96 for i in range(10): 97 97 sample = SimpleTestSample(name=f"sample_{i}", value=i * 10) 98 98 sink.write(sample.as_wds) ··· 118 118 def make_simple_dataset(tmp_path: Path, num_samples: int = 10, name: str = "test") -> atdata.Dataset: 119 119 """Create a SimpleTestSample dataset for testing.""" 120 120 dataset_path = tmp_path / f"{name}-dataset-000000.tar" 121 - with wds.TarWriter(str(dataset_path)) as sink: 121 + with wds.writer.TarWriter(str(dataset_path)) as sink: 122 122 for i in range(num_samples): 123 123 sample = SimpleTestSample(name=f"sample_{i}", value=i * 10) 124 124 sink.write(sample.as_wds) ··· 128 128 def make_array_dataset(tmp_path: Path, num_samples: int = 3, array_shape: tuple = (10, 10)) -> atdata.Dataset: 129 129 """Create an ArrayTestSample dataset for testing.""" 130 130 dataset_path = tmp_path / "array-dataset-000000.tar" 131 - with wds.TarWriter(str(dataset_path)) as sink: 131 + with wds.writer.TarWriter(str(dataset_path)) as sink: 132 132 for i in range(num_samples): 133 133 arr = np.random.randn(*array_shape) 134 134 sample = ArrayTestSample(label=f"array_{i}", data=arr) ··· 279 279 """Test that BasicIndexEntry generates a valid UUID by default. 280 280 281 281 Should auto-generate a unique UUID when none is provided, and it should be 282 - parseable as a valid UUID. 282 + parsable as a valid UUID. 283 283 """ 284 284 entry = atlocal.BasicIndexEntry( 285 285 wds_url="s3://bucket/dataset.tar", ··· 800 800 RuntimeError. 801 801 """ 802 802 dataset_path = tmp_path / "empty-dataset-000000.tar" 803 - with wds.TarWriter(str(dataset_path)) as sink: 803 + with wds.writer.TarWriter(str(dataset_path)) as sink: 804 804 pass # Write no samples 805 805 806 806 ds = atdata.Dataset[SimpleTestSample](url=str(dataset_path))