Malachite is a tool to import your Last.fm and Spotify listening history to the AT Protocol network using the fm.teal.alpha.feed.play lexicon.
malachite scrobbles importer atproto music
14
fork

Configure Feed

Select the types of activity you want to include in your feed.

chore: archive — consolidated into ewanc26/pkgs

+13 -1033
+3 -197
CONTRIBUTING.md
··· 1 1 # Contributing to Malachite 2 2 3 - Thanks for wanting to help. This document covers the layout of the project, how to get a dev environment running, and what to keep in mind when opening a PR. 4 - 5 - ## Table of Contents 6 - 7 - - [Project layout](#project-layout) 8 - - [Prerequisites](#prerequisites) 9 - - [Getting started](#getting-started) 10 - - [Running the CLI](#running-the-cli) 11 - - [Running the web app](#running-the-web-app) 12 - - [Running tests](#running-tests) 13 - - [Code architecture](#code-architecture) 14 - - [Making changes](#making-changes) 15 - - [Opening a pull request](#opening-a-pull-request) 16 - - [Publishing `@ewanc26/tid`](#publishing-ewanc26tid) 17 - 18 - --- 19 - 20 - ## Project layout 21 - 22 - This is a pnpm monorepo with three separate projects: 23 - 24 - ``` 25 - malachite/ 26 - ├── src/ # CLI — TypeScript, compiled to dist/ by tsc 27 - │ ├── core/ # Environment-agnostic logic (shared with web) 28 - │ ├── lib/ # CLI wrappers around core (Node.js-specific) 29 - │ └── utils/ # Legacy CLI utilities (gradually migrating to core/) 30 - ├── web/ # SvelteKit web app 31 - │ └── src/lib/ 32 - │ ├── core/ # Thin re-exports of src/core/ via the $core alias 33 - │ └── ... # Svelte components, routes, config 34 - ├── packages/ 35 - │ └── tid/ # @ewanc26/tid — standalone npm package 36 - └── lexicons/ # fm.teal.alpha lexicon definitions 37 - ``` 38 - 39 - The key architectural rule is that **`src/core/` is the single source of truth** for all non-UI logic. The CLI wrappers in `src/lib/` adapt it for terminal use (spinners, file I/O, credentials). The web app re-exports it via `$core` path alias — there should be no duplicated logic between the two surfaces. 40 - 41 - --- 42 - 43 - ## Prerequisites 44 - 45 - - **Node.js 20+** (required for Web Crypto API via `globalThis.crypto`) 46 - - **pnpm 9+** — install with `npm i -g pnpm` if you don't have it 47 - 48 - --- 49 - 50 - ## Getting started 51 - 52 - ```sh 53 - # Clone and install everything in one shot — pnpm workspaces handles all three projects 54 - git clone https://github.com/ewanc26/malachite 55 - cd malachite 56 - pnpm install 57 - ``` 58 - 59 - --- 60 - 61 - ## Running the CLI 62 - 63 - ```sh 64 - # Build 65 - pnpm build 66 - 67 - # Dry run against a real Last.fm export 68 - node dist/index.js -i my-export.csv --dry-run 69 - 70 - # Interactive mode 71 - node dist/index.js 72 - 73 - # Rebuild and run in one step 74 - pnpm dev 75 - ``` 76 - 77 - For development work it's worth using `--dev` mode, which enables verbose logging and caps batch sizes to 20 records so you're not waiting through thousands of API calls: 78 - 79 - ```sh 80 - node dist/index.js -i my-export.csv --dev --dry-run 81 - ``` 82 - 83 - --- 84 - 85 - ## Running the web app 86 - 87 - ```sh 88 - cd web 89 - pnpm dev # starts at http://127.0.0.1:5173 90 - ``` 91 - 92 - The dev server **must** run on `127.0.0.1:5173` exactly — the ATProto OAuth loopback `redirect_uri` is pinned to that origin per RFC 8252. Don't change the host or port without also updating `web/src/lib/core/oauth.ts`. 93 - 94 - For changes to shared `src/core/` files, the web app picks them up immediately via the `$core` alias — no separate build step needed. 95 - 96 - --- 97 - 98 - ## Running tests 99 - 100 - Tests use Node's built-in test runner — no Jest, no Vitest. 101 - 102 - ```sh 103 - # Build first, then run all tests 104 - pnpm test 105 - 106 - # Run just the TID tests 107 - pnpm test:tid 108 - 109 - # Watch mode 110 - pnpm test:watch 111 - ``` 112 - 113 - Tests live in `src/tests/`. If you're adding a new file to `src/core/`, add a corresponding test file there. The existing tests are a good reference for the style — `node:test` + `node:assert`, no extra dependencies. 114 - 115 - --- 116 - 117 - ## Code architecture 118 - 119 - ### The `src/core/` contract 120 - 121 - Any code that lives in `src/core/` must be: 122 - 123 - - **Zero Node.js dependencies** — no `fs`, `path`, `crypto` module, etc. Web Crypto (`globalThis.crypto`) is fine because Node 20+ and all modern browsers support it. 124 - - **Callback-based for progress/logging** — functions accept optional `onProgress` callbacks rather than calling `console.log` directly. 125 - - **Tested** — new files should have coverage in `src/tests/`. 126 - 127 - If you need Node.js-specific behaviour (file I/O, terminal spinners, credential storage), put it in `src/lib/` as a thin wrapper that calls into `src/core/`. 128 - 129 - ### Adding something to the web 130 - 131 - The web's `web/src/lib/core/` files are almost all one-liners: 132 - 133 - ```ts 134 - export * from '$core/your-new-file.js'; 135 - ``` 136 - 137 - The only exceptions are `csv.ts` and `spotify.ts`, which add browser `File` API loaders on top of the core parsers, and `oauth.ts` / `import.ts`, which are web-only. Follow that same pattern for anything new. 138 - 139 - ### Rate limiting 140 - 141 - The rate limiter in `src/core/rate-limiter.ts` learns quota from response headers on the first successful batch and then gates all subsequent batches. If you're changing publish behaviour, keep the 15% headroom buffer intact — exceeding the PDS daily limit affects every user on a shared instance, not just the one importing. 142 - 143 - ### TIDs 144 - 145 - Record keys are generated from `playedTime` using the TID clock in `src/core/tid.ts`. The clock is monotonic — even if records arrive out of order, every call produces a strictly increasing TID within the same process/page. Don't make the generation async (it isn't) and don't reset `lastUs` outside of tests. 146 - 147 - --- 148 - 149 - ## Making changes 150 - 151 - ### Bugfixes 152 - 153 - Open a PR against `main` with a description of what was broken and how you've verified the fix. A failing test that now passes is ideal. 154 - 155 - ### New features 156 - 157 - Open an issue first if the change is significant — it's worth a quick discussion before writing a lot of code. For smaller additions (a new CLI flag, a missing field in the data mapping, an extra export from `@ewanc26/tid`) just open the PR directly. 158 - 159 - ### Changing shared core logic 160 - 161 - Changes to `src/core/` affect all three surfaces (CLI, web, `@ewanc26/tid` if relevant). Run both `pnpm test` and `cd web && pnpm check` before submitting to make sure nothing is broken on either side. 162 - 163 - ### Style 164 - 165 - - TypeScript strict mode is on — no `any` without a comment explaining why. 166 - - No new runtime dependencies in `src/core/` or `packages/tid/`. 167 - - Prefer named exports over default exports. 168 - - Keep comments on the *why*, not the *what*. 169 - 170 - --- 171 - 172 - ## Opening a pull request 173 - 174 - 1. Fork the repo and create a branch from `main`. 175 - 2. Make your changes, including tests where applicable. 176 - 3. Verify `pnpm test` passes and `pnpm run type-check` is clean. 177 - 4. If you touched the web app, verify `cd web && pnpm check` is clean too. 178 - 5. Open a PR with a clear description of what changed and why. 3 + **This repository has been archived.** Development continues in [`ewanc26/pkgs`](https://github.com/ewanc26/pkgs) — please open issues and PRs there. 179 4 180 - There's no formal CLA or contributor agreement — AGPL-3.0 covers contributions automatically. 181 - 182 - --- 183 - 184 - ## Publishing `@ewanc26/tid` 185 - 186 - > **Note:** `@ewanc26/tid` is now canonically maintained in the [`@ewanc26/pkgs`](https://github.com/ewanc26/pkgs) monorepo. The copy in `packages/tid/` here is kept for historical context. All version bumps, releases, and npm publishes should happen from there. 187 - 188 - To cut a new release, work from the pkgs monorepo: 189 - 190 - ```sh 191 - cd /path/to/pkgs 192 - git subtree pull --prefix=packages/tid malachite malachite-split 193 - 194 - cd packages/tid 195 - # Bump the version in package.json, then: 196 - pnpm build 197 - npm publish --access public --otp=<your-2fa-code> 198 - ``` 199 - 200 - The package has no runtime dependencies and must stay that way. If a change to `src/core/tid.ts` in this repo affects the public API of the package, update `packages/tid/src/index.ts` in `pkgs` to match and bump the version accordingly (semver — patch for fixes, minor for new exports, major for breaking changes). 5 + - CLI: [`packages/malachite/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite) 6 + - Web: [`packages/malachite-web/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite-web)
+10 -836
README.md
··· 1 - # Malachite 2 - 3 - Import your Last.fm and Spotify listening history to the AT Protocol network using the `fm.teal.alpha.feed.play` lexicon. 4 - 5 - **Repository:** [malachite](https://github.com/ewanc26/malachite) 6 - [Also available on Tangled](https://tangled.org/did:plc:ofrbh253gwicbkc5nktqepol/atproto-lastfm-importer) 7 - 8 - ## Table of Contents 9 - 10 - - [⚠️ Important: Rate Limits](#️-important-rate-limits) 11 - - [How Dynamic Batch Sizing Works](#how-dynamic-batch-sizing-works) 12 - - [What's with the name?](#whats-with-the-name) 13 - - [Web App](#web-app) 14 - - [Quick Start](#quick-start) 15 - - [Interactive Mode (Recommended for First-Time Users)](#interactive-mode-recommended-for-first-time-users) 16 - - [Command Line Mode](#command-line-mode) 17 - - [Features](#features) 18 - - [Import Capabilities](#import-capabilities) 19 - - [Performance & Safety](#performance--safety) 20 - - [User Experience](#user-experience) 21 - - [Technical Features](#technical-features) 22 - - [Usage Examples](#usage-examples) 23 - - [Combined Import (Last.fm + Spotify)](#combined-import-lastfm--spotify) 24 - - [Re-Sync Mode](#re-sync-mode) 25 - - [Remove Duplicates](#remove-duplicates) 26 - - [Import from Spotify](#import-from-spotify) 27 - - [Import from Last.fm](#import-from-lastfm) 28 - - [Advanced Options](#advanced-options) 29 - - [Command Line Options](#command-line-options) 30 - - [Required Options](#required-options) 31 - - [Import Mode](#import-mode) 32 - - [Additional Options](#additional-options) 33 - - [PDS Override](#pds-override) 34 - - [Legacy Flags (Backwards Compatible)](#legacy-flags-backwards-compatible) 35 - - [Getting Your Data](#getting-your-data) 36 - - [Last.fm Export](#lastfm-export) 37 - - [Spotify Export](#spotify-export) 38 - - [Data Format](#data-format) 39 - - [Required Fields](#required-fields) 40 - - [Optional Fields](#optional-fields) 41 - - [Example Records](#example-records) 42 - - [How It Works](#how-it-works) 43 - - [Processing Flow](#processing-flow) 44 - - [Automatic Duplicate Prevention](#automatic-duplicate-prevention) 45 - - [Rate Limiting Algorithm](#rate-limiting-algorithm) 46 - - [Multi-Day Imports](#multi-day-imports) 47 - - [Logging and Output](#logging-and-output) 48 - - [Verbosity Levels](#verbosity-levels) 49 - - [Error Handling](#error-handling) 50 - - [Troubleshooting](#troubleshooting) 51 - - [Authentication Issues](#authentication-issues) 52 - - [Performance Issues](#performance-issues) 53 - - [Connection Issues](#connection-issues) 54 - - [Output Control](#output-control) 55 - - [Development](#development) 56 - - [File Storage](#file-storage) 57 - - [Credential Storage](#credential-storage) 58 - - [Project Structure](#project-structure) 59 - - [Technical Details](#technical-details) 60 - - [Authentication](#authentication) 61 - - [Batch Publishing](#batch-publishing) 62 - - [Data Mapping](#data-mapping) 63 - - [Lexicon Reference](#lexicon-reference) 64 - - [Contributing](#contributing) 65 - - [License](#license) 66 - - [Credits](#credits) 67 - 68 - ## ⚠️ Important: Rate Limits 69 - 70 - **CRITICAL**: Bluesky's AppView has rate limits on PDS instances. Exceeding 10K records per day can rate limit your **ENTIRE PDS**, affecting all users on your instance. 71 - 72 - This importer automatically protects your PDS by: 73 - 74 - - **Dynamic batch sizing** (1-200 records) that adapts to available quota in real-time 75 - - **15% headroom buffer** prevents quota exhaustion before hitting the limit 76 - - Limiting imports to **7,500 records per day** (with 75% safety margin) 77 - - Calculating optimal batch sizes and delays 78 - - **Graceful degradation** - scales down smoothly as quota depletes 79 - - **Instant recovery** - immediately returns to maximum speed after quota resets 80 - - Pausing 24 hours between days for large imports 81 - - Providing clear progress tracking and time estimates 82 - - Persisting state across restarts for safe resume 83 - 84 - ### How Dynamic Batch Sizing Works 85 - 86 - Malachite continuously monitors your rate limit quota and automatically adjusts batch size: 87 - 88 - ``` 89 - Fresh Quota (5000 points) → Batch Size: 200 records (maximum speed) 90 - Half Depleted (2500 points) → Batch Size: 200 records (still optimal) 91 - Approaching Limit (1200) → Batch Size: 150 records (scaling down) 92 - Near Headroom (900) → Batch Size: 50 records (conservative) 93 - Below Headroom (700) → Batch Size: 1 record (minimal progress) 94 - [Quota Resets] → Batch Size: 200 records (instant recovery) 95 - ``` 96 - 97 - **Benefits:** 98 - 99 - - ✅ **2x faster** when quota is fresh (200 vs 100 records/batch) 100 - - ✅ **Never hits rate limits** - proactive scaling with 15% buffer 101 - - ✅ **Always makes progress** - even with minimal quota (batch size 1) 102 - - ✅ **Automatic recovery** - no manual intervention needed 103 - - ✅ **Transparent** - logs all batch size changes with reasons 104 - 105 - For more details, see the [Bluesky Rate Limits Documentation](https://docs.bsky.app/blog/rate-limits-pds-v3). 106 - 107 - ## What's with the name? 108 - 109 - It used to be called `atproto-lastfm-importer` — generic as fuck. That name told you what it did and nothing about why it mattered, and it sounded like a disposable weekend script. So I renamed it. 110 - 111 - At the moment, the repository is still called `atproto-lastfm-importer` on Tangled, but the GitHub link has been updated to `malachite`. I do not know if this can be resolved. 112 - 113 - **Malachite** is a greenish-blue copper mineral associated with preservation and transformation. That's exactly what this tool does: it preserves your scrobbles and transforms them into proper `fm.teal.alpha.feed.play` records on the AT Protocol. The colour match isn't an accident — malachite sits squarely in the teal/green range, a deliberate nod to the `teal` lexicon it publishes to. 114 - 115 - ## Web App 116 - 117 - Malachite also ships a browser-based web app (`web/`) built with SvelteKit. It supports all five import modes and signs in via ATProto OAuth — no app password required. 118 - 119 - **Running the web app in development:** 120 - 121 - ```bash 122 - cd web 123 - pnpm install 124 - pnpm dev # starts at http://127.0.0.1:5173 125 - ``` 126 - 127 - > **Note:** The dev server must run on `127.0.0.1:5173` exactly. This is enforced in `vite.config.ts` because the OAuth loopback `redirect_uri` is pinned to that origin (RFC 8252 §7.3). Do not change the host or port without updating the OAuth client metadata. 128 - 129 - The web app fetches existing records using the same CAR-export path as the CLI (`com.atproto.sync.getRepo`) so it costs zero AppView write-quota points to check for duplicates. 130 - 131 - ## Quick Start 132 - 133 - **Note:** You must build the project first, then run with arguments. 134 - 135 - ### Interactive Mode (Recommended for First-Time Users) 136 - 137 - Just run without any arguments and Malachite will guide you through the process: 138 - 139 - ```bash 140 - # Install dependencies and build 141 - pnpm install 142 - pnpm build 143 - 144 - # Run in interactive mode 145 - pnpm start 146 - ``` 147 - 148 - The interactive mode will: 149 - 150 - - Present a menu of available actions 151 - - Prompt for all required information (handle, password, files) 152 - - Ask for optional settings (dry run, verbose logging, etc.) 153 - - Provide helpful descriptions for each option 154 - 155 - ### Command Line Mode 156 - 157 - For automation or if you prefer command-line arguments: 158 - 159 - ```bash 160 - # Show help 161 - pnpm start --help 162 - 163 - # Run with command line arguments 164 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 165 - 166 - # Alternative: run directly with node 167 - node dist/index.js -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 168 - ``` 169 - 170 - ## Features 171 - 172 - ### Import Capabilities 173 - 174 - - ✅ **Last.fm Import**: Full support for Last.fm CSV exports with MusicBrainz IDs 175 - - ✅ **Spotify Import**: Import Extended Streaming History JSON files 176 - - ✅ **Combined Import**: Merge Last.fm and Spotify exports with intelligent deduplication 177 - - ✅ **Re-Sync Mode**: Import only new scrobbles without creating duplicates 178 - - ✅ **Duplicate Removal**: Clean up accidentally imported duplicate records 179 - 180 - ### Performance & Safety 181 - 182 - - ✅ **Automatic Duplicate Prevention**: Fetches your existing Teal records via CAR export and skips anything already imported 183 - - ✅ **Input Deduplication**: Removes duplicate entries within the source file before submission 184 - - ✅ **Dynamic Batch Sizing**: Automatically adjusts batch size (1-200 records) based on available rate limit quota 185 - - ✅ **Batch Operations**: Uses `com.atproto.repo.applyWrites` for efficient batch publishing (up to 200 records per call) 186 - - ✅ **Zero-cost sync check**: Existing record fetching uses `com.atproto.sync.getRepo` (CAR export) — a separate, far more generous rate-limit envelope that costs zero AppView write-quota points 187 - - ✅ **Intelligent Rate Limiting**: Real-time quota monitoring with 15% headroom buffer prevents rate limit exhaustion 188 - - ✅ **Adaptive Recovery**: Automatically scales back to maximum speed after quota resets 189 - - ✅ **Multi-Day Imports**: Large imports automatically span multiple days with 24-hour pauses 190 - - ✅ **Resume Support**: Safe to stop (Ctrl+C) and restart - continues from where it left off 191 - - ✅ **Graceful Cancellation**: Press Ctrl+C to stop after the current batch completes 192 - 193 - ### User Experience 194 - 195 - - ✅ **Structured Logging**: Color-coded output with debug/verbose modes 196 - - ✅ **Progress Tracking**: Real-time progress with time estimates 197 - - ✅ **Dry Run Mode**: Preview records without publishing 198 - - ✅ **Interactive Mode**: Simple prompts guide you through the process 199 - - ✅ **Command Line Mode**: Full automation support for scripting 200 - - ✅ **Web App**: Browser-based UI with ATProto OAuth sign-in 201 - 202 - ### Technical Features 203 - 204 - - ✅ **TID-based Record Keys**: Timestamp-based identifiers for chronological ordering 205 - - ✅ **Identity Resolution**: Resolves ATProto handles/DIDs using Slingshot 206 - - ✅ **PDS Auto-Discovery**: Automatically connects to your personal PDS 207 - - ✅ **MusicBrainz Support**: Preserves MusicBrainz IDs when available (Last.fm) 208 - - ✅ **Chronological Ordering**: Processes oldest first (or newest with `-r` flag) 209 - - ✅ **Error Handling**: Continues on errors with detailed reporting 210 - 211 - ## Usage Examples 212 - 213 - ### Combined Import (Last.fm + Spotify) 214 - 215 - Merge your Last.fm and Spotify listening history into a single, deduplicated import: 216 - 217 - ```bash 218 - # Preview the merged import 219 - pnpm start -i lastfm.csv --spotify-input spotify-export/ -m combined --dry-run 220 - 221 - # Perform the combined import 222 - pnpm start -i lastfm.csv --spotify-input spotify-export/ -m combined -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 223 - ``` 224 - 225 - **What combined mode does:** 226 - 227 - 1. Parses both Last.fm CSV and Spotify JSON exports 228 - 2. Normalizes track names and artist names for comparison 229 - 3. Identifies duplicate plays (same track within 5 minutes) 230 - 4. Chooses the best version of each play (prefers Last.fm with MusicBrainz IDs) 231 - 5. Merges into a single chronological timeline 232 - 6. Shows detailed statistics about the merge 233 - 234 - ### Re-Sync Mode 235 - 236 - Sync your Last.fm export with Teal without creating duplicates: 237 - 238 - ```bash 239 - # Preview what will be synced 240 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -m sync --dry-run 241 - 242 - # Perform the sync 243 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -m sync -y 244 - ``` 245 - 246 - **Perfect for:** 247 - 248 - - Re-running imports with updated Last.fm exports 249 - - Recovering from interrupted imports 250 - - Adding recent scrobbles without duplicating old ones 251 - 252 - **Note:** Sync mode requires authentication even in dry-run mode to fetch existing records. 253 - 254 - ### Remove Duplicates 255 - 256 - Clean up accidentally imported duplicate records: 257 - 258 - ```bash 259 - # Preview duplicates (dry run) 260 - pnpm start -m deduplicate -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx --dry-run 261 - 262 - # Remove duplicates (keeps first occurrence) 263 - pnpm start -m deduplicate -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx 264 - ``` 265 - 266 - ### Import from Spotify 267 - 268 - ```bash 269 - # Import single Spotify JSON file 270 - pnpm start -i Streaming_History_Audio_2021-2023_0.json -m spotify -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 271 - 272 - # Import directory with multiple Spotify files (recommended) 273 - pnpm start -i '/path/to/Spotify Extended Streaming History' -m spotify -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 274 - ``` 275 - 276 - ### Import from Last.fm 277 - 278 - ```bash 279 - # Standard Last.fm import 280 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y 281 - 282 - # Preview without publishing 283 - pnpm start -i lastfm.csv --dry-run 284 - 285 - # Process newest tracks first 286 - pnpm start -i lastfm.csv -h alice.bsky.social -r -y 287 - 288 - # Verbose debug output 289 - pnpm start -i lastfm.csv --dry-run -v 290 - 291 - # Quiet mode (only warnings and errors) 292 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -q -y 293 - ``` 294 - 295 - ### Advanced Options 296 - 297 - ```bash 298 - # Development mode (verbose + file logging + smaller batches for debugging) 299 - pnpm start -i lastfm.csv --dev --dry-run 300 - 301 - # Custom batch settings (advanced users only) 302 - pnpm start -i lastfm.csv -h alice.bsky.social -b 20 -d 3000 303 - 304 - # Full automation with all flags 305 - pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y -q 306 - ``` 307 - 308 - ## Command Line Options 309 - 310 - **Note:** When importing data (not in deduplicate mode), you must provide `--input`, `--handle`, and `--password`. The `--yes` flag skips confirmation prompts for automation. 311 - 312 - ### Required Options 313 - 314 - | Option | Short | Description | Example | 315 - | ------------------- | ----- | -------------------------------------------------- | ------------------------ | 316 - | `--input <path>` | `-i` | Path to Last.fm CSV or Spotify JSON file/directory | `-i lastfm.csv` | 317 - | `--handle <handle>` | `-h` | ATProto handle or DID | `-h alice.bsky.social` | 318 - | `--password <pass>` | `-p` | ATProto app password | `-p xxxx-xxxx-xxxx-xxxx` | 319 - 320 - ### Import Mode 321 - 322 - | Option | Short | Description | Default | 323 - | --------------- | ----- | ----------- | -------- | 324 - | `--mode <mode>` | `-m` | Import mode | `lastfm` | 325 - 326 - **Available modes:** 327 - 328 - - `lastfm` - Import Last.fm export only 329 - - `spotify` - Import Spotify export only 330 - - `combined` - Merge Last.fm + Spotify exports 331 - - `sync` - Skip existing records (sync mode) 332 - - `deduplicate` - Remove duplicate records 333 - 334 - ### Additional Options 335 - 336 - | Option | Short | Description | Default | 337 - | ------------------------ | ----- | ----------------------------------------------------------- | --------------- | 338 - | `--spotify-input <path>` | | Path to Spotify export (for combined mode) | - | 339 - | `--reverse` | `-r` | Process newest first | `false` | 340 - | `--yes` | `-y` | Skip confirmation prompts | `false` | 341 - | `--dry-run` | | Preview without importing | `false` | 342 - | `--verbose` | `-v` | Enable debug logging | `false` | 343 - | `--quiet` | `-q` | Suppress non-essential output | `false` | 344 - | `--dev` | | Development mode (verbose + file logging + smaller batches) | `false` | 345 - | `--batch-size <num>` | `-b` | Initial batch size (1-200, dynamically adjusted) | Auto-calculated | 346 - | `--batch-delay <ms>` | `-d` | Delay between batches in ms | `500` (min) | 347 - | `--help` | | Show help message | - | 348 - 349 - ### PDS Override 350 - 351 - If you already know the base URL of your Personal Data Server (PDS) you can bypass the Slingshot identity resolver and provide it directly with the `--pds` flag. This is useful for private instances, testing, or when the resolver is unreliable. 352 - 353 - | Option | Description | 354 - | ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 355 - | `--pds <url>` | PDS base URL to use for authentication and API calls (e.g. `https://pds.example.com`). When provided, Malachite will skip Slingshot lookup and use this URL directly. | 356 - 357 - Notes: 358 - 359 - - The `--pds` flag overrides the configured Slingshot resolver for identity lookup. If `--pds` is given, Malachite will attempt to authenticate directly against the supplied PDS using your handle/DID and app password. 360 - - Use the full base URL (including scheme), e.g. `https://pds.example.com`. 361 - - If authentication fails when using `--pds`, try removing the flag so Malachite can resolve your PDS automatically via Slingshot. 362 - 363 - ### Legacy Flags (Backwards Compatible) 364 - 365 - These old flags still work but are deprecated: 366 - 367 - - `--file` → Use `--input` 368 - - `--identifier` → Use `--handle` 369 - - `--spotify-file` → Use `--spotify-input` 370 - - `--reverse-chronological` → Use `--reverse` 371 - - `--spotify` → Use `--mode spotify` 372 - - `--combined` → Use `--mode combined` 373 - - `--sync` → Use `--mode sync` 374 - - `--remove-duplicates` → Use `--mode deduplicate` 375 - 376 - ## Getting Your Data 377 - 378 - ### Last.fm Export 379 - 380 - 1. Visit [Last.fm Export Tool](https://lastfm.ghan.nl/export/) 381 - 2. Request your data export in CSV format 382 - 3. Download the CSV file when ready 383 - 4. Use the CSV file path with this importer 384 - 385 - ### Spotify Export 386 - 387 - 1. Go to [Spotify Privacy Settings](https://www.spotify.com/account/privacy/) 388 - 2. Scroll to "Download your data" and request your data 389 - 3. Select "Extended streaming history" (can take up to 30 days) 390 - 4. When ready, download and extract the ZIP file 391 - 5. Use either: 392 - - A single JSON file: `Streaming_History_Audio_2021-2023_0.json` 393 - - The entire extracted directory (recommended) 394 - 395 - **Note:** The importer automatically: 396 - 397 - - Reads all `Streaming_History_Audio_*.json` files in a directory 398 - - Filters out podcasts, audiobooks, and non-music content 399 - - Combines all music tracks into a single import 400 - 401 - ## Data Format 402 - 403 - Each scrobble becomes an `fm.teal.alpha.feed.play` record with: 404 - 405 - ### Required Fields 406 - 407 - - **trackName**: The name of the track 408 - - **artists**: Array of artist objects (requires `artistName`, optional `artistMbId` for Last.fm) 409 - - **playedTime**: ISO 8601 timestamp of when you listened 410 - - **submissionClientAgent**: Identifies this importer (`malachite/v0.10.0` for CLI, `malachite/v0.3.0 (web)` for the web app) 411 - - **musicServiceBaseDomain**: Set to `last.fm` or `spotify.com` 412 - 413 - ### Optional Fields 414 - 415 - - **releaseName**: Album/release name 416 - - **releaseMbId**: MusicBrainz release ID (Last.fm only) 417 - - **recordingMbId**: MusicBrainz recording/track ID (Last.fm only) 418 - - **originUrl**: Link to the track on Last.fm or Spotify 419 - 420 - ### Example Records 421 - 422 - **Last.fm Record:** 423 - 424 - ```json 425 - { 426 - "$type": "fm.teal.alpha.feed.play", 427 - "trackName": "Paint My Masterpiece", 428 - "artists": [ 429 - { 430 - "artistName": "Cjbeards", 431 - "artistMbId": "c8d4f4bf-1b82-4d4d-9d73-05909faaff89" 432 - } 433 - ], 434 - "releaseName": "Masquerade", 435 - "releaseMbId": "fdb2397b-78d5-4019-8fad-656d286e4d33", 436 - "recordingMbId": "3a390ad3-fe56-45f2-a073-bebc45d6bde1", 437 - "playedTime": "2025-11-13T23:49:36Z", 438 - "originUrl": "https://www.last.fm/music/Cjbeards/_/Paint+My+Masterpiece", 439 - "submissionClientAgent": "malachite/v0.10.0", 440 - "musicServiceBaseDomain": "last.fm" 441 - } 442 - ``` 443 - 444 - **Spotify Record:** 445 - 446 - ```json 447 - { 448 - "$type": "fm.teal.alpha.feed.play", 449 - "trackName": "Don't Give Up", 450 - "artists": [ 451 - { 452 - "artistName": "Chicane" 453 - } 454 - ], 455 - "releaseName": "Twenty", 456 - "playedTime": "2021-09-09T10:34:08Z", 457 - "originUrl": "https://open.spotify.com/track/3gZqDJkMZipOYCRjlHWgOV", 458 - "submissionClientAgent": "malachite/v0.10.0", 459 - "musicServiceBaseDomain": "spotify.com" 460 - } 461 - ``` 462 - 463 - ## How It Works 464 - 465 - ### Processing Flow 466 - 467 - 1. **Parses input file(s)**: 468 - - Last.fm: CSV using `csv-parse` library 469 - - Spotify: JSON files (single or multiple in directory) 470 - 2. **Filters data**: 471 - - Spotify: Automatically removes podcasts, audiobooks, and non-music content 472 - 3. **Converts to schema**: Maps to `fm.teal.alpha.feed.play` format 473 - 4. **Deduplicates input**: Removes duplicate entries from the source data (keeps first occurrence) 474 - 5. **Checks Teal**: Downloads the entire repo as a CAR file (`com.atproto.sync.getRepo`) and skips any records already imported — costs zero AppView write-quota points 475 - 6. **Sorts records**: Chronologically (oldest first) or reverse with `-r` flag 476 - 7. **Generates TID-based keys**: From `playedTime` for chronological ordering 477 - 8. **Validates fields**: Ensures required fields are present 478 - 9. **Publishes in batches**: Uses `com.atproto.repo.applyWrites` (up to 200 records per call) 479 - 480 - ### Automatic Duplicate Prevention 481 - 482 - The importer has **two layers of duplicate prevention** to ensure you never import the same record twice: 483 - 484 - #### Step 1: Input File Deduplication 485 - 486 - Removes duplicates within your source file(s): 487 - 488 - **How duplicates are identified:** 489 - 490 - - Same track name (case-insensitive) 491 - - Same artist name (case-insensitive) 492 - - Same timestamp (exact match) 493 - 494 - **What happens:** 495 - 496 - - First occurrence is kept 497 - - Subsequent duplicates are removed 498 - - Shows message: "No duplicates found in input data" or "Removed X duplicate(s)" 499 - 500 - #### Step 2: Teal Comparison via CAR Export 501 - 502 - **Automatically checks your existing Teal records** by downloading your entire repo as a CARv1 file: 503 - 504 - - One HTTP request fetches the whole repo (`com.atproto.sync.getRepo`) 505 - - The CAR file is parsed locally in memory — no AppView quota consumed 506 - - Compares every record against your input and skips anything already imported 507 - - Shows: "Skipped X already-imported record(s)" 508 - 509 - **This means:** 510 - 511 - - ✅ Safe to re-run imports with updated exports 512 - - ✅ Won't create duplicates if you run the import twice 513 - - ✅ Zero AppView write-quota cost for the sync check 514 - - ✅ Works automatically - no special mode needed 515 - 516 - **Note:** 517 - 518 - - Credentials are required even for `--dry-run` to fetch the CAR export 519 - - **Sync mode** (`-m sync`): Shows detailed statistics about what's being skipped 520 - - **Deduplicate mode** (`-m deduplicate`): Removes duplicates from already-imported Teal records (cleanup tool) 521 - 522 - ### Rate Limiting Algorithm 523 - 524 - 1. Calculates safe daily limit (75% of 10K = 7,500 records/day by default) 525 - 2. Determines how many days needed for your import 526 - 3. **Monitors rate limit quota in real-time** before each batch 527 - 4. **Dynamically adjusts batch size** (1-200 records) based on available points 528 - 5. **Preserves 15% headroom buffer** to prevent exhaustion 529 - 6. **Automatically waits** when quota is exhausted (with countdown timer) 530 - 7. **Instantly scales back up** to maximum batch size after quota resets 531 - 8. Enforces minimum delay between batches 532 - 9. Shows clear schedule and real-time batch size adjustments 533 - 534 - ### Multi-Day Imports 535 - 536 - For imports exceeding the daily limit, the importer automatically: 537 - 538 - 1. **Calculates a schedule**: Splits your import across multiple days 539 - 2. **Shows the plan**: Displays which records will be imported each day 540 - 3. **Processes Day 1**: Imports the first batch of records 541 - 4. **Pauses 24 hours**: Waits a full day before continuing 542 - 5. **Repeats**: Continues until all records are imported 543 - 544 - **Important notes:** 545 - 546 - - You can safely stop (Ctrl+C) and restart 547 - - Progress is preserved - continues where it left off 548 - - Each day's progress is clearly displayed 549 - - Time estimates account for multi-day duration 550 - 551 - ## Logging and Output 552 - 553 - The importer uses color-coded output for clarity: 554 - 555 - - **Green (✓)**: Success messages 556 - - **Cyan (→)**: Progress updates 557 - - **Yellow (⚠️)**: Warnings 558 - - **Red (✗)**: Errors 559 - - **Bold Red (🛑)**: Fatal errors 560 - 561 - ### Verbosity Levels 562 - 563 - **Default Mode**: Standard operational messages 564 - 565 - ```bash 566 - pnpm start -i lastfm.csv -h alice.bsky.social -p pass 567 - ``` 568 - 569 - **Verbose Mode** (`-v`): Detailed debug information including batch timing and API calls 570 - 571 - ```bash 572 - pnpm start -i lastfm.csv -h alice.bsky.social -p pass -v 573 - ``` 574 - 575 - **Quiet Mode** (`-q`): Only warnings and errors 576 - 577 - ```bash 578 - pnpm start -i lastfm.csv -h alice.bsky.social -p pass -q 579 - ``` 580 - 581 - **Development Mode** (`--dev`): Verbose logging + file logging to `~/.malachite/logs/` + smaller batch sizes 582 - 583 - ```bash 584 - pnpm start -i lastfm.csv --dev --dry-run 585 - ``` 1 + # Malachite — Archived 586 2 587 - Development mode is perfect for: 3 + **This repository has been consolidated into [`ewanc26/pkgs`](https://github.com/ewanc26/pkgs).** 588 4 589 - - Debugging import issues with detailed logs 590 - - Testing changes with smaller batches (20 records max) 591 - - Preserving logs for later analysis 592 - - Troubleshooting problems with support 5 + The code lives on — it's just no longer maintained here as a standalone repo. 593 6 594 - ## Error Handling 7 + | What | Where | 8 + |---|---| 9 + | CLI importer | [`packages/malachite/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite) | 10 + | Web frontend | [`packages/malachite-web/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite-web) | 11 + | Issues & PRs | [ewanc26/pkgs](https://github.com/ewanc26/pkgs/issues) | 595 12 596 - The importer is designed to be resilient: 597 - 598 - - **Network errors**: Failed records are logged but don't stop the import 599 - - **Invalid data**: Skipped with error messages 600 - - **Authentication issues**: Clear error messages with suggested fixes 601 - - **Rate limit hits**: Automatic adjustment and retry logic 602 - - **Ctrl+C handling**: Gracefully stops after current batch 603 - 604 - ## Troubleshooting 605 - 606 - ### Authentication Issues 607 - 608 - **"Handle not found"** 609 - 610 - - Verify your ATProto handle is correct (e.g., `alice.bsky.social`) 611 - - Ensure you're using a valid DID or handle 612 - 613 - **"Invalid credentials"** 614 - 615 - - Use an **app password**, not your main account password 616 - - Generate app passwords in your account settings 617 - 618 - ### Performance Issues 619 - 620 - **"Rate limit exceeded"** 621 - 622 - - The importer should prevent this automatically 623 - - If you see this, wait 24 hours before retrying 624 - - Consider reducing batch size with `-b` flag 625 - 626 - **Import seems stuck** 627 - 628 - - Check progress messages - large imports take time 629 - - Multi-day imports pause for 24 hours between days 630 - - You can safely stop (Ctrl+C) and resume later 631 - - Use `--verbose` flag to see detailed progress 632 - 633 - ### Connection Issues 634 - 635 - **"Connection refused"** 636 - 637 - - Check your internet connection 638 - - Verify your PDS is accessible 639 - - Some PDSs may have firewall rules 640 - 641 - ### Output Control 642 - 643 - **Too much output** 644 - 645 - - Use `--quiet` flag to suppress non-essential messages 646 - - Only warnings and errors will be shown 647 - 648 - **Need more details** 649 - 650 - - Use `--verbose` flag to see debug-level information 651 - - Shows batch timing, API calls, and detailed progress 652 - 653 - ## Development 654 - 655 - ```bash 656 - # Type checking 657 - pnpm run type-check 658 - 659 - # Build 660 - pnpm run build 661 - 662 - # Development mode (rebuild + run) 663 - pnpm run dev 664 - 665 - # Run tests 666 - pnpm run test 667 - 668 - # Clean build artifacts 669 - pnpm run clean 670 - ``` 671 - 672 - ## File Storage 673 - 674 - Malachite stores all its data in `~/.malachite/`: 675 - 676 - ``` 677 - ~/.malachite/ 678 - ├── cache/ # Cached Teal records (24-hour TTL) 679 - ├── state/ # Import state for resume functionality 680 - ├── logs/ # Import logs (when file logging is enabled) 681 - └── credentials.json # Encrypted credentials (optional, machine-specific) 682 - ``` 683 - 684 - This keeps your project directory clean and follows standard Unix conventions. 685 - 686 - ### Credential Storage 687 - 688 - Malachite automatically saves your ATProto credentials after a successful login so you don't need to re-enter them on the next run: 689 - 690 - **Security Features:** 691 - 692 - - ✅ **AES-256-GCM encryption** - Military-grade encryption 693 - - ✅ **Machine-specific** - Credentials are bound to your computer and can't be transferred 694 - - ✅ **Secure key derivation** - Uses PBKDF2 with 100,000 iterations 695 - - ✅ **File permissions** - Credentials file is readable only by you (Unix) 696 - 697 - **How It Works:** 698 - 699 - 1. Credentials are encrypted using a key derived from your hostname + username and saved to `~/.malachite/credentials.json` after every successful login 700 - 2. On the next run, Malachite loads saved credentials automatically 701 - 3. In interactive mode, you'll be prompted whether to use the saved credentials or enter new ones 702 - 703 - **Managing Credentials:** 704 - 705 - ```bash 706 - # Clear saved credentials 707 - pnpm start --clear-credentials 708 - 709 - # Or through interactive mode (option 7) 710 - pnpm start 711 - ``` 712 - 713 - **Important Notes:** 714 - 715 - - Credentials are machine-specific and won't work if you copy the file to another computer 716 - - This is a convenience feature - you can always enter credentials manually 717 - - If you change your password, clear and re-save credentials 718 - 719 - ## Project Structure 720 - 721 - ``` 722 - malachite/ 723 - ├── src/ 724 - │ ├── lib/ 725 - │ │ ├── auth.ts # Authentication & identity resolution 726 - │ │ ├── cli.ts # Command line interface & argument parsing 727 - │ │ ├── csv.ts # CSV parsing & record conversion 728 - │ │ ├── publisher.ts # Batch publishing with rate limiting 729 - │ │ ├── spotify.ts # Spotify JSON parsing 730 - │ │ ├── merge.ts # Combined import deduplication 731 - │ │ └── sync.ts # Re-sync mode & duplicate detection 732 - │ ├── utils/ 733 - │ │ ├── car-fetch.ts # CAR export fetcher (com.atproto.sync.getRepo) 734 - │ │ ├── logger.ts # Structured logging system 735 - │ │ ├── helpers.ts # Utility functions (timing, formatting) 736 - │ │ ├── input.ts # User input handling (prompts, passwords) 737 - │ │ ├── rate-limiter.ts # Rate limiting with server-learned quota 738 - │ │ ├── killswitch.ts # Graceful shutdown handling 739 - │ │ ├── tid.ts # TID generation from timestamps 740 - │ │ └── ui.ts # UI elements (spinners, progress bars) 741 - │ ├── config.ts # Configuration constants & version 742 - │ └── types.ts # TypeScript type definitions 743 - ├── web/ # SvelteKit web app 744 - │ └── src/lib/ 745 - │ ├── core/ # Browser-safe equivalents of src/lib & src/utils 746 - │ │ ├── auth.ts # Password-based ATProto login 747 - │ │ ├── car-fetch.ts# CAR export fetcher (browser-safe) 748 - │ │ ├── csv.ts # CSV parser (no csv-parse dep) 749 - │ │ ├── import.ts # Import orchestration 750 - │ │ ├── merge.ts # Combined import deduplication 751 - │ │ ├── oauth.ts # ATProto OAuth client 752 - │ │ ├── publisher.ts# Batch publisher with progress callbacks 753 - │ │ ├── rate-limiter.ts # In-memory rate limiter 754 - │ │ ├── spotify.ts # Spotify JSON parser 755 - │ │ ├── sync.ts # CAR-based sync & dedup 756 - │ │ └── tid.ts # TID generation (Web Crypto API) 757 - │ ├── config.ts # Shared constants (version injected by Vite) 758 - │ ├── modes.ts # Import mode definitions 759 - │ └── types.ts # TypeScript type definitions 760 - ├── lexicons/ # fm.teal.alpha lexicon definitions 761 - │ └── fm.teal.alpha/ 762 - │ └── feed/ 763 - │ └── play.json # Play record schema 764 - ├── package.json 765 - ├── tsconfig.json 766 - └── README.md 767 - ``` 768 - 769 - ## Technical Details 770 - 771 - ### Authentication 772 - 773 - - Uses Slingshot resolver to discover your PDS from your handle/DID 774 - - Requires an ATProto app password (not your main password) for the CLI 775 - - Web app supports ATProto OAuth — no app password needed 776 - - Automatically configures the agent for your personal PDS 777 - 778 - ### Batch Publishing 779 - 780 - - Uses `com.atproto.repo.applyWrites` for efficiency (up to 20x faster than individual calls) 781 - - Batches up to 200 records per API call (PDS maximum) 782 - - **Dynamic batch sizing** (1-200 records) based on real-time rate limit quota 783 - - **Intelligent quota monitoring** with 15% headroom buffer 784 - - **Automatic adjustment** - scales down as quota depletes, scales up after reset 785 - - Enforces minimum delays between batches for rate limit safety 786 - 787 - ### CAR Export Sync 788 - 789 - All read paths (duplicate checks, sync, deduplicate) use `com.atproto.sync.getRepo` to download the user's entire repo as a CARv1 file. The CAR is parsed locally using `@ipld/car` and `@ipld/dag-cbor` — no AppView XRPC calls are made for reads, so the sync check costs zero write-quota points. 790 - 791 - ### Data Mapping 792 - 793 - **Last.fm:** 794 - 795 - - Direct mapping from CSV columns 796 - - Converts Unix timestamps to ISO 8601 797 - - Preserves MusicBrainz IDs when present 798 - - Generates URLs from artist/track names 799 - - Wraps artists in array format with optional MBID 800 - 801 - **Spotify:** 802 - 803 - - Extracts data from JSON fields 804 - - Already in ISO 8601 format (`ts` field) 805 - - Generates URLs from `spotify_track_uri` 806 - - Automatically filters non-music content 807 - - Extracts artist and album from metadata fields 808 - 809 - ### Lexicon Reference 810 - 811 - This importer follows the official `fm.teal.alpha` lexicon defined in `/lexicons/fm.teal.alpha/feed/play.json`. 812 - 813 - The lexicon defines required and optional field types, string length constraints, array formats, timestamp formatting, and URL validation. 814 - 815 - ## Contributing 816 - 817 - Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, architecture notes, and PR guidelines. 818 - 819 - ## License 820 - 821 - AGPL-3.0-only - See LICENCE file for details 822 - 823 - ## ☕ Support 824 - 825 - If you found this useful, consider [buying me a ko-fi](https://ko-fi.com/ewancroft)! 826 - 827 - ## Credits 828 - 829 - - Uses [@atproto/api](https://www.npmjs.com/package/@atproto/api) for ATProto interactions 830 - - CSV parsing via [csv-parse](https://www.npmjs.com/package/csv-parse) 831 - - Identity resolution via [Slingshot](https://slingshot.danner.cloud) 832 - - Follows the `fm.teal.alpha` lexicon standard 833 - - Colored output via [chalk](https://www.npmjs.com/package/chalk) 834 - - Progress indicators via [ora](https://www.npmjs.com/package/ora) and [cli-progress](https://www.npmjs.com/package/cli-progress) 835 - - Web app built with [SvelteKit](https://kit.svelte.dev) and [Tailwind CSS](https://tailwindcss.com) 836 - - ATProto OAuth via [@atproto/oauth-client-browser](https://www.npmjs.com/package/@atproto/oauth-client-browser) 837 - 838 - --- 839 - 840 - **Note**: This tool is for personal use. Respect the terms of service and rate limits when importing your data. 13 + Git history has been fully preserved in the monorepo via `git filter-repo`. 14 + This repository is now archived and will not receive further updates.