···11# Contributing to Malachite
2233-Thanks for wanting to help. This document covers the layout of the project, how to get a dev environment running, and what to keep in mind when opening a PR.
44-55-## Table of Contents
66-77-- [Project layout](#project-layout)
88-- [Prerequisites](#prerequisites)
99-- [Getting started](#getting-started)
1010-- [Running the CLI](#running-the-cli)
1111-- [Running the web app](#running-the-web-app)
1212-- [Running tests](#running-tests)
1313-- [Code architecture](#code-architecture)
1414-- [Making changes](#making-changes)
1515-- [Opening a pull request](#opening-a-pull-request)
1616-- [Publishing `@ewanc26/tid`](#publishing-ewanc26tid)
1717-1818----
1919-2020-## Project layout
2121-2222-This is a pnpm monorepo with three separate projects:
2323-2424-```
2525-malachite/
2626-├── src/ # CLI — TypeScript, compiled to dist/ by tsc
2727-│ ├── core/ # Environment-agnostic logic (shared with web)
2828-│ ├── lib/ # CLI wrappers around core (Node.js-specific)
2929-│ └── utils/ # Legacy CLI utilities (gradually migrating to core/)
3030-├── web/ # SvelteKit web app
3131-│ └── src/lib/
3232-│ ├── core/ # Thin re-exports of src/core/ via the $core alias
3333-│ └── ... # Svelte components, routes, config
3434-├── packages/
3535-│ └── tid/ # @ewanc26/tid — standalone npm package
3636-└── lexicons/ # fm.teal.alpha lexicon definitions
3737-```
3838-3939-The key architectural rule is that **`src/core/` is the single source of truth** for all non-UI logic. The CLI wrappers in `src/lib/` adapt it for terminal use (spinners, file I/O, credentials). The web app re-exports it via `$core` path alias — there should be no duplicated logic between the two surfaces.
4040-4141----
4242-4343-## Prerequisites
4444-4545-- **Node.js 20+** (required for Web Crypto API via `globalThis.crypto`)
4646-- **pnpm 9+** — install with `npm i -g pnpm` if you don't have it
4747-4848----
4949-5050-## Getting started
5151-5252-```sh
5353-# Clone and install everything in one shot — pnpm workspaces handles all three projects
5454-git clone https://github.com/ewanc26/malachite
5555-cd malachite
5656-pnpm install
5757-```
5858-5959----
6060-6161-## Running the CLI
6262-6363-```sh
6464-# Build
6565-pnpm build
6666-6767-# Dry run against a real Last.fm export
6868-node dist/index.js -i my-export.csv --dry-run
6969-7070-# Interactive mode
7171-node dist/index.js
7272-7373-# Rebuild and run in one step
7474-pnpm dev
7575-```
7676-7777-For development work it's worth using `--dev` mode, which enables verbose logging and caps batch sizes to 20 records so you're not waiting through thousands of API calls:
7878-7979-```sh
8080-node dist/index.js -i my-export.csv --dev --dry-run
8181-```
8282-8383----
8484-8585-## Running the web app
8686-8787-```sh
8888-cd web
8989-pnpm dev # starts at http://127.0.0.1:5173
9090-```
9191-9292-The dev server **must** run on `127.0.0.1:5173` exactly — the ATProto OAuth loopback `redirect_uri` is pinned to that origin per RFC 8252. Don't change the host or port without also updating `web/src/lib/core/oauth.ts`.
9393-9494-For changes to shared `src/core/` files, the web app picks them up immediately via the `$core` alias — no separate build step needed.
9595-9696----
9797-9898-## Running tests
9999-100100-Tests use Node's built-in test runner — no Jest, no Vitest.
101101-102102-```sh
103103-# Build first, then run all tests
104104-pnpm test
105105-106106-# Run just the TID tests
107107-pnpm test:tid
108108-109109-# Watch mode
110110-pnpm test:watch
111111-```
112112-113113-Tests live in `src/tests/`. If you're adding a new file to `src/core/`, add a corresponding test file there. The existing tests are a good reference for the style — `node:test` + `node:assert`, no extra dependencies.
114114-115115----
116116-117117-## Code architecture
118118-119119-### The `src/core/` contract
120120-121121-Any code that lives in `src/core/` must be:
122122-123123-- **Zero Node.js dependencies** — no `fs`, `path`, `crypto` module, etc. Web Crypto (`globalThis.crypto`) is fine because Node 20+ and all modern browsers support it.
124124-- **Callback-based for progress/logging** — functions accept optional `onProgress` callbacks rather than calling `console.log` directly.
125125-- **Tested** — new files should have coverage in `src/tests/`.
126126-127127-If you need Node.js-specific behaviour (file I/O, terminal spinners, credential storage), put it in `src/lib/` as a thin wrapper that calls into `src/core/`.
128128-129129-### Adding something to the web
130130-131131-The web's `web/src/lib/core/` files are almost all one-liners:
132132-133133-```ts
134134-export * from '$core/your-new-file.js';
135135-```
136136-137137-The only exceptions are `csv.ts` and `spotify.ts`, which add browser `File` API loaders on top of the core parsers, and `oauth.ts` / `import.ts`, which are web-only. Follow that same pattern for anything new.
138138-139139-### Rate limiting
140140-141141-The rate limiter in `src/core/rate-limiter.ts` learns quota from response headers on the first successful batch and then gates all subsequent batches. If you're changing publish behaviour, keep the 15% headroom buffer intact — exceeding the PDS daily limit affects every user on a shared instance, not just the one importing.
142142-143143-### TIDs
144144-145145-Record keys are generated from `playedTime` using the TID clock in `src/core/tid.ts`. The clock is monotonic — even if records arrive out of order, every call produces a strictly increasing TID within the same process/page. Don't make the generation async (it isn't) and don't reset `lastUs` outside of tests.
146146-147147----
148148-149149-## Making changes
150150-151151-### Bugfixes
152152-153153-Open a PR against `main` with a description of what was broken and how you've verified the fix. A failing test that now passes is ideal.
154154-155155-### New features
156156-157157-Open an issue first if the change is significant — it's worth a quick discussion before writing a lot of code. For smaller additions (a new CLI flag, a missing field in the data mapping, an extra export from `@ewanc26/tid`) just open the PR directly.
158158-159159-### Changing shared core logic
160160-161161-Changes to `src/core/` affect all three surfaces (CLI, web, `@ewanc26/tid` if relevant). Run both `pnpm test` and `cd web && pnpm check` before submitting to make sure nothing is broken on either side.
162162-163163-### Style
164164-165165-- TypeScript strict mode is on — no `any` without a comment explaining why.
166166-- No new runtime dependencies in `src/core/` or `packages/tid/`.
167167-- Prefer named exports over default exports.
168168-- Keep comments on the *why*, not the *what*.
169169-170170----
171171-172172-## Opening a pull request
173173-174174-1. Fork the repo and create a branch from `main`.
175175-2. Make your changes, including tests where applicable.
176176-3. Verify `pnpm test` passes and `pnpm run type-check` is clean.
177177-4. If you touched the web app, verify `cd web && pnpm check` is clean too.
178178-5. Open a PR with a clear description of what changed and why.
33+**This repository has been archived.** Development continues in [`ewanc26/pkgs`](https://github.com/ewanc26/pkgs) — please open issues and PRs there.
1794180180-There's no formal CLA or contributor agreement — AGPL-3.0 covers contributions automatically.
181181-182182----
183183-184184-## Publishing `@ewanc26/tid`
185185-186186-> **Note:** `@ewanc26/tid` is now canonically maintained in the [`@ewanc26/pkgs`](https://github.com/ewanc26/pkgs) monorepo. The copy in `packages/tid/` here is kept for historical context. All version bumps, releases, and npm publishes should happen from there.
187187-188188-To cut a new release, work from the pkgs monorepo:
189189-190190-```sh
191191-cd /path/to/pkgs
192192-git subtree pull --prefix=packages/tid malachite malachite-split
193193-194194-cd packages/tid
195195-# Bump the version in package.json, then:
196196-pnpm build
197197-npm publish --access public --otp=<your-2fa-code>
198198-```
199199-200200-The package has no runtime dependencies and must stay that way. If a change to `src/core/tid.ts` in this repo affects the public API of the package, update `packages/tid/src/index.ts` in `pkgs` to match and bump the version accordingly (semver — patch for fixes, minor for new exports, major for breaking changes).
55+- CLI: [`packages/malachite/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite)
66+- Web: [`packages/malachite-web/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite-web)
+10-836
README.md
···11-# Malachite
22-33-Import your Last.fm and Spotify listening history to the AT Protocol network using the `fm.teal.alpha.feed.play` lexicon.
44-55-**Repository:** [malachite](https://github.com/ewanc26/malachite)
66-[Also available on Tangled](https://tangled.org/did:plc:ofrbh253gwicbkc5nktqepol/atproto-lastfm-importer)
77-88-## Table of Contents
99-1010-- [⚠️ Important: Rate Limits](#️-important-rate-limits)
1111- - [How Dynamic Batch Sizing Works](#how-dynamic-batch-sizing-works)
1212-- [What's with the name?](#whats-with-the-name)
1313-- [Web App](#web-app)
1414-- [Quick Start](#quick-start)
1515- - [Interactive Mode (Recommended for First-Time Users)](#interactive-mode-recommended-for-first-time-users)
1616- - [Command Line Mode](#command-line-mode)
1717-- [Features](#features)
1818- - [Import Capabilities](#import-capabilities)
1919- - [Performance & Safety](#performance--safety)
2020- - [User Experience](#user-experience)
2121- - [Technical Features](#technical-features)
2222-- [Usage Examples](#usage-examples)
2323- - [Combined Import (Last.fm + Spotify)](#combined-import-lastfm--spotify)
2424- - [Re-Sync Mode](#re-sync-mode)
2525- - [Remove Duplicates](#remove-duplicates)
2626- - [Import from Spotify](#import-from-spotify)
2727- - [Import from Last.fm](#import-from-lastfm)
2828- - [Advanced Options](#advanced-options)
2929-- [Command Line Options](#command-line-options)
3030- - [Required Options](#required-options)
3131- - [Import Mode](#import-mode)
3232- - [Additional Options](#additional-options)
3333- - [PDS Override](#pds-override)
3434- - [Legacy Flags (Backwards Compatible)](#legacy-flags-backwards-compatible)
3535-- [Getting Your Data](#getting-your-data)
3636- - [Last.fm Export](#lastfm-export)
3737- - [Spotify Export](#spotify-export)
3838-- [Data Format](#data-format)
3939- - [Required Fields](#required-fields)
4040- - [Optional Fields](#optional-fields)
4141- - [Example Records](#example-records)
4242-- [How It Works](#how-it-works)
4343- - [Processing Flow](#processing-flow)
4444- - [Automatic Duplicate Prevention](#automatic-duplicate-prevention)
4545- - [Rate Limiting Algorithm](#rate-limiting-algorithm)
4646- - [Multi-Day Imports](#multi-day-imports)
4747-- [Logging and Output](#logging-and-output)
4848- - [Verbosity Levels](#verbosity-levels)
4949-- [Error Handling](#error-handling)
5050-- [Troubleshooting](#troubleshooting)
5151- - [Authentication Issues](#authentication-issues)
5252- - [Performance Issues](#performance-issues)
5353- - [Connection Issues](#connection-issues)
5454- - [Output Control](#output-control)
5555-- [Development](#development)
5656-- [File Storage](#file-storage)
5757- - [Credential Storage](#credential-storage)
5858-- [Project Structure](#project-structure)
5959-- [Technical Details](#technical-details)
6060- - [Authentication](#authentication)
6161- - [Batch Publishing](#batch-publishing)
6262- - [Data Mapping](#data-mapping)
6363- - [Lexicon Reference](#lexicon-reference)
6464-- [Contributing](#contributing)
6565-- [License](#license)
6666-- [Credits](#credits)
6767-6868-## ⚠️ Important: Rate Limits
6969-7070-**CRITICAL**: Bluesky's AppView has rate limits on PDS instances. Exceeding 10K records per day can rate limit your **ENTIRE PDS**, affecting all users on your instance.
7171-7272-This importer automatically protects your PDS by:
7373-7474-- **Dynamic batch sizing** (1-200 records) that adapts to available quota in real-time
7575-- **15% headroom buffer** prevents quota exhaustion before hitting the limit
7676-- Limiting imports to **7,500 records per day** (with 75% safety margin)
7777-- Calculating optimal batch sizes and delays
7878-- **Graceful degradation** - scales down smoothly as quota depletes
7979-- **Instant recovery** - immediately returns to maximum speed after quota resets
8080-- Pausing 24 hours between days for large imports
8181-- Providing clear progress tracking and time estimates
8282-- Persisting state across restarts for safe resume
8383-8484-### How Dynamic Batch Sizing Works
8585-8686-Malachite continuously monitors your rate limit quota and automatically adjusts batch size:
8787-8888-```
8989-Fresh Quota (5000 points) → Batch Size: 200 records (maximum speed)
9090-Half Depleted (2500 points) → Batch Size: 200 records (still optimal)
9191-Approaching Limit (1200) → Batch Size: 150 records (scaling down)
9292-Near Headroom (900) → Batch Size: 50 records (conservative)
9393-Below Headroom (700) → Batch Size: 1 record (minimal progress)
9494-[Quota Resets] → Batch Size: 200 records (instant recovery)
9595-```
9696-9797-**Benefits:**
9898-9999-- ✅ **2x faster** when quota is fresh (200 vs 100 records/batch)
100100-- ✅ **Never hits rate limits** - proactive scaling with 15% buffer
101101-- ✅ **Always makes progress** - even with minimal quota (batch size 1)
102102-- ✅ **Automatic recovery** - no manual intervention needed
103103-- ✅ **Transparent** - logs all batch size changes with reasons
104104-105105-For more details, see the [Bluesky Rate Limits Documentation](https://docs.bsky.app/blog/rate-limits-pds-v3).
106106-107107-## What's with the name?
108108-109109-It used to be called `atproto-lastfm-importer` — generic as fuck. That name told you what it did and nothing about why it mattered, and it sounded like a disposable weekend script. So I renamed it.
110110-111111-At the moment, the repository is still called `atproto-lastfm-importer` on Tangled, but the GitHub link has been updated to `malachite`. I do not know if this can be resolved.
112112-113113-**Malachite** is a greenish-blue copper mineral associated with preservation and transformation. That's exactly what this tool does: it preserves your scrobbles and transforms them into proper `fm.teal.alpha.feed.play` records on the AT Protocol. The colour match isn't an accident — malachite sits squarely in the teal/green range, a deliberate nod to the `teal` lexicon it publishes to.
114114-115115-## Web App
116116-117117-Malachite also ships a browser-based web app (`web/`) built with SvelteKit. It supports all five import modes and signs in via ATProto OAuth — no app password required.
118118-119119-**Running the web app in development:**
120120-121121-```bash
122122-cd web
123123-pnpm install
124124-pnpm dev # starts at http://127.0.0.1:5173
125125-```
126126-127127-> **Note:** The dev server must run on `127.0.0.1:5173` exactly. This is enforced in `vite.config.ts` because the OAuth loopback `redirect_uri` is pinned to that origin (RFC 8252 §7.3). Do not change the host or port without updating the OAuth client metadata.
128128-129129-The web app fetches existing records using the same CAR-export path as the CLI (`com.atproto.sync.getRepo`) so it costs zero AppView write-quota points to check for duplicates.
130130-131131-## Quick Start
132132-133133-**Note:** You must build the project first, then run with arguments.
134134-135135-### Interactive Mode (Recommended for First-Time Users)
136136-137137-Just run without any arguments and Malachite will guide you through the process:
138138-139139-```bash
140140-# Install dependencies and build
141141-pnpm install
142142-pnpm build
143143-144144-# Run in interactive mode
145145-pnpm start
146146-```
147147-148148-The interactive mode will:
149149-150150-- Present a menu of available actions
151151-- Prompt for all required information (handle, password, files)
152152-- Ask for optional settings (dry run, verbose logging, etc.)
153153-- Provide helpful descriptions for each option
154154-155155-### Command Line Mode
156156-157157-For automation or if you prefer command-line arguments:
158158-159159-```bash
160160-# Show help
161161-pnpm start --help
162162-163163-# Run with command line arguments
164164-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
165165-166166-# Alternative: run directly with node
167167-node dist/index.js -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
168168-```
169169-170170-## Features
171171-172172-### Import Capabilities
173173-174174-- ✅ **Last.fm Import**: Full support for Last.fm CSV exports with MusicBrainz IDs
175175-- ✅ **Spotify Import**: Import Extended Streaming History JSON files
176176-- ✅ **Combined Import**: Merge Last.fm and Spotify exports with intelligent deduplication
177177-- ✅ **Re-Sync Mode**: Import only new scrobbles without creating duplicates
178178-- ✅ **Duplicate Removal**: Clean up accidentally imported duplicate records
179179-180180-### Performance & Safety
181181-182182-- ✅ **Automatic Duplicate Prevention**: Fetches your existing Teal records via CAR export and skips anything already imported
183183-- ✅ **Input Deduplication**: Removes duplicate entries within the source file before submission
184184-- ✅ **Dynamic Batch Sizing**: Automatically adjusts batch size (1-200 records) based on available rate limit quota
185185-- ✅ **Batch Operations**: Uses `com.atproto.repo.applyWrites` for efficient batch publishing (up to 200 records per call)
186186-- ✅ **Zero-cost sync check**: Existing record fetching uses `com.atproto.sync.getRepo` (CAR export) — a separate, far more generous rate-limit envelope that costs zero AppView write-quota points
187187-- ✅ **Intelligent Rate Limiting**: Real-time quota monitoring with 15% headroom buffer prevents rate limit exhaustion
188188-- ✅ **Adaptive Recovery**: Automatically scales back to maximum speed after quota resets
189189-- ✅ **Multi-Day Imports**: Large imports automatically span multiple days with 24-hour pauses
190190-- ✅ **Resume Support**: Safe to stop (Ctrl+C) and restart - continues from where it left off
191191-- ✅ **Graceful Cancellation**: Press Ctrl+C to stop after the current batch completes
192192-193193-### User Experience
194194-195195-- ✅ **Structured Logging**: Color-coded output with debug/verbose modes
196196-- ✅ **Progress Tracking**: Real-time progress with time estimates
197197-- ✅ **Dry Run Mode**: Preview records without publishing
198198-- ✅ **Interactive Mode**: Simple prompts guide you through the process
199199-- ✅ **Command Line Mode**: Full automation support for scripting
200200-- ✅ **Web App**: Browser-based UI with ATProto OAuth sign-in
201201-202202-### Technical Features
203203-204204-- ✅ **TID-based Record Keys**: Timestamp-based identifiers for chronological ordering
205205-- ✅ **Identity Resolution**: Resolves ATProto handles/DIDs using Slingshot
206206-- ✅ **PDS Auto-Discovery**: Automatically connects to your personal PDS
207207-- ✅ **MusicBrainz Support**: Preserves MusicBrainz IDs when available (Last.fm)
208208-- ✅ **Chronological Ordering**: Processes oldest first (or newest with `-r` flag)
209209-- ✅ **Error Handling**: Continues on errors with detailed reporting
210210-211211-## Usage Examples
212212-213213-### Combined Import (Last.fm + Spotify)
214214-215215-Merge your Last.fm and Spotify listening history into a single, deduplicated import:
216216-217217-```bash
218218-# Preview the merged import
219219-pnpm start -i lastfm.csv --spotify-input spotify-export/ -m combined --dry-run
220220-221221-# Perform the combined import
222222-pnpm start -i lastfm.csv --spotify-input spotify-export/ -m combined -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
223223-```
224224-225225-**What combined mode does:**
226226-227227-1. Parses both Last.fm CSV and Spotify JSON exports
228228-2. Normalizes track names and artist names for comparison
229229-3. Identifies duplicate plays (same track within 5 minutes)
230230-4. Chooses the best version of each play (prefers Last.fm with MusicBrainz IDs)
231231-5. Merges into a single chronological timeline
232232-6. Shows detailed statistics about the merge
233233-234234-### Re-Sync Mode
235235-236236-Sync your Last.fm export with Teal without creating duplicates:
237237-238238-```bash
239239-# Preview what will be synced
240240-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -m sync --dry-run
241241-242242-# Perform the sync
243243-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -m sync -y
244244-```
245245-246246-**Perfect for:**
247247-248248-- Re-running imports with updated Last.fm exports
249249-- Recovering from interrupted imports
250250-- Adding recent scrobbles without duplicating old ones
251251-252252-**Note:** Sync mode requires authentication even in dry-run mode to fetch existing records.
253253-254254-### Remove Duplicates
255255-256256-Clean up accidentally imported duplicate records:
257257-258258-```bash
259259-# Preview duplicates (dry run)
260260-pnpm start -m deduplicate -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx --dry-run
261261-262262-# Remove duplicates (keeps first occurrence)
263263-pnpm start -m deduplicate -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx
264264-```
265265-266266-### Import from Spotify
267267-268268-```bash
269269-# Import single Spotify JSON file
270270-pnpm start -i Streaming_History_Audio_2021-2023_0.json -m spotify -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
271271-272272-# Import directory with multiple Spotify files (recommended)
273273-pnpm start -i '/path/to/Spotify Extended Streaming History' -m spotify -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
274274-```
275275-276276-### Import from Last.fm
277277-278278-```bash
279279-# Standard Last.fm import
280280-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y
281281-282282-# Preview without publishing
283283-pnpm start -i lastfm.csv --dry-run
284284-285285-# Process newest tracks first
286286-pnpm start -i lastfm.csv -h alice.bsky.social -r -y
287287-288288-# Verbose debug output
289289-pnpm start -i lastfm.csv --dry-run -v
290290-291291-# Quiet mode (only warnings and errors)
292292-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -q -y
293293-```
294294-295295-### Advanced Options
296296-297297-```bash
298298-# Development mode (verbose + file logging + smaller batches for debugging)
299299-pnpm start -i lastfm.csv --dev --dry-run
300300-301301-# Custom batch settings (advanced users only)
302302-pnpm start -i lastfm.csv -h alice.bsky.social -b 20 -d 3000
303303-304304-# Full automation with all flags
305305-pnpm start -i lastfm.csv -h alice.bsky.social -p xxxx-xxxx-xxxx-xxxx -y -q
306306-```
307307-308308-## Command Line Options
309309-310310-**Note:** When importing data (not in deduplicate mode), you must provide `--input`, `--handle`, and `--password`. The `--yes` flag skips confirmation prompts for automation.
311311-312312-### Required Options
313313-314314-| Option | Short | Description | Example |
315315-| ------------------- | ----- | -------------------------------------------------- | ------------------------ |
316316-| `--input <path>` | `-i` | Path to Last.fm CSV or Spotify JSON file/directory | `-i lastfm.csv` |
317317-| `--handle <handle>` | `-h` | ATProto handle or DID | `-h alice.bsky.social` |
318318-| `--password <pass>` | `-p` | ATProto app password | `-p xxxx-xxxx-xxxx-xxxx` |
319319-320320-### Import Mode
321321-322322-| Option | Short | Description | Default |
323323-| --------------- | ----- | ----------- | -------- |
324324-| `--mode <mode>` | `-m` | Import mode | `lastfm` |
325325-326326-**Available modes:**
327327-328328-- `lastfm` - Import Last.fm export only
329329-- `spotify` - Import Spotify export only
330330-- `combined` - Merge Last.fm + Spotify exports
331331-- `sync` - Skip existing records (sync mode)
332332-- `deduplicate` - Remove duplicate records
333333-334334-### Additional Options
335335-336336-| Option | Short | Description | Default |
337337-| ------------------------ | ----- | ----------------------------------------------------------- | --------------- |
338338-| `--spotify-input <path>` | | Path to Spotify export (for combined mode) | - |
339339-| `--reverse` | `-r` | Process newest first | `false` |
340340-| `--yes` | `-y` | Skip confirmation prompts | `false` |
341341-| `--dry-run` | | Preview without importing | `false` |
342342-| `--verbose` | `-v` | Enable debug logging | `false` |
343343-| `--quiet` | `-q` | Suppress non-essential output | `false` |
344344-| `--dev` | | Development mode (verbose + file logging + smaller batches) | `false` |
345345-| `--batch-size <num>` | `-b` | Initial batch size (1-200, dynamically adjusted) | Auto-calculated |
346346-| `--batch-delay <ms>` | `-d` | Delay between batches in ms | `500` (min) |
347347-| `--help` | | Show help message | - |
348348-349349-### PDS Override
350350-351351-If you already know the base URL of your Personal Data Server (PDS) you can bypass the Slingshot identity resolver and provide it directly with the `--pds` flag. This is useful for private instances, testing, or when the resolver is unreliable.
352352-353353-| Option | Description |
354354-| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
355355-| `--pds <url>` | PDS base URL to use for authentication and API calls (e.g. `https://pds.example.com`). When provided, Malachite will skip Slingshot lookup and use this URL directly. |
356356-357357-Notes:
358358-359359-- The `--pds` flag overrides the configured Slingshot resolver for identity lookup. If `--pds` is given, Malachite will attempt to authenticate directly against the supplied PDS using your handle/DID and app password.
360360-- Use the full base URL (including scheme), e.g. `https://pds.example.com`.
361361-- If authentication fails when using `--pds`, try removing the flag so Malachite can resolve your PDS automatically via Slingshot.
362362-363363-### Legacy Flags (Backwards Compatible)
364364-365365-These old flags still work but are deprecated:
366366-367367-- `--file` → Use `--input`
368368-- `--identifier` → Use `--handle`
369369-- `--spotify-file` → Use `--spotify-input`
370370-- `--reverse-chronological` → Use `--reverse`
371371-- `--spotify` → Use `--mode spotify`
372372-- `--combined` → Use `--mode combined`
373373-- `--sync` → Use `--mode sync`
374374-- `--remove-duplicates` → Use `--mode deduplicate`
375375-376376-## Getting Your Data
377377-378378-### Last.fm Export
379379-380380-1. Visit [Last.fm Export Tool](https://lastfm.ghan.nl/export/)
381381-2. Request your data export in CSV format
382382-3. Download the CSV file when ready
383383-4. Use the CSV file path with this importer
384384-385385-### Spotify Export
386386-387387-1. Go to [Spotify Privacy Settings](https://www.spotify.com/account/privacy/)
388388-2. Scroll to "Download your data" and request your data
389389-3. Select "Extended streaming history" (can take up to 30 days)
390390-4. When ready, download and extract the ZIP file
391391-5. Use either:
392392- - A single JSON file: `Streaming_History_Audio_2021-2023_0.json`
393393- - The entire extracted directory (recommended)
394394-395395-**Note:** The importer automatically:
396396-397397-- Reads all `Streaming_History_Audio_*.json` files in a directory
398398-- Filters out podcasts, audiobooks, and non-music content
399399-- Combines all music tracks into a single import
400400-401401-## Data Format
402402-403403-Each scrobble becomes an `fm.teal.alpha.feed.play` record with:
404404-405405-### Required Fields
406406-407407-- **trackName**: The name of the track
408408-- **artists**: Array of artist objects (requires `artistName`, optional `artistMbId` for Last.fm)
409409-- **playedTime**: ISO 8601 timestamp of when you listened
410410-- **submissionClientAgent**: Identifies this importer (`malachite/v0.10.0` for CLI, `malachite/v0.3.0 (web)` for the web app)
411411-- **musicServiceBaseDomain**: Set to `last.fm` or `spotify.com`
412412-413413-### Optional Fields
414414-415415-- **releaseName**: Album/release name
416416-- **releaseMbId**: MusicBrainz release ID (Last.fm only)
417417-- **recordingMbId**: MusicBrainz recording/track ID (Last.fm only)
418418-- **originUrl**: Link to the track on Last.fm or Spotify
419419-420420-### Example Records
421421-422422-**Last.fm Record:**
423423-424424-```json
425425-{
426426- "$type": "fm.teal.alpha.feed.play",
427427- "trackName": "Paint My Masterpiece",
428428- "artists": [
429429- {
430430- "artistName": "Cjbeards",
431431- "artistMbId": "c8d4f4bf-1b82-4d4d-9d73-05909faaff89"
432432- }
433433- ],
434434- "releaseName": "Masquerade",
435435- "releaseMbId": "fdb2397b-78d5-4019-8fad-656d286e4d33",
436436- "recordingMbId": "3a390ad3-fe56-45f2-a073-bebc45d6bde1",
437437- "playedTime": "2025-11-13T23:49:36Z",
438438- "originUrl": "https://www.last.fm/music/Cjbeards/_/Paint+My+Masterpiece",
439439- "submissionClientAgent": "malachite/v0.10.0",
440440- "musicServiceBaseDomain": "last.fm"
441441-}
442442-```
443443-444444-**Spotify Record:**
445445-446446-```json
447447-{
448448- "$type": "fm.teal.alpha.feed.play",
449449- "trackName": "Don't Give Up",
450450- "artists": [
451451- {
452452- "artistName": "Chicane"
453453- }
454454- ],
455455- "releaseName": "Twenty",
456456- "playedTime": "2021-09-09T10:34:08Z",
457457- "originUrl": "https://open.spotify.com/track/3gZqDJkMZipOYCRjlHWgOV",
458458- "submissionClientAgent": "malachite/v0.10.0",
459459- "musicServiceBaseDomain": "spotify.com"
460460-}
461461-```
462462-463463-## How It Works
464464-465465-### Processing Flow
466466-467467-1. **Parses input file(s)**:
468468- - Last.fm: CSV using `csv-parse` library
469469- - Spotify: JSON files (single or multiple in directory)
470470-2. **Filters data**:
471471- - Spotify: Automatically removes podcasts, audiobooks, and non-music content
472472-3. **Converts to schema**: Maps to `fm.teal.alpha.feed.play` format
473473-4. **Deduplicates input**: Removes duplicate entries from the source data (keeps first occurrence)
474474-5. **Checks Teal**: Downloads the entire repo as a CAR file (`com.atproto.sync.getRepo`) and skips any records already imported — costs zero AppView write-quota points
475475-6. **Sorts records**: Chronologically (oldest first) or reverse with `-r` flag
476476-7. **Generates TID-based keys**: From `playedTime` for chronological ordering
477477-8. **Validates fields**: Ensures required fields are present
478478-9. **Publishes in batches**: Uses `com.atproto.repo.applyWrites` (up to 200 records per call)
479479-480480-### Automatic Duplicate Prevention
481481-482482-The importer has **two layers of duplicate prevention** to ensure you never import the same record twice:
483483-484484-#### Step 1: Input File Deduplication
485485-486486-Removes duplicates within your source file(s):
487487-488488-**How duplicates are identified:**
489489-490490-- Same track name (case-insensitive)
491491-- Same artist name (case-insensitive)
492492-- Same timestamp (exact match)
493493-494494-**What happens:**
495495-496496-- First occurrence is kept
497497-- Subsequent duplicates are removed
498498-- Shows message: "No duplicates found in input data" or "Removed X duplicate(s)"
499499-500500-#### Step 2: Teal Comparison via CAR Export
501501-502502-**Automatically checks your existing Teal records** by downloading your entire repo as a CARv1 file:
503503-504504-- One HTTP request fetches the whole repo (`com.atproto.sync.getRepo`)
505505-- The CAR file is parsed locally in memory — no AppView quota consumed
506506-- Compares every record against your input and skips anything already imported
507507-- Shows: "Skipped X already-imported record(s)"
508508-509509-**This means:**
510510-511511-- ✅ Safe to re-run imports with updated exports
512512-- ✅ Won't create duplicates if you run the import twice
513513-- ✅ Zero AppView write-quota cost for the sync check
514514-- ✅ Works automatically - no special mode needed
515515-516516-**Note:**
517517-518518-- Credentials are required even for `--dry-run` to fetch the CAR export
519519-- **Sync mode** (`-m sync`): Shows detailed statistics about what's being skipped
520520-- **Deduplicate mode** (`-m deduplicate`): Removes duplicates from already-imported Teal records (cleanup tool)
521521-522522-### Rate Limiting Algorithm
523523-524524-1. Calculates safe daily limit (75% of 10K = 7,500 records/day by default)
525525-2. Determines how many days needed for your import
526526-3. **Monitors rate limit quota in real-time** before each batch
527527-4. **Dynamically adjusts batch size** (1-200 records) based on available points
528528-5. **Preserves 15% headroom buffer** to prevent exhaustion
529529-6. **Automatically waits** when quota is exhausted (with countdown timer)
530530-7. **Instantly scales back up** to maximum batch size after quota resets
531531-8. Enforces minimum delay between batches
532532-9. Shows clear schedule and real-time batch size adjustments
533533-534534-### Multi-Day Imports
535535-536536-For imports exceeding the daily limit, the importer automatically:
537537-538538-1. **Calculates a schedule**: Splits your import across multiple days
539539-2. **Shows the plan**: Displays which records will be imported each day
540540-3. **Processes Day 1**: Imports the first batch of records
541541-4. **Pauses 24 hours**: Waits a full day before continuing
542542-5. **Repeats**: Continues until all records are imported
543543-544544-**Important notes:**
545545-546546-- You can safely stop (Ctrl+C) and restart
547547-- Progress is preserved - continues where it left off
548548-- Each day's progress is clearly displayed
549549-- Time estimates account for multi-day duration
550550-551551-## Logging and Output
552552-553553-The importer uses color-coded output for clarity:
554554-555555-- **Green (✓)**: Success messages
556556-- **Cyan (→)**: Progress updates
557557-- **Yellow (⚠️)**: Warnings
558558-- **Red (✗)**: Errors
559559-- **Bold Red (🛑)**: Fatal errors
560560-561561-### Verbosity Levels
562562-563563-**Default Mode**: Standard operational messages
564564-565565-```bash
566566-pnpm start -i lastfm.csv -h alice.bsky.social -p pass
567567-```
568568-569569-**Verbose Mode** (`-v`): Detailed debug information including batch timing and API calls
570570-571571-```bash
572572-pnpm start -i lastfm.csv -h alice.bsky.social -p pass -v
573573-```
574574-575575-**Quiet Mode** (`-q`): Only warnings and errors
576576-577577-```bash
578578-pnpm start -i lastfm.csv -h alice.bsky.social -p pass -q
579579-```
580580-581581-**Development Mode** (`--dev`): Verbose logging + file logging to `~/.malachite/logs/` + smaller batch sizes
582582-583583-```bash
584584-pnpm start -i lastfm.csv --dev --dry-run
585585-```
11+# Malachite — Archived
5862587587-Development mode is perfect for:
33+**This repository has been consolidated into [`ewanc26/pkgs`](https://github.com/ewanc26/pkgs).**
5884589589-- Debugging import issues with detailed logs
590590-- Testing changes with smaller batches (20 records max)
591591-- Preserving logs for later analysis
592592-- Troubleshooting problems with support
55+The code lives on — it's just no longer maintained here as a standalone repo.
5936594594-## Error Handling
77+| What | Where |
88+|---|---|
99+| CLI importer | [`packages/malachite/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite) |
1010+| Web frontend | [`packages/malachite-web/`](https://github.com/ewanc26/pkgs/tree/main/packages/malachite-web) |
1111+| Issues & PRs | [ewanc26/pkgs](https://github.com/ewanc26/pkgs/issues) |
59512596596-The importer is designed to be resilient:
597597-598598-- **Network errors**: Failed records are logged but don't stop the import
599599-- **Invalid data**: Skipped with error messages
600600-- **Authentication issues**: Clear error messages with suggested fixes
601601-- **Rate limit hits**: Automatic adjustment and retry logic
602602-- **Ctrl+C handling**: Gracefully stops after current batch
603603-604604-## Troubleshooting
605605-606606-### Authentication Issues
607607-608608-**"Handle not found"**
609609-610610-- Verify your ATProto handle is correct (e.g., `alice.bsky.social`)
611611-- Ensure you're using a valid DID or handle
612612-613613-**"Invalid credentials"**
614614-615615-- Use an **app password**, not your main account password
616616-- Generate app passwords in your account settings
617617-618618-### Performance Issues
619619-620620-**"Rate limit exceeded"**
621621-622622-- The importer should prevent this automatically
623623-- If you see this, wait 24 hours before retrying
624624-- Consider reducing batch size with `-b` flag
625625-626626-**Import seems stuck**
627627-628628-- Check progress messages - large imports take time
629629-- Multi-day imports pause for 24 hours between days
630630-- You can safely stop (Ctrl+C) and resume later
631631-- Use `--verbose` flag to see detailed progress
632632-633633-### Connection Issues
634634-635635-**"Connection refused"**
636636-637637-- Check your internet connection
638638-- Verify your PDS is accessible
639639-- Some PDSs may have firewall rules
640640-641641-### Output Control
642642-643643-**Too much output**
644644-645645-- Use `--quiet` flag to suppress non-essential messages
646646-- Only warnings and errors will be shown
647647-648648-**Need more details**
649649-650650-- Use `--verbose` flag to see debug-level information
651651-- Shows batch timing, API calls, and detailed progress
652652-653653-## Development
654654-655655-```bash
656656-# Type checking
657657-pnpm run type-check
658658-659659-# Build
660660-pnpm run build
661661-662662-# Development mode (rebuild + run)
663663-pnpm run dev
664664-665665-# Run tests
666666-pnpm run test
667667-668668-# Clean build artifacts
669669-pnpm run clean
670670-```
671671-672672-## File Storage
673673-674674-Malachite stores all its data in `~/.malachite/`:
675675-676676-```
677677-~/.malachite/
678678-├── cache/ # Cached Teal records (24-hour TTL)
679679-├── state/ # Import state for resume functionality
680680-├── logs/ # Import logs (when file logging is enabled)
681681-└── credentials.json # Encrypted credentials (optional, machine-specific)
682682-```
683683-684684-This keeps your project directory clean and follows standard Unix conventions.
685685-686686-### Credential Storage
687687-688688-Malachite automatically saves your ATProto credentials after a successful login so you don't need to re-enter them on the next run:
689689-690690-**Security Features:**
691691-692692-- ✅ **AES-256-GCM encryption** - Military-grade encryption
693693-- ✅ **Machine-specific** - Credentials are bound to your computer and can't be transferred
694694-- ✅ **Secure key derivation** - Uses PBKDF2 with 100,000 iterations
695695-- ✅ **File permissions** - Credentials file is readable only by you (Unix)
696696-697697-**How It Works:**
698698-699699-1. Credentials are encrypted using a key derived from your hostname + username and saved to `~/.malachite/credentials.json` after every successful login
700700-2. On the next run, Malachite loads saved credentials automatically
701701-3. In interactive mode, you'll be prompted whether to use the saved credentials or enter new ones
702702-703703-**Managing Credentials:**
704704-705705-```bash
706706-# Clear saved credentials
707707-pnpm start --clear-credentials
708708-709709-# Or through interactive mode (option 7)
710710-pnpm start
711711-```
712712-713713-**Important Notes:**
714714-715715-- Credentials are machine-specific and won't work if you copy the file to another computer
716716-- This is a convenience feature - you can always enter credentials manually
717717-- If you change your password, clear and re-save credentials
718718-719719-## Project Structure
720720-721721-```
722722-malachite/
723723-├── src/
724724-│ ├── lib/
725725-│ │ ├── auth.ts # Authentication & identity resolution
726726-│ │ ├── cli.ts # Command line interface & argument parsing
727727-│ │ ├── csv.ts # CSV parsing & record conversion
728728-│ │ ├── publisher.ts # Batch publishing with rate limiting
729729-│ │ ├── spotify.ts # Spotify JSON parsing
730730-│ │ ├── merge.ts # Combined import deduplication
731731-│ │ └── sync.ts # Re-sync mode & duplicate detection
732732-│ ├── utils/
733733-│ │ ├── car-fetch.ts # CAR export fetcher (com.atproto.sync.getRepo)
734734-│ │ ├── logger.ts # Structured logging system
735735-│ │ ├── helpers.ts # Utility functions (timing, formatting)
736736-│ │ ├── input.ts # User input handling (prompts, passwords)
737737-│ │ ├── rate-limiter.ts # Rate limiting with server-learned quota
738738-│ │ ├── killswitch.ts # Graceful shutdown handling
739739-│ │ ├── tid.ts # TID generation from timestamps
740740-│ │ └── ui.ts # UI elements (spinners, progress bars)
741741-│ ├── config.ts # Configuration constants & version
742742-│ └── types.ts # TypeScript type definitions
743743-├── web/ # SvelteKit web app
744744-│ └── src/lib/
745745-│ ├── core/ # Browser-safe equivalents of src/lib & src/utils
746746-│ │ ├── auth.ts # Password-based ATProto login
747747-│ │ ├── car-fetch.ts# CAR export fetcher (browser-safe)
748748-│ │ ├── csv.ts # CSV parser (no csv-parse dep)
749749-│ │ ├── import.ts # Import orchestration
750750-│ │ ├── merge.ts # Combined import deduplication
751751-│ │ ├── oauth.ts # ATProto OAuth client
752752-│ │ ├── publisher.ts# Batch publisher with progress callbacks
753753-│ │ ├── rate-limiter.ts # In-memory rate limiter
754754-│ │ ├── spotify.ts # Spotify JSON parser
755755-│ │ ├── sync.ts # CAR-based sync & dedup
756756-│ │ └── tid.ts # TID generation (Web Crypto API)
757757-│ ├── config.ts # Shared constants (version injected by Vite)
758758-│ ├── modes.ts # Import mode definitions
759759-│ └── types.ts # TypeScript type definitions
760760-├── lexicons/ # fm.teal.alpha lexicon definitions
761761-│ └── fm.teal.alpha/
762762-│ └── feed/
763763-│ └── play.json # Play record schema
764764-├── package.json
765765-├── tsconfig.json
766766-└── README.md
767767-```
768768-769769-## Technical Details
770770-771771-### Authentication
772772-773773-- Uses Slingshot resolver to discover your PDS from your handle/DID
774774-- Requires an ATProto app password (not your main password) for the CLI
775775-- Web app supports ATProto OAuth — no app password needed
776776-- Automatically configures the agent for your personal PDS
777777-778778-### Batch Publishing
779779-780780-- Uses `com.atproto.repo.applyWrites` for efficiency (up to 20x faster than individual calls)
781781-- Batches up to 200 records per API call (PDS maximum)
782782-- **Dynamic batch sizing** (1-200 records) based on real-time rate limit quota
783783-- **Intelligent quota monitoring** with 15% headroom buffer
784784-- **Automatic adjustment** - scales down as quota depletes, scales up after reset
785785-- Enforces minimum delays between batches for rate limit safety
786786-787787-### CAR Export Sync
788788-789789-All read paths (duplicate checks, sync, deduplicate) use `com.atproto.sync.getRepo` to download the user's entire repo as a CARv1 file. The CAR is parsed locally using `@ipld/car` and `@ipld/dag-cbor` — no AppView XRPC calls are made for reads, so the sync check costs zero write-quota points.
790790-791791-### Data Mapping
792792-793793-**Last.fm:**
794794-795795-- Direct mapping from CSV columns
796796-- Converts Unix timestamps to ISO 8601
797797-- Preserves MusicBrainz IDs when present
798798-- Generates URLs from artist/track names
799799-- Wraps artists in array format with optional MBID
800800-801801-**Spotify:**
802802-803803-- Extracts data from JSON fields
804804-- Already in ISO 8601 format (`ts` field)
805805-- Generates URLs from `spotify_track_uri`
806806-- Automatically filters non-music content
807807-- Extracts artist and album from metadata fields
808808-809809-### Lexicon Reference
810810-811811-This importer follows the official `fm.teal.alpha` lexicon defined in `/lexicons/fm.teal.alpha/feed/play.json`.
812812-813813-The lexicon defines required and optional field types, string length constraints, array formats, timestamp formatting, and URL validation.
814814-815815-## Contributing
816816-817817-Contributions are welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for setup instructions, architecture notes, and PR guidelines.
818818-819819-## License
820820-821821-AGPL-3.0-only - See LICENCE file for details
822822-823823-## ☕ Support
824824-825825-If you found this useful, consider [buying me a ko-fi](https://ko-fi.com/ewancroft)!
826826-827827-## Credits
828828-829829-- Uses [@atproto/api](https://www.npmjs.com/package/@atproto/api) for ATProto interactions
830830-- CSV parsing via [csv-parse](https://www.npmjs.com/package/csv-parse)
831831-- Identity resolution via [Slingshot](https://slingshot.danner.cloud)
832832-- Follows the `fm.teal.alpha` lexicon standard
833833-- Colored output via [chalk](https://www.npmjs.com/package/chalk)
834834-- Progress indicators via [ora](https://www.npmjs.com/package/ora) and [cli-progress](https://www.npmjs.com/package/cli-progress)
835835-- Web app built with [SvelteKit](https://kit.svelte.dev) and [Tailwind CSS](https://tailwindcss.com)
836836-- ATProto OAuth via [@atproto/oauth-client-browser](https://www.npmjs.com/package/@atproto/oauth-client-browser)
837837-838838----
839839-840840-**Note**: This tool is for personal use. Respect the terms of service and rate limits when importing your data.
1313+Git history has been fully preserved in the monorepo via `git filter-repo`.
1414+This repository is now archived and will not receive further updates.