Navigate a directory full of directories, identifying repos and worktrees
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

add planner-pushdown design for staged scan and sort pushdown

Refs: is-tree-scan-priority
Refs: is-tree-fuzzel-pipeline

rektide 7fb578d9 d0e33890

+293
+293
doc/planner-pushdown.md
··· 1 + # Planner + Pushdown Engine for `is-tree` 2 + 3 + This document proposes a lightweight query planner for `is-tree` so we can push computation earlier in the pipeline, especially for `--scan` and picker workflows. 4 + 5 + Related docs and code: 6 + 7 + - [`/doc/pick-iter.md`](/doc/pick-iter.md) 8 + - [`/README.md`](/README.md) 9 + - [`/src/main.rs`](/src/main.rs) 10 + - [`/src/plugin.rs`](/src/plugin.rs) 11 + 12 + ## Why now 13 + 14 + We already landed one targeted optimization: short-circuiting `--all --format directory` in [`/src/main.rs`](/src/main.rs). 15 + 16 + That win validates the direction, but it is still a special case. We need a general mechanism that can answer: 17 + 18 + - Which columns are actually needed? 19 + - Which filters can run before expensive plugin work? 20 + - Which sorts can run early enough to keep `--scan` responsive? 21 + 22 + In short: turn the CLI request into an execution plan, then push expensive work as late as possible. 23 + 24 + ## Goals 25 + 26 + - Preserve current CLI behavior by default. 27 + - Make fast paths automatic when query shape allows. 28 + - Keep `--scan` interactive by prioritizing early-sort keys. 29 + - Reuse existing plugin registry architecture instead of bypassing it. 30 + - Allow incremental rollout without rewriting the whole runtime. 31 + 32 + ## Non-goals 33 + 34 + - No SQL parser or user-facing query DSL. 35 + - No distributed execution. 36 + - No breaking changes to existing output formats. 37 + 38 + ## Core idea 39 + 40 + Treat each invocation as a query: 41 + 42 + - **Projection**: requested output columns 43 + - **Filters**: row predicates 44 + - **Sort**: ordered keys 45 + - **Mode**: full vs scan 46 + - **Input**: explicit paths or discovered roots 47 + 48 + Then compile to a physical plan where each column and predicate is annotated by when it becomes available and how expensive it is. 49 + 50 + ## Architecture 51 + 52 + ```mermaid 53 + flowchart LR 54 + ParseCli[Parse CLI Args] --> LogicalQuery[Build LogicalQuery] 55 + LogicalQuery --> PlanRules[Apply Pushdown Rules] 56 + PlanRules --> PhysicalPlan[Build PhysicalPlan] 57 + PhysicalPlan --> EnumerateStage[Enumerate Candidate Paths] 58 + EnumerateStage --> EarlyProbeStage[Run Early Probes] 59 + EarlyProbeStage --> EarlyFilterSort[Apply Early Filters and Sorts] 60 + EarlyFilterSort --> LateProbeStage[Run Late Plugin Probes If Required] 61 + LateProbeStage --> FinalFilterSort[Apply Remaining Filters and Sorts] 62 + FinalFilterSort --> RenderStage[Render Text or JSON] 63 + ``` 64 + 65 + ## Data model draft 66 + 67 + These are implementation-level structs we can add near runtime planning code. 68 + 69 + ```rust 70 + #[derive(Debug, Clone, Copy, PartialEq, Eq)] 71 + enum ExecMode { 72 + Full, 73 + Scan, 74 + } 75 + 76 + #[derive(Debug, Clone, PartialEq, Eq)] 77 + struct LogicalQuery { 78 + mode: ExecMode, 79 + roots: Vec<std::path::PathBuf>, 80 + projection: Vec<String>, 81 + filters: Vec<FilterExpr>, 82 + sort_keys: Vec<SortKey>, 83 + emit_json: bool, 84 + } 85 + 86 + #[derive(Debug, Clone, PartialEq, Eq)] 87 + struct SortKey { 88 + column: String, 89 + desc: bool, 90 + } 91 + 92 + #[derive(Debug, Clone, Copy, PartialEq, Eq, PartialOrd, Ord)] 93 + enum AvailabilityStage { 94 + Enumerate, 95 + EarlyProbe, 96 + LateProbe, 97 + Finalize, 98 + } 99 + 100 + #[derive(Debug, Clone, Copy, PartialEq, Eq)] 101 + enum CostClass { 102 + Free, 103 + Cheap, 104 + Expensive, 105 + } 106 + 107 + #[derive(Debug, Clone, PartialEq, Eq)] 108 + struct ColumnPlanMeta { 109 + key: &'static str, 110 + stage: AvailabilityStage, 111 + cost: CostClass, 112 + stable_in_scan: bool, 113 + } 114 + ``` 115 + 116 + ### Practical column classification (initial) 117 + 118 + | Column | Stage | Cost | Notes | 119 + |---|---|---|---| 120 + | `directory` | `Enumerate` | `Free` | Known from input path list | 121 + | `status` | `EarlyProbe` | `Cheap` | `detect_repo(path)` is local fs checks | 122 + | `workparent` | `LateProbe` | `Cheap` | Path parsing + repo metadata checks | 123 + | `change-date` | `EarlyProbe` | `Cheap` | local metadata mtime | 124 + | `commit-date` | `LateProbe` | `Expensive` | subprocess/git history lookup | 125 + | `ahead` | `LateProbe` | `Expensive` | jj/git remote-related logic | 126 + 127 + This table is the planner contract. It can start hard-coded and later move to plugin metadata. 128 + 129 + ## Pushdown rules 130 + 131 + ### 1) Projection pushdown 132 + 133 + Only compute columns that are needed by: 134 + 135 + - output projection 136 + - filter predicates 137 + - sort keys 138 + 139 + If requested columns are only `directory`, skip repo probe and plugins entirely. 140 + 141 + ### 2) Filter pushdown 142 + 143 + Apply predicates at earliest available stage. 144 + 145 + Examples: 146 + 147 + - `status == jj` can run at `EarlyProbe`. 148 + - `ahead > 0` must wait for `LateProbe`. 149 + 150 + ### 3) Sort pushdown 151 + 152 + Sort as early as possible, but only when sort keys are available. 153 + 154 + - `--sort directory+` sorts during enumeration. 155 + - `--sort change-date-` sorts after `EarlyProbe`. 156 + - `--sort ahead-` requires `LateProbe`. 157 + 158 + If multiple keys are mixed, planner splits sort into staged ordering: 159 + 160 + - Early stable sort on early keys 161 + - Final sort after late keys are available 162 + 163 + ### 4) Mode-aware gating 164 + 165 + `--scan` should avoid `LateProbe` by default. 166 + 167 + Planner behavior in scan mode: 168 + 169 + - If query needs only `Enumerate`/`EarlyProbe` columns, stay scan-fast. 170 + - If query requests late columns or late sort keys, use policy: 171 + - `upgrade`: automatically switch to full plan 172 + - `defer`: keep scan-fast behavior and warn that late requirements are skipped 173 + - `error`: fail with clear message 174 + 175 + Default recommendation: `upgrade` for correctness unless user opts into strict fast mode. 176 + 177 + ## Execution examples 178 + 179 + ### Case A: `--all --format directory` 180 + 181 + Plan: 182 + 183 + 1. Enumerate candidate subdirectories 184 + 2. Render path list 185 + 186 + No probe, no plugin execution. 187 + 188 + ### Case B: `--scan --format "{status} {directory}" --sort directory+` 189 + 190 + Plan: 191 + 192 + 1. Enumerate 193 + 2. Early probe for `status` 194 + 3. Early sort by `directory` 195 + 4. Stream render 196 + 197 + No late stage required. 198 + 199 + ### Case C: `--scan --sort ahead- --format directory` 200 + 201 + Planner detects `ahead` as late/expensive. 202 + 203 + - With `upgrade`: switch to full mode and compute ahead before final sort. 204 + - With `defer`: run scan-only path ordering and warn that `ahead` sort is not applied. 205 + 206 + ## Integration with picker pipeline 207 + 208 + This planner directly supports the high-value pipeline from [`/doc/pick-iter.md`](/doc/pick-iter.md): 209 + 210 + ```bash 211 + is-tree --scan --sort change-date- --format directory | fuzzel --dmenu --multi | is-tree --stdin --format all 212 + ``` 213 + 214 + Key benefits: 215 + 216 + - Fast candidate emission (`Enumerate` + `EarlyProbe` only) 217 + - Useful prioritization (`change-date` pushdown) 218 + - Expensive columns deferred until user has narrowed selection 219 + 220 + ## Implementation plan 221 + 222 + ### Phase 1: planner metadata and rule engine 223 + 224 + - Add `LogicalQuery`, `PhysicalPlan`, and column metadata table. 225 + - Build planner from existing CLI args (`format`, `sort`, `filter`, `json`, `all`). 226 + - Keep old runtime path as fallback. 227 + 228 + Acceptance: 229 + 230 + - Planner returns deterministic stage assignment for projection/filter/sort keys. 231 + - `--all --format directory` is represented as enumerate-only plan. 232 + 233 + ### Phase 2: staged execution runtime 234 + 235 + - Introduce execution stages in `run()` path: 236 + - enumerate 237 + - early probe/filter/sort 238 + - optional late probe/filter/sort 239 + - render 240 + - Route current short-circuit through planner instead of bespoke branch. 241 + 242 + Acceptance: 243 + 244 + - Existing directory-only optimization remains fast and behaviorally identical. 245 + - Query results remain equivalent to current behavior for full-mode queries. 246 + 247 + ### Phase 3: scan policy + diagnostics 248 + 249 + - Add scan late-key policy (`upgrade`, `defer`, `error`). 250 + - Emit explicit diagnostics when requested sort/filter cannot run in scan-fast stage. 251 + 252 + Acceptance: 253 + 254 + - Users can predictably control correctness vs speed in scan mode. 255 + - Help text documents scan policy behavior. 256 + 257 + ### Phase 4: plugin metadata integration 258 + 259 + - Extend plugin column declarations with planning hints (`stage`, `cost`). 260 + - Remove hard-coded planner map once plugin hints are complete. 261 + 262 + Acceptance: 263 + 264 + - Planner decisions come from plugin metadata rather than ad-hoc key matching. 265 + - New plugins can participate in pushdown automatically. 266 + 267 + ## Testing strategy 268 + 269 + - Unit tests for planner rule decisions: 270 + - projection-only query 271 + - mixed early/late sort keys 272 + - scan policy behaviors 273 + - Integration tests for runtime equivalence: 274 + - full mode unchanged output 275 + - scan mode staged behavior 276 + - Performance checks: 277 + - compare current vs planned execution on large directory sets 278 + 279 + ## Ticket alignment 280 + 281 + - `is-tree-scan-priority`: provides the mechanism to prioritize and stream candidates. 282 + - `is-tree-fuzzel-pipeline`: provides the UX workflow that consumes staged scan output. 283 + - `is-tree-per-file-stats` and `is-tree-staleness-views`: benefit from selecting expensive drill-down only after narrowing candidates. 284 + 285 + ## Decision summary 286 + 287 + We should evolve `is-tree` from ad-hoc fast paths into a small planner-driven runtime: 288 + 289 + - classify column availability/cost 290 + - push projection/filter/sort as early as possible 291 + - keep `--scan` responsive while preserving correctness controls 292 + 293 + This gives us a reusable optimization model, not just one-off special cases.