···11+* Deployment Policy — Implementation Plan
22+33+See [[file:spec-deployment-policy.org][spec-deployment-policy.org]] for the full specification.
44+55+** Compatibility Constraints
66+77+- Gardens and servers can be at different versions.
88+- Server must be upgraded first — it needs to accept both old and new formats.
99+- Old gardens send old fields (reboot_policy, allow_realtime, etc.) and must
1010+ continue to work with a new server.
1111+- New gardens sending policy rules must not break an old server — so the garden
1212+ upgrade must happen after the server can accept both formats.
1313+- Garden config files (client.json) are user-facing. Users need a migration
1414+ path and deprecation warnings, not a hard cutover.
1515+1616+** Phase 1: Foundation
1717+1818+No behavior change. Build the new types and evaluation logic, fully tested.
1919+2020+*** Schema additions (sower_client)
2121+2222+- Add =SowerClient.Orchestration.Subscription.Policy= module
2323+ - =Policy.Rule= schema: actions, triggers, window, confirm
2424+ - Reuses existing =Subscription.Window= schema for the window field
2525+- Add =policy= field to =SowerClient.Orchestration.Subscription= schema
2626+ - Type: array of =Policy.Rule=, default: =[]= (empty means use default policy)
2727+ - Add to =sower_client.ex= schema list
2828+2929+*** Shared evaluator (sower_client)
3030+3131+- =Policy.evaluate(policy_rules, trigger, now, seed_type)=
3232+ - Returns ={:allow, action}=, ={:confirm, action}=, or =:deny=
3333+ - Walks the disruption hierarchy (restart → activate → stage)
3434+ - Applies default policy when rules are empty
3535+ - Validates actions against seed type
3636+- =Policy.trigger_for_reason(audit_reason)=
3737+ - Maps audit reasons to policy triggers
3838+- Window evaluation logic with overnight span support (subsumes sow-160)
3939+4040+*** Database migration (sower)
4141+4242+- Add =policy= column (=:map= / jsonb array) to subscriptions table
4343+- Column is nullable — nil means "use old fields" during transition
4444+4545+*** Tests
4646+4747+- Policy evaluator unit tests covering:
4848+ - Each trigger type
4949+ - Window matching (including overnight spans)
5050+ - Disruption hierarchy (highest permitted action wins)
5151+ - Default policy behavior
5252+ - Confirm flag (confirm wins when multiple rules match)
5353+ - Seed type action validation
5454+ - Empty/nil policy → default
5555+5656+** Phase 2: Server-side adoption
5757+5858+Server uses policy for all deployment decisions. Old gardens still work.
5959+6060+*** Old-to-new conversion
6161+6262+- Add =Policy.from_legacy(subscription)= in sower_client
6363+ - Converts reboot_policy, allow_realtime, poll_on_connect, window,
6464+ activation_args into equivalent policy rules
6565+ - Used at registration time when a garden sends old-format subscriptions
6666+- Server's =register_subscription/2= and =sync_subscriptions/2=:
6767+ - If subscription has =policy= rules: store as-is
6868+ - If subscription has no policy (old garden): call =from_legacy=, store
6969+ the converted policy alongside the old fields
7070+7171+*** Replace server-side evaluation
7272+7373+- =Sower.Workers.DeploySubscription.run/2=:
7474+ - Replace =within_window?= check with =Policy.evaluate/3=
7575+ - Trigger is =:realtime= (already known from worker context)
7676+- =Sower.Orchestration.Deployment.deploy_subscription/2=:
7777+ - Evaluate policy before creating deployment
7878+ - =event_reason= already flows through opts — map to trigger
7979+- Manual deploy path (UI):
8080+ - Pass =:user_triggered= through to policy evaluation
8181+ - Handle =:confirm= result (UI confirmation flow)
8282+- =find_realtime_subscriptions/1=:
8383+ - Replace =sub.allow_realtime= filter with policy check for realtime trigger
8484+8585+*** Audit reason updates
8686+8787+- Rename =retry= → =user_retry= in deployment_events enum (migration)
8888+- Add =poll_on_connect= to deployment_events enum (migration)
8989+9090+*** UI updates
9191+9292+- Subscription show page: display policy rules instead of old fields
9393+- Keep displaying old fields as fallback if policy is empty (transition period)
9494+9595+** Phase 3: Garden-side adoption
9696+9797+Garden uses policy for all deployment decisions.
9898+9999+*** Garden evaluation injection
100100+101101+- =Garden.Socket.handle_cast(:sync_subscriptions)=:
102102+ - Evaluate policy for poll_on_connect subscriptions before deploying
103103+ - Pass =:poll_on_connect= as audit reason to server
104104+- =Garden.Scheduler=:
105105+ - Evaluate policy before deploying on cron fire
106106+- =Garden.Deployer=:
107107+ - Replace =Garden.Seed.activation_mode(subscription)= with action from
108108+ policy evaluation result
109109+ - Replace =reboot_reason/1= logic — reboot decision comes from policy
110110+ (=:restart= action permitted or not), not from =reboot_policy= field
111111+112112+*** Garden config support
113113+114114+- =SowerClient.Config.preprocess_subscriptions/1=:
115115+ - Support =policy= key in subscription config (new format)
116116+ - Continue supporting old keys (reboot_policy, etc.) — convert via
117117+ =Policy.from_legacy= with deprecation warning logged
118118+- Document new config format for users
119119+120120+** Phase 4: Cleanup
121121+122122+Remove old fields once all gardens have upgraded.
123123+124124+*** Remove old fields
125125+126126+- Remove from =SowerClient.Orchestration.Subscription= schema:
127127+ =reboot_policy=, =allow_realtime=, =poll_on_connect=, =window=,
128128+ =activation_args=
129129+- Remove from server Ecto schema and changeset
130130+- Remove from =register_subscription= and =create_subscription= field lists
131131+- Remove =Policy.from_legacy/1= conversion code
132132+- Remove =Garden.Seed.activation_mode/1=
133133+- Remove =Sower.Orchestration.Subscription.within_window?/2=
134134+- Remove =reboot_reason/1= from =Garden.Deployer=
135135+136136+*** Database migration
137137+138138+- Drop old columns: =reboot_policy=, =allow_realtime=, =window=,
139139+ =activation_args= from subscriptions table
140140+141141+*** Config cleanup
142142+143143+- Remove support for old config keys in =preprocess_subscriptions=
144144+- Deprecation warnings become errors
145145+146146+** Release Sequence
147147+148148+1. Release Phases 1+2+3 together → policy system fully functional on both
149149+ server and garden. Old fields kept alongside new policy field.
150150+ =Policy.from_legacy= converts old garden configs automatically with
151151+ deprecation warnings. No user action required to upgrade.
152152+2. After all gardens running release 1 → release Phase 4 cleanup (drop
153153+ old fields, remove conversion code, deprecation warnings become errors).
+411
docs/spec-deployment-policy.org
···11+* Deployment Policy Rules — Specification
22+33+** Overview
44+55+A deployment policy is a list of rules attached to a subscription that controls
66+when and how deployment actions are permitted. The system uses *default deny* —
77+if no rule matches a given action/trigger/time combination, the action is blocked.
88+99+The subscription retains responsibility for *what* to deploy (seed identity,
1010+matching rules, schedule). The policy controls *whether* a deployment action is
1111+allowed to proceed.
1212+1313+** Design Principles
1414+1515+- Default deny: the absence of a matching rule means the action is not permitted.
1616+- Allow-only: rules grant permission. There is no "deny" effect.
1717+- Any-match: rules are unordered. If any rule matches, the action is allowed.
1818+- Omitted fields mean "any": a rule with no =triggers= field matches all trigger types.
1919+- Overlapping rules are harmless: redundancy is acceptable, not an error.
2020+- Seed types define which actions are valid: the policy is validated against the
2121+ seed type's supported actions.
2222+2323+** Actions
2424+2525+Actions represent what a deployment does, generalized across seed types.
2626+2727+| Action | Description | Disruption |
2828+|----------+----------------------------------------------------------+------------|
2929+| stage | Download and pin a closure (gcroot). No runtime changes. | None |
3030+| activate | Apply the configuration via the seed type's activation. | Service |
3131+| restart | Full machine reboot. | Full |
3232+3333+*** Actions by Seed Type
3434+3535+| Seed Type | stage | activate | restart |
3636+|--------------+-------+----------+---------|
3737+| nixos | yes | yes | yes |
3838+| home-manager | yes | yes | no |
3939+| gcroot | yes | no | no |
4040+| service | yes | yes | no |
4141+4242+This table will grow as new seed types are added. The policy engine should
4343+reject rules that reference actions unsupported by the subscription's seed type.
4444+4545+*** Activation Mode Derivation
4646+4747+The activation mode is not configured directly. It is derived from the set of
4848+actions permitted at the time of deployment. This keeps policy rules purely
4949+about permission while the system selects the appropriate mechanism.
5050+5151+**** NixOS
5252+5353+| Permitted Actions | Activation Mode | Behavior |
5454+|--------------------------+-----------------+------------------------------------|
5555+| activate only | switch | Activate immediately, set as boot |
5656+| activate + restart | boot | Set as boot config, then reboot |
5757+| stage only | (none) | Download and pin closure only |
5858+5959+When =activate= is permitted but =restart= is not, the system uses =switch= —
6060+the configuration takes effect immediately without a reboot. When both
6161+=activate= and =restart= are permitted, the system uses =boot= mode and
6262+triggers a reboot, ensuring the full configuration (including kernel changes)
6363+is applied.
6464+6565+This means a subscription with separate time windows for =activate= and
6666+=restart= naturally produces the desired behavior:
6767+6868+- During activate-only windows: =switch= (non-disruptive)
6969+- During activate+restart windows: =boot= + reboot (full apply)
7070+7171+**** Home-manager
7272+7373+| Permitted Actions | Activation Mode | Behavior |
7474+|-------------------+-----------------+--------------------------------|
7575+| activate | switch | Activate new generation |
7676+| stage only | (none) | Download and pin closure only |
7777+7878+**** Service
7979+8080+| Permitted Actions | Activation Mode | Behavior |
8181+|-------------------+-----------------+--------------------------------|
8282+| activate | restart | Restart the service |
8383+| stage only | (none) | Download and pin closure only |
8484+8585+**** Gcroot
8686+8787+| Permitted Actions | Activation Mode | Behavior |
8888+|-------------------+-----------------+--------------------------------|
8989+| stage | (none) | Download and pin closure only |
9090+9191+** Triggers
9292+9393+Triggers represent how a deployment was initiated. These align with the
9494+existing =deployment_events= audit system's =reason= field.
9595+9696+| Trigger | Audit Reason | Description | Source |
9797+|------------------+-----------------------+------------------------------------------------+-----------|
9898+| manual | user_triggered | User clicked deploy in the UI or invoked via API | Human |
9999+| manual | user_retry | User retried a failed deployment | Human |
100100+| scheduled | schedule_triggered | Cron schedule fired and a new seed was found | Automated |
101101+| realtime | realtime_triggered | A matching seed was registered | Automated |
102102+| poll_on_connect | poll_on_connect | Garden connected and polled for pending seeds | Automated |
103103+104104+Multiple audit reasons can map to the same policy trigger. The policy engine
105105+evaluates against the trigger column; the audit reason is preserved for
106106+traceability. =user_retry= is treated identically to =manual= for policy
107107+purposes.
108108+109109+** Windows
110110+111111+A window constrains when a rule applies based on day of week and time of day.
112112+Windows naturally handle overnight spans — when =time_start= is after
113113+=time_end=, the window wraps across midnight.
114114+115115+| Field | Type | Required | Description |
116116+|------------+----------+----------+----------------------------------------|
117117+| days | string[] | yes | Days of the week the window is active |
118118+| time_start | string | yes | Start of window (HH:MM) |
119119+| time_end | string | yes | End of window (HH:MM) |
120120+121121+Day values: =mon=, =tue=, =wed=, =thu=, =fri=, =sat=, =sun=.
122122+123123+Timezone is specified once on the subscription (=timezone= field) and applies
124124+to all windows in the policy.
125125+126126+When =time_start= > =time_end= (e.g. =22:00= to =06:00=), the window spans
127127+midnight. The =days= field refers to the day the window *opens*. A rule with
128128+=days: ["fri"], time_start: "22:00", time_end: "06:00"= covers Friday 22:00
129129+through Saturday 06:00.
130130+131131+A rule with no =window= field is not time-constrained — it matches at any time.
132132+133133+** Rule Schema
134134+135135+A single policy rule:
136136+137137+| Field | Type | Required | Default | Description |
138138+|----------+------------+----------+---------+-------------------------------------------------|
139139+| actions | string[] | yes | | Actions this rule permits |
140140+| triggers | string[] | no | any | Triggers this rule applies to. Omit for all. |
141141+| window | object | no | none | Time window constraint (see Windows section) |
142142+| confirm | boolean | no | false | Require explicit confirmation before proceeding |
143143+144144+Multiple rules are ORed (any-match). Within a rule, all fields are ANDed —
145145+the action, trigger, and window must all match.
146146+147147+** Policy Evaluation
148148+149149+Actions form a disruption hierarchy. Each level subsumes the levels below it:
150150+151151+| Level | Action | Implies |
152152+|-------+----------+-----------------|
153153+| 3 | restart | activate, stage |
154154+| 2 | activate | stage |
155155+| 1 | stage | (none) |
156156+157157+On each deployment event with =(trigger, current_time)=:
158158+159159+1. For each action the seed type supports (from highest to lowest disruption):
160160+ a. Filter rules where the action is in the rule's =actions= list.
161161+ b. Filter rules where =trigger= is in the rule's =triggers= list (or triggers is omitted).
162162+ c. For remaining rules, evaluate the =window= against =current_time= (or pass if no window).
163163+ d. If any rule matches: the action is *permitted*.
164164+2. Select the highest-disruption permitted action. All lower actions are
165165+ implied and do not need separate rules.
166166+3. Derive the activation mode from the selected action (see Activation Mode Derivation).
167167+4. If the matching rule has =confirm: true=: block pending out-of-band approval
168168+ before proceeding.
169169+5. If no action is permitted: *deny* — do nothing.
170170+171171+** Relationship to Schedule
172172+173173+The subscription's =schedule= field defines *when to poll for new seeds*. The
174174+policy defines *whether to act on what is found*. These are separate concerns.
175175+176176+A schedule that fires outside any policy window results in a poll that finds a
177177+seed but does not deploy it. This is intentional — the seed will be deployed
178178+when the next poll fires within the window, or when a manual deploy is triggered.
179179+180180+** Examples
181181+182182+*** Allow manual apply anytime, reboot only 2-4am
183183+184184+#+begin_src json
185185+[
186186+ {
187187+ "actions": ["activate"],
188188+ "triggers": ["manual"]
189189+ },
190190+ {
191191+ "actions": ["restart"],
192192+ "window": {
193193+ "days": ["mon", "tue", "wed", "thu", "fri", "sat", "sun"],
194194+ "time_start": "02:00",
195195+ "time_end": "04:00"
196196+ }
197197+ }
198198+]
199199+#+end_src
200200+201201+*** Automated deploys on weekdays, reboots on weekends
202202+203203+#+begin_src json
204204+[
205205+ {
206206+ "actions": ["activate"],
207207+ "triggers": ["manual"]
208208+ },
209209+ {
210210+ "actions": ["stage", "activate"],
211211+ "triggers": ["scheduled", "realtime"],
212212+ "window": {
213213+ "days": ["mon", "tue", "wed", "thu", "fri"],
214214+ "time_start": "09:00",
215215+ "time_end": "17:00"
216216+ }
217217+ },
218218+ {
219219+ "actions": ["restart"],
220220+ "window": {
221221+ "days": ["fri"],
222222+ "time_start": "22:00",
223223+ "time_end": "06:00"
224224+ }
225225+ }
226226+]
227227+#+end_src
228228+229229+*** Staging only (gcroot subscription)
230230+231231+#+begin_src json
232232+[
233233+ {
234234+ "actions": ["stage"],
235235+ "triggers": ["scheduled", "realtime"]
236236+ }
237237+]
238238+#+end_src
239239+240240+*** Everything allowed, manual confirmation for reboot
241241+242242+#+begin_src json
243243+[
244244+ {
245245+ "actions": ["stage", "activate"]
246246+ },
247247+ {
248248+ "actions": ["restart"],
249249+ "confirm": true
250250+ }
251251+]
252252+#+end_src
253253+254254+*** Overnight maintenance window
255255+256256+#+begin_src json
257257+[
258258+ {
259259+ "actions": ["activate"],
260260+ "triggers": ["manual"]
261261+ },
262262+ {
263263+ "actions": ["activate", "restart"],
264264+ "triggers": ["scheduled"],
265265+ "window": {
266266+ "days": ["tue", "thu"],
267267+ "time_start": "23:00",
268268+ "time_end": "05:00"
269269+ }
270270+ }
271271+]
272272+#+end_src
273273+274274+** Migration from Current Schema
275275+276276+*** Field Mapping
277277+278278+The following subscription fields are replaced by policy rules:
279279+280280+| Old Field | Replacement |
281281+|------------------+----------------------------------------------------------|
282282+| reboot_policy | =restart= action in policy rules with window constraints |
283283+| allow_realtime | =realtime= trigger in policy rules |
284284+| poll_on_connect | =poll_on_connect= trigger in policy rules |
285285+| window | =window= on policy rules |
286286+| activation_args | Derived from permitted actions per seed type (see Activation Mode Derivation) |
287287+288288+*** Audit Reason Alignment
289289+290290+The existing =deployment_events.reason= enum needs updates to align with
291291+policy triggers:
292292+293293+| Current Reason | Change |
294294+|----------------------+---------------------------------------------------------|
295295+| user_triggered | Keep — maps to =manual= trigger |
296296+| retry | Rename to =user_retry= — maps to =manual= trigger |
297297+| schedule_triggered | Keep — maps to =scheduled= trigger |
298298+| realtime_triggered | Keep — maps to =realtime= trigger |
299299+| (new) | Add =poll_on_connect= — maps to =poll_on_connect= trigger |
300300+| superseded | Keep — lifecycle event, not a trigger |
301301+| stale | Keep — lifecycle event, not a trigger |
302302+303303+The =poll_on_connect= audit reason is currently missing. Garden-side
304304+poll-on-connect deploys do not pass audit context to the server. This must be
305305+added so that:
306306+1. The audit trail captures why the deployment was created.
307307+2. The policy engine can evaluate the correct trigger type.
308308+309309+*** Shared Policy Evaluation Module
310310+311311+The policy engine lives in =sower_client= as
312312+=SowerClient.Orchestration.Subscription.Policy=. Both the server (=sower=)
313313+and the garden depend on =sower_client=, so one implementation serves both
314314+sides.
315315+316316+#+begin_example
317317+alias SowerClient.Orchestration.Subscription.Policy
318318+319319+Policy.evaluate(subscription.policy, trigger, now)
320320+ → {:allow, :restart} # highest permitted action
321321+ → {:allow, :activate}
322322+ → {:allow, :stage}
323323+ → {:confirm, :restart} # allowed but requires confirmation
324324+ → :deny
325325+#+end_example
326326+327327+The trigger is derived from the audit reason:
328328+329329+#+begin_example
330330+Policy.trigger_for_reason(:user_triggered) → :manual
331331+Policy.trigger_for_reason(:user_retry) → :manual
332332+Policy.trigger_for_reason(:schedule_triggered) → :scheduled
333333+Policy.trigger_for_reason(:realtime_triggered) → :realtime
334334+Policy.trigger_for_reason(:poll_on_connect) → :poll_on_connect
335335+#+end_example
336336+337337+*** Injection Points
338338+339339+*Server-side:*
340340+341341+- =Sower.Workers.DeploySubscription.run/2= — currently checks =within_window?=.
342342+ Replace with =Policy.evaluate/3=. Covers realtime deploys.
343343+- =Sower.Orchestration.Deployment.deploy_subscription/2= — entry point for
344344+ manual (UI) and schedule catch-up deploys. Evaluate policy before creating
345345+ the deployment. The =event_reason= is already passed via opts.
346346+347347+*Garden-side:*
348348+349349+- =Garden.Socket.handle_cast(:sync_subscriptions)= — poll-on-connect path.
350350+ Evaluate policy before triggering deploy. The subscription is already
351351+ available from local storage.
352352+- =Garden.Scheduler= — scheduled deploys. Evaluate policy before deploying.
353353+ The subscription is already available.
354354+- =Garden.Deployer= — derives activation mode from =subscription.activation_args=
355355+ today. Replace with the action returned by policy evaluation. The seed
356356+ deployment's =subscription_sid= is used to look up the subscription, which
357357+ already happens in =find_subscription/1=.
358358+359359+*Both sides:*
360360+361361+The =within_window?= function on =Sower.Orchestration.Subscription= is
362362+replaced entirely by the shared policy evaluator. The server-side window
363363+schema (=embeds_one :window=) on subscriptions is replaced by windows
364364+embedded within policy rules.
365365+366366+*** Other Implementation Notes
367367+368368+- =within_window?= must be updated to handle overnight spans (sow-160) as
369369+ the same window logic is reused within the policy evaluator.
370370+- The deploy button in the UI currently has no concept of trigger type. It
371371+ will need to pass =user_triggered= (or =user_retry=) through to the policy
372372+ engine so manual deploys can be distinguished from automated ones.
373373+374374+** Resolved Decisions
375375+376376+- *Staging must be explicit.* =stage= is not implicitly allowed. However,
377377+ =activate= will always ensure staging is complete before proceeding — if a
378378+ policy allows =activate= but not =stage= independently, staging happens as
379379+ part of the activation flow, not as a separately-gated action.
380380+- *Default policy is narrow but not empty.* When no policy is specified (or the
381381+ policy list is empty), the following default applies:
382382+ #+begin_src json
383383+ [{"actions": ["activate"], "triggers": ["manual", "scheduled", "poll_on_connect"]}]
384384+ #+end_src
385385+ This ensures gardens are never stranded — they can always apply an upgrade
386386+ through safe channels. Notably excluded: =realtime= (too aggressive for a
387387+ default) and =restart= (too disruptive). Users must explicitly opt in to both.
388388+ When a user provides any policy rules, the default is fully replaced — there
389389+ is no merging.
390390+- *Confirm blocks any trigger.* The =confirm= flag is not limited to manual
391391+ triggers. When a rule with =confirm: true= matches, the action is blocked
392392+ pending out-of-band approval, regardless of trigger type. If multiple rules
393393+ match the same action and any of them has =confirm: true=, confirmation is
394394+ required.
395395+- *Realtime is gated at both ends.* The server will not send realtime deployment
396396+ events to a garden unless =realtime= appears as a trigger in at least one
397397+ policy rule. The garden will also reject realtime events not present in its
398398+ policy. This avoids unnecessary traffic and provides defense in depth.
399399+- *Inhibition is a separate layer.* Runtime inhibition (garden-side lock) is
400400+ evaluated as a pre-check before policy evaluation. It is out of scope for
401401+ this spec and will be specified separately.
402402+403403+** Future Considerations
404404+405405+- Additional window constraints (e.g. specific dates, week-of-month).
406406+- Generic conditions (non-time-based variables) could be added alongside windows
407407+ if needed, using an operator-based format (eq, in, between).
408408+- Cross-subscription coordination (rolling deployments, staged rollouts) would
409409+ be a separate orchestration layer above this policy system.
410410+- An embedded scripting language (Lua via Luerl) could replace or supplement
411411+ structured rules if the policy needs outgrow what windows and triggers can express.