···11+---
22+date: 2026-04-30
33+title: Slack and Lark Multimodal Image Input
44+status: draft
55+---
66+77+# Slack and Lark Multimodal Image Input
88+99+## 1) Scope
1010+1111+Add inbound image understanding for Slack and Lark runtime messages.
1212+1313+V1 should only support images that arrive with the current user message. It should not add generic attachment browsing, arbitrary file reading, video, audio, OCR-specific tools, or outbound rich media changes.
1414+1515+The target behavior:
1616+1717+- If the runtime source is enabled in `multimodal.image.sources`, inbound images are downloaded to the runtime file cache and passed to the main LLM request as image parts.
1818+- If the source is disabled, the runtime should produce a clear text-only fallback prompt, aligned with LINE.
1919+- If the selected model does not support image parts, the message should still run as text-only.
2020+2121+## 2) Current State
2222+2323+Telegram and LINE already have the working shape:
2424+2525+- Inbound runtime stores image paths on the job.
2626+- `build*PromptMessages(...)` receives `ImageRecognitionEnabled`.
2727+- Image files are converted into `llm.PartTypeImageBase64` parts.
2828+- Size, count, and MIME checks happen before building the LLM message.
2929+3030+Slack does not yet have this path:
3131+3232+- `slackEvent`, `slackInboundEvent`, `slackJob`, and `slackbus.InboundMessage` only carry text and message metadata.
3333+- `BuildSlackRunOptions` does not read `multimodal.image.sources`.
3434+- `buildSlackPromptMessages(...)` always builds plain text messages.
3535+3636+Lark does not yet have this path:
3737+3838+- `inboundMessageFromWebhookEvent(...)` only accepts text messages.
3939+- `larkbus.InboundMessage` and `larkJob` do not carry image paths.
4040+- `BuildLarkRunOptions` reads `multimodal.image.sources` for LINE only; Lark has no image flag.
4141+- `buildLarkPromptMessages(...)` always builds plain text messages.
4242+4343+The config example currently lists `slack` as a supported image source, but Slack is not implemented. Lark is not listed.
4444+4545+## 3) First Principles
4646+4747+1. Runtime input decides whether an image exists.
4848+ The LLM layer should only receive local image paths or ready-made `llm.Part` values. It should not know Slack or Lark file APIs.
4949+5050+2. Download belongs at the channel edge.
5151+ Slack token handling and Lark token handling must stay in their runtime API layers.
5252+5353+3. Image handling should converge after download.
5454+ After a platform image is stored in `file_cache_dir/<runtime>/`, Slack and Lark should reuse the same kind of local-image-to-LLM-parts logic used by Telegram and LINE.
5555+5656+4. No image content in memory by default.
5757+ Chat history and memory should record that an image was present, but should not store base64 image data.
5858+5959+5. Failing to read an image should not crash the runtime.
6060+ The user task can continue with a short text note such as "image attachment could not be read" unless the whole inbound event is malformed.
6161+6262+## 4) Config
6363+6464+Use the existing setting:
6565+6666+```yaml
6767+multimodal:
6868+ image:
6969+ sources: ["telegram", "line", "slack", "lark"]
7070+```
7171+7272+Update `assets/config/config.example.yaml` so the documented supported values match implementation.
7373+7474+Runtime behavior:
7575+7676+- `slack` present: Slack inbound image recognition enabled.
7777+- `lark` present: Lark inbound image recognition enabled.
7878+- Missing source: do not download image for LLM input; provide a text fallback when the user sent only images.
7979+8080+No new channel-specific config is needed in V1.
8181+8282+## 5) Shared Data Model
8383+8484+The smallest useful shape is still `[]string` of local image paths.
8585+8686+Add image path fields to channel-specific inbound/job structs:
8787+8888+- Slack:
8989+ - `slackInboundEvent.ImagePaths []string`
9090+ - `slackbus.InboundMessage.ImagePaths []string`
9191+ - `slackJob.ImagePaths []string`
9292+9393+- Lark:
9494+ - `larkbus.InboundMessage.ImagePaths []string`
9595+ - `larkJob.ImagePaths []string`
9696+9797+If a platform needs delayed download, add `ImagePending bool` only where it is actually needed. LINE needs it because webhook processing and image download are split by message content API timing. Do not copy `ImagePending` into Slack or Lark unless their event flow needs the same delay.
9898+9999+For bus messages, reuse `MessageExtensions.ImagePaths`.
100100+101101+History items can remain text-first. If needed, add a short textual marker to the rendered current message:
102102+103103+```text
104104+[image attachments: 2]
105105+```
106106+107107+Do not add base64 to `ChatHistoryItem`.
108108+109109+## 6) Shared Image Builder
110110+111111+Telegram and LINE currently duplicate local image conversion rules. Slack and Lark should not add two more copies.
112112+113113+Add a small shared helper under a runtime-neutral internal package, for example:
114114+115115+```go
116116+func BuildImageMessage(baseText string, model string, imagePaths []string, opts ImageMessageOptions) (llm.Message, error)
117117+```
118118+119119+Suggested options:
120120+121121+- max images: `3`
122122+- max bytes per image: `5 MiB`
123123+- supported MIME types: PNG, JPEG, WebP where provider/model supports it
124124+- optional WebP conversion hook only for Telegram if still needed
125125+126126+Keep this helper about local files and `llm.Part` construction only. It should not download remote files and should not import Slack, Lark, Telegram, or LINE packages.
127127+128128+If this extraction becomes too noisy, implement Slack/Lark with a minimal local helper first, then collapse duplication in a follow-up PR. Do not block image support on a broad refactor.
129129+130130+## 7) Slack Plan
131131+132132+### 7.1 Parse Image Metadata
133133+134134+Extend Slack event parsing to capture image files from message events.
135135+136136+Required fields should be the minimum needed to download and validate:
137137+138138+- file id
139139+- MIME type or mimetype
140140+- private download URL
141141+- size when present
142142+- filename when present
143143+144144+Ignore non-image files in V1.
145145+146146+### 7.2 Download to Cache
147147+148148+Add Slack API method for authenticated file download.
149149+150150+Rules:
151151+152152+- Use the bot token.
153153+- Save under `file_cache_dir/slack/`.
154154+- Enforce max image bytes before or during download.
155155+- Only accept image MIME types.
156156+- Use secure child directory creation, matching existing cache rules.
157157+158158+### 7.3 Runtime Wiring
159159+160160+Add `ImageRecognitionEnabled` to Slack run options and runtime task options.
161161+162162+In `BuildSlackRunOptions`, compute it with:
163163+164164+```go
165165+sourceEnabled(cfg.MultimodalImageSources, "slack")
166166+```
167167+168168+In the worker path:
169169+170170+- Parse Slack file metadata from the inbound event.
171171+- If enabled, download images before enqueueing/running the job.
172172+- Put local paths on `slackJob.ImagePaths`.
173173+- Pass image paths into `buildSlackPromptMessages(...)`.
174174+175175+### 7.4 Prompt Message
176176+177177+Change:
178178+179179+```go
180180+buildSlackPromptMessages(history, job)
181181+```
182182+183183+to include model and image flag, aligned with Telegram/LINE:
184184+185185+```go
186186+buildSlackPromptMessages(history, job, model, imageRecognitionEnabled, logger)
187187+```
188188+189189+Use image parts only for the current message, not old history.
190190+191191+## 8) Lark Plan
192192+193193+### 8.1 Accept Image Messages
194194+195195+Extend webhook parsing beyond `message_type == "text"`.
196196+197197+V1 should support:
198198+199199+- text messages
200200+- image messages
201201+- image message with optional user text if Lark provides it in the event content
202202+203203+If the message is image-only and image recognition is enabled, synthesize a small task text:
204204+205205+```text
206206+User sent an image.
207207+```
208208+209209+If image recognition is disabled, use a clear fallback text similar to LINE:
210210+211211+```text
212212+User sent an image, but image recognition is disabled in the current Lark runtime. Reply briefly and ask the user to describe the image in text or enable lark in multimodal.image.sources.
213213+```
214214+215215+### 8.2 Download to Cache
216216+217217+Add the minimum Lark API method needed to download image binary content from the message content identifier.
218218+219219+Rules:
220220+221221+- Use the existing tenant token client.
222222+- Save under `file_cache_dir/lark/`.
223223+- Enforce max image bytes.
224224+- Only accept image MIME types.
225225+- Keep the API method local to Lark runtime; do not create a broad Lark SDK.
226226+227227+### 8.3 Runtime Wiring
228228+229229+Add `ImageRecognitionEnabled` to Lark run options and runtime task options.
230230+231231+In `BuildLarkRunOptions`, compute it with:
232232+233233+```go
234234+sourceEnabled(cfg.MultimodalImageSources, "lark")
235235+```
236236+237237+Add `ImagePaths` to `larkbus.InboundMessage` and `larkJob`, then pass them to prompt message building.
238238+239239+### 8.4 Prompt Message
240240+241241+Change:
242242+243243+```go
244244+buildLarkPromptMessages(history, job)
245245+```
246246+247247+to:
248248+249249+```go
250250+buildLarkPromptMessages(history, job, model, imageRecognitionEnabled, logger)
251251+```
252252+253253+Use image parts only for the current message.
254254+255255+## 9) Error Handling
256256+257257+Use text fallback instead of hard failures for normal media issues:
258258+259259+- unsupported MIME type
260260+- image too large
261261+- download failed
262262+- model does not support image parts
263263+264264+Hard failure is acceptable for malformed runtime state:
265265+266266+- missing Slack channel/message identifiers
267267+- missing Lark chat/message identifiers
268268+- invalid configured cache directory
269269+270270+Log enough context to debug:
271271+272272+- channel
273273+- message id
274274+- image count
275275+- skipped count
276276+- reason
277277+278278+Do not log private download URLs or base64 image data.
279279+280280+## 10) Tests
281281+282282+### Slack
283283+284284+- Parse Slack message events with image files.
285285+- Ignore non-image files.
286286+- `BuildSlackRunOptions` enables image recognition when `slack` is in `multimodal.image.sources`.
287287+- Download helper rejects non-image MIME and oversized images.
288288+- `buildSlackPromptMessages` adds `llm.PartTypeImageBase64` for supported image models.
289289+- Unsupported image models degrade to text-only.
290290+- Bus adapter preserves `ImagePaths`.
291291+292292+### Lark
293293+294294+- Parse Lark text event as before.
295295+- Parse Lark image event into inbound message.
296296+- Ignore unsupported message types.
297297+- `BuildLarkRunOptions` enables image recognition when `lark` is in `multimodal.image.sources`.
298298+- Download helper rejects non-image MIME and oversized images.
299299+- `buildLarkPromptMessages` adds image parts for supported image models.
300300+- Unsupported image models degrade to text-only.
301301+- Bus adapter preserves `ImagePaths`.
302302+303303+### Shared
304304+305305+- Shared image builder covers:
306306+ - max image count
307307+ - max bytes
308308+ - MIME detection
309309+ - base64 part generation
310310+ - empty image list
311311+312312+## 11) Suggested PR Split
313313+314314+1. Shared local image builder extraction.
315315+ No Slack/Lark behavior change.
316316+317317+2. Slack image input.
318318+ Event parsing, download, config flag, prompt message parts, tests.
319319+320320+3. Lark image input.
321321+ Webhook parsing, download, config flag, prompt message parts, tests.
322322+323323+4. Docs/config cleanup.
324324+ Update user docs and `config.example.yaml` supported source list.
325325+326326+If the shared helper extraction starts to pull too much code around, split it after Slack and Lark work instead. The feature is the image input path, not a new media framework.
327327+328328+## 12) Acceptance Criteria
329329+330330+- With `multimodal.image.sources` containing `slack`, a Slack user can send an image and ask a question about it; the selected image-capable model receives an image part.
331331+- With `multimodal.image.sources` containing `lark`, a Lark user can send an image and ask a question about it; the selected image-capable model receives an image part.
332332+- With the source disabled, the runtime replies with a short text fallback instead of silently ignoring the image.
333333+- Existing text-only Slack and Lark tests continue to pass.
334334+- Telegram and LINE image behavior remains unchanged.
335335+336336+## 13) Implementation Tasks
337337+338338+1. Add shared local image message building.
339339+ Extract only the file-to-`llm.Part` path after an image is already local. Keep platform download code outside this helper.
340340+341341+2. Wire Slack image metadata through runtime structs.
342342+ Parse Slack file metadata, add image paths to inbound messages and jobs, and keep non-image files out of the image path.
343343+344344+3. Add Slack authenticated image download.
345345+ Download only supported image MIME types into `file_cache_dir/slack/`, enforce size limits, and pass local paths into the current LLM message.
346346+347347+4. Wire Lark image metadata through runtime structs.
348348+ Accept text and image message events, add image paths to inbound messages and jobs, and keep history text-only.
349349+350350+5. Add Lark authenticated image download.
351351+ Download only supported image MIME types into `file_cache_dir/lark/`, enforce size limits, and pass local paths into the current LLM message.
352352+353353+6. Update config support lists.
354354+ Add `lark` where the UI or config template lists supported image sources.
355355+356356+7. Add focused tests.
357357+ Cover parsing, config flags, download validation, bus image path preservation, prompt image parts, disabled image fallback, and text-only model fallback.