Unified Agent + reusable Go agent core.
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

feat: add slack and lark multimodal images

Lyric 3664af8a 8d0f4075

+1893 -325
+1 -1
assets/config/config.example.yaml
··· 104 104 multimodal: 105 105 image: 106 106 # Enabled source whitelist. 107 - # Supported values: telegram, slack, line, remote_download. 107 + # Supported values: telegram, slack, line, lark, remote_download. 108 108 sources: ["telegram", "line"] 109 109 110 110 logging:
+1 -1
cmd/mistermorph/consolecmd/agent_settings.go
··· 33 33 toolsSettingsKey = "tools" 34 34 ) 35 35 36 - var supportedMultimodalSources = []string{"telegram", "slack", "line", "remote_download"} 36 + var supportedMultimodalSources = []string{"telegram", "slack", "line", "lark", "remote_download"} 37 37 38 38 var agentSettingsEnvRefPattern = regexp.MustCompile(`^\$\{([a-zA-Z_][a-zA-Z0-9_]*)\}$`) 39 39
+357
docs/feat/feat_20260430_slack_lark_multimodal_images.md
··· 1 + --- 2 + date: 2026-04-30 3 + title: Slack and Lark Multimodal Image Input 4 + status: draft 5 + --- 6 + 7 + # Slack and Lark Multimodal Image Input 8 + 9 + ## 1) Scope 10 + 11 + Add inbound image understanding for Slack and Lark runtime messages. 12 + 13 + V1 should only support images that arrive with the current user message. It should not add generic attachment browsing, arbitrary file reading, video, audio, OCR-specific tools, or outbound rich media changes. 14 + 15 + The target behavior: 16 + 17 + - If the runtime source is enabled in `multimodal.image.sources`, inbound images are downloaded to the runtime file cache and passed to the main LLM request as image parts. 18 + - If the source is disabled, the runtime should produce a clear text-only fallback prompt, aligned with LINE. 19 + - If the selected model does not support image parts, the message should still run as text-only. 20 + 21 + ## 2) Current State 22 + 23 + Telegram and LINE already have the working shape: 24 + 25 + - Inbound runtime stores image paths on the job. 26 + - `build*PromptMessages(...)` receives `ImageRecognitionEnabled`. 27 + - Image files are converted into `llm.PartTypeImageBase64` parts. 28 + - Size, count, and MIME checks happen before building the LLM message. 29 + 30 + Slack does not yet have this path: 31 + 32 + - `slackEvent`, `slackInboundEvent`, `slackJob`, and `slackbus.InboundMessage` only carry text and message metadata. 33 + - `BuildSlackRunOptions` does not read `multimodal.image.sources`. 34 + - `buildSlackPromptMessages(...)` always builds plain text messages. 35 + 36 + Lark does not yet have this path: 37 + 38 + - `inboundMessageFromWebhookEvent(...)` only accepts text messages. 39 + - `larkbus.InboundMessage` and `larkJob` do not carry image paths. 40 + - `BuildLarkRunOptions` reads `multimodal.image.sources` for LINE only; Lark has no image flag. 41 + - `buildLarkPromptMessages(...)` always builds plain text messages. 42 + 43 + The config example currently lists `slack` as a supported image source, but Slack is not implemented. Lark is not listed. 44 + 45 + ## 3) First Principles 46 + 47 + 1. Runtime input decides whether an image exists. 48 + The LLM layer should only receive local image paths or ready-made `llm.Part` values. It should not know Slack or Lark file APIs. 49 + 50 + 2. Download belongs at the channel edge. 51 + Slack token handling and Lark token handling must stay in their runtime API layers. 52 + 53 + 3. Image handling should converge after download. 54 + After a platform image is stored in `file_cache_dir/<runtime>/`, Slack and Lark should reuse the same kind of local-image-to-LLM-parts logic used by Telegram and LINE. 55 + 56 + 4. No image content in memory by default. 57 + Chat history and memory should record that an image was present, but should not store base64 image data. 58 + 59 + 5. Failing to read an image should not crash the runtime. 60 + The user task can continue with a short text note such as "image attachment could not be read" unless the whole inbound event is malformed. 61 + 62 + ## 4) Config 63 + 64 + Use the existing setting: 65 + 66 + ```yaml 67 + multimodal: 68 + image: 69 + sources: ["telegram", "line", "slack", "lark"] 70 + ``` 71 + 72 + Update `assets/config/config.example.yaml` so the documented supported values match implementation. 73 + 74 + Runtime behavior: 75 + 76 + - `slack` present: Slack inbound image recognition enabled. 77 + - `lark` present: Lark inbound image recognition enabled. 78 + - Missing source: do not download image for LLM input; provide a text fallback when the user sent only images. 79 + 80 + No new channel-specific config is needed in V1. 81 + 82 + ## 5) Shared Data Model 83 + 84 + The smallest useful shape is still `[]string` of local image paths. 85 + 86 + Add image path fields to channel-specific inbound/job structs: 87 + 88 + - Slack: 89 + - `slackInboundEvent.ImagePaths []string` 90 + - `slackbus.InboundMessage.ImagePaths []string` 91 + - `slackJob.ImagePaths []string` 92 + 93 + - Lark: 94 + - `larkbus.InboundMessage.ImagePaths []string` 95 + - `larkJob.ImagePaths []string` 96 + 97 + If a platform needs delayed download, add `ImagePending bool` only where it is actually needed. LINE needs it because webhook processing and image download are split by message content API timing. Do not copy `ImagePending` into Slack or Lark unless their event flow needs the same delay. 98 + 99 + For bus messages, reuse `MessageExtensions.ImagePaths`. 100 + 101 + History items can remain text-first. If needed, add a short textual marker to the rendered current message: 102 + 103 + ```text 104 + [image attachments: 2] 105 + ``` 106 + 107 + Do not add base64 to `ChatHistoryItem`. 108 + 109 + ## 6) Shared Image Builder 110 + 111 + Telegram and LINE currently duplicate local image conversion rules. Slack and Lark should not add two more copies. 112 + 113 + Add a small shared helper under a runtime-neutral internal package, for example: 114 + 115 + ```go 116 + func BuildImageMessage(baseText string, model string, imagePaths []string, opts ImageMessageOptions) (llm.Message, error) 117 + ``` 118 + 119 + Suggested options: 120 + 121 + - max images: `3` 122 + - max bytes per image: `5 MiB` 123 + - supported MIME types: PNG, JPEG, WebP where provider/model supports it 124 + - optional WebP conversion hook only for Telegram if still needed 125 + 126 + Keep this helper about local files and `llm.Part` construction only. It should not download remote files and should not import Slack, Lark, Telegram, or LINE packages. 127 + 128 + If this extraction becomes too noisy, implement Slack/Lark with a minimal local helper first, then collapse duplication in a follow-up PR. Do not block image support on a broad refactor. 129 + 130 + ## 7) Slack Plan 131 + 132 + ### 7.1 Parse Image Metadata 133 + 134 + Extend Slack event parsing to capture image files from message events. 135 + 136 + Required fields should be the minimum needed to download and validate: 137 + 138 + - file id 139 + - MIME type or mimetype 140 + - private download URL 141 + - size when present 142 + - filename when present 143 + 144 + Ignore non-image files in V1. 145 + 146 + ### 7.2 Download to Cache 147 + 148 + Add Slack API method for authenticated file download. 149 + 150 + Rules: 151 + 152 + - Use the bot token. 153 + - Save under `file_cache_dir/slack/`. 154 + - Enforce max image bytes before or during download. 155 + - Only accept image MIME types. 156 + - Use secure child directory creation, matching existing cache rules. 157 + 158 + ### 7.3 Runtime Wiring 159 + 160 + Add `ImageRecognitionEnabled` to Slack run options and runtime task options. 161 + 162 + In `BuildSlackRunOptions`, compute it with: 163 + 164 + ```go 165 + sourceEnabled(cfg.MultimodalImageSources, "slack") 166 + ``` 167 + 168 + In the worker path: 169 + 170 + - Parse Slack file metadata from the inbound event. 171 + - If enabled, download images before enqueueing/running the job. 172 + - Put local paths on `slackJob.ImagePaths`. 173 + - Pass image paths into `buildSlackPromptMessages(...)`. 174 + 175 + ### 7.4 Prompt Message 176 + 177 + Change: 178 + 179 + ```go 180 + buildSlackPromptMessages(history, job) 181 + ``` 182 + 183 + to include model and image flag, aligned with Telegram/LINE: 184 + 185 + ```go 186 + buildSlackPromptMessages(history, job, model, imageRecognitionEnabled, logger) 187 + ``` 188 + 189 + Use image parts only for the current message, not old history. 190 + 191 + ## 8) Lark Plan 192 + 193 + ### 8.1 Accept Image Messages 194 + 195 + Extend webhook parsing beyond `message_type == "text"`. 196 + 197 + V1 should support: 198 + 199 + - text messages 200 + - image messages 201 + - image message with optional user text if Lark provides it in the event content 202 + 203 + If the message is image-only and image recognition is enabled, synthesize a small task text: 204 + 205 + ```text 206 + User sent an image. 207 + ``` 208 + 209 + If image recognition is disabled, use a clear fallback text similar to LINE: 210 + 211 + ```text 212 + User sent an image, but image recognition is disabled in the current Lark runtime. Reply briefly and ask the user to describe the image in text or enable lark in multimodal.image.sources. 213 + ``` 214 + 215 + ### 8.2 Download to Cache 216 + 217 + Add the minimum Lark API method needed to download image binary content from the message content identifier. 218 + 219 + Rules: 220 + 221 + - Use the existing tenant token client. 222 + - Save under `file_cache_dir/lark/`. 223 + - Enforce max image bytes. 224 + - Only accept image MIME types. 225 + - Keep the API method local to Lark runtime; do not create a broad Lark SDK. 226 + 227 + ### 8.3 Runtime Wiring 228 + 229 + Add `ImageRecognitionEnabled` to Lark run options and runtime task options. 230 + 231 + In `BuildLarkRunOptions`, compute it with: 232 + 233 + ```go 234 + sourceEnabled(cfg.MultimodalImageSources, "lark") 235 + ``` 236 + 237 + Add `ImagePaths` to `larkbus.InboundMessage` and `larkJob`, then pass them to prompt message building. 238 + 239 + ### 8.4 Prompt Message 240 + 241 + Change: 242 + 243 + ```go 244 + buildLarkPromptMessages(history, job) 245 + ``` 246 + 247 + to: 248 + 249 + ```go 250 + buildLarkPromptMessages(history, job, model, imageRecognitionEnabled, logger) 251 + ``` 252 + 253 + Use image parts only for the current message. 254 + 255 + ## 9) Error Handling 256 + 257 + Use text fallback instead of hard failures for normal media issues: 258 + 259 + - unsupported MIME type 260 + - image too large 261 + - download failed 262 + - model does not support image parts 263 + 264 + Hard failure is acceptable for malformed runtime state: 265 + 266 + - missing Slack channel/message identifiers 267 + - missing Lark chat/message identifiers 268 + - invalid configured cache directory 269 + 270 + Log enough context to debug: 271 + 272 + - channel 273 + - message id 274 + - image count 275 + - skipped count 276 + - reason 277 + 278 + Do not log private download URLs or base64 image data. 279 + 280 + ## 10) Tests 281 + 282 + ### Slack 283 + 284 + - Parse Slack message events with image files. 285 + - Ignore non-image files. 286 + - `BuildSlackRunOptions` enables image recognition when `slack` is in `multimodal.image.sources`. 287 + - Download helper rejects non-image MIME and oversized images. 288 + - `buildSlackPromptMessages` adds `llm.PartTypeImageBase64` for supported image models. 289 + - Unsupported image models degrade to text-only. 290 + - Bus adapter preserves `ImagePaths`. 291 + 292 + ### Lark 293 + 294 + - Parse Lark text event as before. 295 + - Parse Lark image event into inbound message. 296 + - Ignore unsupported message types. 297 + - `BuildLarkRunOptions` enables image recognition when `lark` is in `multimodal.image.sources`. 298 + - Download helper rejects non-image MIME and oversized images. 299 + - `buildLarkPromptMessages` adds image parts for supported image models. 300 + - Unsupported image models degrade to text-only. 301 + - Bus adapter preserves `ImagePaths`. 302 + 303 + ### Shared 304 + 305 + - Shared image builder covers: 306 + - max image count 307 + - max bytes 308 + - MIME detection 309 + - base64 part generation 310 + - empty image list 311 + 312 + ## 11) Suggested PR Split 313 + 314 + 1. Shared local image builder extraction. 315 + No Slack/Lark behavior change. 316 + 317 + 2. Slack image input. 318 + Event parsing, download, config flag, prompt message parts, tests. 319 + 320 + 3. Lark image input. 321 + Webhook parsing, download, config flag, prompt message parts, tests. 322 + 323 + 4. Docs/config cleanup. 324 + Update user docs and `config.example.yaml` supported source list. 325 + 326 + If the shared helper extraction starts to pull too much code around, split it after Slack and Lark work instead. The feature is the image input path, not a new media framework. 327 + 328 + ## 12) Acceptance Criteria 329 + 330 + - With `multimodal.image.sources` containing `slack`, a Slack user can send an image and ask a question about it; the selected image-capable model receives an image part. 331 + - With `multimodal.image.sources` containing `lark`, a Lark user can send an image and ask a question about it; the selected image-capable model receives an image part. 332 + - With the source disabled, the runtime replies with a short text fallback instead of silently ignoring the image. 333 + - Existing text-only Slack and Lark tests continue to pass. 334 + - Telegram and LINE image behavior remains unchanged. 335 + 336 + ## 13) Implementation Tasks 337 + 338 + 1. Add shared local image message building. 339 + Extract only the file-to-`llm.Part` path after an image is already local. Keep platform download code outside this helper. 340 + 341 + 2. Wire Slack image metadata through runtime structs. 342 + Parse Slack file metadata, add image paths to inbound messages and jobs, and keep non-image files out of the image path. 343 + 344 + 3. Add Slack authenticated image download. 345 + Download only supported image MIME types into `file_cache_dir/slack/`, enforce size limits, and pass local paths into the current LLM message. 346 + 347 + 4. Wire Lark image metadata through runtime structs. 348 + Accept text and image message events, add image paths to inbound messages and jobs, and keep history text-only. 349 + 350 + 5. Add Lark authenticated image download. 351 + Download only supported image MIME types into `file_cache_dir/lark/`, enforce size limits, and pass local paths into the current LLM message. 352 + 353 + 6. Update config support lists. 354 + Add `lark` where the UI or config template lists supported image sources. 355 + 356 + 7. Add focused tests. 357 + Cover parsing, config flags, download validation, bus image path preservation, prompt image parts, disabled image fallback, and text-only model fallback.
+31
internal/bus/adapters/lark/inbound.go
··· 29 29 Text string 30 30 MentionUsers []string 31 31 EventID string 32 + ImagePaths []string 32 33 } 33 34 34 35 type InboundAdapter struct { ··· 84 85 if err != nil { 85 86 return false, err 86 87 } 88 + imagePaths, err := normalizeImagePaths(msg.ImagePaths) 89 + if err != nil { 90 + return false, err 91 + } 87 92 88 93 now := a.nowFn().UTC() 89 94 sentAt := msg.SentAt.UTC() ··· 135 140 FromUserRef: fromUserID, 136 141 EventID: strings.TrimSpace(msg.EventID), 137 142 MentionUsers: mentionUsers, 143 + ImagePaths: imagePaths, 138 144 }, 139 145 } 140 146 return a.flow.PublishValidatedInbound(ctx, platformMessageID, busMsg) ··· 181 187 if err != nil { 182 188 return InboundMessage{}, err 183 189 } 190 + imagePaths, err := normalizeImagePaths(msg.Extensions.ImagePaths) 191 + if err != nil { 192 + return InboundMessage{}, err 193 + } 184 194 185 195 return InboundMessage{ 186 196 ChatID: chatID, ··· 192 202 Text: strings.TrimSpace(env.Text), 193 203 MentionUsers: mentionUsers, 194 204 EventID: strings.TrimSpace(msg.Extensions.EventID), 205 + ImagePaths: imagePaths, 195 206 }, nil 196 207 } 197 208 ··· 250 261 return "", fmt.Errorf("lark chat id is required") 251 262 } 252 263 return chatID, nil 264 + } 265 + 266 + func normalizeImagePaths(paths []string) ([]string, error) { 267 + if len(paths) == 0 { 268 + return nil, nil 269 + } 270 + out := make([]string, 0, len(paths)) 271 + seen := make(map[string]bool, len(paths)) 272 + for _, raw := range paths { 273 + path := strings.TrimSpace(raw) 274 + if path == "" { 275 + return nil, fmt.Errorf("image path is required") 276 + } 277 + if seen[path] { 278 + continue 279 + } 280 + seen[path] = true 281 + out = append(out, path) 282 + } 283 + return out, nil 253 284 } 254 285 255 286 func normalizeLarkChatType(raw string) (string, error) {
+8
internal/bus/adapters/lark/inbound_test.go
··· 50 50 Text: "hello lark", 51 51 MentionUsers: []string{"ou_123", "ou_456"}, 52 52 EventID: "ev_001", 53 + ImagePaths: []string{"/tmp/a.png", "/tmp/a.png", "/tmp/b.jpg"}, 53 54 SentAt: time.Date(2026, 3, 6, 1, 2, 3, 0, time.UTC), 54 55 }) 55 56 if err != nil { ··· 75 76 } 76 77 if msg.Extensions.PlatformMessageID != "oc_group123:om_1001" { 77 78 t.Fatalf("platform_message_id mismatch: got %q", msg.Extensions.PlatformMessageID) 79 + } 80 + if len(msg.Extensions.ImagePaths) != 2 { 81 + t.Fatalf("image_paths len = %d, want 2", len(msg.Extensions.ImagePaths)) 78 82 } 79 83 env, envErr := msg.Envelope() 80 84 if envErr != nil { ··· 134 138 ChannelID: "oc_group123", 135 139 EventID: "ev_001", 136 140 MentionUsers: []string{"ou_123", "ou_456"}, 141 + ImagePaths: []string{"/tmp/a.png", "/tmp/b.jpg"}, 137 142 }, 138 143 } 139 144 ··· 155 160 } 156 161 if inbound.Text != "hello lark" { 157 162 t.Fatalf("text mismatch: got %q want %q", inbound.Text, "hello lark") 163 + } 164 + if len(inbound.ImagePaths) != 2 { 165 + t.Fatalf("image_paths len = %d, want 2", len(inbound.ImagePaths)) 158 166 } 159 167 }
+31
internal/bus/adapters/slack/inbound.go
··· 31 31 SentAt time.Time 32 32 MentionUsers []string 33 33 EventID string 34 + ImagePaths []string 34 35 } 35 36 36 37 type InboundAdapter struct { ··· 94 95 if err != nil { 95 96 return false, err 96 97 } 98 + imagePaths, err := normalizeImagePaths(msg.ImagePaths) 99 + if err != nil { 100 + return false, err 101 + } 97 102 now := a.nowFn().UTC() 98 103 sentAt := msg.SentAt.UTC() 99 104 if sentAt.IsZero() { ··· 148 153 ThreadTS: threadTS, 149 154 EventID: strings.TrimSpace(msg.EventID), 150 155 MentionUsers: mentionUsers, 156 + ImagePaths: imagePaths, 151 157 }, 152 158 } 153 159 return a.flow.PublishValidatedInbound(ctx, platformMessageID, busMsg) ··· 187 193 if err != nil { 188 194 return InboundMessage{}, err 189 195 } 196 + imagePaths, err := normalizeImagePaths(msg.Extensions.ImagePaths) 197 + if err != nil { 198 + return InboundMessage{}, err 199 + } 190 200 threadTS := strings.TrimSpace(msg.Extensions.ThreadTS) 191 201 if threadTS == "" { 192 202 threadTS = strings.TrimSpace(msg.Extensions.ReplyTo) ··· 215 225 SentAt: sentAt.UTC(), 216 226 MentionUsers: mentionUsers, 217 227 EventID: strings.TrimSpace(msg.Extensions.EventID), 228 + ImagePaths: imagePaths, 218 229 }, nil 219 230 } 220 231 ··· 229 240 return nil, fmt.Errorf("mention user is required") 230 241 } 231 242 out = append(out, item) 243 + } 244 + return out, nil 245 + } 246 + 247 + func normalizeImagePaths(paths []string) ([]string, error) { 248 + if len(paths) == 0 { 249 + return nil, nil 250 + } 251 + out := make([]string, 0, len(paths)) 252 + seen := make(map[string]bool, len(paths)) 253 + for _, raw := range paths { 254 + path := strings.TrimSpace(raw) 255 + if path == "" { 256 + return nil, fmt.Errorf("image path is required") 257 + } 258 + if seen[path] { 259 + continue 260 + } 261 + seen[path] = true 262 + out = append(out, path) 232 263 } 233 264 return out, nil 234 265 }
+8
internal/bus/adapters/slack/inbound_test.go
··· 53 53 Text: "hello from slack", 54 54 MentionUsers: []string{"@alice", "@bob"}, 55 55 EventID: "Ev01", 56 + ImagePaths: []string{"/tmp/a.png", "/tmp/a.png", "/tmp/b.jpg"}, 56 57 }) 57 58 if err != nil { 58 59 t.Fatalf("HandleInboundMessage() error = %v", err) ··· 86 87 } 87 88 if msg.Extensions.EventID != "Ev01" { 88 89 t.Fatalf("event_id mismatch: got %q want %q", msg.Extensions.EventID, "Ev01") 90 + } 91 + if len(msg.Extensions.ImagePaths) != 2 { 92 + t.Fatalf("image_paths len = %d, want 2", len(msg.Extensions.ImagePaths)) 89 93 } 90 94 env, envErr := msg.Envelope() 91 95 if envErr != nil { ··· 155 159 ThreadTS: "1739667000.000050", 156 160 EventID: "Ev01", 157 161 MentionUsers: []string{"@alice", "@bob"}, 162 + ImagePaths: []string{"/tmp/a.png", "/tmp/b.jpg"}, 158 163 }, 159 164 } 160 165 inbound, err := InboundMessageFromBusMessage(msg) ··· 178 183 } 179 184 if inbound.Text != "hello from slack" { 180 185 t.Fatalf("text mismatch: got %q want %q", inbound.Text, "hello from slack") 186 + } 187 + if len(inbound.ImagePaths) != 2 { 188 + t.Fatalf("image_paths len = %d, want 2", len(inbound.ImagePaths)) 181 189 } 182 190 }
+12
internal/channelopts/options.go
··· 265 265 MemoryShortTermDays int 266 266 MemoryInjectionEnabled bool 267 267 MemoryInjectionMaxItems int 268 + MultimodalImageSources []string 268 269 } 269 270 270 271 type SlackInput struct { ··· 317 318 MemoryShortTermDays: r.GetInt("memory.short_term_days"), 318 319 MemoryInjectionEnabled: r.GetBool("memory.injection.enabled"), 319 320 MemoryInjectionMaxItems: r.GetInt("memory.injection.max_items"), 321 + MultimodalImageSources: append([]string(nil), r.GetStringSlice("multimodal.image.sources")...), 320 322 } 321 323 } 322 324 ··· 359 361 } 360 362 fileCacheDir := strings.TrimSpace(cfg.FileCacheDir) 361 363 serverListen := normalizeServerListen(cfg.ServerListen) 364 + imageRecognitionEnabled := sourceEnabled(cfg.MultimodalImageSources, "slack") 362 365 baseURL := strings.TrimSpace(in.BaseURL) 363 366 if baseURL == "" { 364 367 baseURL = strings.TrimSpace(cfg.BaseURL) ··· 389 392 MemoryShortTermDays: cfg.MemoryShortTermDays, 390 393 MemoryInjectionEnabled: cfg.MemoryInjectionEnabled, 391 394 MemoryInjectionMaxItems: cfg.MemoryInjectionMaxItems, 395 + ImageRecognitionEnabled: imageRecognitionEnabled, 392 396 Hooks: in.Hooks, 393 397 InspectPrompt: in.InspectPrompt, 394 398 InspectRequest: in.InspectRequest, ··· 446 450 TaskTimeout time.Duration 447 451 GlobalTaskTimeout time.Duration 448 452 MaxConcurrency int 453 + FileCacheDir string 449 454 ServerListen string 450 455 ServerAuthToken string 451 456 ServerMaxQueue int ··· 462 467 MemoryShortTermDays int 463 468 MemoryInjectionEnabled bool 464 469 MemoryInjectionMaxItems int 470 + MultimodalImageSources []string 465 471 } 466 472 467 473 type LarkInput struct { ··· 538 544 TaskTimeout: r.GetDuration("lark.task_timeout"), 539 545 GlobalTaskTimeout: r.GetDuration("timeout"), 540 546 MaxConcurrency: r.GetInt("lark.max_concurrency"), 547 + FileCacheDir: strings.TrimSpace(r.GetString("file_cache_dir")), 541 548 ServerListen: resolveServeListen(r, "lark.serve_listen", defaultLarkServeListen), 542 549 ServerAuthToken: strings.TrimSpace(r.GetString("server.auth_token")), 543 550 ServerMaxQueue: r.GetInt("server.max_queue"), ··· 562 569 MemoryShortTermDays: r.GetInt("memory.short_term_days"), 563 570 MemoryInjectionEnabled: r.GetBool("memory.injection.enabled"), 564 571 MemoryInjectionMaxItems: r.GetInt("memory.injection.max_items"), 572 + MultimodalImageSources: append([]string(nil), r.GetStringSlice("multimodal.image.sources")...), 565 573 } 566 574 } 567 575 ··· 674 682 if maxConcurrency <= 0 { 675 683 maxConcurrency = cfg.MaxConcurrency 676 684 } 685 + fileCacheDir := strings.TrimSpace(cfg.FileCacheDir) 677 686 serverListen := normalizeServerListen(cfg.ServerListen) 678 687 baseURL := strings.TrimSpace(in.BaseURL) 679 688 if baseURL == "" { ··· 695 704 if encryptKey == "" { 696 705 encryptKey = strings.TrimSpace(cfg.EncryptKey) 697 706 } 707 + imageRecognitionEnabled := sourceEnabled(cfg.MultimodalImageSources, "lark") 698 708 699 709 return larkruntime.RunOptions{ 700 710 AppID: strings.TrimSpace(in.AppID), ··· 705 715 AddressingInterjectThreshold: addressingInterjectThreshold, 706 716 TaskTimeout: taskTimeout, 707 717 MaxConcurrency: maxConcurrency, 718 + FileCacheDir: fileCacheDir, 708 719 ServerListen: serverListen, 709 720 ServerAuthToken: cfg.ServerAuthToken, 710 721 ServerMaxQueue: cfg.ServerMaxQueue, ··· 721 732 MemoryShortTermDays: cfg.MemoryShortTermDays, 722 733 MemoryInjectionEnabled: cfg.MemoryInjectionEnabled, 723 734 MemoryInjectionMaxItems: cfg.MemoryInjectionMaxItems, 735 + ImageRecognitionEnabled: imageRecognitionEnabled, 724 736 Hooks: in.Hooks, 725 737 InspectPrompt: in.InspectPrompt, 726 738 InspectRequest: in.InspectRequest,
+12
internal/channelopts/options_test.go
··· 146 146 MemoryShortTermDays: 9, 147 147 MemoryInjectionEnabled: true, 148 148 MemoryInjectionMaxItems: 33, 149 + MultimodalImageSources: []string{"slack"}, 149 150 }, 150 151 SlackInput{ 151 152 BotToken: "xoxb-1", ··· 167 168 } 168 169 if !opts.MemoryEnabled || opts.MemoryShortTermDays != 9 || !opts.MemoryInjectionEnabled || opts.MemoryInjectionMaxItems != 33 { 169 170 t.Fatalf("memory options mismatch: %#v", opts) 171 + } 172 + if !opts.ImageRecognitionEnabled { 173 + t.Fatalf("ImageRecognitionEnabled = false, want true when slack is in sources") 170 174 } 171 175 } 172 176 ··· 304 308 TaskTimeout: 0, 305 309 GlobalTaskTimeout: 5 * time.Minute, 306 310 MaxConcurrency: 3, 311 + FileCacheDir: "/tmp/morph-cache", 307 312 DefaultGroupTriggerMode: "smart", 308 313 DefaultAddressingConfidenceThreshold: 0.6, 309 314 DefaultAddressingInterjectThreshold: 0.6, 310 315 AgentLimits: agent.Limits{ToolRepeatLimit: 13}, 316 + MultimodalImageSources: []string{"lark"}, 311 317 }, 312 318 LarkInput{ 313 319 AppID: "cli_xxx", ··· 323 329 } 324 330 if opts.AgentLimits.ToolRepeatLimit != 13 { 325 331 t.Fatalf("agent tool repeat limit = %d, want 13", opts.AgentLimits.ToolRepeatLimit) 332 + } 333 + if opts.FileCacheDir != "/tmp/morph-cache" { 334 + t.Fatalf("file cache dir = %q, want %q", opts.FileCacheDir, "/tmp/morph-cache") 335 + } 336 + if !opts.ImageRecognitionEnabled { 337 + t.Fatalf("ImageRecognitionEnabled = false, want true when lark is in sources") 326 338 } 327 339 } 328 340
+162
internal/channelruntime/imageinput/message.go
··· 1 + package imageinput 2 + 3 + import ( 4 + "encoding/base64" 5 + "fmt" 6 + "log/slog" 7 + "os" 8 + "path/filepath" 9 + "strings" 10 + 11 + "github.com/quailyquaily/mistermorph/llm" 12 + ) 13 + 14 + type TranscodeFunc func(raw []byte, mimeType string) ([]byte, string, error) 15 + 16 + type MessageOptions struct { 17 + MaxImages int 18 + MaxBytes int64 19 + Logger *slog.Logger 20 + LogPrefix string 21 + Transcode TranscodeFunc 22 + } 23 + 24 + func BuildUserMessage(content string, model string, imagePaths []string, opts MessageOptions) (llm.Message, error) { 25 + msg := llm.Message{Role: "user", Content: content} 26 + if !llm.ModelSupportsImageParts(model) || len(imagePaths) == 0 || opts.MaxImages <= 0 || opts.MaxBytes <= 0 { 27 + return msg, nil 28 + } 29 + 30 + parts := make([]llm.Part, 0, 1+minInt(len(imagePaths), opts.MaxImages)) 31 + if strings.TrimSpace(content) != "" { 32 + parts = append(parts, llm.Part{Type: llm.PartTypeText, Text: content}) 33 + } 34 + 35 + seen := make(map[string]bool, len(imagePaths)) 36 + imageCount := 0 37 + for _, rawPath := range imagePaths { 38 + if imageCount >= opts.MaxImages { 39 + break 40 + } 41 + path := strings.TrimSpace(rawPath) 42 + if path == "" || seen[path] { 43 + continue 44 + } 45 + seen[path] = true 46 + 47 + info, err := os.Stat(path) 48 + if err != nil { 49 + logWarn(opts, "image_part_skip", "path", path, "error", err.Error()) 50 + continue 51 + } 52 + if info.Size() <= 0 { 53 + continue 54 + } 55 + if info.Size() > opts.MaxBytes { 56 + return llm.Message{}, fmt.Errorf("图片太大: %s (%d bytes > %d bytes)", filepath.Base(path), info.Size(), opts.MaxBytes) 57 + } 58 + 59 + raw, err := os.ReadFile(path) 60 + if err != nil { 61 + logWarn(opts, "image_part_read_error", "path", path, "error", err.Error()) 62 + continue 63 + } 64 + mimeType := MIMETypeFromPath(path) 65 + if !SupportedUploadMIME(mimeType) { 66 + logWarn(opts, "image_part_skip_unsupported_format", "path", path, "mime_type", mimeType) 67 + continue 68 + } 69 + if opts.Transcode != nil { 70 + transcodedRaw, transcodedMIME, transcodeErr := opts.Transcode(raw, mimeType) 71 + if transcodeErr != nil { 72 + return llm.Message{}, fmt.Errorf("图片转换失败: %s: %w", filepath.Base(path), transcodeErr) 73 + } 74 + raw = transcodedRaw 75 + mimeType = strings.TrimSpace(strings.ToLower(transcodedMIME)) 76 + if !SupportedUploadMIME(mimeType) { 77 + return llm.Message{}, fmt.Errorf("图片转换后格式不支持: %s (%s)", filepath.Base(path), mimeType) 78 + } 79 + } 80 + 81 + parts = append(parts, llm.Part{ 82 + Type: llm.PartTypeImageBase64, 83 + MIMEType: mimeType, 84 + DataBase64: base64.StdEncoding.EncodeToString(raw), 85 + }) 86 + imageCount++ 87 + } 88 + if imageCount == 0 { 89 + return msg, nil 90 + } 91 + msg.Parts = parts 92 + return msg, nil 93 + } 94 + 95 + func MIMETypeFromPath(path string) string { 96 + switch strings.ToLower(strings.TrimSpace(filepath.Ext(path))) { 97 + case ".jpg", ".jpeg": 98 + return "image/jpeg" 99 + case ".png": 100 + return "image/png" 101 + case ".webp": 102 + return "image/webp" 103 + case ".gif": 104 + return "image/gif" 105 + case ".bmp": 106 + return "image/bmp" 107 + case ".heic": 108 + return "image/heic" 109 + case ".heif": 110 + return "image/heif" 111 + default: 112 + return "" 113 + } 114 + } 115 + 116 + func NormalizeMIMEType(mimeType string) string { 117 + mimeType = strings.TrimSpace(strings.ToLower(mimeType)) 118 + if idx := strings.Index(mimeType, ";"); idx >= 0 { 119 + mimeType = strings.TrimSpace(mimeType[:idx]) 120 + } 121 + return mimeType 122 + } 123 + 124 + func SupportedUploadMIME(mimeType string) bool { 125 + switch NormalizeMIMEType(mimeType) { 126 + case "image/jpeg", "image/png", "image/webp": 127 + return true 128 + default: 129 + return false 130 + } 131 + } 132 + 133 + func ExtensionForMIMEType(mimeType string) string { 134 + switch NormalizeMIMEType(mimeType) { 135 + case "image/jpeg": 136 + return ".jpg" 137 + case "image/png": 138 + return ".png" 139 + case "image/webp": 140 + return ".webp" 141 + default: 142 + return "" 143 + } 144 + } 145 + 146 + func logWarn(opts MessageOptions, suffix string, args ...any) { 147 + if opts.Logger == nil { 148 + return 149 + } 150 + prefix := strings.TrimSpace(opts.LogPrefix) 151 + if prefix == "" { 152 + prefix = "image" 153 + } 154 + opts.Logger.Warn(prefix+"_"+suffix, args...) 155 + } 156 + 157 + func minInt(a, b int) int { 158 + if a < b { 159 + return a 160 + } 161 + return b 162 + }
+101
internal/channelruntime/imageinput/message_test.go
··· 1 + package imageinput 2 + 3 + import ( 4 + "encoding/base64" 5 + "os" 6 + "path/filepath" 7 + "testing" 8 + 9 + "github.com/quailyquaily/mistermorph/llm" 10 + ) 11 + 12 + func TestBuildUserMessageWithImageParts(t *testing.T) { 13 + t.Parallel() 14 + 15 + dir := t.TempDir() 16 + path := filepath.Join(dir, "image.png") 17 + raw := []byte("png-data") 18 + if err := os.WriteFile(path, raw, 0o600); err != nil { 19 + t.Fatalf("write image: %v", err) 20 + } 21 + 22 + msg, err := BuildUserMessage("hello", "gpt-5.2", []string{path}, MessageOptions{ 23 + MaxImages: 3, 24 + MaxBytes: 1024, 25 + }) 26 + if err != nil { 27 + t.Fatalf("BuildUserMessage() error = %v", err) 28 + } 29 + if len(msg.Parts) != 2 { 30 + t.Fatalf("parts len = %d, want 2", len(msg.Parts)) 31 + } 32 + if msg.Parts[0].Type != llm.PartTypeText || msg.Parts[0].Text != "hello" { 33 + t.Fatalf("text part mismatch: %+v", msg.Parts[0]) 34 + } 35 + if msg.Parts[1].Type != llm.PartTypeImageBase64 { 36 + t.Fatalf("image part type = %q, want %q", msg.Parts[1].Type, llm.PartTypeImageBase64) 37 + } 38 + if msg.Parts[1].MIMEType != "image/png" { 39 + t.Fatalf("image MIME = %q, want image/png", msg.Parts[1].MIMEType) 40 + } 41 + if msg.Parts[1].DataBase64 != base64.StdEncoding.EncodeToString(raw) { 42 + t.Fatalf("image data mismatch") 43 + } 44 + } 45 + 46 + func TestBuildUserMessageSkipsUnknownTypes(t *testing.T) { 47 + t.Parallel() 48 + 49 + dir := t.TempDir() 50 + path := filepath.Join(dir, "image.bin") 51 + if err := os.WriteFile(path, []byte("not-image"), 0o600); err != nil { 52 + t.Fatalf("write file: %v", err) 53 + } 54 + 55 + msg, err := BuildUserMessage("hello", "gpt-5.2", []string{path}, MessageOptions{ 56 + MaxImages: 3, 57 + MaxBytes: 1024, 58 + }) 59 + if err != nil { 60 + t.Fatalf("BuildUserMessage() error = %v", err) 61 + } 62 + if len(msg.Parts) != 0 { 63 + t.Fatalf("parts len = %d, want 0", len(msg.Parts)) 64 + } 65 + if msg.Content != "hello" { 66 + t.Fatalf("content = %q, want hello", msg.Content) 67 + } 68 + } 69 + 70 + func TestBuildUserMessageTranscode(t *testing.T) { 71 + t.Parallel() 72 + 73 + dir := t.TempDir() 74 + path := filepath.Join(dir, "image.jpg") 75 + if err := os.WriteFile(path, []byte("jpg-data"), 0o600); err != nil { 76 + t.Fatalf("write image: %v", err) 77 + } 78 + 79 + msg, err := BuildUserMessage("hello", "gpt-5.2", []string{path}, MessageOptions{ 80 + MaxImages: 3, 81 + MaxBytes: 1024, 82 + Transcode: func(raw []byte, mimeType string) ([]byte, string, error) { 83 + if mimeType != "image/jpeg" { 84 + t.Fatalf("transcode MIME = %q, want image/jpeg", mimeType) 85 + } 86 + return []byte("webp-data"), "image/webp", nil 87 + }, 88 + }) 89 + if err != nil { 90 + t.Fatalf("BuildUserMessage() error = %v", err) 91 + } 92 + if len(msg.Parts) != 2 { 93 + t.Fatalf("parts len = %d, want 2", len(msg.Parts)) 94 + } 95 + if msg.Parts[1].MIMEType != "image/webp" { 96 + t.Fatalf("image MIME = %q, want image/webp", msg.Parts[1].MIMEType) 97 + } 98 + if msg.Parts[1].DataBase64 != base64.StdEncoding.EncodeToString([]byte("webp-data")) { 99 + t.Fatalf("image data mismatch") 100 + } 101 + }
+149
internal/channelruntime/lark/images.go
··· 1 + package lark 2 + 3 + import ( 4 + "context" 5 + "fmt" 6 + "net/http" 7 + "os" 8 + "path/filepath" 9 + "strings" 10 + "time" 11 + 12 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 13 + "github.com/quailyquaily/mistermorph/internal/telegramutil" 14 + ) 15 + 16 + const ( 17 + larkLLMMaxImages = 3 18 + larkLLMMaxImageBytes = int64(5 * 1024 * 1024) 19 + ) 20 + 21 + const larkImageRecognitionDisabledPrompt = "User sent an image, but image recognition is disabled in the current Lark runtime. Reply briefly and ask the user to describe the image in text or enable lark in multimodal.image.sources." 22 + 23 + func downloadLarkImageToCache(ctx context.Context, api *larkAPI, cacheDir string, messageID string, imageKey string, maxBytes int64) (string, error) { 24 + if ctx == nil { 25 + ctx = context.Background() 26 + } 27 + if api == nil { 28 + return "", fmt.Errorf("lark api is not initialized") 29 + } 30 + cacheDir = strings.TrimSpace(cacheDir) 31 + if cacheDir == "" { 32 + return "", fmt.Errorf("lark image cache dir is required") 33 + } 34 + messageID = strings.TrimSpace(messageID) 35 + if messageID == "" { 36 + return "", fmt.Errorf("lark message id is required") 37 + } 38 + imageKey = strings.TrimSpace(imageKey) 39 + if imageKey == "" { 40 + return "", fmt.Errorf("lark image key is required") 41 + } 42 + if maxBytes <= 0 { 43 + return "", fmt.Errorf("lark image max bytes must be positive") 44 + } 45 + if err := telegramutil.EnsureSecureCacheDir(cacheDir); err != nil { 46 + return "", err 47 + } 48 + 49 + raw, mimeType, err := api.messageResource(ctx, messageID, imageKey, "image", maxBytes) 50 + if err != nil { 51 + return "", err 52 + } 53 + mimeType = imageinput.NormalizeMIMEType(mimeType) 54 + if !imageinput.SupportedUploadMIME(mimeType) { 55 + detected := imageinput.NormalizeMIMEType(http.DetectContentType(raw)) 56 + if imageinput.SupportedUploadMIME(detected) { 57 + mimeType = detected 58 + } 59 + } 60 + if !imageinput.SupportedUploadMIME(mimeType) { 61 + return "", fmt.Errorf("lark image format is not supported: %s", mimeType) 62 + } 63 + ext := imageinput.ExtensionForMIMEType(mimeType) 64 + if ext == "" { 65 + return "", fmt.Errorf("lark image extension is not supported: %s", mimeType) 66 + } 67 + 68 + pattern := "lark_" + sanitizeLarkFileToken(imageKey) + "_*" + ext 69 + tmp, err := os.CreateTemp(cacheDir, pattern) 70 + if err != nil { 71 + return "", err 72 + } 73 + tmpPath := tmp.Name() 74 + if _, err := tmp.Write(raw); err != nil { 75 + _ = tmp.Close() 76 + _ = os.Remove(tmpPath) 77 + return "", err 78 + } 79 + if err := tmp.Close(); err != nil { 80 + _ = os.Remove(tmpPath) 81 + return "", err 82 + } 83 + return tmpPath, nil 84 + } 85 + 86 + func larkImageCacheDir(fileCacheDir string) string { 87 + fileCacheDir = strings.TrimSpace(fileCacheDir) 88 + if fileCacheDir == "" { 89 + return "" 90 + } 91 + return filepath.Join(fileCacheDir, "lark") 92 + } 93 + 94 + func larkImageFallbackText(text string, imageRecognitionEnabled bool, imageCount int) string { 95 + text = strings.TrimSpace(text) 96 + if imageCount <= 0 || imageRecognitionEnabled { 97 + if text != "" { 98 + return text 99 + } 100 + return "User sent an image." 101 + } 102 + if text == "" || text == "User sent an image." { 103 + return larkImageRecognitionDisabledPrompt 104 + } 105 + return text + "\n\n" + larkImageRecognitionDisabledPrompt 106 + } 107 + 108 + func appendLarkImageReadFailure(text string) string { 109 + text = strings.TrimSpace(text) 110 + note := "Image attachment could not be read." 111 + if text == "" || text == "User sent an image." { 112 + return note 113 + } 114 + return text + "\n\n" + note 115 + } 116 + 117 + func sanitizeLarkFileToken(raw string) string { 118 + raw = strings.TrimSpace(raw) 119 + if raw == "" { 120 + return "img" 121 + } 122 + var b strings.Builder 123 + for _, r := range raw { 124 + switch { 125 + case r >= 'a' && r <= 'z': 126 + b.WriteRune(r) 127 + case r >= 'A' && r <= 'Z': 128 + b.WriteRune(r) 129 + case r >= '0' && r <= '9': 130 + b.WriteRune(r) 131 + case r == '_' || r == '-': 132 + b.WriteRune(r) 133 + default: 134 + b.WriteByte('_') 135 + } 136 + } 137 + out := strings.TrimSpace(b.String()) 138 + if out == "" { 139 + return "img" 140 + } 141 + return out 142 + } 143 + 144 + func larkImageDownloadContext(parent context.Context) (context.Context, context.CancelFunc) { 145 + if parent == nil { 146 + parent = context.Background() 147 + } 148 + return context.WithTimeout(parent, 4*time.Second) 149 + }
+80
internal/channelruntime/lark/images_test.go
··· 1 + package lark 2 + 3 + import ( 4 + "context" 5 + "net/http" 6 + "net/http/httptest" 7 + "os" 8 + "path/filepath" 9 + "testing" 10 + ) 11 + 12 + func TestDownloadLarkImageToCache(t *testing.T) { 13 + t.Parallel() 14 + 15 + raw := []byte{ 16 + 0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 17 + 0x00, 0x00, 0x00, 0x0d, 18 + } 19 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 20 + switch r.URL.Path { 21 + case "/auth/v3/tenant_access_token/internal": 22 + w.Header().Set("Content-Type", "application/json") 23 + _, _ = w.Write([]byte(`{"code":0,"tenant_access_token":"tenant-token","expire":7200}`)) 24 + case "/im/v1/messages/om_1001/resources/img_123": 25 + if r.URL.Query().Get("type") != "image" { 26 + t.Fatalf("type query = %q, want image", r.URL.Query().Get("type")) 27 + } 28 + if r.Header.Get("Authorization") != "Bearer tenant-token" { 29 + t.Fatalf("authorization = %q, want tenant token", r.Header.Get("Authorization")) 30 + } 31 + w.Header().Set("Content-Type", "image/png") 32 + _, _ = w.Write(raw) 33 + default: 34 + t.Fatalf("unexpected path: %s", r.URL.Path) 35 + } 36 + })) 37 + defer srv.Close() 38 + 39 + tokenClient := NewTenantTokenClient(srv.Client(), srv.URL, "app_id", "app_secret") 40 + api := newLarkAPI(srv.Client(), srv.URL, tokenClient) 41 + path, err := downloadLarkImageToCache(context.Background(), api, t.TempDir(), "om_1001", "img_123", 1024*1024) 42 + if err != nil { 43 + t.Fatalf("downloadLarkImageToCache() error = %v", err) 44 + } 45 + if filepath.Ext(path) != ".png" { 46 + t.Fatalf("extension = %q, want .png", filepath.Ext(path)) 47 + } 48 + got, err := os.ReadFile(path) 49 + if err != nil { 50 + t.Fatalf("read file: %v", err) 51 + } 52 + if string(got) != string(raw) { 53 + t.Fatalf("downloaded content mismatch") 54 + } 55 + } 56 + 57 + func TestDownloadLarkImageToCacheRejectsUnknownType(t *testing.T) { 58 + t.Parallel() 59 + 60 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 61 + switch r.URL.Path { 62 + case "/auth/v3/tenant_access_token/internal": 63 + w.Header().Set("Content-Type", "application/json") 64 + _, _ = w.Write([]byte(`{"code":0,"tenant_access_token":"tenant-token","expire":7200}`)) 65 + case "/im/v1/messages/om_1001/resources/img_123": 66 + w.Header().Set("Content-Type", "application/octet-stream") 67 + _, _ = w.Write([]byte("unknown")) 68 + default: 69 + t.Fatalf("unexpected path: %s", r.URL.Path) 70 + } 71 + })) 72 + defer srv.Close() 73 + 74 + tokenClient := NewTenantTokenClient(srv.Client(), srv.URL, "app_id", "app_secret") 75 + api := newLarkAPI(srv.Client(), srv.URL, tokenClient) 76 + _, err := downloadLarkImageToCache(context.Background(), api, t.TempDir(), "om_1001", "img_123", 1024*1024) 77 + if err == nil { 78 + t.Fatalf("downloadLarkImageToCache() expected error") 79 + } 80 + }
+50
internal/channelruntime/lark/lark_api.go
··· 147 147 } 148 148 return nil 149 149 } 150 + 151 + func (api *larkAPI) messageResource(ctx context.Context, messageID, fileKey, fileType string, maxBytes int64) ([]byte, string, error) { 152 + if api == nil { 153 + return nil, "", fmt.Errorf("lark api is not initialized") 154 + } 155 + if api.tokenClient == nil { 156 + return nil, "", fmt.Errorf("lark token client is not initialized") 157 + } 158 + messageID = strings.TrimSpace(messageID) 159 + fileKey = strings.TrimSpace(fileKey) 160 + fileType = strings.TrimSpace(fileType) 161 + if messageID == "" { 162 + return nil, "", fmt.Errorf("lark message id is required") 163 + } 164 + if fileKey == "" { 165 + return nil, "", fmt.Errorf("lark file key is required") 166 + } 167 + if fileType == "" { 168 + return nil, "", fmt.Errorf("lark file type is required") 169 + } 170 + if maxBytes <= 0 { 171 + return nil, "", fmt.Errorf("lark max bytes must be positive") 172 + } 173 + token, err := api.tokenClient.Token(ctx) 174 + if err != nil { 175 + return nil, "", err 176 + } 177 + endpoint := api.baseURL + "/im/v1/messages/" + url.PathEscape(messageID) + "/resources/" + url.PathEscape(fileKey) + "?type=" + url.QueryEscape(fileType) 178 + req, err := http.NewRequestWithContext(ctx, http.MethodGet, endpoint, nil) 179 + if err != nil { 180 + return nil, "", err 181 + } 182 + req.Header.Set("Authorization", "Bearer "+token) 183 + resp, err := api.http.Do(req) 184 + if err != nil { 185 + return nil, "", err 186 + } 187 + defer resp.Body.Close() 188 + raw, readErr := io.ReadAll(io.LimitReader(resp.Body, maxBytes+1)) 189 + if readErr != nil { 190 + return nil, "", readErr 191 + } 192 + if resp.StatusCode < 200 || resp.StatusCode >= 300 { 193 + return nil, "", fmt.Errorf("lark resource download http %d: %s", resp.StatusCode, strings.TrimSpace(string(raw))) 194 + } 195 + if int64(len(raw)) > maxBytes { 196 + return nil, "", fmt.Errorf("lark resource too large: > %d bytes", maxBytes) 197 + } 198 + return raw, resp.Header.Get("Content-Type"), nil 199 + }
+2
internal/channelruntime/lark/run.go
··· 27 27 AddressingInterjectThreshold float64 28 28 TaskTimeout time.Duration 29 29 MaxConcurrency int 30 + FileCacheDir string 30 31 ServerListen string 31 32 ServerAuthToken string 32 33 ServerMaxQueue int ··· 43 44 MemoryShortTermDays int 44 45 MemoryInjectionEnabled bool 45 46 MemoryInjectionMaxItems int 47 + ImageRecognitionEnabled bool 46 48 Hooks Hooks 47 49 InspectPrompt bool 48 50 InspectRequest bool
+5
internal/channelruntime/lark/runtime.go
··· 165 165 MemoryEnabled: opts.MemoryEnabled, 166 166 MemoryInjectionEnabled: opts.MemoryInjectionEnabled, 167 167 MemoryInjectionMaxItems: opts.MemoryInjectionMaxItems, 168 + ImageRecognitionEnabled: opts.ImageRecognitionEnabled, 168 169 MemoryOrchestrator: memRuntime.Orchestrator, 169 170 MemoryProjectionWorker: memRuntime.ProjectionWorker, 170 171 } ··· 397 398 FromUserID: inbound.FromUserID, 398 399 DisplayName: inbound.DisplayName, 399 400 Text: text, 401 + ImagePaths: append([]string(nil), inbound.ImagePaths...), 400 402 WorkspaceDir: workspaceDir, 401 403 SentAt: inbound.SentAt, 402 404 Version: version, ··· 489 491 EncryptKey: strings.TrimSpace(opts.EncryptKey), 490 492 Inbound: larkInboundAdapter, 491 493 AllowedChats: allowedChats, 494 + API: api, 495 + FileCacheDir: larkImageCacheDir(opts.FileCacheDir), 496 + ImageRecognition: opts.ImageRecognitionEnabled, 492 497 Logger: logger, 493 498 })) 494 499 webhookServer := &http.Server{
+10
internal/channelruntime/lark/runtime_options.go
··· 5 5 "time" 6 6 7 7 "github.com/quailyquaily/mistermorph/agent" 8 + "github.com/quailyquaily/mistermorph/internal/pathutil" 8 9 ) 9 10 10 11 type runtimeLoopOptions struct { ··· 16 17 AddressingInterjectThreshold float64 17 18 TaskTimeout time.Duration 18 19 MaxConcurrency int 20 + FileCacheDir string 19 21 ServerListen string 20 22 ServerAuthToken string 21 23 ServerMaxQueue int ··· 32 34 MemoryShortTermDays int 33 35 MemoryInjectionEnabled bool 34 36 MemoryInjectionMaxItems int 37 + ImageRecognitionEnabled bool 35 38 Hooks Hooks 36 39 InspectPrompt bool 37 40 InspectRequest bool ··· 47 50 AddressingInterjectThreshold: opts.AddressingInterjectThreshold, 48 51 TaskTimeout: opts.TaskTimeout, 49 52 MaxConcurrency: opts.MaxConcurrency, 53 + FileCacheDir: strings.TrimSpace(opts.FileCacheDir), 50 54 ServerListen: strings.TrimSpace(opts.ServerListen), 51 55 ServerAuthToken: strings.TrimSpace(opts.ServerAuthToken), 52 56 ServerMaxQueue: opts.ServerMaxQueue, ··· 63 67 MemoryShortTermDays: opts.MemoryShortTermDays, 64 68 MemoryInjectionEnabled: opts.MemoryInjectionEnabled, 65 69 MemoryInjectionMaxItems: opts.MemoryInjectionMaxItems, 70 + ImageRecognitionEnabled: opts.ImageRecognitionEnabled, 66 71 Hooks: opts.Hooks, 67 72 InspectPrompt: opts.InspectPrompt, 68 73 InspectRequest: opts.InspectRequest, ··· 75 80 opts.AppSecret = strings.TrimSpace(opts.AppSecret) 76 81 opts.AllowedChatIDs = normalizeRunStringSlice(opts.AllowedChatIDs) 77 82 opts.GroupTriggerMode = strings.ToLower(strings.TrimSpace(opts.GroupTriggerMode)) 83 + opts.FileCacheDir = strings.TrimSpace(opts.FileCacheDir) 78 84 opts.ServerListen = strings.TrimSpace(opts.ServerListen) 79 85 opts.ServerAuthToken = strings.TrimSpace(opts.ServerAuthToken) 80 86 opts.BaseURL = strings.TrimSpace(opts.BaseURL) ··· 111 117 if opts.BaseURL == "" { 112 118 opts.BaseURL = defaultLarkBaseURL 113 119 } 120 + if opts.FileCacheDir == "" { 121 + opts.FileCacheDir = "~/.cache/morph" 122 + } 123 + opts.FileCacheDir = pathutil.ExpandHomePath(opts.FileCacheDir) 114 124 if opts.ServerListen == "" { 115 125 opts.ServerListen = "127.0.0.1:8787" 116 126 }
+40 -4
internal/channelruntime/lark/runtime_task.go
··· 3 3 import ( 4 4 "context" 5 5 "fmt" 6 + "log/slog" 6 7 "strings" 7 8 "time" 8 9 ··· 10 11 "github.com/quailyquaily/mistermorph/agent" 11 12 busruntime "github.com/quailyquaily/mistermorph/internal/bus" 12 13 larkbus "github.com/quailyquaily/mistermorph/internal/bus/adapters/lark" 14 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 13 15 "github.com/quailyquaily/mistermorph/internal/channelruntime/taskruntime" 14 16 "github.com/quailyquaily/mistermorph/internal/chathistory" 15 17 "github.com/quailyquaily/mistermorph/internal/idempotency" ··· 29 31 MemoryEnabled bool 30 32 MemoryInjectionEnabled bool 31 33 MemoryInjectionMaxItems int 34 + ImageRecognitionEnabled bool 32 35 MemoryOrchestrator *memoryruntime.Orchestrator 33 36 MemoryProjectionWorker *memoryruntime.ProjectionWorker 34 37 } ··· 42 45 FromUserID string 43 46 DisplayName string 44 47 Text string 48 + ImagePaths []string 45 49 WorkspaceDir string 46 50 SentAt time.Time 47 51 Version uint64 ··· 65 69 ctx = llmstats.WithMetadata(ctx, job.TaskID, job.EventID) 66 70 ctx = pathroots.WithWorkspaceDir(ctx, job.WorkspaceDir) 67 71 ctx = builtin.WithContactsSendRuntimeContext(ctx, contactsSendRuntimeContextForLark(job)) 72 + logger := rt.Logger 68 73 task := strings.TrimSpace(job.Text) 69 74 if task == "" { 70 75 return nil, nil, nil, fmt.Errorf("empty lark task") 71 76 } 72 - historyMsg, currentMsg, err := buildLarkPromptMessages(history, job) 77 + mainRoute, err := rt.ResolveMainRouteForRun() 78 + if err != nil { 79 + return nil, nil, nil, err 80 + } 81 + mainModel := strings.TrimSpace(mainRoute.ClientConfig.Model) 82 + historyMsg, currentMsg, err := buildLarkPromptMessages(history, job, mainModel, runtimeOpts.ImageRecognitionEnabled, logger) 73 83 if err != nil { 74 84 return nil, nil, nil, err 75 85 } ··· 126 136 } 127 137 result, err := rt.Run(ctx, taskruntime.RunRequest{ 128 138 Task: task, 139 + Model: mainModel, 129 140 Scene: "lark.loop", 130 141 History: llmHistory, 131 142 Meta: meta, ··· 147 158 return result.Final, result.Context, result.LoadedSkills, nil 148 159 } 149 160 150 - func buildLarkPromptMessages(history []chathistory.ChatHistoryItem, job larkJob) (*llm.Message, *llm.Message, error) { 161 + func buildLarkPromptMessages(history []chathistory.ChatHistoryItem, job larkJob, model string, imageRecognitionEnabled bool, logger *slog.Logger) (*llm.Message, *llm.Message, error) { 151 162 historyRaw, err := chathistory.RenderHistoryContext(chathistory.ChannelLark, history) 152 163 if err != nil { 153 164 return nil, nil, fmt.Errorf("render lark history context: %w", err) ··· 161 172 if err != nil { 162 173 return nil, nil, fmt.Errorf("render lark current message: %w", err) 163 174 } 164 - current := llm.Message{Role: "user", Content: currentRaw} 175 + imagePaths := append([]string(nil), job.ImagePaths...) 176 + if !imageRecognitionEnabled { 177 + imagePaths = nil 178 + } 179 + current, err := imageinput.BuildUserMessage(currentRaw, model, imagePaths, imageinput.MessageOptions{ 180 + MaxImages: larkLLMMaxImages, 181 + MaxBytes: larkLLMMaxImageBytes, 182 + Logger: logger, 183 + LogPrefix: "lark", 184 + }) 185 + if err != nil { 186 + return nil, nil, err 187 + } 165 188 return historyMsg, &current, nil 166 189 } 167 190 ··· 216 239 ReplyToMessageID: strings.TrimSpace(job.MessageID), 217 240 SentAt: job.SentAt.UTC(), 218 241 Sender: larkSenderFromJob(job, false), 219 - Text: strings.TrimSpace(job.Text), 242 + Text: larkHistoryText(job.Text, len(job.ImagePaths)), 243 + } 244 + } 245 + 246 + func larkHistoryText(text string, imageCount int) string { 247 + text = strings.TrimSpace(text) 248 + if imageCount <= 0 { 249 + return text 220 250 } 251 + marker := fmt.Sprintf("[image attachments: %d]", imageCount) 252 + if text == "" { 253 + return marker 254 + } 255 + return text + "\n" + marker 221 256 } 222 257 223 258 func larkJobFromInbound(inbound larkbus.InboundMessage) larkJob { ··· 228 263 FromUserID: strings.TrimSpace(inbound.FromUserID), 229 264 DisplayName: strings.TrimSpace(inbound.DisplayName), 230 265 Text: strings.TrimSpace(inbound.Text), 266 + ImagePaths: append([]string(nil), inbound.ImagePaths...), 231 267 SentAt: inbound.SentAt.UTC(), 232 268 MentionUsers: append([]string(nil), inbound.MentionUsers...), 233 269 EventID: strings.TrimSpace(inbound.EventID),
+40 -2
internal/channelruntime/lark/runtime_task_test.go
··· 1 1 package lark 2 2 3 3 import ( 4 + "os" 5 + "path/filepath" 4 6 "strings" 5 7 "testing" 6 8 "time" ··· 25 27 DisplayName: "Alice", 26 28 Text: "latest", 27 29 SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 28 - }) 30 + }, "gpt-5.2", true, nil) 29 31 if err != nil { 30 32 t.Fatalf("buildLarkPromptMessages() error = %v", err) 31 33 } ··· 76 78 DisplayName: "Alice", 77 79 Text: "latest", 78 80 SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 79 - }) 81 + }, "gpt-5.2", false, nil) 80 82 if err != nil { 81 83 t.Fatalf("buildLarkPromptMessages() error = %v", err) 82 84 } ··· 87 89 t.Fatalf("current message should still be present: %#v", currentMsg) 88 90 } 89 91 } 92 + 93 + func TestBuildLarkPromptMessagesWithImageParts(t *testing.T) { 94 + t.Parallel() 95 + 96 + dir := t.TempDir() 97 + path := filepath.Join(dir, "image.png") 98 + if err := os.WriteFile(path, []byte("png-data"), 0o600); err != nil { 99 + t.Fatalf("write image: %v", err) 100 + } 101 + 102 + historyMsg, currentMsg, err := buildLarkPromptMessages(nil, larkJob{ 103 + ChatID: "oc_123", 104 + ChatType: "group", 105 + MessageID: "102", 106 + FromUserID: "ou_123", 107 + DisplayName: "Alice", 108 + Text: "latest", 109 + ImagePaths: []string{path}, 110 + SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 111 + }, "gpt-5.2", true, nil) 112 + if err != nil { 113 + t.Fatalf("buildLarkPromptMessages() error = %v", err) 114 + } 115 + if historyMsg != nil { 116 + t.Fatalf("historyMsg should be nil") 117 + } 118 + if currentMsg == nil { 119 + t.Fatalf("currentMsg = nil") 120 + } 121 + if len(currentMsg.Parts) != 2 { 122 + t.Fatalf("current parts len = %d, want 2", len(currentMsg.Parts)) 123 + } 124 + if currentMsg.Parts[1].MIMEType != "image/png" { 125 + t.Fatalf("image MIME = %q, want image/png", currentMsg.Parts[1].MIMEType) 126 + } 127 + }
+96 -35
internal/channelruntime/lark/webhook.go
··· 30 30 EncryptKey string 31 31 Inbound *larkbus.InboundAdapter 32 32 AllowedChats map[string]bool 33 + API *larkAPI 34 + FileCacheDir string 35 + ImageRecognition bool 33 36 Logger *slog.Logger 34 37 } 35 38 ··· 95 98 Text string `json:"text,omitempty"` 96 99 } 97 100 101 + type larkImageContent struct { 102 + ImageKey string `json:"image_key,omitempty"` 103 + Text string `json:"text,omitempty"` 104 + } 105 + 106 + type larkParsedInboundMessage struct { 107 + Message larkbus.InboundMessage 108 + ImageKeys []string 109 + } 110 + 98 111 func newLarkWebhookHandler(opts larkWebhookHandlerOptions) http.Handler { 99 112 verificationToken := strings.TrimSpace(opts.VerificationToken) 100 113 encryptKey := strings.TrimSpace(opts.EncryptKey) ··· 138 151 http.Error(w, "invalid json", http.StatusBadRequest) 139 152 return 140 153 } 141 - inbound, ok, normalizeErr := inboundMessageFromWebhookEvent(payload, allowedChats) 154 + parsed, ok, normalizeErr := inboundMessageFromWebhookEvent(payload, allowedChats) 142 155 if normalizeErr != nil { 143 156 logLarkWebhookWarn(opts.Logger, "lark_webhook_event_invalid", 144 157 "event_id", strings.TrimSpace(payload.Header.GetEventID()), ··· 148 161 return 149 162 } 150 163 if ok { 164 + inbound := parsed.Message 165 + if len(parsed.ImageKeys) > 0 { 166 + inbound.Text = larkImageFallbackText(inbound.Text, opts.ImageRecognition, len(parsed.ImageKeys)) 167 + if opts.ImageRecognition { 168 + for _, imageKey := range parsed.ImageKeys { 169 + if len(inbound.ImagePaths) >= larkLLMMaxImages { 170 + break 171 + } 172 + imageCtx, cancelImage := larkImageDownloadContext(r.Context()) 173 + path, imageErr := downloadLarkImageToCache(imageCtx, opts.API, strings.TrimSpace(opts.FileCacheDir), inbound.MessageID, imageKey, larkLLMMaxImageBytes) 174 + cancelImage() 175 + if imageErr != nil { 176 + logLarkWebhookWarn(opts.Logger, "lark_image_download_failed", 177 + "event_id", strings.TrimSpace(payload.Header.GetEventID()), 178 + "chat_id", strings.TrimSpace(inbound.ChatID), 179 + "message_id", strings.TrimSpace(inbound.MessageID), 180 + "image_key", strings.TrimSpace(imageKey), 181 + "error", imageErr.Error(), 182 + ) 183 + continue 184 + } 185 + inbound.ImagePaths = append(inbound.ImagePaths, path) 186 + } 187 + if len(inbound.ImagePaths) == 0 { 188 + inbound.Text = appendLarkImageReadFailure(inbound.Text) 189 + } 190 + } 191 + } 151 192 accepted, publishErr := opts.Inbound.HandleInboundMessage(r.Context(), inbound) 152 193 if publishErr != nil { 153 194 logLarkWebhookWarn(opts.Logger, "lark_webhook_publish_error", ··· 254 295 return nil 255 296 } 256 297 257 - func inboundMessageFromWebhookEvent(payload larkWebhookEnvelope, allowedChats map[string]bool) (larkbus.InboundMessage, bool, error) { 298 + func inboundMessageFromWebhookEvent(payload larkWebhookEnvelope, allowedChats map[string]bool) (larkParsedInboundMessage, bool, error) { 258 299 if payload.Event == nil { 259 - return larkbus.InboundMessage{}, false, nil 300 + return larkParsedInboundMessage{}, false, nil 260 301 } 261 302 event := payload.Event 262 303 if !strings.EqualFold(strings.TrimSpace(event.Sender.SenderType), "user") { 263 - return larkbus.InboundMessage{}, false, nil 304 + return larkParsedInboundMessage{}, false, nil 264 305 } 265 306 chatID := strings.TrimSpace(event.Message.ChatID) 266 307 if chatID == "" { 267 - return larkbus.InboundMessage{}, false, fmt.Errorf("chat_id is required") 308 + return larkParsedInboundMessage{}, false, fmt.Errorf("chat_id is required") 268 309 } 269 310 if len(allowedChats) > 0 && !allowedChats[chatID] { 270 - return larkbus.InboundMessage{}, false, nil 311 + return larkParsedInboundMessage{}, false, nil 271 312 } 272 313 messageID := strings.TrimSpace(event.Message.MessageID) 273 314 if messageID == "" { 274 - return larkbus.InboundMessage{}, false, fmt.Errorf("message_id is required") 315 + return larkParsedInboundMessage{}, false, fmt.Errorf("message_id is required") 275 316 } 276 317 chatType, err := normalizeLarkInboundChatType(event.Message.ChatType) 277 318 if err != nil { 278 - return larkbus.InboundMessage{}, false, err 319 + return larkParsedInboundMessage{}, false, err 279 320 } 280 321 fromUserID := strings.TrimSpace(event.Sender.SenderID.OpenID) 281 322 if fromUserID == "" { 282 - return larkbus.InboundMessage{}, false, fmt.Errorf("from_user_id is required") 323 + return larkParsedInboundMessage{}, false, fmt.Errorf("from_user_id is required") 283 324 } 284 - text, ok, err := extractLarkTextContent(event.Message.MessageType, event.Message.Content) 325 + text, imageKeys, ok, err := extractLarkMessageContent(event.Message.MessageType, event.Message.Content) 285 326 if err != nil { 286 - return larkbus.InboundMessage{}, false, err 327 + return larkParsedInboundMessage{}, false, err 287 328 } 288 329 if !ok { 289 - return larkbus.InboundMessage{}, false, nil 330 + return larkParsedInboundMessage{}, false, nil 290 331 } 291 - return larkbus.InboundMessage{ 292 - ChatID: chatID, 293 - MessageID: messageID, 294 - SentAt: parseLarkEventTime(event.Message.CreateTime), 295 - ChatType: chatType, 296 - FromUserID: fromUserID, 297 - DisplayName: "", 298 - Text: text, 299 - MentionUsers: collectLarkMentionUsers(event.Message.Mentions), 300 - EventID: strings.TrimSpace(payload.Header.GetEventID()), 332 + return larkParsedInboundMessage{ 333 + Message: larkbus.InboundMessage{ 334 + ChatID: chatID, 335 + MessageID: messageID, 336 + SentAt: parseLarkEventTime(event.Message.CreateTime), 337 + ChatType: chatType, 338 + FromUserID: fromUserID, 339 + DisplayName: "", 340 + Text: text, 341 + MentionUsers: collectLarkMentionUsers(event.Message.Mentions), 342 + EventID: strings.TrimSpace(payload.Header.GetEventID()), 343 + }, 344 + ImageKeys: imageKeys, 301 345 }, true, nil 302 346 } 303 347 ··· 315 359 } 316 360 } 317 361 318 - func extractLarkTextContent(messageType, content string) (string, bool, error) { 319 - if !strings.EqualFold(strings.TrimSpace(messageType), "text") { 320 - return "", false, nil 321 - } 362 + func extractLarkMessageContent(messageType, content string) (string, []string, bool, error) { 363 + messageType = strings.ToLower(strings.TrimSpace(messageType)) 322 364 content = strings.TrimSpace(content) 323 365 if content == "" { 324 - return "", false, nil 366 + return "", nil, false, nil 325 367 } 326 - var textContent larkTextContent 327 - if err := json.Unmarshal([]byte(content), &textContent); err != nil { 328 - return "", false, fmt.Errorf("invalid text content") 329 - } 330 - text := strings.TrimSpace(textContent.Text) 331 - if text == "" { 332 - return "", false, nil 368 + switch messageType { 369 + case "text": 370 + var textContent larkTextContent 371 + if err := json.Unmarshal([]byte(content), &textContent); err != nil { 372 + return "", nil, false, fmt.Errorf("invalid text content") 373 + } 374 + text := strings.TrimSpace(textContent.Text) 375 + if text == "" { 376 + return "", nil, false, nil 377 + } 378 + return text, nil, true, nil 379 + case "image": 380 + var imageContent larkImageContent 381 + if err := json.Unmarshal([]byte(content), &imageContent); err != nil { 382 + return "", nil, false, fmt.Errorf("invalid image content") 383 + } 384 + imageKey := strings.TrimSpace(imageContent.ImageKey) 385 + if imageKey == "" { 386 + return "", nil, false, nil 387 + } 388 + text := strings.TrimSpace(imageContent.Text) 389 + if text == "" { 390 + text = "User sent an image." 391 + } 392 + return text, []string{imageKey}, true, nil 393 + default: 394 + return "", nil, false, nil 333 395 } 334 - return text, true, nil 335 396 } 336 397 337 398 func collectLarkMentionUsers(items []larkWebhookMentionEvent) []string {
+41 -1
internal/channelruntime/lark/webhook_test.go
··· 102 102 }, 103 103 } 104 104 105 - msg, ok, err := inboundMessageFromWebhookEvent(payload, map[string]bool{}) 105 + parsed, ok, err := inboundMessageFromWebhookEvent(payload, map[string]bool{}) 106 106 if err != nil { 107 107 t.Fatalf("inboundMessageFromWebhookEvent() error = %v", err) 108 108 } 109 109 if !ok { 110 110 t.Fatalf("inboundMessageFromWebhookEvent() ok=false, want true") 111 111 } 112 + msg := parsed.Message 112 113 if msg.ChatID != "oc_group123" { 113 114 t.Fatalf("chat_id mismatch: got %q want %q", msg.ChatID, "oc_group123") 114 115 } ··· 133 134 wantSentAt := time.UnixMilli(1760000000123).UTC() 134 135 if !msg.SentAt.Equal(wantSentAt) { 135 136 t.Fatalf("sent_at = %s, want %s", msg.SentAt.Format(time.RFC3339Nano), wantSentAt.Format(time.RFC3339Nano)) 137 + } 138 + } 139 + 140 + func TestInboundMessageFromWebhookEventImage(t *testing.T) { 141 + t.Parallel() 142 + 143 + payload := larkWebhookEnvelope{ 144 + Header: &larkWebhookHeader{ 145 + EventID: "ev_img", 146 + EventType: "im.message.receive_v1", 147 + }, 148 + Event: &larkWebhookEvent{ 149 + Sender: larkWebhookSender{ 150 + SenderType: "user", 151 + SenderID: larkWebhookUserID{OpenID: "ou_123"}, 152 + }, 153 + Message: larkWebhookMessage{ 154 + MessageID: "om_1001", 155 + CreateTime: "1760000000123", 156 + ChatID: "oc_group123", 157 + ChatType: "group", 158 + MessageType: "image", 159 + Content: `{"image_key":"img_123"}`, 160 + }, 161 + }, 162 + } 163 + 164 + parsed, ok, err := inboundMessageFromWebhookEvent(payload, map[string]bool{}) 165 + if err != nil { 166 + t.Fatalf("inboundMessageFromWebhookEvent() error = %v", err) 167 + } 168 + if !ok { 169 + t.Fatalf("inboundMessageFromWebhookEvent() ok=false, want true") 170 + } 171 + if parsed.Message.Text != "User sent an image." { 172 + t.Fatalf("text = %q, want image fallback", parsed.Message.Text) 173 + } 174 + if len(parsed.ImageKeys) != 1 || parsed.ImageKeys[0] != "img_123" { 175 + t.Fatalf("image keys = %#v, want [img_123]", parsed.ImageKeys) 136 176 } 137 177 } 138 178
+10 -114
internal/channelruntime/line/images.go
··· 2 2 3 3 import ( 4 4 "context" 5 - "encoding/base64" 6 5 "fmt" 7 6 "log/slog" 8 7 "os" 9 8 "path/filepath" 10 9 "strings" 11 10 11 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 12 12 "github.com/quailyquaily/mistermorph/internal/telegramutil" 13 13 "github.com/quailyquaily/mistermorph/llm" 14 14 ) ··· 18 18 lineLLMMaxImageBytes = int64(5 * 1024 * 1024) 19 19 ) 20 20 21 - func buildLineHistoryMessage(content string, model string, imagePaths []string, logger *slog.Logger) (llm.Message, error) { 22 - return buildLineCurrentMessage(content, model, imagePaths, logger) 23 - } 24 - 25 21 func buildLineCurrentMessage(content string, model string, imagePaths []string, logger *slog.Logger) (llm.Message, error) { 26 - msg := llm.Message{Role: "user", Content: content} 27 - if !llm.ModelSupportsImageParts(model) || len(imagePaths) == 0 { 28 - return msg, nil 29 - } 30 - parts := make([]llm.Part, 0, 1+min(len(imagePaths), lineLLMMaxImages)) 31 - if strings.TrimSpace(content) != "" { 32 - parts = append(parts, llm.Part{Type: llm.PartTypeText, Text: content}) 33 - } 34 - 35 - seen := make(map[string]bool, len(imagePaths)) 36 - imageCount := 0 37 - for _, rawPath := range imagePaths { 38 - if imageCount >= lineLLMMaxImages { 39 - break 40 - } 41 - path := strings.TrimSpace(rawPath) 42 - if path == "" || seen[path] { 43 - continue 44 - } 45 - seen[path] = true 46 - 47 - info, err := os.Stat(path) 48 - if err != nil { 49 - if logger != nil { 50 - logger.Warn("line_image_part_skip", "path", path, "error", err.Error()) 51 - } 52 - continue 53 - } 54 - if info.Size() <= 0 { 55 - continue 56 - } 57 - if info.Size() > lineLLMMaxImageBytes { 58 - return llm.Message{}, fmt.Errorf("图片太大: %s (%d bytes > %d bytes)", filepath.Base(path), info.Size(), lineLLMMaxImageBytes) 59 - } 60 - 61 - raw, err := os.ReadFile(path) 62 - if err != nil { 63 - if logger != nil { 64 - logger.Warn("line_image_part_read_error", "path", path, "error", err.Error()) 65 - } 66 - continue 67 - } 68 - mimeType := lineImageMIMEType(path) 69 - if !isLineSupportedUploadImageMIME(mimeType) { 70 - if logger != nil { 71 - logger.Warn("line_image_part_skip_unsupported_format", "path", path, "mime_type", mimeType) 72 - } 73 - continue 74 - } 75 - 76 - parts = append(parts, llm.Part{ 77 - Type: llm.PartTypeImageBase64, 78 - MIMEType: mimeType, 79 - DataBase64: base64.StdEncoding.EncodeToString(raw), 80 - }) 81 - imageCount++ 82 - } 83 - if imageCount == 0 { 84 - return msg, nil 85 - } 86 - msg.Parts = parts 87 - return msg, nil 22 + return imageinput.BuildUserMessage(content, model, imagePaths, imageinput.MessageOptions{ 23 + MaxImages: lineLLMMaxImages, 24 + MaxBytes: lineLLMMaxImageBytes, 25 + Logger: logger, 26 + LogPrefix: "line", 27 + }) 88 28 } 89 29 90 30 func downloadLineImageToCache(ctx context.Context, api *lineAPI, cacheDir string, messageID string, maxBytes int64) (string, error) { ··· 113 53 if err != nil { 114 54 return "", err 115 55 } 116 - mimeType = lineNormalizeMIMEType(mimeType) 117 - if !isLineSupportedUploadImageMIME(mimeType) { 56 + mimeType = imageinput.NormalizeMIMEType(mimeType) 57 + if !imageinput.SupportedUploadMIME(mimeType) { 118 58 return "", fmt.Errorf("line image format is not supported: %s", mimeType) 119 59 } 120 - ext := lineImageExtFromMIMEType(mimeType) 60 + ext := imageinput.ExtensionForMIMEType(mimeType) 121 61 if ext == "" { 122 62 return "", fmt.Errorf("line image extension is not supported: %s", mimeType) 123 63 } ··· 162 102 return fmt.Errorf("child dir is not under parent dir: %s", childAbs) 163 103 } 164 104 return telegramutil.EnsureSecureCacheDir(childAbs) 165 - } 166 - 167 - func lineNormalizeMIMEType(mimeType string) string { 168 - mimeType = strings.TrimSpace(strings.ToLower(mimeType)) 169 - if idx := strings.Index(mimeType, ";"); idx >= 0 { 170 - mimeType = strings.TrimSpace(mimeType[:idx]) 171 - } 172 - return mimeType 173 - } 174 - 175 - func lineImageExtFromMIMEType(mimeType string) string { 176 - switch lineNormalizeMIMEType(mimeType) { 177 - case "image/jpeg": 178 - return ".jpg" 179 - case "image/png": 180 - return ".png" 181 - case "image/webp": 182 - return ".webp" 183 - default: 184 - return "" 185 - } 186 - } 187 - 188 - func lineImageMIMEType(path string) string { 189 - ext := strings.ToLower(strings.TrimSpace(filepath.Ext(path))) 190 - switch ext { 191 - case ".jpg", ".jpeg": 192 - return "image/jpeg" 193 - case ".png": 194 - return "image/png" 195 - case ".webp": 196 - return "image/webp" 197 - default: 198 - return "" 199 - } 200 - } 201 - 202 - func isLineSupportedUploadImageMIME(mimeType string) bool { 203 - switch lineNormalizeMIMEType(mimeType) { 204 - case "image/jpeg", "image/png", "image/webp": 205 - return true 206 - default: 207 - return false 208 - } 209 105 } 210 106 211 107 func sanitizeLineFileToken(raw string) string {
+30 -9
internal/channelruntime/line/images_test.go
··· 27 27 0x42, 0x60, 0x82, 28 28 } 29 29 30 - func TestBuildLineHistoryMessageWithImageParts(t *testing.T) { 30 + func TestBuildLineCurrentMessageWithImageParts(t *testing.T) { 31 31 t.Parallel() 32 32 33 33 dir := t.TempDir() ··· 36 36 t.Fatalf("write image: %v", err) 37 37 } 38 38 39 - msg, err := buildLineHistoryMessage("hello", "gpt-5.2", []string{path}, nil) 39 + msg, err := buildLineCurrentMessage("hello", "gpt-5.2", []string{path}, nil) 40 40 if err != nil { 41 - t.Fatalf("buildLineHistoryMessage() error = %v", err) 41 + t.Fatalf("buildLineCurrentMessage() error = %v", err) 42 42 } 43 43 if len(msg.Parts) != 2 { 44 44 t.Fatalf("parts len = %d, want 2", len(msg.Parts)) ··· 61 61 } 62 62 } 63 63 64 - func TestBuildLineHistoryMessageUnsupportedModel(t *testing.T) { 64 + func TestBuildLineCurrentMessageUnsupportedModel(t *testing.T) { 65 65 t.Parallel() 66 66 67 - msg, err := buildLineHistoryMessage("hello", "text-only-model", []string{"/tmp/x.png"}, nil) 67 + msg, err := buildLineCurrentMessage("hello", "text-only-model", []string{"/tmp/x.png"}, nil) 68 68 if err != nil { 69 - t.Fatalf("buildLineHistoryMessage() error = %v", err) 69 + t.Fatalf("buildLineCurrentMessage() error = %v", err) 70 70 } 71 71 if len(msg.Parts) != 0 { 72 72 t.Fatalf("parts len = %d, want 0", len(msg.Parts)) ··· 151 151 } 152 152 } 153 153 154 - func TestBuildLineHistoryMessageImageTooLarge(t *testing.T) { 154 + func TestBuildLineCurrentMessageImageTooLarge(t *testing.T) { 155 155 t.Parallel() 156 156 157 157 dir := t.TempDir() ··· 161 161 t.Fatalf("write image: %v", err) 162 162 } 163 163 164 - _, err := buildLineHistoryMessage("hello", "gpt-5.2", []string{path}, nil) 164 + _, err := buildLineCurrentMessage("hello", "gpt-5.2", []string{path}, nil) 165 165 if err == nil { 166 - t.Fatalf("buildLineHistoryMessage() expected error") 166 + t.Fatalf("buildLineCurrentMessage() expected error") 167 167 } 168 168 if !strings.Contains(err.Error(), "图片太大") { 169 169 t.Fatalf("error = %v, want 图片太大", err) 170 + } 171 + } 172 + 173 + func TestBuildLineCurrentMessageSkipsUnknownFileTypes(t *testing.T) { 174 + t.Parallel() 175 + 176 + dir := t.TempDir() 177 + path := filepath.Join(dir, "image.bin") 178 + if err := os.WriteFile(path, []byte("not-an-image"), 0o600); err != nil { 179 + t.Fatalf("write file: %v", err) 180 + } 181 + 182 + msg, err := buildLineCurrentMessage("hello", "gpt-5.2", []string{path}, nil) 183 + if err != nil { 184 + t.Fatalf("buildLineCurrentMessage() error = %v", err) 185 + } 186 + if len(msg.Parts) != 0 { 187 + t.Fatalf("parts len = %d, want 0", len(msg.Parts)) 188 + } 189 + if msg.Content != "hello" { 190 + t.Fatalf("content = %q, want hello", msg.Content) 170 191 } 171 192 } 172 193
+1 -4
internal/channelruntime/line/runtime_task.go
··· 146 146 } 147 147 var historyMsg *llm.Message 148 148 if strings.TrimSpace(historyRaw) != "" { 149 - msg, buildErr := buildLineHistoryMessage(historyRaw, model, nil, logger) 150 - if buildErr != nil { 151 - return nil, nil, buildErr 152 - } 149 + msg := llm.Message{Role: "user", Content: historyRaw} 153 150 historyMsg = &msg 154 151 } 155 152
+168
internal/channelruntime/slack/images.go
··· 1 + package slack 2 + 3 + import ( 4 + "context" 5 + "fmt" 6 + "io" 7 + "net/http" 8 + "os" 9 + "path/filepath" 10 + "strings" 11 + "time" 12 + 13 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 14 + "github.com/quailyquaily/mistermorph/internal/telegramutil" 15 + ) 16 + 17 + const ( 18 + slackLLMMaxImages = 3 19 + slackLLMMaxImageBytes = int64(5 * 1024 * 1024) 20 + ) 21 + 22 + const slackImageRecognitionDisabledPrompt = "User sent an image, but image recognition is disabled in the current Slack runtime. Reply briefly and ask the user to describe the image in text or enable slack in multimodal.image.sources." 23 + 24 + func downloadSlackImageToCache(ctx context.Context, api *slackAPI, cacheDir string, file slackEventFile, maxBytes int64) (string, error) { 25 + if ctx == nil { 26 + ctx = context.Background() 27 + } 28 + if api == nil || api.http == nil { 29 + return "", fmt.Errorf("slack api is not initialized") 30 + } 31 + cacheDir = strings.TrimSpace(cacheDir) 32 + if cacheDir == "" { 33 + return "", fmt.Errorf("slack image cache dir is required") 34 + } 35 + if maxBytes <= 0 { 36 + return "", fmt.Errorf("slack image max bytes must be positive") 37 + } 38 + if file.Size > maxBytes { 39 + return "", fmt.Errorf("slack image too large: %d bytes > %d bytes", file.Size, maxBytes) 40 + } 41 + mimeType := imageinput.NormalizeMIMEType(slackFileMIMEType(file)) 42 + if !imageinput.SupportedUploadMIME(mimeType) { 43 + return "", fmt.Errorf("slack image format is not supported: %s", mimeType) 44 + } 45 + ext := imageinput.ExtensionForMIMEType(mimeType) 46 + if ext == "" { 47 + return "", fmt.Errorf("slack image extension is not supported: %s", mimeType) 48 + } 49 + downloadURL := slackFileDownloadURL(file) 50 + if downloadURL == "" { 51 + return "", fmt.Errorf("slack image download url is required") 52 + } 53 + if err := telegramutil.EnsureSecureCacheDir(cacheDir); err != nil { 54 + return "", err 55 + } 56 + 57 + req, err := http.NewRequestWithContext(ctx, http.MethodGet, downloadURL, nil) 58 + if err != nil { 59 + return "", err 60 + } 61 + req.Header.Set("Authorization", "Bearer "+strings.TrimSpace(api.botToken)) 62 + resp, err := api.http.Do(req) 63 + if err != nil { 64 + return "", err 65 + } 66 + defer resp.Body.Close() 67 + if resp.StatusCode < 200 || resp.StatusCode >= 300 { 68 + raw, _ := io.ReadAll(io.LimitReader(resp.Body, 4096)) 69 + msg := strings.TrimSpace(string(raw)) 70 + if msg == "" { 71 + return "", fmt.Errorf("slack image download http %d", resp.StatusCode) 72 + } 73 + return "", fmt.Errorf("slack image download http %d: %s", resp.StatusCode, msg) 74 + } 75 + raw, err := io.ReadAll(io.LimitReader(resp.Body, maxBytes+1)) 76 + if err != nil { 77 + return "", err 78 + } 79 + if int64(len(raw)) > maxBytes { 80 + return "", fmt.Errorf("slack image too large: > %d bytes", maxBytes) 81 + } 82 + 83 + token := strings.TrimSpace(file.ID) 84 + if token == "" { 85 + token = strings.TrimSpace(file.Name) 86 + } 87 + pattern := "slack_" + sanitizeSlackFileToken(token) + "_*" + ext 88 + tmp, err := os.CreateTemp(cacheDir, pattern) 89 + if err != nil { 90 + return "", err 91 + } 92 + tmpPath := tmp.Name() 93 + if _, err := tmp.Write(raw); err != nil { 94 + _ = tmp.Close() 95 + _ = os.Remove(tmpPath) 96 + return "", err 97 + } 98 + if err := tmp.Close(); err != nil { 99 + _ = os.Remove(tmpPath) 100 + return "", err 101 + } 102 + return tmpPath, nil 103 + } 104 + 105 + func slackImageCacheDir(fileCacheDir string) string { 106 + fileCacheDir = strings.TrimSpace(fileCacheDir) 107 + if fileCacheDir == "" { 108 + return "" 109 + } 110 + return filepath.Join(fileCacheDir, "slack") 111 + } 112 + 113 + func slackImageFallbackText(text string, imageRecognitionEnabled bool, imageCount int) string { 114 + text = strings.TrimSpace(text) 115 + if imageCount <= 0 || imageRecognitionEnabled { 116 + if text != "" { 117 + return text 118 + } 119 + return "User sent an image." 120 + } 121 + if text == "" { 122 + return slackImageRecognitionDisabledPrompt 123 + } 124 + return text + "\n\n" + slackImageRecognitionDisabledPrompt 125 + } 126 + 127 + func appendSlackImageReadFailure(text string) string { 128 + text = strings.TrimSpace(text) 129 + note := "Image attachment could not be read." 130 + if text == "" || text == "User sent an image." { 131 + return note 132 + } 133 + return text + "\n\n" + note 134 + } 135 + 136 + func sanitizeSlackFileToken(raw string) string { 137 + raw = strings.TrimSpace(raw) 138 + if raw == "" { 139 + return "img" 140 + } 141 + var b strings.Builder 142 + for _, r := range raw { 143 + switch { 144 + case r >= 'a' && r <= 'z': 145 + b.WriteRune(r) 146 + case r >= 'A' && r <= 'Z': 147 + b.WriteRune(r) 148 + case r >= '0' && r <= '9': 149 + b.WriteRune(r) 150 + case r == '_' || r == '-': 151 + b.WriteRune(r) 152 + default: 153 + b.WriteByte('_') 154 + } 155 + } 156 + out := strings.TrimSpace(b.String()) 157 + if out == "" { 158 + return "img" 159 + } 160 + return out 161 + } 162 + 163 + func slackImageDownloadContext(parent context.Context) (context.Context, context.CancelFunc) { 164 + if parent == nil { 165 + parent = context.Background() 166 + } 167 + return context.WithTimeout(parent, 20*time.Second) 168 + }
+60
internal/channelruntime/slack/images_test.go
··· 1 + package slack 2 + 3 + import ( 4 + "context" 5 + "net/http" 6 + "net/http/httptest" 7 + "os" 8 + "path/filepath" 9 + "testing" 10 + ) 11 + 12 + func TestDownloadSlackImageToCache(t *testing.T) { 13 + t.Parallel() 14 + 15 + raw := []byte("png-data") 16 + srv := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) { 17 + if r.Header.Get("Authorization") != "Bearer xoxb-token" { 18 + t.Fatalf("authorization = %q, want bot token", r.Header.Get("Authorization")) 19 + } 20 + w.Header().Set("Content-Type", "image/png") 21 + _, _ = w.Write(raw) 22 + })) 23 + defer srv.Close() 24 + 25 + api := newSlackAPI(srv.Client(), "", "xoxb-token", "xapp-token") 26 + path, err := downloadSlackImageToCache(context.Background(), api, t.TempDir(), slackEventFile{ 27 + ID: "F111", 28 + Mimetype: "image/png", 29 + URLPrivateDownload: srv.URL + "/file", 30 + Size: int64(len(raw)), 31 + }, 1024*1024) 32 + if err != nil { 33 + t.Fatalf("downloadSlackImageToCache() error = %v", err) 34 + } 35 + if filepath.Ext(path) != ".png" { 36 + t.Fatalf("extension = %q, want .png", filepath.Ext(path)) 37 + } 38 + got, err := os.ReadFile(path) 39 + if err != nil { 40 + t.Fatalf("read file: %v", err) 41 + } 42 + if string(got) != string(raw) { 43 + t.Fatalf("downloaded content mismatch") 44 + } 45 + } 46 + 47 + func TestDownloadSlackImageToCacheRejectsUnknownType(t *testing.T) { 48 + t.Parallel() 49 + 50 + api := newSlackAPI(http.DefaultClient, "", "xoxb-token", "xapp-token") 51 + _, err := downloadSlackImageToCache(context.Background(), api, t.TempDir(), slackEventFile{ 52 + ID: "F111", 53 + Mimetype: "application/octet-stream", 54 + URLPrivateDownload: "https://files.slack.test/file", 55 + Size: 10, 56 + }, 1024*1024) 57 + if err == nil { 58 + t.Fatalf("downloadSlackImageToCache() expected error") 59 + } 60 + }
+51
internal/channelruntime/slack/runtime.go
··· 50 50 MemoryShortTermDays int 51 51 MemoryInjectionEnabled bool 52 52 MemoryInjectionMaxItems int 53 + ImageRecognitionEnabled bool 53 54 Hooks Hooks 54 55 InspectPrompt bool 55 56 InspectRequest bool ··· 75 76 Username string 76 77 DisplayName string 77 78 Text string 79 + ImagePaths []string 78 80 WorkspaceDir string 79 81 SentAt time.Time 80 82 Version uint64 ··· 301 303 MemoryEnabled: opts.MemoryEnabled, 302 304 MemoryInjectionEnabled: opts.MemoryInjectionEnabled, 303 305 MemoryInjectionMaxItems: opts.MemoryInjectionMaxItems, 306 + ImageRecognitionEnabled: opts.ImageRecognitionEnabled, 304 307 MemoryOrchestrator: memRuntime.Orchestrator, 305 308 MemoryProjectionWorker: memRuntime.ProjectionWorker, 306 309 } ··· 651 654 Username: inbound.Username, 652 655 DisplayName: inbound.DisplayName, 653 656 Text: text, 657 + ImagePaths: append([]string(nil), inbound.ImagePaths...), 654 658 WorkspaceDir: workspaceDir, 655 659 SentAt: inbound.SentAt, 656 660 Version: version, ··· 774 778 Text: event.Text, 775 779 SentAt: event.SentAt, 776 780 MentionUsers: append([]string(nil), event.MentionUsers...), 781 + ImagePaths: append([]string(nil), event.ImagePaths...), 777 782 })) 778 783 history[historyScopeKey] = trimChatHistoryItems(cur, slackHistoryCapForMode(groupTriggerMode)) 779 784 mu.Unlock() ··· 823 828 } 824 829 logger.Info("slack_inbound_event", 825 830 "event_type", event.EventType, 831 + "event_subtype", event.EventSubtype, 826 832 "event_id", event.EventID, 827 833 "team_id", event.TeamID, 828 834 "channel_id", event.ChannelID, ··· 830 836 "user_id", event.UserID, 831 837 "message_ts", event.MessageTS, 832 838 "thread_ts", event.ThreadTS, 839 + "image_file_count", len(event.ImageFiles), 833 840 "is_app_mention", event.IsAppMention, 834 841 "is_thread_message", event.IsThreadMessage, 835 842 ) ··· 868 875 } 869 876 event.Username = username 870 877 event.DisplayName = displayName 878 + event.Text = slackImageFallbackText(event.Text, taskRuntimeOpts.ImageRecognitionEnabled, len(event.ImageFiles)) 871 879 handledCommand, cmdErr := maybeHandleSlackCommand(context.Background(), d, inprocBus, workspaceStore, conversationKey, event, botUserID) 872 880 if cmdErr != nil { 873 881 logger.Warn("slack_command_error", ··· 940 948 return nil 941 949 } 942 950 if !accepted { 951 + logger.Info("slack_group_ignored", 952 + "team_id", event.TeamID, 953 + "channel_id", event.ChannelID, 954 + "message_ts", event.MessageTS, 955 + "thread_ts", event.ThreadTS, 956 + "text_len", len(event.Text), 957 + "image_file_count", len(event.ImageFiles), 958 + "llm_attempted", dec.AddressingLLMAttempted, 959 + "llm_ok", dec.AddressingLLMOK, 960 + "llm_addressed", dec.Addressing.Addressed, 961 + "confidence", dec.Addressing.Confidence, 962 + "wanna_interject", dec.Addressing.WannaInterject, 963 + "interject", dec.Addressing.Interject, 964 + "impulse", dec.Addressing.Impulse, 965 + "is_lightweight", dec.Addressing.IsLightweight, 966 + "reason", dec.Reason, 967 + ) 943 968 if strings.EqualFold(groupTriggerMode, "talkative") { 944 969 appendIgnoredInboundHistory(event) 945 970 } ··· 947 972 } 948 973 event.ThreadTS = quoteReplyThreadTSForGroupTrigger(event, dec) 949 974 } 975 + if taskRuntimeOpts.ImageRecognitionEnabled && len(event.ImageFiles) > 0 { 976 + imageCacheDir := slackImageCacheDir(fileCacheDir) 977 + for _, file := range event.ImageFiles { 978 + if len(event.ImagePaths) >= slackLLMMaxImages { 979 + break 980 + } 981 + imageCtx, cancelImage := slackImageDownloadContext(context.Background()) 982 + path, imageErr := downloadSlackImageToCache(imageCtx, api, imageCacheDir, file, slackLLMMaxImageBytes) 983 + cancelImage() 984 + if imageErr != nil { 985 + logger.Warn("slack_image_download_failed", 986 + "team_id", event.TeamID, 987 + "channel_id", event.ChannelID, 988 + "message_ts", event.MessageTS, 989 + "file_id", strings.TrimSpace(file.ID), 990 + "error", imageErr.Error(), 991 + ) 992 + continue 993 + } 994 + event.ImagePaths = append(event.ImagePaths, path) 995 + } 996 + if len(event.ImagePaths) == 0 { 997 + event.Text = appendSlackImageReadFailure(event.Text) 998 + } 999 + } 950 1000 951 1001 accepted, err := slackInboundAdapter.HandleInboundMessage(context.Background(), slackbus.InboundMessage{ 952 1002 TeamID: event.TeamID, ··· 961 1011 SentAt: event.SentAt, 962 1012 MentionUsers: append([]string(nil), event.MentionUsers...), 963 1013 EventID: event.EventID, 1014 + ImagePaths: append([]string(nil), event.ImagePaths...), 964 1015 }) 965 1016 if err != nil { 966 1017 logger.Warn("slack_bus_publish_error", "channel_id", event.ChannelID, "message_ts", event.MessageTS, "bus_error_code", busErrorCodeString(err), "error", err.Error())
+2
internal/channelruntime/slack/runtime_options.go
··· 31 31 MemoryShortTermDays int 32 32 MemoryInjectionEnabled bool 33 33 MemoryInjectionMaxItems int 34 + ImageRecognitionEnabled bool 34 35 InspectPrompt bool 35 36 InspectRequest bool 36 37 TaskStore daemonruntime.TaskView ··· 64 65 MemoryShortTermDays: opts.MemoryShortTermDays, 65 66 MemoryInjectionEnabled: opts.MemoryInjectionEnabled, 66 67 MemoryInjectionMaxItems: opts.MemoryInjectionMaxItems, 68 + ImageRecognitionEnabled: opts.ImageRecognitionEnabled, 67 69 InspectPrompt: opts.InspectPrompt, 68 70 InspectRequest: opts.InspectRequest, 69 71 TaskStore: opts.TaskStore,
+37 -4
internal/channelruntime/slack/runtime_task.go
··· 3 3 import ( 4 4 "context" 5 5 "fmt" 6 + "log/slog" 6 7 "strings" 7 8 "time" 8 9 9 10 "github.com/google/uuid" 10 11 "github.com/quailyquaily/mistermorph/agent" 11 12 busruntime "github.com/quailyquaily/mistermorph/internal/bus" 13 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 12 14 "github.com/quailyquaily/mistermorph/internal/channelruntime/taskruntime" 13 15 "github.com/quailyquaily/mistermorph/internal/chathistory" 14 16 "github.com/quailyquaily/mistermorph/internal/idempotency" ··· 30 32 MemoryEnabled bool 31 33 MemoryInjectionEnabled bool 32 34 MemoryInjectionMaxItems int 35 + ImageRecognitionEnabled bool 33 36 MemoryOrchestrator *memoryruntime.Orchestrator 34 37 MemoryProjectionWorker *memoryruntime.ProjectionWorker 35 38 } ··· 59 62 if task == "" { 60 63 return nil, nil, nil, nil, fmt.Errorf("empty slack task") 61 64 } 62 - historyMsg, currentMsg, err := buildSlackPromptMessages(history, job) 65 + mainRoute, err := rt.ResolveMainRouteForRun() 66 + if err != nil { 67 + return nil, nil, nil, nil, err 68 + } 69 + mainModel := strings.TrimSpace(mainRoute.ClientConfig.Model) 70 + historyMsg, currentMsg, err := buildSlackPromptMessages(history, job, mainModel, runtimeOpts.ImageRecognitionEnabled, logger) 63 71 if err != nil { 64 72 return nil, nil, nil, nil, err 65 73 } ··· 130 138 } 131 139 result, err := rt.Run(ctx, taskruntime.RunRequest{ 132 140 Task: task, 141 + Model: mainModel, 133 142 Scene: "slack.loop", 134 143 History: llmHistory, 135 144 Meta: meta, ··· 165 174 return result.Final, result.Context, result.LoadedSkills, reaction, nil 166 175 } 167 176 168 - func buildSlackPromptMessages(history []chathistory.ChatHistoryItem, job slackJob) (*llm.Message, *llm.Message, error) { 177 + func buildSlackPromptMessages(history []chathistory.ChatHistoryItem, job slackJob, model string, imageRecognitionEnabled bool, logger *slog.Logger) (*llm.Message, *llm.Message, error) { 169 178 historyRaw, err := chathistory.RenderHistoryContext(chathistory.ChannelSlack, history) 170 179 if err != nil { 171 180 return nil, nil, fmt.Errorf("render slack history context: %w", err) ··· 179 188 if err != nil { 180 189 return nil, nil, fmt.Errorf("render slack current message: %w", err) 181 190 } 182 - current := llm.Message{Role: "user", Content: currentRaw} 191 + imagePaths := append([]string(nil), job.ImagePaths...) 192 + if !imageRecognitionEnabled { 193 + imagePaths = nil 194 + } 195 + current, err := imageinput.BuildUserMessage(currentRaw, model, imagePaths, imageinput.MessageOptions{ 196 + MaxImages: slackLLMMaxImages, 197 + MaxBytes: slackLLMMaxImageBytes, 198 + Logger: logger, 199 + LogPrefix: "slack", 200 + }) 201 + if err != nil { 202 + return nil, nil, err 203 + } 183 204 return historyMsg, &current, nil 184 205 } 185 206 ··· 237 258 ReplyToMessageID: strings.TrimSpace(job.ThreadTS), 238 259 SentAt: job.SentAt.UTC(), 239 260 Sender: slackSenderFromJob(job, false, ""), 240 - Text: strings.TrimSpace(job.Text), 261 + Text: slackHistoryText(job.Text, len(job.ImagePaths)), 241 262 } 263 + } 264 + 265 + func slackHistoryText(text string, imageCount int) string { 266 + text = strings.TrimSpace(text) 267 + if imageCount <= 0 { 268 + return text 269 + } 270 + marker := fmt.Sprintf("[image attachments: %d]", imageCount) 271 + if text == "" { 272 + return marker 273 + } 274 + return text + "\n" + marker 242 275 } 243 276 244 277 func newSlackOutboundAgentHistoryItem(job slackJob, output string, sentAt time.Time, botUserID string) chathistory.ChatHistoryItem {
+43 -2
internal/channelruntime/slack/runtime_task_test.go
··· 1 1 package slack 2 2 3 3 import ( 4 + "os" 5 + "path/filepath" 4 6 "strings" 5 7 "testing" 6 8 "time" ··· 113 115 DisplayName: "Alice", 114 116 Text: "latest", 115 117 SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 116 - }) 118 + }, "gpt-5.2", true, nil) 117 119 if err != nil { 118 120 t.Fatalf("buildSlackPromptMessages() error = %v", err) 119 121 } ··· 148 150 DisplayName: "Alice", 149 151 Text: "latest", 150 152 SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 151 - }) 153 + }, "gpt-5.2", false, nil) 152 154 if err != nil { 153 155 t.Fatalf("buildSlackPromptMessages() error = %v", err) 154 156 } ··· 157 159 } 158 160 if currentMsg == nil || !strings.Contains(currentMsg.Content, "\"text\": \"latest\"") { 159 161 t.Fatalf("current message should still be present: %#v", currentMsg) 162 + } 163 + } 164 + 165 + func TestBuildSlackPromptMessagesWithImageParts(t *testing.T) { 166 + t.Parallel() 167 + 168 + dir := t.TempDir() 169 + path := filepath.Join(dir, "image.png") 170 + if err := os.WriteFile(path, []byte("png-data"), 0o600); err != nil { 171 + t.Fatalf("write image: %v", err) 172 + } 173 + 174 + historyMsg, currentMsg, err := buildSlackPromptMessages(nil, slackJob{ 175 + TeamID: "T1", 176 + ChannelID: "C1", 177 + ChatType: "channel", 178 + MessageTS: "102.0001", 179 + ThreadTS: "102.0001", 180 + UserID: "U1", 181 + Username: "alice", 182 + DisplayName: "Alice", 183 + Text: "latest", 184 + ImagePaths: []string{path}, 185 + SentAt: time.Date(2026, 3, 8, 9, 2, 0, 0, time.UTC), 186 + }, "gpt-5.2", true, nil) 187 + if err != nil { 188 + t.Fatalf("buildSlackPromptMessages() error = %v", err) 189 + } 190 + if historyMsg != nil { 191 + t.Fatalf("historyMsg should be nil") 192 + } 193 + if currentMsg == nil { 194 + t.Fatalf("currentMsg = nil") 195 + } 196 + if len(currentMsg.Parts) != 2 { 197 + t.Fatalf("current parts len = %d, want 2", len(currentMsg.Parts)) 198 + } 199 + if currentMsg.Parts[1].MIMEType != "image/png" { 200 + t.Fatalf("image MIME = %q, want image/png", currentMsg.Parts[1].MIMEType) 160 201 } 161 202 } 162 203
+94
internal/channelruntime/slack/runtime_test.go
··· 89 89 } 90 90 } 91 91 92 + func TestParseSlackInboundEventWithImageFile(t *testing.T) { 93 + t.Parallel() 94 + 95 + payload, err := json.Marshal(map[string]any{ 96 + "team_id": "T111", 97 + "event_id": "Ev03", 98 + "event": map[string]any{ 99 + "type": "message", 100 + "subtype": "file_share", 101 + "user": "U111", 102 + "text": "", 103 + "channel": "D222", 104 + "channel_type": "im", 105 + "ts": "1739667600.000100", 106 + "files": []map[string]any{ 107 + { 108 + "id": "F111", 109 + "name": "photo.png", 110 + "mimetype": "image/png", 111 + "url_private_download": "https://files.slack.test/photo.png", 112 + "size": 123, 113 + }, 114 + { 115 + "id": "F222", 116 + "name": "note.txt", 117 + "mimetype": "text/plain", 118 + "url_private_download": "https://files.slack.test/note.txt", 119 + }, 120 + }, 121 + }, 122 + }) 123 + if err != nil { 124 + t.Fatalf("json.Marshal() error = %v", err) 125 + } 126 + event, ok, err := parseSlackInboundEvent(slackSocketEnvelope{ 127 + Type: "events_api", 128 + Payload: payload, 129 + }, "U999") 130 + if err != nil { 131 + t.Fatalf("parseSlackInboundEvent() error = %v", err) 132 + } 133 + if !ok { 134 + t.Fatalf("parseSlackInboundEvent() ok=false, want true") 135 + } 136 + if len(event.ImageFiles) != 1 { 137 + t.Fatalf("image files len = %d, want 1", len(event.ImageFiles)) 138 + } 139 + if event.EventSubtype != "file_share" { 140 + t.Fatalf("event subtype = %q, want file_share", event.EventSubtype) 141 + } 142 + if event.ImageFiles[0].ID != "F111" { 143 + t.Fatalf("image file id = %q, want F111", event.ImageFiles[0].ID) 144 + } 145 + } 146 + 147 + func TestParseSlackInboundEventIgnoresNonImageFileShare(t *testing.T) { 148 + t.Parallel() 149 + 150 + payload, err := json.Marshal(map[string]any{ 151 + "team_id": "T111", 152 + "event_id": "Ev04", 153 + "event": map[string]any{ 154 + "type": "message", 155 + "subtype": "file_share", 156 + "user": "U111", 157 + "text": "", 158 + "channel": "D222", 159 + "channel_type": "im", 160 + "ts": "1739667600.000100", 161 + "files": []map[string]any{ 162 + { 163 + "id": "F222", 164 + "name": "note.txt", 165 + "mimetype": "text/plain", 166 + "url_private_download": "https://files.slack.test/note.txt", 167 + }, 168 + }, 169 + }, 170 + }) 171 + if err != nil { 172 + t.Fatalf("json.Marshal() error = %v", err) 173 + } 174 + _, ok, err := parseSlackInboundEvent(slackSocketEnvelope{ 175 + Type: "events_api", 176 + Payload: payload, 177 + }, "U999") 178 + if err != nil { 179 + t.Fatalf("parseSlackInboundEvent() error = %v", err) 180 + } 181 + if ok { 182 + t.Fatalf("parseSlackInboundEvent() ok=true, want false") 183 + } 184 + } 185 + 92 186 func TestDecideSlackGroupTrigger_Strict(t *testing.T) { 93 187 t.Parallel() 94 188
+116 -14
internal/channelruntime/slack/socket_events.go
··· 32 32 } 33 33 34 34 type slackEvent struct { 35 - Type string `json:"type,omitempty"` 36 - Subtype string `json:"subtype,omitempty"` 37 - User string `json:"user,omitempty"` 38 - Text string `json:"text,omitempty"` 39 - Channel string `json:"channel,omitempty"` 40 - ChannelType string `json:"channel_type,omitempty"` 41 - TS string `json:"ts,omitempty"` 42 - ThreadTS string `json:"thread_ts,omitempty"` 43 - BotID string `json:"bot_id,omitempty"` 44 - Team string `json:"team,omitempty"` 45 - EventTS string `json:"event_ts,omitempty"` 35 + Type string `json:"type,omitempty"` 36 + Subtype string `json:"subtype,omitempty"` 37 + User string `json:"user,omitempty"` 38 + Text string `json:"text,omitempty"` 39 + Channel string `json:"channel,omitempty"` 40 + ChannelType string `json:"channel_type,omitempty"` 41 + TS string `json:"ts,omitempty"` 42 + ThreadTS string `json:"thread_ts,omitempty"` 43 + BotID string `json:"bot_id,omitempty"` 44 + Team string `json:"team,omitempty"` 45 + EventTS string `json:"event_ts,omitempty"` 46 + Files []slackEventFile `json:"files,omitempty"` 47 + } 48 + 49 + type slackEventFile struct { 50 + ID string `json:"id,omitempty"` 51 + Name string `json:"name,omitempty"` 52 + Title string `json:"title,omitempty"` 53 + Mimetype string `json:"mimetype,omitempty"` 54 + Filetype string `json:"filetype,omitempty"` 55 + URLPrivate string `json:"url_private,omitempty"` 56 + URLPrivateDownload string `json:"url_private_download,omitempty"` 57 + Size int64 `json:"size,omitempty"` 46 58 } 47 59 48 60 type slackInboundEvent struct { 49 61 EventType string 62 + EventSubtype string 50 63 TeamID string 51 64 ChannelID string 52 65 ChatType string ··· 59 72 EventID string 60 73 SentAt time.Time 61 74 MentionUsers []string 75 + ImageFiles []slackEventFile 76 + ImagePaths []string 62 77 IsAppMention bool 63 78 IsThreadMessage bool 64 79 } ··· 112 127 return slackInboundEvent{}, false, nil 113 128 } 114 129 subtype := strings.TrimSpace(event.Subtype) 115 - if subtype != "" { 130 + text := strings.TrimSpace(event.Text) 131 + imageFiles := slackImageFilesFromEvent(event.Files) 132 + if !acceptSlackMessageSubtype(subtype, imageFiles) { 116 133 return slackInboundEvent{}, false, nil 117 134 } 118 135 if strings.TrimSpace(event.BotID) != "" { ··· 133 150 if messageTS == "" { 134 151 return slackInboundEvent{}, false, nil 135 152 } 136 - text := strings.TrimSpace(event.Text) 137 - if text == "" { 153 + if text == "" && len(imageFiles) == 0 { 138 154 return slackInboundEvent{}, false, nil 139 155 } 140 156 teamID := strings.TrimSpace(payload.TeamID) ··· 158 174 159 175 return slackInboundEvent{ 160 176 EventType: eventType, 177 + EventSubtype: subtype, 161 178 TeamID: teamID, 162 179 ChannelID: channelID, 163 180 ChatType: chatType, ··· 168 185 EventID: strings.TrimSpace(payload.EventID), 169 186 SentAt: sentAt, 170 187 MentionUsers: collectSlackMentionUsers(text), 188 + ImageFiles: imageFiles, 171 189 IsAppMention: isAppMention, 172 190 IsThreadMessage: strings.TrimSpace(event.ThreadTS) != "", 173 191 }, true, nil 192 + } 193 + 194 + func acceptSlackMessageSubtype(subtype string, imageFiles []slackEventFile) bool { 195 + subtype = strings.TrimSpace(subtype) 196 + if subtype == "" { 197 + return true 198 + } 199 + return subtype == "file_share" && len(imageFiles) > 0 200 + } 201 + 202 + func slackImageFilesFromEvent(files []slackEventFile) []slackEventFile { 203 + if len(files) == 0 { 204 + return nil 205 + } 206 + out := make([]slackEventFile, 0, len(files)) 207 + seen := make(map[string]bool, len(files)) 208 + for _, file := range files { 209 + id := strings.TrimSpace(file.ID) 210 + url := slackFileDownloadURL(file) 211 + mimeType := slackFileMIMEType(file) 212 + if url == "" || !strings.HasPrefix(mimeType, "image/") { 213 + continue 214 + } 215 + key := id 216 + if key == "" { 217 + key = url 218 + } 219 + if seen[key] { 220 + continue 221 + } 222 + seen[key] = true 223 + out = append(out, file) 224 + } 225 + return out 226 + } 227 + 228 + func slackFileDownloadURL(file slackEventFile) string { 229 + if url := strings.TrimSpace(file.URLPrivateDownload); url != "" { 230 + return url 231 + } 232 + return strings.TrimSpace(file.URLPrivate) 233 + } 234 + 235 + func slackFileMIMEType(file slackEventFile) string { 236 + mimeType := strings.TrimSpace(strings.ToLower(file.Mimetype)) 237 + if idx := strings.Index(mimeType, ";"); idx >= 0 { 238 + mimeType = strings.TrimSpace(mimeType[:idx]) 239 + } 240 + if mimeType != "" { 241 + return mimeType 242 + } 243 + switch strings.ToLower(strings.TrimSpace(file.Filetype)) { 244 + case "jpg", "jpeg": 245 + return "image/jpeg" 246 + case "png": 247 + return "image/png" 248 + case "webp": 249 + return "image/webp" 250 + case "gif": 251 + return "image/gif" 252 + } 253 + for _, name := range []string{file.Name, file.Title, slackFileDownloadURL(file)} { 254 + mimeType = slackImageMIMEFromName(name) 255 + if mimeType != "" { 256 + return mimeType 257 + } 258 + } 259 + return "" 260 + } 261 + 262 + func slackImageMIMEFromName(name string) string { 263 + name = strings.ToLower(strings.TrimSpace(name)) 264 + switch { 265 + case strings.HasSuffix(name, ".jpg"), strings.HasSuffix(name, ".jpeg"): 266 + return "image/jpeg" 267 + case strings.HasSuffix(name, ".png"): 268 + return "image/png" 269 + case strings.HasSuffix(name, ".webp"): 270 + return "image/webp" 271 + case strings.HasSuffix(name, ".gif"): 272 + return "image/gif" 273 + default: 274 + return "" 275 + } 174 276 } 175 277 176 278 func isSlackGroupChat(chatType string) bool {
+19 -113
internal/channelruntime/telegram/runtime_task.go
··· 3 3 import ( 4 4 "bytes" 5 5 "context" 6 - "encoding/base64" 7 6 "fmt" 8 7 "image" 9 8 _ "image/gif" ··· 11 10 _ "image/png" 12 11 "io" 13 12 "log/slog" 14 - "os" 15 - "path/filepath" 16 13 "strings" 17 14 "time" 18 15 19 16 "github.com/nickalie/go-webpbin" 20 17 "github.com/quailyquaily/mistermorph/agent" 21 18 busruntime "github.com/quailyquaily/mistermorph/internal/bus" 19 + "github.com/quailyquaily/mistermorph/internal/channelruntime/imageinput" 22 20 "github.com/quailyquaily/mistermorph/internal/channelruntime/taskruntime" 23 21 "github.com/quailyquaily/mistermorph/internal/chathistory" 24 22 "github.com/quailyquaily/mistermorph/internal/llmstats" ··· 208 206 } 209 207 var historyMsg *llm.Message 210 208 if strings.TrimSpace(historyRaw) != "" { 211 - msg, buildErr := buildTelegramHistoryMessage(historyRaw, model, nil, logger) 212 - if buildErr != nil { 213 - return nil, nil, buildErr 214 - } 209 + msg := llm.Message{Role: "user", Content: historyRaw} 215 210 historyMsg = &msg 216 211 } 217 212 ··· 285 280 return strings.TrimSpace(plan.Steps[index].Step) 286 281 } 287 282 288 - func buildTelegramHistoryMessage(content string, model string, imagePaths []string, logger *slog.Logger) (llm.Message, error) { 289 - return buildTelegramCurrentMessage(content, model, imagePaths, logger) 290 - } 291 - 292 283 func buildTelegramCurrentMessage(content string, model string, imagePaths []string, logger *slog.Logger) (llm.Message, error) { 293 - msg := llm.Message{Role: "user", Content: content} 294 - if !llm.ModelSupportsImageParts(model) { 295 - return msg, nil 296 - } 297 - if len(imagePaths) == 0 { 298 - return msg, nil 299 - } 300 - parts := make([]llm.Part, 0, 1+min(len(imagePaths), telegramLLMMaxImages)) 301 - if strings.TrimSpace(content) != "" { 302 - parts = append(parts, llm.Part{Type: llm.PartTypeText, Text: content}) 303 - } 304 - 305 - enableWebPTranscode := llm.ModelSupportsWebPTranscode(model) 306 - seen := make(map[string]bool, len(imagePaths)) 307 - imageCount := 0 308 - for _, rawPath := range imagePaths { 309 - if imageCount >= telegramLLMMaxImages { 310 - break 311 - } 312 - path := strings.TrimSpace(rawPath) 313 - if path == "" || seen[path] { 314 - continue 315 - } 316 - seen[path] = true 317 - 318 - info, err := os.Stat(path) 319 - if err != nil { 320 - if logger != nil { 321 - logger.Warn("telegram_image_part_skip", "path", path, "error", err.Error()) 322 - } 323 - continue 324 - } 325 - if info.Size() <= 0 { 326 - continue 327 - } 328 - if info.Size() > telegramLLMMaxImageBytes { 329 - return llm.Message{}, fmt.Errorf("图片太大: %s (%d bytes > %d bytes)", filepath.Base(path), info.Size(), telegramLLMMaxImageBytes) 330 - } 331 - 332 - raw, err := os.ReadFile(path) 333 - if err != nil { 334 - if logger != nil { 335 - logger.Warn("telegram_image_part_read_error", "path", path, "error", err.Error()) 336 - } 337 - continue 338 - } 339 - mimeType := telegramImageMIMEType(path) 340 - if !isTelegramSupportedUploadImageMIME(mimeType) { 341 - if logger != nil { 342 - logger.Warn("telegram_image_part_skip_unsupported_format", "path", path, "mime_type", mimeType) 343 - } 344 - continue 345 - } 346 - if enableWebPTranscode && shouldTelegramTranscodeToWebP(mimeType) { 347 - webpRaw, webpErr := encodeImageToWebP(raw) 348 - if webpErr != nil { 349 - return llm.Message{}, fmt.Errorf("图片转换失败: %s: %w", filepath.Base(path), webpErr) 284 + var transcode imageinput.TranscodeFunc 285 + if llm.ModelSupportsWebPTranscode(model) { 286 + transcode = func(raw []byte, mimeType string) ([]byte, string, error) { 287 + if shouldTelegramTranscodeToWebP(mimeType) { 288 + webpRaw, err := encodeImageToWebP(raw) 289 + if err != nil { 290 + return nil, "", err 291 + } 292 + return webpRaw, "image/webp", nil 350 293 } 351 - raw = webpRaw 352 - mimeType = "image/webp" 294 + return raw, mimeType, nil 353 295 } 354 - 355 - parts = append(parts, llm.Part{ 356 - Type: llm.PartTypeImageBase64, 357 - MIMEType: mimeType, 358 - DataBase64: base64.StdEncoding.EncodeToString(raw), 359 - }) 360 - imageCount++ 361 296 } 362 - if imageCount == 0 { 363 - return msg, nil 364 - } 365 - msg.Parts = parts 366 - return msg, nil 367 - } 368 - 369 - func telegramImageMIMEType(path string) string { 370 - ext := strings.ToLower(strings.TrimSpace(filepath.Ext(path))) 371 - switch ext { 372 - case ".jpg", ".jpeg": 373 - return "image/jpeg" 374 - case ".png": 375 - return "image/png" 376 - case ".webp": 377 - return "image/webp" 378 - case ".gif": 379 - return "image/gif" 380 - case ".bmp": 381 - return "image/bmp" 382 - case ".heic": 383 - return "image/heic" 384 - case ".heif": 385 - return "image/heif" 386 - } 387 - return "image/png" 388 - } 389 - 390 - func isTelegramSupportedUploadImageMIME(mimeType string) bool { 391 - mimeType = strings.ToLower(strings.TrimSpace(mimeType)) 392 - switch mimeType { 393 - case "image/jpeg", "image/png", "image/webp": 394 - return true 395 - default: 396 - return false 397 - } 297 + return imageinput.BuildUserMessage(content, model, imagePaths, imageinput.MessageOptions{ 298 + MaxImages: telegramLLMMaxImages, 299 + MaxBytes: telegramLLMMaxImageBytes, 300 + Logger: logger, 301 + LogPrefix: "telegram", 302 + Transcode: transcode, 303 + }) 398 304 } 399 305 400 306 func shouldTelegramTranscodeToWebP(mimeType string) bool {
+25 -21
internal/channelruntime/telegram/runtime_task_test.go
··· 181 181 } 182 182 } 183 183 184 - func TestBuildTelegramHistoryMessageWithImageParts(t *testing.T) { 184 + func TestBuildTelegramCurrentMessageWithImageParts(t *testing.T) { 185 185 orig := encodeImageToWebP 186 186 encodeImageToWebP = func(raw []byte) ([]byte, error) { return []byte("webp-bytes"), nil } 187 187 t.Cleanup(func() { encodeImageToWebP = orig }) ··· 192 192 t.Fatalf("WriteFile() error = %v", err) 193 193 } 194 194 195 - msg, err := buildTelegramHistoryMessage("history", "gpt-5.2", []string{imgPath}, nil) 195 + msg, err := buildTelegramCurrentMessage("history", "gpt-5.2", []string{imgPath}, nil) 196 196 if err != nil { 197 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 197 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 198 198 } 199 199 if msg.Role != "user" { 200 200 t.Fatalf("role = %q, want user", msg.Role) ··· 295 295 } 296 296 } 297 297 298 - func TestBuildTelegramHistoryMessageSkipsMissingAndCapsCount(t *testing.T) { 298 + func TestBuildTelegramCurrentMessageSkipsMissingAndCapsCount(t *testing.T) { 299 299 dir := t.TempDir() 300 300 paths := make([]string, 0, 4) 301 301 for i := 0; i < 4; i++ { ··· 306 306 paths = append(paths, path) 307 307 } 308 308 309 - msg, err := buildTelegramHistoryMessage("history", "grok-4", append([]string{"/missing.png"}, paths...), nil) 309 + msg, err := buildTelegramCurrentMessage("history", "grok-4", append([]string{"/missing.png"}, paths...), nil) 310 310 if err != nil { 311 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 311 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 312 312 } 313 313 if len(msg.Parts) != 4 { 314 314 t.Fatalf("parts len = %d, want 4 (1 text + 3 images)", len(msg.Parts)) ··· 323 323 } 324 324 } 325 325 326 - func TestBuildTelegramHistoryMessageUnsupportedModelSkipsImageParts(t *testing.T) { 326 + func TestBuildTelegramCurrentMessageUnsupportedModelSkipsImageParts(t *testing.T) { 327 327 dir := t.TempDir() 328 328 imgPath := filepath.Join(dir, "x.jpg") 329 329 if err := os.WriteFile(imgPath, []byte("abc"), 0o600); err != nil { 330 330 t.Fatalf("WriteFile() error = %v", err) 331 331 } 332 332 333 - msg, err := buildTelegramHistoryMessage("history", "qwen-max", []string{imgPath}, nil) 333 + msg, err := buildTelegramCurrentMessage("history", "qwen-max", []string{imgPath}, nil) 334 334 if err != nil { 335 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 335 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 336 336 } 337 337 if len(msg.Parts) != 0 { 338 338 t.Fatalf("parts len = %d, want 0", len(msg.Parts)) ··· 342 342 } 343 343 } 344 344 345 - func TestBuildTelegramHistoryMessageReturnsErrorWhenImageTooLarge(t *testing.T) { 345 + func TestBuildTelegramCurrentMessageReturnsErrorWhenImageTooLarge(t *testing.T) { 346 346 orig := encodeImageToWebP 347 347 encodeImageToWebP = func(raw []byte) ([]byte, error) { return raw, nil } 348 348 t.Cleanup(func() { encodeImageToWebP = orig }) ··· 359 359 } 360 360 _ = f.Close() 361 361 362 - _, err = buildTelegramHistoryMessage("history", "gpt-5.2", []string{imgPath}, nil) 362 + _, err = buildTelegramCurrentMessage("history", "gpt-5.2", []string{imgPath}, nil) 363 363 if err == nil { 364 - t.Fatalf("buildTelegramHistoryMessage() expected error") 364 + t.Fatalf("buildTelegramCurrentMessage() expected error") 365 365 } 366 366 if !strings.Contains(err.Error(), "图片太大") { 367 367 t.Fatalf("error = %q, want contains 图片太大", err.Error()) 368 368 } 369 369 } 370 370 371 - func TestBuildTelegramHistoryMessageUsesWebPForSupportedModel(t *testing.T) { 371 + func TestBuildTelegramCurrentMessageUsesWebPForSupportedModel(t *testing.T) { 372 372 orig := encodeImageToWebP 373 373 encodeImageToWebP = func(raw []byte) ([]byte, error) { return []byte("webp-bytes"), nil } 374 374 t.Cleanup(func() { encodeImageToWebP = orig }) ··· 379 379 t.Fatalf("WriteFile() error = %v", err) 380 380 } 381 381 382 - msg, err := buildTelegramHistoryMessage("history", "gpt-5.2", []string{imgPath}, nil) 382 + msg, err := buildTelegramCurrentMessage("history", "gpt-5.2", []string{imgPath}, nil) 383 383 if err != nil { 384 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 384 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 385 385 } 386 386 if len(msg.Parts) != 2 { 387 387 t.Fatalf("parts len = %d, want 2", len(msg.Parts)) ··· 394 394 } 395 395 } 396 396 397 - func TestBuildTelegramHistoryMessageDoesNotForceWebPForUnsupportedModel(t *testing.T) { 397 + func TestBuildTelegramCurrentMessageDoesNotForceWebPForUnsupportedModel(t *testing.T) { 398 398 orig := encodeImageToWebP 399 399 encodeImageToWebP = func(raw []byte) ([]byte, error) { return []byte("unexpected"), nil } 400 400 t.Cleanup(func() { encodeImageToWebP = orig }) ··· 405 405 t.Fatalf("WriteFile() error = %v", err) 406 406 } 407 407 408 - msg, err := buildTelegramHistoryMessage("history", "grok-4", []string{imgPath}, nil) 408 + msg, err := buildTelegramCurrentMessage("history", "grok-4", []string{imgPath}, nil) 409 409 if err != nil { 410 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 410 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 411 411 } 412 412 if len(msg.Parts) != 2 { 413 413 t.Fatalf("parts len = %d, want 2", len(msg.Parts)) ··· 420 420 } 421 421 } 422 422 423 - func TestBuildTelegramHistoryMessageSkipsUnsupportedImageFormats(t *testing.T) { 423 + func TestBuildTelegramCurrentMessageSkipsUnsupportedImageFormats(t *testing.T) { 424 424 orig := encodeImageToWebP 425 425 encodeImageToWebP = func(raw []byte) ([]byte, error) { return []byte("unexpected"), nil } 426 426 t.Cleanup(func() { encodeImageToWebP = orig }) ··· 430 430 if err := os.WriteFile(gifPath, []byte("gif-bytes"), 0o600); err != nil { 431 431 t.Fatalf("WriteFile() error = %v", err) 432 432 } 433 + unknownPath := filepath.Join(dir, "x.bin") 434 + if err := os.WriteFile(unknownPath, []byte("unknown-bytes"), 0o600); err != nil { 435 + t.Fatalf("WriteFile() error = %v", err) 436 + } 433 437 434 - msg, err := buildTelegramHistoryMessage("history", "gpt-5.2", []string{gifPath}, nil) 438 + msg, err := buildTelegramCurrentMessage("history", "gpt-5.2", []string{gifPath, unknownPath}, nil) 435 439 if err != nil { 436 - t.Fatalf("buildTelegramHistoryMessage() error = %v", err) 440 + t.Fatalf("buildTelegramCurrentMessage() error = %v", err) 437 441 } 438 442 if len(msg.Parts) != 0 { 439 443 t.Fatalf("parts len = %d, want 0", len(msg.Parts))