# README Embedding Feature ## Overview Enhance the repository page (`/r/{handle}/{repository}`) with embedded README content fetched from the source repository, similar to Docker Hub's "Overview" tab. ## Current State The repository page currently shows: - Repository metadata from OCI annotations - Short description from `org.opencontainers.image.description` - External links to source (`org.opencontainers.image.source`) and docs (`org.opencontainers.image.documentation`) - Tags and manifests lists ## Proposed Feature Automatically fetch and render README.md content from the source repository when available, displaying it in an "Overview" section on the repository page. ## Implementation Approach ### 1. Source URL Detection Parse `org.opencontainers.image.source` annotation to detect GitHub repositories: - Pattern: `https://github.com/{owner}/{repo}` - Extract owner and repo name ### 2. README Fetching Fetch README.md from GitHub via raw content URL: ``` https://raw.githubusercontent.com/{owner}/{repo}/{branch}/README.md ``` Try multiple branch names in order: 1. `main` 2. `master` 3. `develop` Fallback if README not found or fetch fails. ### 3. Markdown Rendering Use a Go markdown library to render README content: - **Option A**: `github.com/gomarkdown/markdown` - Pure Go, fast - **Option B**: `github.com/yuin/goldmark` - CommonMark compliant, extensible - **Option C**: Call GitHub's markdown API (requires network call) Recommended: `goldmark` for CommonMark compliance and GitHub-flavored markdown support. ### 4. Caching Strategy Cache rendered README to avoid repeated fetches: **Option A: In-memory cache** - Simple, fast - Lost on restart - Good for MVP **Option B: Database cache** - Add `readme_html` column to `manifests` table - Update on new manifest pushes - Persistent across restarts - Background job to refresh periodically **Option C: Hybrid** - Cache in database - Also cache in memory for frequently accessed repos - TTL-based refresh (e.g., 1 hour) ### 5. UI Integration Add "Overview" section to repository page: - Show after repository header, before tags/manifests - Render markdown as HTML - Apply CSS styling for markdown elements (headings, code blocks, tables, etc.) - Handle images in README (may need to proxy or allow external images) ## Implementation Steps 1. **Add README fetcher** (`pkg/appview/readme/fetcher.go`) ```go type Fetcher struct { httpClient *http.Client cache Cache } func (f *Fetcher) FetchGitHubReadme(sourceURL string) (string, error) func (f *Fetcher) RenderMarkdown(content string) (string, error) ``` 2. **Update database schema** (optional, for caching) ```sql ALTER TABLE manifests ADD COLUMN readme_html TEXT; ALTER TABLE manifests ADD COLUMN readme_fetched_at TIMESTAMP; ``` 3. **Update RepositoryPageHandler** - Fetch README for repository - Pass rendered HTML to template 4. **Update repository.html template** - Add "Overview" section - Render HTML safely (use `template.HTML`) 5. **Add markdown CSS** - Style headings, code blocks, lists, tables - Syntax highlighting for code blocks (optional) ## Security Considerations 1. **XSS Prevention** - Sanitize HTML output from markdown renderer - Use `bluemonday` or similar HTML sanitizer - Only allow safe HTML elements and attributes 2. **Rate Limiting** - Cache aggressively to avoid hitting GitHub rate limits - Consider GitHub API instead of raw content (requires token but higher limits) - Handle 429 responses gracefully 3. **Image Handling** - README may contain images with relative URLs - Options: - Rewrite image URLs to absolute GitHub URLs - Proxy images through ATCR (caching, security) - Block external images (simplest, but breaks many READMEs) 4. **Content Size** - Limit README size (e.g., 1MB max) - Truncate very long READMEs with "View on GitHub" link ## Future Enhancements 1. **Support other platforms** - GitLab: `https://gitlab.com/{owner}/{repo}/-/raw/{branch}/README.md` - Gitea/Forgejo - Bitbucket 2. **Custom README upload** - Allow users to upload custom README via UI - Store in PDS as `io.atcr.readme` record - Priority: custom > source repo 3. **Automatic updates** - Background job to refresh READMEs periodically - Webhook support to update on push to source repo 4. **Syntax highlighting** - Use highlight.js or similar for code blocks - Support multiple languages ## Example Flow 1. User pushes image with label: `org.opencontainers.image.source=https://github.com/alice/myapp` 2. Manifest stored with source URL annotation 3. User visits `/r/alice/myapp` 4. RepositoryPageHandler: - Checks cache for README - If not cached or expired: - Fetches `https://raw.githubusercontent.com/alice/myapp/main/README.md` - Renders markdown to HTML - Sanitizes HTML - Caches result - Passes README HTML to template 5. Template renders Overview section with README content ## Dependencies ```go // Markdown rendering github.com/yuin/goldmark v1.6.0 github.com/yuin/goldmark-emoji v1.0.2 // GitHub emoji support // HTML sanitization github.com/microcosm-cc/bluemonday v1.0.26 ``` ## References - [OCI Image Spec - Annotations](https://github.com/opencontainers/image-spec/blob/main/annotations.md) - [Docker Hub Overview tab behavior](https://hub.docker.com/) - [Goldmark documentation](https://github.com/yuin/goldmark) - [GitHub raw content URLs](https://raw.githubusercontent.com/)