README Embedding Feature#
Overview#
Enhance the repository page (/r/{handle}/{repository}) with embedded README content fetched from the source repository, similar to Docker Hub's "Overview" tab.
Current State#
The repository page currently shows:
- Repository metadata from OCI annotations
- Short description from
org.opencontainers.image.description - External links to source (
org.opencontainers.image.source) and docs (org.opencontainers.image.documentation) - Tags and manifests lists
Proposed Feature#
Automatically fetch and render README.md content from the source repository when available, displaying it in an "Overview" section on the repository page.
Implementation Approach#
1. Source URL Detection#
Parse org.opencontainers.image.source annotation to detect GitHub repositories:
- Pattern:
https://github.com/{owner}/{repo} - Extract owner and repo name
2. README Fetching#
Fetch README.md from GitHub via raw content URL:
https://raw.githubusercontent.com/{owner}/{repo}/{branch}/README.md
Try multiple branch names in order:
mainmasterdevelop
Fallback if README not found or fetch fails.
3. Markdown Rendering#
Use a Go markdown library to render README content:
- Option A:
github.com/gomarkdown/markdown- Pure Go, fast - Option B:
github.com/yuin/goldmark- CommonMark compliant, extensible - Option C: Call GitHub's markdown API (requires network call)
Recommended: goldmark for CommonMark compliance and GitHub-flavored markdown support.
4. Caching Strategy#
Cache rendered README to avoid repeated fetches:
Option A: In-memory cache
- Simple, fast
- Lost on restart
- Good for MVP
Option B: Database cache
- Add
readme_htmlcolumn tomanifeststable - Update on new manifest pushes
- Persistent across restarts
- Background job to refresh periodically
Option C: Hybrid
- Cache in database
- Also cache in memory for frequently accessed repos
- TTL-based refresh (e.g., 1 hour)
5. UI Integration#
Add "Overview" section to repository page:
- Show after repository header, before tags/manifests
- Render markdown as HTML
- Apply CSS styling for markdown elements (headings, code blocks, tables, etc.)
- Handle images in README (may need to proxy or allow external images)
Implementation Steps#
-
Add README fetcher (
pkg/appview/readme/fetcher.go)type Fetcher struct { httpClient *http.Client cache Cache } func (f *Fetcher) FetchGitHubReadme(sourceURL string) (string, error) func (f *Fetcher) RenderMarkdown(content string) (string, error) -
Update database schema (optional, for caching)
ALTER TABLE manifests ADD COLUMN readme_html TEXT; ALTER TABLE manifests ADD COLUMN readme_fetched_at TIMESTAMP; -
Update RepositoryPageHandler
- Fetch README for repository
- Pass rendered HTML to template
-
Update repository.html template
- Add "Overview" section
- Render HTML safely (use
template.HTML)
-
Add markdown CSS
- Style headings, code blocks, lists, tables
- Syntax highlighting for code blocks (optional)
Security Considerations#
-
XSS Prevention
- Sanitize HTML output from markdown renderer
- Use
bluemondayor similar HTML sanitizer - Only allow safe HTML elements and attributes
-
Rate Limiting
- Cache aggressively to avoid hitting GitHub rate limits
- Consider GitHub API instead of raw content (requires token but higher limits)
- Handle 429 responses gracefully
-
Image Handling
- README may contain images with relative URLs
- Options:
- Rewrite image URLs to absolute GitHub URLs
- Proxy images through ATCR (caching, security)
- Block external images (simplest, but breaks many READMEs)
-
Content Size
- Limit README size (e.g., 1MB max)
- Truncate very long READMEs with "View on GitHub" link
Future Enhancements#
-
Support other platforms
- GitLab:
https://gitlab.com/{owner}/{repo}/-/raw/{branch}/README.md - Gitea/Forgejo
- Bitbucket
- GitLab:
-
Custom README upload
- Allow users to upload custom README via UI
- Store in PDS as
io.atcr.readmerecord - Priority: custom > source repo
-
Automatic updates
- Background job to refresh READMEs periodically
- Webhook support to update on push to source repo
-
Syntax highlighting
- Use highlight.js or similar for code blocks
- Support multiple languages
Example Flow#
- User pushes image with label:
org.opencontainers.image.source=https://github.com/alice/myapp - Manifest stored with source URL annotation
- User visits
/r/alice/myapp - RepositoryPageHandler:
- Checks cache for README
- If not cached or expired:
- Fetches
https://raw.githubusercontent.com/alice/myapp/main/README.md - Renders markdown to HTML
- Sanitizes HTML
- Caches result
- Fetches
- Passes README HTML to template
- Template renders Overview section with README content
Dependencies#
// Markdown rendering
github.com/yuin/goldmark v1.6.0
github.com/yuin/goldmark-emoji v1.0.2 // GitHub emoji support
// HTML sanitization
github.com/microcosm-cc/bluemonday v1.0.26