this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

JavaScript 76.6%
Shell 3.5%
Batchfile 2.5%
Other 17.4%
78 1 0

Clone this repository

https://tangled.org/henrick.thebull.app/stagehand https://tangled.org/did:plc:fwoq4y4ddxuoye2io4peagju/stagehand
git@tangled.org:henrick.thebull.app/stagehand git@tangled.org:did:plc:fwoq4y4ddxuoye2io4peagju/stagehand

For self-hosted knots, clone URLs may differ based on your setup.

Download tar.gz
README.md

Stagehand - Telegram Image Queue Bot#

A Telegram bot that takes links from supported websites, extracts images and videos, and queues them for posting to a Telegram channel at scheduled intervals.

Features#

  • Extract images and videos from various supported websites
  • Process forwarded messages containing links (including button links)
  • Queue media for scheduled posting
  • Customizable posting schedule using cron syntax
  • Access control to limit who can use the bot
  • Post media with source attribution and link back to original
  • Modular design for easy addition of new website scrapers
  • Interactive visual queue management with inline buttons
  • Intelligent queue monitoring with automatic alerts
  • Perceptual image hashing for duplicate detection
  • Automatic media cache management with recaching support
  • Scheduled announcements with individual cron schedules
  • Auto-updater for seamless updates from Git repository
  • Discord webhook integration (optional)

Supported Websites#

  • e621 - Uses OpenGraph scraping and Cheerio DOM parsing to extract images and videos
  • FurAffinity - Leverages the FA Export API to fetch submission data and direct download links
  • Bluesky - Uses ATProto API Library with BskyAgent for full support of posts, images, and videos
  • SoFurry - Utilizes SoFurry's own APIs to fetch and download submissions.
  • Weasyl - Implements Weasyl's API, you'll need to supply your own API key from Weasyl

Methodology for Supported Sites#

Bluesky#

  • Uses the official @atproto/api library with BskyAgent
  • Parses URLs in the format bsky.app/profile/{handle}/post/{id}
  • Extracts user DIDs and content identifiers
  • Supports both image and video content extraction
  • Handles thumbnails for video posts
  • Works with public posts without requiring authentication
  • Processes quoted content and multiple images in a single post

e621#

  • Uses Cheerio to parse the HTML DOM of e621 pages
  • Extracts media URLs from OpenGraph tags and direct DOM elements
  • Handles both image and video content
  • Processes and caches media files locally
  • Supports fallback methods if primary extraction fails
  • Ensures proper URL resolution for relative paths

FurAffinity#

  • Extracts submission IDs from URLs in the format furaffinity.net/view/{id}
  • Uses the (FAExport) (API) to fetch submission data
  • Extracts direct download URLs, titles, and artist information
  • Handles both image and video content
  • Preserves proper attribution and metadata

SoFurry#

  • Uses direct access to the SoFurry API with OAuth authentication
  • Extracts submission IDs from URLs
  • Requires manual setup:
    1. Create an application on the SoFurry Developer Portal
    2. Generate an OAuth access token manually
    3. Add the access token to your .env file as SOFURRY_ACCESS_TOKEN
  • Fetches submission data, including display URLs and metadata
  • Supports both image and video content
  • Handles proper attribution with author and title information

Weasyl (Currently Broken)#

  • Implements Weasyl's API for submission data
  • Requires an API key from Weasyl
  • Add your API key to .env as WEASYL_API_KEY
  • Extracts submission information and media URLs

Media Caching and Transcoding#

Stagehand uses a sophisticated media caching and transcoding system to efficiently handle images and videos from various sources:

Media Caching#

  • All downloaded media is cached locally to reduce bandwidth usage and improve performance
  • Files are stored in organized directories:
    • cache/images/ - For static images
    • cache/videos/ - For original video files
    • cache/transcoded/ - For processed/transcoded videos
  • Filenames are generated using MD5 hashes of source URLs to ensure uniqueness
  • Cache is automatically cleaned up, with files older than 15 days (configurable) being removed

Media Processing#

  • Content type detection based on HTTP headers and URL patterns
  • Intelligent fallback mechanisms if metadata is unavailable
  • Special handling for different media sources (e.g., Bluesky API)
  • File extension determination from both URL and content type
  • Maximum download size limit of 50MB to prevent abuse

Video Transcoding#

  • Videos are automatically transcoded to H.264 MP4 format for maximum compatibility with Telegram
  • Uses FFmpeg (via fluent-ffmpeg) with optimized settings:
    • H.264 video codec for wide compatibility
    • AAC audio codec at 128kbps
    • Medium preset balancing quality and processing speed
    • CRF 23 for good quality-to-size ratio
    • MP4 container with faststart flag for immediate playback
    • YUV420p pixel format for maximum device compatibility
  • Animated GIFs are handled appropriately based on content type

This system ensures that all media is properly optimized before being sent to Telegram, providing reliable playback across all devices while managing bandwidth and storage efficiently.

Bot Architecture#

Stagehand uses a modular bot architecture for better maintainability and extensibility:

Modular Design#

  • Commands: Each bot command (/start, /help, /queue, etc.) is implemented as a separate module
  • Helpers: Shared functionality (auth, media posting, queue management) is organized into helper classes
  • Registry System: A central command registry coordinates all modules and their dependencies

Migration Status#

  • Current Version: Modular architecture (active)
  • 📁 Legacy Backup: Original monolithic version saved as telegramBot.js.backup
  • 🔄 Migration Script: Use ./migrate-bot.sh to switch between versions if needed

Benefits#

  • Easier Maintenance: Changes to one command don't affect others
  • Better Testing: Individual modules can be tested in isolation
  • Extensibility: New commands and features can be added easily
  • Code Organization: Clear separation of concerns

For detailed architecture documentation, see docs/bot-architecture.md.

Installation#

Linux (Ubuntu, Fedora, Arch)#

Run the automated installation script:

./install.sh

This script will:

  • Detect your Linux distribution and install required dependencies
  • Set up your .env configuration interactively
  • Install Node.js dependencies
  • Create necessary cache directories
  • Optionally set up a systemd service for automatic startup

Windows#

Run the automated installation batch file:

install.bat

This script will:

  • Check for required dependencies (Node.js, Git, FFmpeg)
  • Install missing dependencies using Chocolatey or winget if available
  • Guide you through an interactive configuration process
  • Install Node.js dependencies
  • Create necessary cache directories
  • Optionally set up a Windows service using PM2

Manual Installation#

If you prefer to install manually:

  1. Clone this repository
  2. Install dependencies:
    npm install
    
  3. Copy .env.example to .env and fill in your credentials:
    cp .env.example .env
    
  4. Edit the .env file with your Telegram bot token and channel ID
  5. Ensure PM2 is installed globally:
    npm install -g pm2
    

Installation Script Details#

The automated installation scripts provide several advantages:

  1. Dependency Management

    • Automatically installs Node.js, Git, FFmpeg, and PM2 if missing
    • Uses native package managers (apt, dnf, pacman) on Linux
    • Utilizes Chocolatey or winget on Windows if available
  2. Interactive Configuration

    • Guides you through setting up all required environment variables
    • Provides sensible defaults for optional settings
    • Validates that required fields like bot token are entered
  3. System Integration

    • Configures systemd service on Linux
    • Sets up Windows service with PM2
    • Creates proper directory structure for media caching

Environment Variables#

The bot is configured via environment variables in a .env file. The installation scripts will help you create this file interactively, but here's a reference of available settings:

Required Settings#

  • BOT_TOKEN: Your Telegram bot token from BotFather
  • CHANNEL_ID: Your Telegram channel ID or username (e.g., @mychannel)

User Access Control#

  • AUTHORIZED_USERS: Comma-separated list of Telegram user IDs that are allowed to use the bot
  • OWNER_ID: (Optional) User ID of the bot owner for admin-level commands

Integration Options#

  • WEASYL_API_KEY: (Optional) API key for Weasyl integration - Get yours from Weasyl
  • SOFURRY_ACCESS_TOKEN: (Optional) OAuth access token for SoFurry API - Create an app and generate a token at the SoFurry Developer Portal
  • DISCORD_WEBHOOK_URL: (Optional) For Discord integration
  • DISCORD_ENABLED: Set to 'true' to enable Discord posting

Queue Configuration#

  • DEFAULT_CRON_SCHEDULE: When to post images (cron format, default: '0 */1 * * *' - every hour)
  • IMAGES_PER_INTERVAL: Number of images to post each time (default: 1)

Queue Monitoring#

  • QUEUE_LOW_THRESHOLD: Alert when queue has this many items or fewer (default: 10)
  • QUEUE_EMPTY_THRESHOLD: Alert when queue hits this level (default: 0)
  • QUEUE_ALERTS_ENABLED: Enable/disable queue monitoring (default: true)
  • QUEUE_ALERT_COOLDOWN_HOURS: Hours between repeated alerts (default: 24)

Media Cache#

  • MAX_CACHE_AGE_DAYS: Days to keep cached media files (default: 15)

Running the Bot#

This bot uses PM2 by default to ensure it runs persistently and automatically restarts after crashes or system reboots.

Starting the bot#

npm start

Stopping the bot#

npm run stop

Restarting the bot#

npm run restart

Viewing logs#

npm run logs

Checking status#

npm run status

Setting up automatic startup on system boot#

pm2 startup

Then follow the instructions provided by the command.

Saving the current PM2 process list#

After starting your bot, run:

pm2 save

This ensures your bot restarts automatically if the system reboots.

Development mode#

For development with auto-reload on file changes:

npm run dev

Commands#

  • /start - Start the bot
  • /help - Show help information
  • /queue - Show current queue status with interactive management
  • /status - Show detailed queue status and alert information
  • /send - Post the next image in the queue immediately
  • /schedule [cron] - Set posting schedule using cron syntax
  • /setcount [number] - Set number of images per post interval
  • /shuffle - Toggle shuffle mode (randomizes queue after each post, persists between restarts)
  • /clear - Clear the entire queue
  • /cleancache - Clean expired items from media cache
  • /recache - Recache missing files and remove items that fail after 3 attempts
  • /announce - Create a new announcement (with custom text and schedule)
  • /announcements - Manage existing announcements (view, edit, delete)
  • /update - Check for updates, stash changes, pull latest code, and restart bot (owner only)

Adding Images to Queue#

Send a link from any supported website to the bot in a direct message. The bot will extract the image and add it to the queue.

The bot can process any message containing multiple links to supported websites. This includes:

  • Multiple URLs in Text: Messages containing several URLs will have all valid links processed
  • Mixed Content: Messages with both text and inline keyboard buttons containing URLs
  • Embedded URLs: Links that appear anywhere within message text (not just at the beginning)
  • Button Links: URLs embedded in inline keyboard buttons are automatically detected and processed
  • Error Handling: Invalid links are silently discarded - the bot will only show an error if zero valid links are found

Forwarded Messages#

The bot can also process forwarded messages that contain links to supported websites with the same capabilities as regular messages.

Usage: Send any message containing supported website links to the bot. The bot will automatically detect and process all valid URLs while silently discarding invalid ones.

Interactive Queue Management#

The /queue command displays an interactive visual interface for managing queued items:

  • Page Navigation: Browse through the queue using Previous/Next buttons
  • Item Preview: View a preview of any queued item before it's posted
  • Item Removal: Remove specific items from the queue with a single click
  • Reordering: Move any item to the top of the queue to be posted next
  • Pagination: Easily navigate through pages of queued items
  • Shuffle Mode: Automatically randomize the queue after each post with /shuffle command (setting persists between restarts)

The interface shows important information about each queued item including:

  • Item position in queue
  • Content type (image or video)
  • Title and source website
  • Controls for managing each item

Queue Monitoring and Alerts#

Stagehand includes an intelligent queue monitoring system that automatically alerts authorized users when the queue needs attention:

Features#

  • Real-time monitoring: Checks queue levels every 30 seconds
  • Smart notifications: Configurable thresholds for low and empty queue alerts
  • 24-hour cooldown: Prevents alert spam with time-based rate limiting
  • Multi-user support: All authorized users receive alerts simultaneously
  • Admin controls: Test and manage alerts via the /status command

Configuration#

Add these variables to your .env file to customize alert behavior:

# Queue Alert Configuration
QUEUE_LOW_THRESHOLD=10          # Alert when queue ≤ this number
QUEUE_EMPTY_THRESHOLD=0         # Alert when queue is critically low/empty
QUEUE_ALERTS_ENABLED=true       # Enable/disable monitoring
QUEUE_ALERT_COOLDOWN_HOURS=24   # Hours between repeated alerts

Commands#

  • /status - View detailed queue status and alert configuration
  • Use admin controls in /status to test alerts and reset cooldowns

For detailed configuration options, see Queue Monitoring Configuration.

Documentation#

This project includes comprehensive documentation in the docs/ directory:

Advanced Features#

Perceptual Image Hashing#

Stagehand includes a sophisticated image hashing system to detect duplicate and similar images:

  • Automatic Processing: Images are automatically hashed when cached
  • SQLite Database: Stores perceptual hashes with URLs and metadata
  • Similarity Detection: Find visually similar images using Hamming distance
  • Duplicate Prevention: Helps identify duplicate content from different sources
  • Cleanup Management: Automatically removes database entries for deleted files

The image hashing system uses the imghash library to generate perceptual hashes (pHash) that can detect similar images even after resizing, compression, or minor modifications.

For detailed information, see Image Hashing Documentation.

Announcement System#

Create and manage scheduled text announcements that are posted to your channel:

  • Multiple Announcements: Support for multiple independent announcements
  • Individual Schedules: Each announcement can have its own cron schedule
  • Interactive Management: Create, edit, and delete announcements through the bot
  • Persistent Storage: Announcements are saved and survive bot restarts
  • Test Functionality: Test announcements before scheduling them

Use /announce to create new announcements and /announcements to manage existing ones.

Auto-Updater#

Stagehand includes an automatic update system that keeps your bot up to date:

  • Periodic Checks: Automatically checks for updates every 12 hours
  • Git Integration: Pulls updates from your Git repository
  • PM2 Restart: Automatically restarts the bot using PM2 after updating
  • Manual Updates: Use /update command to check and apply updates immediately
  • Owner-Only: Update command is restricted to the bot owner
  • Dev Mode Exclusion: Auto-updater is disabled in development mode

Update Flow#

When an update is triggered (either automatically or via /update command), the bot follows this process:

  1. Stash Local Changes: Any uncommitted local modifications are automatically stashed to prevent conflicts
  2. Fetch Remote Changes: Retrieves the latest commits from the remote repository
  3. Check for Updates: Verifies if there are new commits to pull
  4. Pull Changes: Applies the updates using git pull
  5. Save Queue: Forces a save of the current queue to disk to prevent data loss
  6. Display Commit Info: Shows the latest commit message with author and timestamp
  7. Restart Bot: Gracefully restarts the bot via PM2 with updated environment variables

Error Handling#

Each step in the update process includes comprehensive error handling:

  • Stash Failures: Reports if local changes cannot be stashed
  • Fetch Failures: Reports connection or repository access issues
  • Pull Failures: Reports merge conflicts or pull errors
  • Restart Failures: Provides instructions to manually restart if PM2 fails
  • Descriptive Messages: Each error includes specific information about what went wrong

The updater handles the normal PM2 SIGINT signal correctly and won't report it as an error during successful restarts.

Media Recaching#

Automatically handle missing or corrupted cache files:

  • Automatic Detection: Identifies missing cache files in the queue
  • Redownload Support: Automatically redownloads missing media files
  • Failure Tracking: Tracks failed attempts and removes items after 3 failures
  • Manual Trigger: Use /recache command to manually trigger recaching
  • Scheduled Execution: Can be scheduled to run automatically via cron
  • Progress Reporting: Reports how many items were processed and removed

This ensures your queue remains healthy and all media files are available when needed.

Discord Integration#

Optional Discord webhook support for cross-platform posting:

  • Webhook Support: Post media to Discord channels via webhooks
  • Parallel Posting: Post to both Telegram and Discord simultaneously
  • Independent Configuration: Enable/disable Discord without affecting Telegram
  • Media Compatibility: Handles both images and videos

Configure using the DISCORD_WEBHOOK_URL and DISCORD_ENABLED environment variables.

Documentation#

This project includes comprehensive documentation in the docs/ directory:

License#

GPL V3

ToDo List#

  • ATProto Implementation
  • Basic e621 Scraper
  • FurAffinity Scraper
  • SoFurry Scraper
  • Weasyl Scraper
  • Interactive Graphical Queue Manager
  • Add shuffle mode for queue
  • Add perceptual hashing
  • Redo Bluesky Module
  • Redo Telegram Module
  • Queue monitoring and alerts
  • Announcement system
  • Auto-updater
  • Media recaching system
  • Redo Queue Manager
  • Redo Discord Module