A rust implementation of skywatch-phash
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

README.md

Types Module#

Purpose#

This module defines the core, shared data structures used throughout the application. These types ensure that data passed between modules—from the firehose to the queue, to the workers, and to the moderation module—is structured and strongly-typed. The use of jacquard specific types (Did, Cid, AtUri) and CowStr provides significant memory efficiency and type safety benefits over using standard Strings.

Key Data Structures#

BlobCheck#

Represents a single rule loaded from rules/blobs.json. It defines a set of phashes to check for and the moderation actions to take upon a successful match.

  • phashes: Vec<CowStr<'static>>: A list of known bad phashes.
  • label: CowStr<'static>: The moderation label to apply (e.g., "spam").
  • comment: CowStr<'static>: A human-readable description of the rule for moderation logs.
  • Action flags (report_acct, label_post, etc.): Booleans that determine which actions to trigger.
  • hamming_threshold: Option<u32>: An optional override for the similarity threshold for this specific rule.
  • ignore_did: Option<Vec<Did<'static>>>: A list of DIDs to exempt from this check.

BlobReference#

A lightweight reference to an image blob found within a post record.

  • cid: Cid<'static>: The Content Identifier of the blob.
  • mime_type: Option<CowStr<'static>>: The image's MIME type (e.g., "image/jpeg").

ImageJob#

The primary unit of work that is serialized and passed through the Redis queue.

  • post_uri: AtUri<'static>: The unique AT-URI of the post containing the image(s).
  • post_cid: Cid<'static>: The CID of the specific commit that created the post.
  • post_did: Did<'static>: The DID of the user who authored the post.
  • blobs: Vec<BlobReference>: The list of images to be processed for this post.
  • attempts: u32: A counter for tracking retry attempts.

MatchResult#

A struct created by the processor when a computed phash matches a BlobCheck rule. It bundles all necessary information for the moderation module to act.

  • phash: CowStr<'static>: The phash computed from the downloaded image.
  • matched_check: BlobCheck: A clone of the rule that was matched.
  • matched_phash: CowStr<'static>: The specific known bad phash from the rule that triggered the match.
  • hamming_distance: u32: The calculated similarity score between the two phashes.

Deserialization#

This module includes custom serde deserialization functions (e.g., deserialize_did, deserialize_cowstr_vec). These functions are optimized to perform zero-copy deserialization where possible by first deserializing to a &str slice and then parsing into the final jacquard type. This is significantly more efficient than deserializing to an intermediate String, as it avoids unnecessary memory allocations.