Monorepo for Tangled tangled.org
752
fork

Configure Feed

Select the types of activity you want to include in your feed.

proposal/discussion: extensible markup lexicon #113

open opened by boltless.me

several issues with current markup situation

Related Issues:

I recommend reading all related issues before discussing.

markup format is not extensible from lexicon#

Current lexicon definition doesn't specify the markup format. Right now, we only support blessed, tangled-specific markdown variant. But in future, we want to support custom syntax like org-mode requested in #197.

markdown facets#

It is pretty common to reference objects like user, issue, pull, repository, blob, or even git commits via markdown. And if someone reference something, we want that reference to be permanent.

For example, if alice referenced bob as @bob.tngl.sh and bob changed its handle to something else, @bob.tngl.sh should still point to same user (bob). We currently include mentioned/referenced identities in record to invalidate the legacy link, but this isn't enough. bluesky uses app.bsky.richtext.facet to embed resolved metadata to rich text, but its hard to adopt same solution because we need to apply byte-wise facets to a markup language. Byte-wise facets is quite doable for markdown variants or djot, but I assume not all markup language/parser will allow this.

Proposal#

Introduce sh.tangled.markup lexicon.

sh.tangled.markup#markdown#

Represent title/body text of issue/pull/comment.[1]

Both lexicons has two fields:

  • text (raw text)
  • refMap (uri -> item map)

refMap will map any uri used in text to resolved identifier like did, at-uri or blob. For maximum extensibility, it would be better to make key (uri) to be extensible too.


Honestly I'm not satisfied with my own solution, but I think we do need some kind of dedicated lexicon to represent the markup content instead of using raw string type.

I'm open to more thoughts.

[1]: Title might use sh.tangled.markup#markdown_inline instead to be more specific

i am open to the idea of defining a rich markdown facet-y lexicon for our use case. it is quite an undertaking to represent a markdown AST as a lexicon and the usefulness is questionable, given that other implementors need to be able to lower markdown AST into the lexicon AST. but we can be sure that issues/comments render identically on all tangled appviews.

one reason to prefer raw string markdown might be: other rendered content such as README files are plain text, any alternate appview would need to understand how to render plaintext markdown anyway.

To share the plan publicly, here is my final design:

{
  "$type": "sh.tangled.feed.comment",
  "body": {
    "$type": "sh.tangled.markup.markdown",
    "original": "hello @alice.com, see issue#123 and ![image](blob://cid)",
    "text": "hello [did:plc:alice], see [at://did:plc:blob/sh.tangled.repo.issue/rkey] and ![image](blob://cid)",
    "blobs": [
      { "$type": "blob",
        "ref": { "$link": "cid" },
        "mimetype": "image/jpeg",
        "size": 1234
      }
    ]
  }
}
  • original: raw text that user inserted. It's optional and will be restored on edit.
  • text: actual markdown content with Tangled-flavored syntax. reference users/records via did/at-uri.
  • blobs: list of referenced blobs to notify PDS about the blob references

When user sends original text, appview will normalize it as text. Mentions/References are expected to indexed from this normalized text value. This will allow backlink support across multiple appviews (currently backlinks are just tangled.org links.)

normalization:#

@alice.com -> [did:plc:alice]
issue#123  -> [at://did:plc:alice/sh.tangled.repo.issue/rkey]

note that issue#123 is just an example. We haven't settled down the syntax, and thanks to this design, we can easily change it later.

sign up or login to add to the discussion
Labels

None yet.

assignee

None yet.

Participants 2
AT URI
at://did:plc:xasnlahkri4ewmbuzly2rlc5/sh.tangled.repo.issue/3mctoic4vhe22