···11+🚧 unfinished 🚧
22+33+we are using esbuild (https://esbuild.github.io) for bundling
44+start ./host.sh, then open http://[::1]:8067/test.html in a browser
55+OR
66+run ./build.sh, then open file:///.../test.html in a browser
+4
editor1/bundle.sh
···11+./node_modules/.bin/esbuild site/root.js --bundle --outdir=site/bundle --sourcemap --banner:js="'use strict'" --format=esm --target=firefox91,safari15,chrome999 "$@"
22+# note, we set --format=esm even though we load as a normal script
33+# this is to trick esbuild into defining all module-global variables as window globals
44+# which is useful for testing (also note that esbuild avoids name collisions here)
···11+ok so with the parser we gotta output 2 things:
22+1: faceted text (i.e. text with a list of non-overlapping ranges with features)
33+2: highlighter spans (ie the exact text of the input, split into non-overlapping ranges with styling on them)
44+55+for 2:
66+77+there's basically 2 kinds of text we push.
88+- syntax spans, which correspond to text which doesnt appear in the output (though may map to text in the output)
99+- visible spans, which do appear in the output directly
1010+1111+when the parser hits something that gets eaten and doesnt go to output (e.g. `**`)
1212+we need to:
1313+- flush the current regular text before it
1414+- create a new node in the tree, set that as the current node
1515+1616+then, later on we have 2 options:
1717+- 'cancel' this node (ie move all its contents to the parent, set current node to parent)
1818+- 'close' this node (set current node to parent)
1919+2020+ok so how about each highlight span has like, a copy of the array of currently open nodes, then at the end we filter out the ones that have like .cancelled=true, and convert it to a string?
2121+2222+anyway so open():
2323+what this actually does is say,
2424+- end current segment
2525+- add a feature to the list of features
2626+- set temp variable on that feature saying it is supporting that segment (segments have a reference count - well of course, thats the features array length)
2727+- start a new segment
2828+2929+3030+close():
3131+- end current segment
3232+- remove feature from the list of features
3333+- start a new segment
3434+3535+cancel():
3636+- for each segment, if it contains that feature, remove it. we may need to merge the first segment containing this feature with the one before it, if their feature lists are identical now.
3737+- note, we can stay within the current segment
3838+- oh wait shit but we gotta go back and insert the syntax characters uhhh
3939+4040+ok lets say that um
4141+we can only begin a single feature at a time.
4242+ie each segment has a feature that caused it to begin, and likewise each feature has a segment that it began.
4343+so lets say we have a situation like "ab /cd"
4444+we have segments:
4545+{text:"ab ",features:[]}
4646+{text:"cd", features:[{<italic>,oncancel:{prefix:"/",segment:thisone}}]}
4747+wait how about this:
4848+{text:"ab ",start:[],end:[]}
4949+{text:"cd",start:[{<italic>,oncancel:"/",startsat:thissegment}],end:[]}
5050+and also a end:[] array.
5151+then, all we have to do to cancel a feature is:
5252+- seg = feature.startsat
5353+- remove feature from seg.start
5454+- seg.text = feature.oncancel + seg.text
5555+- merge back to the previous segment
5656+- and we havent added it to the end array yet
5757+now consider an example like "ab /**cd**"
5858+we have segments: ah lets make them "events" instead:
5959+- "ab "
6060+- start:{<italic>,oncancel:"/"}
6161+- start:{<bold>,oncancel:"**"}
6262+- "cd"
6363+- end:{<bold>}
6464+then we reach the end of list and we see, ah italic is still open. so we find its start event, replace it with its oncancel string.
6565+then at the end we iterate over this list of events and built the facets!!
+30
editor1/notes2.txt
···11+ok so unparsing... this is something which we dont /really/ need to deal with, but
22+in clients which allow "editing" or quoting other people's posts, we need a way to losslessly convert back from faceted text to markup
33+first, the obvious. we have to escape anything which would be parsed that we dont want.
44+next, facets
55+so in some cases we can probably "figure out" the original intent.
66+e.g. **abc /test/ def** is gonna produce a pattern of facets that we can reconstruct the original from
77+though, there are awkward cases like ok what if someone else's client uses "*" for bold instead of "**". so their post contains *bold* (with strip:[1,1]) but if we convert that back to \b[bold] then oops our post now generates **bold**. so do we consider that equivalent? since after stripping, the output is the same, after all. thats a tough question.
88+99+anyway, in cases where we cant figure it out: we need a raw facet syntax.
1010+previously i had something like \facet<features>"""[text] or whatever. so then you parse the json and thats a list of features
1111+that's /ok/ but gets messy. especially with overlapping styles where features are repeated. so here's an idea, what if we store a palette of features?
1212+like we say:
1313+the result of parsing `**abc /test/ def**` would then convert back into:
1414+\feature(a){$type:markup, style:italic}
1515+\feature(b){$type:markup, style:bold}
1616+\use(b)[abc ]\use(a,b)[test]\use(b)[ def]
1717+1818+1919+2020+2121+what if we could show the length limit as highlighting?
2222+this is tricky because its kinda not direct, the way things contribute to length. so like..
2323+e.g. you write `\i[test]` and the length limit is 4. your output text is `/test/` (6 chars) so do you mark like..
2424+well, the offending characters of the output are:
2525+- `/` (contributed by the italic end event)
2626+- `t` (contributed by the text event "test")
2727+therefore, you could argue that `t]` should be highlighted.
2828+however, of course, removing those chars doesnt bring you to a valid length (would produce `\i[tes` or something).
2929+furthermore, removing the entire italic span would bring you under length. so perhaps that should be marked..
3030+
+285
editor1/old/hl-live.js
···11+"use strict"
22+33+function first_difference(str1, str2, tokens) {
44+ let i
55+ let ti = 0
66+ let ind = 0
77+ let offset = 9 // account for regex lookahead..
88+ if (tokens.length)
99+ for (i=-offset; i<str1.length; i++) {
1010+ if (str1[i+offset] !== str2[i+offset])
1111+ break
1212+ if (i >= ind+tokens[ti].len) {
1313+ ind += tokens[ti].len
1414+ ti++
1515+ }
1616+ }
1717+ return [ti-1, ind]
1818+}
1919+2020+let nw = 0
2121+2222+class Parser {
2323+ constructor(states) {
2424+ this.states = states
2525+ }
2626+ parse(text, oldtext, oldtokens) {
2727+ nw++
2828+ let iloop = 0
2929+ let current, s_name
3030+ let [t1, ind] = first_difference(oldtext, text, oldtokens)
3131+ let shift = text.length - oldtext.length
3232+ let suff_start
3333+ for (suff_start=text.length; suff_start>=ind; suff_start--) {
3434+ if (text[suff_start] !== oldtext[suff_start-shift])
3535+ break
3636+ }
3737+ suff_start++
3838+ let t2 = null
3939+4040+ let token1
4141+ let tokens
4242+ if (t1<0) {
4343+ token1 = {len:0, type:undefined, state:'data'}
4444+ tokens = []
4545+ } else {
4646+ token1 = oldtokens[t1]
4747+ tokens = oldtokens.slice(0, t1+1)
4848+ }
4949+ let lastIndex = ind
5050+5151+ let to_state = (name)=>{
5252+ s_name = name
5353+ current = this.states[name]
5454+ current.regex.lastIndex = lastIndex
5555+ }
5656+ to_state(token1.state)
5757+5858+ function output(start, end, type) {
5959+ if (start==end)
6060+ return
6161+ if (start >= suff_start) {
6262+ let ind2=ind+shift
6363+ for (let i=t1+1; i<oldtokens.length; i++) {
6464+ let x = oldtokens[i]
6565+ if (ind2==start && ind2+x.len==end && x.type==type && x.state==s_name) {
6666+ t2 = i
6767+ return true
6868+ }
6969+ ind2 += x.len
7070+ }
7171+ }
7272+ tokens.push({len:end-start, type, state:s_name, new:nw})
7373+ }
7474+7575+ function merge() {
7676+ if (t2==null)
7777+ return [tokens, t1, t2, tokens.length, ind]
7878+ return [tokens.concat(oldtokens.slice(t2)), t1, t2, tokens.length, ind]
7979+ }
8080+ //console.log("starting on char: "+lastIndex, "suffix: ", suffix)
8181+8282+ let match
8383+ while (match = current.regex.exec(text)) {
8484+ if (output(lastIndex, match.index))
8585+ return merge()
8686+ // infinite loop protection
8787+ if (lastIndex == current.regex.lastIndex) {
8888+ if (iloop++ > 5)
8989+ throw new Error('infinite loop '+lastIndex)
9090+ } else
9191+ iloop=0
9292+ // process match
9393+ lastIndex = current.regex.lastIndex
9494+ let g = current.groups[match.indexOf("", 1)-1]
9595+ if ('function'==typeof g)
9696+ g = g(match[0])
9797+ if (g.state)
9898+ s_name = g.state
9999+ if (output(match.index, lastIndex, g.token))
100100+ return merge()
101101+ if (g.state)
102102+ to_state(g.state)
103103+ }
104104+ output(lastIndex, text.length)
105105+ return merge()
106106+ }
107107+}
108108+109109+function STATE({raw}, ...values) {
110110+ let r = raw.join("()").slice(1, -1)
111111+ .replace(/\n/g, "|").replace(/\\`/g, "`")
112112+ .replace(/[(](?![?)])/g, "(?:")
113113+ let regex = new RegExp(r, 'g')
114114+ return {regex, groups: values}
115115+}
116116+117117+// todo: function to determine new state
118118+// "default" highlight for skipped chars (i.e. within rawtext states)
119119+120120+let parse_html = new Parser({
121121+ data: STATE`
122122+&([a-zA-Z0-9]+|#[xX][0-9a-fA-F]+|#[0-9]+);?${{token:'charref'}}
123123+<(?=/?[a-zA-Z])${{token:'tag', state:'tag'}}
124124+<!---?>${{token:'comment'}}
125125+<!--${{token:'comment', state:'comment'}}
126126+<[!?/][^>]*>?${{token:'comment'}}
127127+\n${{}}
128128+`,
129129+ comment: STATE`
130130+(--!?>|$)${{token:'comment', state:'data'}}
131131+`,
132132+ tag: STATE`
133133+script(?![^\s/>])${{token:'name', state: 'in_script_tag'}}
134134+[a-zA-Z][^\s/>]*${{token:'name', state:'in_tag'}}
135135+/[a-zA-Z][^\s/>]*${{token:'name', state:'in_tag'}}
136136+`,
137137+ in_tag: STATE`
138138+[\s/]*(>${{token:'tag', state:'data'}}
139139+=[^\s/>=]*${{token:'key', state:'after_key'}}
140140+[^\s/>=]+${{token:'key', state:'after_key'}})
141141+`,
142142+ after_key: STATE`
143143+\s*=\s*${{state:'value'}}
144144+(?:)${{state:'in_tag'}}
145145+`,
146146+ value: STATE`
147147+"[^"]*"?${{token:'value', state:'in_tag'}}
148148+'[^']*'?${{token:'value', state:'in_tag'}}
149149+[^\s>]*${{token:'value', state:'in_tag'}}
150150+`,
151151+152152+ in_script_tag: STATE`
153153+[\s/]*(>${{token:'tag', state: 'js'}}
154154+=[^\s/>=]*${{token:'key', state:'after_script_key'}}
155155+[^\s/>=]+${{token:'key', state:'after_script_key'}})
156156+`,
157157+ after_script_key: STATE`
158158+\s*=\s*${{state:'script_value'}}
159159+(?:)${{state:'in_script_tag'}}
160160+`,
161161+ script_value: STATE`
162162+"[^"]*"?${{token:'value', state:'in_script_tag'}}
163163+'[^']*'?${{token:'value', state:'in_script_tag'}}
164164+[^\s>]*${{token:'value', state:'in_script_tag'}}
165165+`,
166166+167167+ js: STATE`
168168+(?=</script)${{state:'data'}}
169169+(break|catch|class|continue|default|do|else|finally|for|function|if|switch|try|while|with|case|return|throw|yield|yield|=>)(?![\w$])${{token:'flow'}}
170170+(typeof|await|delete|void|in|instanceof|new)(?![\w$])${{token:'operator'}}
171171+[?]?[.]\s*${{state:'js_property'}}
172172+([+-]{2}|[!=]==?|[!~])${{token:'operator'}}
173173+=${{token:'assignment'}}
174174+([-*%+&^|]|[*<>&|?]{2}|>>>)(=${{token:'assignment'}})?${{token:'operator'}}
175175+([?]|:|[<>]=?)${{token:'operator'}}
176176+(super|this)(?![\w$])${{token:'keyword', state:'js_after_value'}}
177177+(const|debugger|export|import|var|enum|implements|interface|let|package|private|protected|public|static|extends)(?![\w$])${{token:'keyword'}}
178178+(?!\d)[\w$]+(?=\s*:)${{token:'property', state:'js_after_label'}}
179179+(?!\d)[\w$]+${{token:'word', state:'js_after_value'}}
180180+"${{token:'string', state:'js_string1'}}
181181+'${{token:'string', state:'js_string2'}}
182182+\`${{token:'string', state:'js_string3'}}
183183+//${{token:'comment',state:'js_comment'}}
184184+/[*]${{token:'comment',state:'js_block_comment'}}
185185+/${{token:'string', state:'js_regex'}}
186186+<!--${{token:'comment',state:'js_comment'}}
187187+(0[xXbBoO]|[.])?[\dA-Fa-f]+(_?[\dA-Fa-f]+)*${{token:'constant', state:'js_after_value'}}
188188+\n${{}}
189189+;${{token:'semicolon'}}
190190+`,
191191+ js_regex: STATE`
192192+(?=</script)${{state:'data'}}
193193+\n${{state:'js'}}
194194+\\[/\\]${{}}
195195+/[idgmuy]*${{token:'string',state:'js_after_value'}}
196196+`,
197197+ js_string1: STATE`
198198+(?=</script)${{state:'data'}}
199199+\n${{state:'js'}}
200200+\\["\\\n]${{}}
201201+"${{token:'string',state:'js_after_value'}}
202202+`,
203203+ js_string2: STATE`
204204+(?=</script)${{state:'data'}}
205205+\n${{state:'js'}}
206206+\\['\\\n]${{}}
207207+'${{token:'string',state:'js_after_value'}}
208208+`,
209209+ js_string3: STATE`
210210+(?=</script)${{state:'data'}}
211211+\\[\`\\]${{}}
212212+\`${{token:'string',state:'js_after_value'}}
213213+`,
214214+ js_comment: STATE`
215215+(?=</script)${{state:'data'}}
216216+\n${{token:'comment', state:'js'}}
217217+`,
218218+ js_block_comment: STATE`
219219+(?=</script)${{state:'data'}}
220220+[*]/${{token:'comment', state:'js'}}
221221+`,
222222+ js_after_value: STATE`
223223+(?=</script)${{state:'data'}}
224224+/=${{token:'assignment'}}
225225+/${{token:'operator'}}
226226+[+-]{2}${{token:'operator'}}
227227+${{state:'js'}}
228228+`,
229229+ js_property: STATE`
230230+(?=</script)${{state:'data'}}
231231+(?!\d)[\w$]+${{token:'property', state:'js'}}
232232+${{state:'js'}}
233233+`,
234234+ js_after_label: STATE`
235235+\s*:${{state:'js'}}
236236+`
237237+})
238238+239239+let parser = parse_html
240240+let old_tokens=[], old_text=""
241241+function render(t, out) {
242242+ let [tokens, t1, t2, nlen, ind] = parser.parse(t, old_text, old_tokens)
243243+ let pp = performance.now()
244244+ old_tokens = tokens
245245+ old_text = t
246246+ let elem1 = out.childNodes[t1+1]
247247+ let elem2 = t2==null ? null : out.childNodes[t2]
248248+ let prev
249249+ let nchanged = 0
250250+ // todo: delete nodes with this?
251251+ //let range = document.createRange()
252252+ //range.setStart(out, nlen)
253253+ //range.setEndBefore(out, t2)
254254+ for (let i=t1+1; i<nlen; i++) {
255255+ let changed
256256+ if (elem1==elem2) {
257257+ elem1 = document.createElement('span')
258258+ out.insertBefore(elem1, elem2)
259259+ changed = true
260260+ }
261261+ let text = t.substr(ind, tokens[i].len).replace(/[\0-\10\13\14\16-\37\177]/g, "\xFFFF") // \xEE00
262262+ if (elem1.textContent != text) {
263263+ elem1.textContent = text
264264+ changed = true
265265+ }
266266+ if (elem1.className != tokens[i].type) {
267267+ elem1.className = tokens[i].type||""
268268+ changed = true
269269+ }
270270+ if (changed)
271271+ nchanged++
272272+// if (changed)
273273+ // elem1.dataset.anim = elem1.dataset.anim=='false'
274274+ elem1 = elem1.nextSibling
275275+ ind += tokens[i].len
276276+ }
277277+ while (elem1!=elem2) {
278278+ let prev = elem1
279279+ elem1 = elem1.nextSibling
280280+ prev.remove()
281281+ nchanged++
282282+ }
283283+ graph.set_status(nchanged)
284284+ return pp
285285+}
···11+let MARKUP_TYPE = "com.example.richtext.facet#markup"
22+let r = String.raw
33+let whitespace = r`\s\u00AD\u2060\u200A\u200B\u200C\u200D\u20e2`
44+let url = r`[-\w/%&=#+~@$*'!?,.;:]*`
55+let url_final = r`[-\w/%&=#+~@$*']`
66+let big_regex = RegExp([
77+ r`\b(?<link>https?://${url}${url_final}([(]${url}[)](${url}${url_final})?)?)(?<link_open>\[)?`,
88+ // alternative less strict link regex:
99+ //r`\b(?<link>(https?://|([a-zA-Z0-9-]+[.])+[a-zA-Z0-9-]{2,}(?=[/]))${url}${url_final}([(]${url}[)](${url}${url_final})?)?)(?<link_open>\[)?`,
1010+ // urls must either have a scheme, or have a slash after the domain, to be considered.
1111+ // ie we will never link common "word.word" mistakes.
1212+ r`(?<=^|\s)[##](?<hashtag>(?!\uFE0F)[^${whitespace}]*[^\p{P}${whitespace}])`, // note: must filter out #123
1313+ r`(?<=^|\s|[(])@(?<mention>[-a-zA-Z0-9]+([.][-a-zA-Z0-9]+)+)\b`,
1414+ r`(?<style>[*][*]|[_][_]|[~][~]|[/])`,
1515+ r`(?<close>\])`,
1616+ r`(?<open>\[)`,
1717+ r`\x60(?<code>.*?)\x60`,
1818+ r`[\\]facet(?<facet>\[.*?\])\n\[`,
1919+ r`[\\](?<tag_open>[a-z]+)\[`,
2020+ r`[\\](?<escaped>.)`,
2121+].join("|"), 'gu')
2222+const INITIALS = /(https?:[/][/])|#|#|@|[*][*]|[_][_]|[~][~]|[/]|\]|\[|`|[\\]/g
2323+2424+const STYLE_SURROUND = {
2525+ __proto__: null,
2626+ italic: Object.freeze(["/","/"]),
2727+ bold: Object.freeze(["**","**"]),
2828+ underline: Object.freeze(["__","__"]),
2929+ strikethrough: Object.freeze(["~~","~~"]),
3030+ code: Object.freeze(["`","`"]),
3131+}
3232+const TAG_FEATURES = {
3333+ __proto__: null,
3434+ i: Object.freeze({$type:MARKUP_TYPE, style:'italic', strip:STYLE_SURROUND['italic']}),
3535+ b: Object.freeze({$type:MARKUP_TYPE, style:'bold', strip:STYLE_SURROUND['bold']}),
3636+ u: Object.freeze({$type:MARKUP_TYPE, style:'underline', strip:STYLE_SURROUND['underline']}),
3737+ s: Object.freeze({$type:MARKUP_TYPE, style:'strikethrough', strip:STYLE_SURROUND['strikethrough']}),
3838+ code: Object.freeze({$type:MARKUP_TYPE, style:'code', strip:STYLE_SURROUND['code']}),
3939+}
4040+4141+const STYLE_START
4242+ = /^[\s,][^\s,]|^['"}{(>|\[][^\s,'"]/
4343+const STYLE_END
4444+ = /^[^\s,][-\s.,:;!?'"}{)<\\|\]]/
4545+const ITALIC_START
4646+ = /^[\s,][^\s,/]|^['"}{(|\[][^\s,'"/<]/
4747+const ITALIC_END
4848+ = /^[^\s,/>][-\s.,:;!?'"}{)\\|\]]/
4949+const STYLE_SYNTAX = {
5050+ __proto__:null,
5151+ '**': 'bold',
5252+ '__': 'underline',
5353+ '~~': 'strikethrough',
5454+ '/': 'italic',
5555+}
5656+function check_style(match, open_styles) {
5757+ let type = STYLE_SYNTAX[match.groups.style]
5858+ let before = match.input.charAt(match.index-1)||"\n"
5959+ let after = match.input.charAt(match.index+match[0].length)||"\n"
6060+6161+ let feature = open_styles.get(type)
6262+ let side = 'none'
6363+ if (feature) {
6464+ if (('italic'==type ? ITALIC_END : STYLE_END).test(before+after))
6565+ side = 'close'
6666+ } else {
6767+ if (('italic'==type ? ITALIC_START : STYLE_START).test(before+after))
6868+ side = 'open'
6969+ }
7070+ return {side, type, feature}
7171+}
7272+7373+function finalize_overlay_span({text, features, owner}) {
7474+ let cn = ""
7575+ for (let f of features)
7676+ if (!f._cancelled)
7777+ cn += " "+f.$type.split("#")[1]
7878+ if (owner) {
7979+ if (owner._cancelled)
8080+ // if a syntax span has its owner feature cancelled, we still want to highlight that
8181+ // so the user knows there /would/ be something here if they had closed the tag etc.
8282+ cn += " cancelled"
8383+ cn += " "+owner.$type.split("#")[1]+"-syntax"
8484+ }
8585+ return [text, cn]
8686+}
8787+8888+function parse_facet_json(text) {
8989+ try {
9090+ let json = JSON.parse(text)
9191+ if (!(json instanceof Array))
9292+ return null
9393+ for (let feature of json) {
9494+ if ('string'!=typeof feature.$type)
9595+ return null
9696+ }
9797+ return json
9898+ } catch(e) {
9999+ }
100100+ return null
101101+}
102102+103103+function richtext_to_markup({text, facets}) {
104104+ text = text.replace(INITIALS, "\\$&") // hm can't we just pass big_regex to this?
105105+ // TODO: try to convert faceted text back into a markup string, for post editing
106106+ // this is a difficult task, and will often be impossible. in that case, we have a raw facet tag
107107+ // but ideally, we should try to use regular syntax as much as possible.
108108+ return text
109109+}
110110+111111+function highlight(text) {
112112+ // facet system
113113+ let events = [] // [event]
114114+ let open_features = new Map() // feature → event
115115+ let open_styles = new Map() // type → feature
116116+ let open_brackets = [] // [feature]
117117+118118+ // overlay system
119119+ let overlay_last = 0
120120+ let overlay_spans = []
121121+ function overlay(end, feature=null) {
122122+ if (end > overlay_last) {
123123+ let text2 = text.slice(overlay_last, end)
124124+ overlay_spans.push({text:text2, features:new Set(open_features.keys()), owner:feature})
125125+ overlay_last = end
126126+ }
127127+ }
128128+129129+ // events
130130+ function event_text(text) {
131131+ events.push({type:'text', feature:null, text})
132132+ }
133133+ function event_open(feature, oncancel="") {
134134+ let event = {type:'open', feature, text:oncancel}
135135+ events.push(event)
136136+ open_features.set(feature, event)
137137+ }
138138+ function event_close(feature, oncancel="") {
139139+ let event = {type:'close', feature, text:oncancel}
140140+ events.push(event)
141141+ open_features.delete(feature)
142142+ }
143143+ function event_cancel(feature) {
144144+ let event = open_features.get(feature)
145145+ open_features.delete(feature)
146146+ event.type = 'text'
147147+ event.feature = null
148148+ feature._cancelled = true // eghh.. (this is ok because these are only used for overlay spans now)
149149+ }
150150+151151+ let last = 0
152152+ for (let match of text.matchAll(big_regex)) {
153153+ let start = match.index
154154+ let end = start + match[0].length
155155+ let g = match.groups
156156+157157+ if (start > last)
158158+ event_text(text.slice(last, start))
159159+ last = end
160160+161161+ if (g.mention!=null) {
162162+ let feature = {$type:"app.bsky.richtext.facet#mention", did:g.mention}
163163+ overlay(start)
164164+ event_open(feature)
165165+ overlay(end, feature)
166166+ event_text(match[0])
167167+ event_close(feature)
168168+ } else if (g.link!=null) {
169169+ let feature = {$type:"app.bsky.richtext.facet#link", uri:g.link}
170170+ overlay(start)
171171+ if (g.link_open!=null) {
172172+ overlay(end, feature)
173173+ event_open(feature, match[0])
174174+ open_brackets.push(feature)
175175+ } else {
176176+ event_open(feature)
177177+ overlay(end, feature)
178178+ event_text(g.link)
179179+ event_close(feature)
180180+ }
181181+ } else if (g.hashtag!=null && !/^\d+$/.test(g.hashtag)) {
182182+ let feature = {$type:"app.bsky.richtext.facet#tag", tag:g.hashtag}
183183+ overlay(start)
184184+ event_open(feature)
185185+ overlay(end, feature)
186186+ event_text(match[0])
187187+ event_close(feature)
188188+ } else if (g.style!=null) {
189189+ let {side, type, feature} = check_style(match, open_styles)
190190+ if ('open'==side) {
191191+ feature = {$type:MARKUP_TYPE, style:type, strip:STYLE_SURROUND[type]}
192192+ overlay(start)
193193+ overlay(end, feature)
194194+ open_styles.set(type, feature)
195195+ event_open(feature, match[0])
196196+ } else if ('close'==side) {
197197+ overlay(start)
198198+ event_close(feature, match[0])
199199+ overlay(end, feature)
200200+ open_styles.delete(type)
201201+ } else {
202202+ event_text(match[0])
203203+ }
204204+ } else if (g.open!=null) {
205205+ event_text(match[0])
206206+ open_brackets.push(null)
207207+ } else if (g.close!=null && open_brackets.length) {
208208+ let feature = open_brackets.pop()
209209+ if (feature) {
210210+ overlay(start)
211211+ if (feature instanceof Array) {
212212+ for (let f of feature)
213213+ event_close(f) // nn
214214+ overlay(end, {$type:"#multi"})
215215+ } else {
216216+ event_close(feature, match[0])
217217+ overlay(end, feature)
218218+ }
219219+ } else {
220220+ event_text(match[0])
221221+ }
222222+ } else if (g.code!=null) {
223223+ let feature = {$type:MARKUP_TYPE, style:'code', strip:STYLE_SURROUND['code']}
224224+ overlay(start)
225225+ overlay(end, feature)
226226+ event_open(feature, "`")
227227+ event_text(g.code)
228228+ event_close(feature, "`")
229229+ } else if (g.tag_open!=null) {
230230+ let feature = TAG_FEATURES[g.tag_open]
231231+ if (feature) {
232232+ feature = {...feature}
233233+ overlay(start)
234234+ overlay(end, feature)
235235+ event_open(feature, match[0])
236236+ open_brackets.push(feature)
237237+ } else {
238238+ event_text(match[0])
239239+ }
240240+ } else if (g.escaped!=null) {
241241+ let feature = {$type:"#escape"} // only used for overlay
242242+ overlay(start)
243243+ event_text(g.escaped)
244244+ overlay(end, feature)
245245+ } else if (g.facet!=null) {
246246+ let features = parse_facet_json(g.facet)
247247+ if (features) {
248248+ overlay(start)
249249+ overlay(end, {$type: "#facet"})
250250+ for (let feature of features)
251251+ event_open(feature)
252252+ open_brackets.push(features)
253253+ } else {
254254+ overlay(start)
255255+ overlay(end, {$type: "#facet", _cancelled:true})
256256+ event_text("<invalid>[")
257257+ open_brackets.push(null)
258258+ }
259259+ } else {
260260+ overlay(end)
261261+ event_text(match[0])
262262+ }
263263+ }
264264+ if (text.length > last)
265265+ event_text(text.slice(last, text.length))
266266+267267+ overlay(text.length)
268268+269269+ for (let feature of open_features.keys())
270270+ event_cancel(feature)
271271+272272+ //console.log(events, overlay_spans)
273273+274274+ return {
275275+ overlay_spans: overlay_spans.map((span)=>finalize_overlay_span(span)),
276276+ data: events_to_richtext(events),
277277+ }
278278+}
279279+280280+function has_strip(feature) {
281281+ return MARKUP_TYPE==feature.$type
282282+}
283283+284284+// note: do not reuse feature objects
285285+function events_to_richtext(events) {
286286+ let text = ""
287287+ let facets = []
288288+ let open_features = new Set()
289289+ let start = 0
290290+ function flush_facet() {
291291+ if (text.length > start) {
292292+ facets.push({
293293+ index: {start, end:text.length},
294294+ features: [...open_features].map(feature=>{
295295+ if (has_strip(feature)) {
296296+ // convert _surround into strip
297297+ let feat = {...feature, strip:[0,0]} // copy
298298+ if (start == feature.strip.start) // at start
299299+ feat.strip[0] = feature.strip.surround[0].length
300300+ if (text.length == feature.strip.end) {
301301+ // at end
302302+ feat.strip[1] = feature.strip.surround[1].length
303303+ } else {
304304+ //feat.cont = true // todo: for the renderer: maybe continuation should be the default if the strips are both 0?
305305+ // anyway, which side of the gap should this flag be on? should it be a continued forward flag, or a merge backward flag?
306306+ // merge is probably easier for rendering (or actually do we want it to be a "don't merge, start a new block" flag?)
307307+ // ugh this one is annoying because like we dont have merge flags for builtin facets.. so like,
308308+ // maybe we can try to store that information elsewhere? or idk.
309309+ // is there a defined meaning for 2 facets with the same exact data and no gap between them? maybe it's those should always be considered equivalent, and so to disambigutate, if we DO want to not merge, we add some dummy property ah but strip is already gonna break things hm..
310310+ }
311311+ feature = feat
312312+ }
313313+ return feature
314314+ }),
315315+ })
316316+ start = text.length
317317+ }
318318+ }
319319+ for (let event of events) {
320320+ if ('text'==event.type) {
321321+ text += event.text
322322+ }
323323+ if ('open'==event.type) {
324324+ flush_facet()
325325+ open_features.add(event.feature)
326326+ if (has_strip(event.feature)) {
327327+ event.feature.strip = {surround:event.feature.strip, start: text.length, end: null}
328328+ text += event.feature.strip.surround[0]
329329+ }
330330+ }
331331+ if ('close'==event.type) {
332332+ if (has_strip(event.feature)) {
333333+ text += event.feature.strip.surround[1]
334334+ event.feature.strip.end = text.length
335335+ }
336336+ flush_facet()
337337+ open_features.delete(event.feature)
338338+ }
339339+ }
340340+ return {text, facets}
341341+}
+36
editor1/old/speed.html
···11+<!doctype html><meta charset=utf-8>
22+33+<script src=./template.js></script>
44+<script src=./html.js></script>
55+<script src=./length-meter.js></script>
66+<script src=./length-meter2.js></script>
77+<body>
88+<div id=$h1 hidden></div>
99+<div id=$h2 hidden></div>
1010+<script>
1111+ let h1 = $h1, h2=$h2
1212+ window.x=1
1313+ customElements.define('length-meter2', LengthMeter2)
1414+1515+ let s
1616+1717+1818+ s=performance.now()
1919+ for (let i=0; i<100; i++) {
2020+ let n = document.createElement('length-meter2')
2121+ n.setup({limit:300})
2222+ h1.append(n)
2323+ }
2424+ console.log(performance.now()-s)
2525+2626+ s=performance.now()
2727+ for (let i=0; i<100; i++) {
2828+ let n = new LengthMeter({limit:300})
2929+ h2.append(n.$root)
3030+ }
3131+3232+ n.update(count)
3333+ console.log(performance.now()-s)
3434+3535+ console.log(window.x)
3636+</script>
···11+import {Widget, HTML} from './template.js'
22+33+function get_last_indent(str, end) {
44+ let start
55+ for (start=end-1; start>=0 && str[start]!="\n"; start--)
66+ if (str[start]!=" " && str[start]!="\t")
77+ end = start
88+ return str.substring(start+1, end)
99+}
1010+1111+function compare(a, b) {
1212+ return a.textContent==b[0] && a.className==b[1]
1313+}
1414+1515+function find_gap(ot, nt) {
1616+ let start = 0
1717+ let oend = ot.length
1818+ let nend = nt.length
1919+ while (start<oend && start<nend && compare(ot[start], nt[start])) {
2020+ start++
2121+ }
2222+ while (oend>start && nend>start && compare(ot[oend-1], nt[nend-1])) {
2323+ oend--
2424+ nend--
2525+ }
2626+ return [start, oend, nend]
2727+}
2828+2929+function create_elem(x) {
3030+ let elem = document.createElement('span')
3131+ elem.className = x[1]
3232+ elem.textContent = x[0]
3333+ return elem
3434+}
3535+3636+export default class Editor1 extends Widget {
3737+ constructor({
3838+ parser,
3939+ graph,
4040+ oninput,
4141+ preview=false,
4242+ }) {
4343+ super()
4444+ this.tag_name = 'hl-textarea'
4545+ this.parser = parser
4646+ this.lock = false
4747+ this.missed = true
4848+ this.graph = graph
4949+ this.keydown_time = performance.now()
5050+ this.indent_newline = false
5151+ this.oninput = oninput
5252+ this.preview = preview
5353+ // todo: use html events properly (we may want to block the internal textarea's events also
5454+ // (note you cant rely on its oninput event, because our parsing may be delayed)
5555+5656+ let elem = document.createElement(this.tag_name)
5757+ elem.attachShadow({mode: 'open', delegatesFocus: true})
5858+ elem.shadowRoot.append(this.$root)
5959+ this.$root = elem
6060+6161+ // todo: don't set this event unless we need it?
6262+ this.$textarea.onbeforeinput = ev=>{
6363+ if (this.graph)
6464+ this.record_input()
6565+ if (this.indent_newline)
6666+ if ('insertLineBreak'==ev.inputType) {
6767+ let indent = get_last_indent(this.$textarea.value, this.$textarea.selectionStart)
6868+ if (indent) {
6969+ ev.preventDefault()
7070+ document.execCommand('insertText', false, "\n"+indent)
7171+ }
7272+ }
7373+ }
7474+7575+ this.$textarea.addEventListener('input', e=>{
7676+ this.last_value = this.$textarea.value
7777+ if (this.lock) {
7878+ this.missed = true
7979+ return
8080+ }
8181+ // lock rendering, then add an event (to the end of the queue) to unlock it
8282+ // this way if we get like, 10 input events at once, we'll render on the first one, then
8383+ // our unlock event happens after all those input events. so we'll only render twice in total.
8484+ this.lock = true
8585+ setTimeout(()=>{
8686+ this.lock = false
8787+ if (this.missed)
8888+ this.run()
8989+ })
9090+ this.run()
9191+ }, {passive: true})
9292+9393+ // this.run() we dont want to deal with this in the constructor yknow (esp if it errors)
9494+ }
9595+ record_input() {
9696+ this.keydown_time = performance.now()
9797+ }
9898+ set_overlay(spans) {
9999+ for (let x of spans)
100100+ // in some browsers, certain control characters render inside inputs but not normal elements.
101101+ // e.g. firefox `-moz-control-character-visibility: visible`
102102+ // so we have to replace them with something that will (probably?) render at the same size.
103103+ // this is one of the sketchier problems we have to deal with, but it's very rare for a user to input any of these chars
104104+ // also, \x00 is often sometimes in fonts (in which case it shouldn't be replaced) but it can't even be pasted in a textarea so whatever.
105105+ x[0] = x[0].replace(/[\x00-\x08\x0B\x0C\x0E-\x1F\x7F]/g, "\uFFFF")
106106+107107+ let nodes = [...this.$overlay.childNodes]
108108+ let [start, oend, nend] = find_gap(nodes, spans)
109109+110110+ let nchanged = 0
111111+ let after = nodes[oend]
112112+ let i = start
113113+ for (; i<oend && i<nend; i++) {
114114+ nodes[i].textContent = spans[i][0]
115115+ nodes[i].className = spans[i][1]
116116+ nchanged++
117117+ }
118118+ for (; i<oend; i++) {
119119+ nodes[i].remove()
120120+ nchanged++
121121+ }
122122+ for (; i<nend; i++) {
123123+ this.$overlay.insertBefore(create_elem(spans[i]), after)
124124+ nchanged++
125125+ }
126126+ if (this.graph)
127127+ this.graph.set_status(nchanged)
128128+ }
129129+ // todo: so we have preview mode, but
130130+ // ok what does that mean exactly...
131131+ // in our example page, we actually dont want to be in preview mode
132132+ // because we have to show the facets and stuff
133133+ // so like, we dont want to have to run the parser a second time
134134+ // hmmm.. maybe have a flag on the editor or.. idk
135135+ // im also not satisfied with the length system
136136+ // it seems messy, also why is it not just in the data field?
137137+ // ret is weird, i want to reorganize that again.
138138+ // like overlay_spans shouldnt be exposed
139139+ // but basically there's 3 or 4 outputs for the parser:
140140+ // 1: the overlay spans, for the editor to use internally
141141+ // 2: other status info, for a composer ui (e.g. grapheme length)
142142+ // 3: the final post data, only needed on submit
143143+ // 4? data to render a preview (in theory can be lighter than final)
144144+ // so, we should have a way to tell the editor, like
145145+ // - configure the input event, what fn to call and which info needed
146146+ // - method to request the final data (mayb #3 should be async? mostly for if the parser is delayed and you click send too fast somehow. but also - yea for like @mention resolution! yea we can solve it that way)
147147+ // ok yea let's do this
148148+ run() {
149149+ this.missed = false
150150+ let t_0 = performance.now(), t_parse, t_render
151151+ let text = this.$textarea.value
152152+ let ret
153153+ try {
154154+ ret = this.parser.parse(text, this.preview ? 'preview' : 'status')
155155+ } finally {
156156+ t_parse = performance.now()
157157+ if (!ret) {
158158+ // fallback in case of error
159159+ ret = {
160160+ highlight: [[text, "error"]],
161161+ status: {cost: text.length}, // hack
162162+ final: {text}, // hack
163163+ }
164164+ }
165165+ this.set_overlay(ret.highlight)
166166+ t_render = performance.now()
167167+ if (this.graph) {
168168+ void this.$overlay.scrollHeight // force layout recalc (slow!)
169169+ let t_layout = performance.now()
170170+ this.graph.graph_time(t_0, ['lime', t_parse, '#F60', t_render, 'purple', t_layout], ['gray', this.keydown_time])
171171+ this.keydown_time = null
172172+ }
173173+ if (this.oninput) {
174174+ let preview = ret.preview ?? ret.final
175175+ this.oninput(ret.status, preview)
176176+ }
177177+ }
178178+ }
179179+ // render the final post data
180180+ get_data() {
181181+ let text = this.$textarea.value
182182+ let ret = this.parser.parse(text, 'final')
183183+ return ret.final
184184+ }
185185+ set_styles(css) {
186186+ this.$colors.textContent = css
187187+ }
188188+ set_text(text, preserve_history) {
189189+ return this.splice_text(text, 'all', preserve_history)
190190+ }
191191+ // range:
192192+ // - 'all': entire textarea
193193+ // - 'selection': current user selection
194194+ // - {start:Number, end:Number}: custom range
195195+ // text:
196196+ // - String: string to insert
197197+ // - Function(String) -> String: function that converts the old text to new text
198198+ // preserve_history:
199199+ // - Boolean: try to preserve history (using execCommand)
200200+ // return:
201201+ // - Boolean: whether undo history was preserved
202202+ splice_text(text, range='selection', preserve_history=true) {
203203+ let good = false
204204+ let curr = this.$textarea.value
205205+ let after, start = 0, end = Infinity
206206+ if ('all'===range) {
207207+ if ('function'==typeof text)
208208+ text = text(curr)
209209+ after = text
210210+ } else {
211211+ if ('selection'===range) {
212212+ start = this.$textarea.selectionStart
213213+ end = this.$textarea.selectionEnd
214214+ } else {
215215+ start = +range.start
216216+ end = +range.end
217217+ }
218218+ if ('function'==typeof text)
219219+ text = text(curr.slice(start, end))
220220+ after = curr.slice(0, start) + text + curr.slice(end)
221221+ }
222222+ if (preserve_history) {
223223+ if ('all'===range) {
224224+ this.$textarea.select()
225225+ } else if ('selection'===range) {
226226+ this.$textarea.focus()
227227+ } else {
228228+ this.$textarea.focus()
229229+ this.$textarea.selectionStart = start
230230+ this.$textarea.selectionEnd = end
231231+ }
232232+ if (this.$root.contains(document.activeElement)) {
233233+ if (text)
234234+ document.execCommand('insertText', false, text)
235235+ else if (start!=end)
236236+ document.execCommand('delete')
237237+ good = true
238238+ }
239239+ /*if ('selection'===range) {
240240+ // todo: should we try to restore the selection after an operation? need to calculate overlap cases and such.. it's annoying generally. i tried this for just range=selection mode, but i dont like it because it can turn the cursor into a selection (since cursor is just 0-length selection)
241241+ this.$textarea.selectionStart = start
242242+ this.$textarea.selectionEnd = end - (curr.length - after.length)
243243+ }*/
244244+ }
245245+ // check this always, in case we can't use execCommand
246246+ // todo: will .value ever not be the string we inserted? like inserting weird chars or something
247247+ if (this.$textarea.value != after) {
248248+ this.$textarea.value = after
249249+ good = false
250250+ }
251251+ this.run()
252252+ return good
253253+ }
254254+}
255255+Editor1.template = HTML`
256256+<div>
257257+ <textarea-overlay $=overlay></textarea-overlay>
258258+ <textarea $=textarea></textarea>
259259+</div>
260260+<style>
261261+:host {
262262+ /* defaults */
263263+ white-space: pre-wrap; white-space: break-spaces;
264264+ font-family: monospace;
265265+ font-size: 1em;
266266+ /* font-kerning: none; /* might be good to set this just in case? */
267267+ font-variant-ligatures: none;
268268+ color: black;
269269+ background: white;
270270+ --inner-padding: 2px; /* use this instead of padding */
271271+ display: inline-block;
272272+ margin: 1px;
273273+ overflow-y: auto;
274274+ border: 2px inset ThreeDLightShadow;
275275+ scroll-padding: var(--inner-padding, 0px);
276276+}
277277+/* basically we have to ensure the textarea and overlay have all the same text styles */
278278+/* either set to a fixed value, or inherited from the editor-container element */
279279+:host > div, :host > div > * {
280280+ all: initial;
281281+ /* stuff that affects text positioning */
282282+ -webkit-text-size-adjust: none;
283283+ white-space: inherit;
284284+ word-break: break-word;
285285+ font: inherit;
286286+ font-size: 1em;
287287+ tab-size: inherit; -moz-tab-size: inherit;
288288+ letter-spacing: inherit;
289289+ word-spacing: inherit;
290290+}
291291+:host > div {
292292+ position: relative;
293293+ contain: content;
294294+ overflow: visible;
295295+ display: block;
296296+ min-height: 100%; /* fill remaining space, if host is sized */
297297+}
298298+:host > div > * {
299299+ display: block;
300300+ overflow: hidden; overflow: clip;
301301+ width: -webkit-fill-available; width: -moz-available; width: stretch;
302302+ padding: var(--inner-padding, 0px);
303303+}
304304+:host > div > textarea {
305305+ position: absolute;
306306+ top: 0;
307307+ z-index: 1;
308308+ resize: none;
309309+ height: 200%; /* todo: use like, grid or flex or something instead of */
310310+ overflow: hidden;
311311+312312+ background: transparent;
313313+ -webkit-text-fill-color: transparent; /* sets color without affecting caret-color */
314314+ color: inherit;
315315+ contain: strict;
316316+}
317317+:host > div > textarea-overlay {
318318+ pointer-events: none;
319319+ background: inherit;
320320+ color: inherit;
321321+ will-change: contents;
322322+ contain: content;
323323+}
324324+/* necessary to avoid hiding last linebreak */
325325+:host > div > textarea-overlay::after {
326326+ content: "\\A";
327327+}
328328+/* delegatesFocus is a newer feature, so we can't rely on :focus */
329329+:host(:focus-within) {
330330+ outline: auto;
331331+}
332332+</style>
333333+<style $=colors></style>
334334+`
335335+Editor1.graph_categories = [
336336+ {color: 'gray', label: "input delay"},
337337+ {color: 'lime', label: "parsing"},
338338+ {color: '#F60', label: "rendering"},
339339+ {color: 'purple', label: "layout"},
340340+]
+3
editor1/site/export.js
···11+export {default as HlTextarea} from './editor.js'
22+export {default as EditorParser1} from './parse-example.js'
33+export {finalize_facets} from './make-facets.js'
···11+import {grapheme_cost} from './lib/grapheme.js'
22+import {getUtf8Length as utf8_length} from '@atcute/uint8array'
33+44+export const MARKUP_TYPE = "com.example.richtext.facet#markup"
55+66+export function has_strip(feature) {
77+ return MARKUP_TYPE===feature.$type
88+}
99+1010+// this generates facets and also adds overlength property to the overlay spans
1111+export function finalize_facets(parserstate, mode, opts) {
1212+ const events = parserstate.events
1313+ let text = ""
1414+ // first we need to build the entire string so we can calculate the length and see where it goes over the limit
1515+ for (const event of events) {
1616+ if ('text'==event.type) {
1717+ text += event.text
1818+ } else if ('open'==event.type) {
1919+ if (has_strip(event.feature)) {
2020+ text += event.feature.strip[0]
2121+ }
2222+ } else if ('close'==event.type) {
2323+ if (has_strip(event.feature)) {
2424+ text += event.feature.strip[1]
2525+ }
2626+ }
2727+ }
2828+ // cost: number
2929+ // over_index: utf16 index
3030+ const {cost, over_index} = grapheme_cost(text, opts.budget)
3131+3232+ // now do the actual stuff
3333+ const open_features = new Set()
3434+ const overlength_features = new Set()
3535+ const facets = []
3636+ let total_length = 0 // utf16 index
3737+ let bstart = 0, bcurr = 0 // utf8 indexes
3838+ const flush_facet = ()=>{
3939+ if (bcurr <= bstart)
4040+ return
4141+ if (open_features.size)
4242+ facets.push({
4343+ index: {byteStart:bstart, byteEnd:bcurr},
4444+ features: [...open_features].map(feature=>{
4545+ if (!has_strip(feature))
4646+ return feature
4747+4848+ // convert _surround into strip
4949+ const feat = {...feature, strip:[0,0]} // copy
5050+ if (bstart == feature.strip.bstart) // at start
5151+ feat.strip[0] = feature.strip.surround[0].length
5252+ if (bcurr == feature.strip.bend) {
5353+ // at end
5454+ feat.strip[1] = feature.strip.surround[1].length
5555+ } else {
5656+ // feat.cont = true (todo: i'd rather cont be the default, and we set a field if we don't want 2 adjacent features merged. or maybe we rely on the good old fashioned like if 2 features have all the same fields then they get merged. and we add a start/end field to each feature which will be the start/end of the original feature before splitting. which we can use to determine strip, so we can set the same strip on all of the created features and only apply it if the feature is at the start/end of the original)
5757+ }
5858+ return feat
5959+ }),
6060+ })
6161+ bstart = bcurr
6262+ }
6363+ for (const event of events) {
6464+ if ('text'==event.type) {
6565+ bcurr += utf8_length(event.text)
6666+ total_length += event.text.length
6767+ if (total_length > over_index) {
6868+ const not_over = Math.max(0, event.text.length - (total_length - over_index))
6969+ event.overlength = not_over // note: here we assume that characters in text nodes have a 1:1 mapping to characters in their overlay node
7070+ }
7171+ } else if ('open'==event.type) {
7272+ flush_facet()
7373+ open_features.add(event.feature)
7474+ if (has_strip(event.feature)) {
7575+ const surround = event.feature.strip
7676+ event.feature.strip = {surround, bstart: bcurr, bend: null} // hacky.. but this is a field we can safely use todo: clean this up. like store it on the event or something during the first pass idk.
7777+ bcurr += utf8_length(surround[0])
7878+ // add both surrounds' lengths here, so we report length overruns earlier
7979+ total_length += surround[0].length + surround[1].length
8080+ }
8181+ if (total_length > over_index) {
8282+ event.overlength = 0
8383+ overlength_features.add(event.feature)
8484+ }
8585+ } else if ('close'==event.type) {
8686+ if (has_strip(event.feature)) {
8787+ bcurr += event.feature.strip.surround[1].length
8888+ // note: we added to `total_length` earlier
8989+ event.feature.strip.bend = bcurr
9090+ }
9191+ flush_facet()
9292+ open_features.delete(event.feature)
9393+ if (overlength_features.has(event.feature)) {
9494+ event.overlength = 0
9595+ }
9696+ }
9797+ }
9898+9999+ return {
100100+ highlight: parserstate.finalize_overlay_spans(),
101101+ status: {cost, budget:opts.budget},
102102+ final: {text, facets},
103103+ }
104104+}
+213
editor1/site/parse-example.js
···11+import ParserState from './parser-state.js'
22+import {MARKUP_TYPE} from './make-facets.js'
33+44+const r = String.raw
55+const whitespace = r`\s\u00AD\u2060\u200A\u200B\u200C\u200D\u20e2`
66+const url = r`[-\w/%&=#+~@$*'!?,.;:]*`
77+const url_final = r`[-\w/%&=#+~@$*']`
88+const ipv6_address = r`(?:\[[0-9A-Fa-f:]+\])`
99+const BIG_REGEX = RegExp([
1010+ r`\b(?<link>(?<link_begin>https?://)(?:${ipv6_address}?${url}${url_final}(?:[(]${url}[)](?:${url}${url_final})?)?|${ipv6_address}))(?<link_open>\[)?`,
1111+ // alternative less strict link regex:
1212+ //r`\b(?<link>(https?://|([a-zA-Z0-9-]+[.])+[a-zA-Z0-9-]{2,}(?=[/]))${url}${url_final}([(]${url}[)](${url}${url_final})?)?)(?<link_open>\[)?`, (todo: support ipv6 on this one)
1313+ // urls must either have a scheme, or have a slash after the domain, to be considered.
1414+ // ie we will never link common "word.word" mistakes.
1515+ r`[##](?<hashtag>(?!\uFE0F)[^${whitespace}]*[^\p{P}${whitespace}])`, // note: must filter out #123
1616+ r`@(?<mention>[-a-zA-Z0-9]+(?:[.][-a-zA-Z0-9]+)+)\b`,
1717+ r`(?<style>[*][*]|[_][_]|[~][~]|[/])`,
1818+ r`(?<close>\])`,
1919+ r`(?<open>\[)`,
2020+ r`\x60(?<code>.*?)\x60`,
2121+ r`[\\]facet(?<facet>\[.*?\])\n\[`,
2222+ r`[\\](?<tag_open>[a-z]+)\[`,
2323+ r`[\\](?<escaped>.)`,
2424+].join("|"), 'gu')
2525+2626+const STYLE_SURROUND = {
2727+ __proto__: null,
2828+ italic: Object.freeze(["/","/"]),
2929+ bold: Object.freeze(["**","**"]),
3030+ underline: Object.freeze(["__","__"]),
3131+ strikethrough: Object.freeze(["~~","~~"]),
3232+ code: Object.freeze(["`","`"]),
3333+}
3434+const TAG_FEATURES = {
3535+ __proto__: null,
3636+ i: Object.freeze({$type:MARKUP_TYPE, style:'italic', strip:STYLE_SURROUND['italic']}),
3737+ b: Object.freeze({$type:MARKUP_TYPE, style:'bold', strip:STYLE_SURROUND['bold']}),
3838+ u: Object.freeze({$type:MARKUP_TYPE, style:'underline', strip:STYLE_SURROUND['underline']}),
3939+ s: Object.freeze({$type:MARKUP_TYPE, style:'strikethrough', strip:STYLE_SURROUND['strikethrough']}),
4040+ code: Object.freeze({$type:MARKUP_TYPE, style:'code', strip:STYLE_SURROUND['code']}),
4141+}
4242+4343+// these regexes are used on a string containing the char before and the char after the style token. e.g. "a**b" we do STYLE_START.test("ab")
4444+const STYLE_START
4545+ = /^[\s,][^\s,]|^['"}{(>|\[][^\s,'"]/
4646+const STYLE_END
4747+ = /^[^\s,][-\s.,:;!?'"}{)<\\|\]]/
4848+const ITALIC_START
4949+ = /^[\s,][^\s,/]|^['"}{(|\[][^\s,'"/<]/
5050+const ITALIC_END
5151+ = /^[^\s,/>][-\s.,:;!?'"}{)\\|\]]/
5252+const STYLE_SYNTAX = {
5353+ __proto__:null,
5454+ '**': 'bold',
5555+ '__': 'underline',
5656+ '~~': 'strikethrough',
5757+ '/': 'italic',
5858+}
5959+// todo: add a system to do this automatically (replacement for lookbehind which i am trying to avoid for now)
6060+const TAG_BEFORE
6161+ = /^[\s(]/
6262+const MENTION_BEFORE
6363+ = /^[\s(]/
6464+6565+function parse_facet_json(text) {
6666+ try {
6767+ let json = JSON.parse(text)
6868+ if (!(json instanceof Array))
6969+ return null
7070+ for (let feature of json) {
7171+ if ('string'!=typeof feature.$type)
7272+ return null
7373+ }
7474+ return json
7575+ } catch(e) {
7676+ }
7777+ return null
7878+}
7979+8080+export default class EditorParser1 {
8181+ options = {}
8282+ constructor(opts) {
8383+ this.options = opts
8484+ }
8585+ parse(text, mode='all') {
8686+ const p = new ParserState(text, BIG_REGEX, this.options)
8787+ const open_styles = new Map() // type → feature
8888+ const open_brackets = [] // [feature]
8989+ for (p.begin(); p.match; p.step()) {
9090+ const match = p.match
9191+ const start = match.index
9292+ const end = start + match[0].length
9393+ const g = match.groups
9494+9595+ decide: if (g.mention!=null) {
9696+ if (!MENTION_BEFORE.test(p.charBefore(start)||"\n")) {
9797+ p.reject()
9898+ break decide
9999+ }
100100+ // @mention
101101+ p.accept()
102102+ let feature = {$type:"app.bsky.richtext.facet#mention", did:g.mention}
103103+ p.element_direct(end, feature)
104104+ } else if (g.link!=null) {
105105+ let feature = {$type:"app.bsky.richtext.facet#link", uri:g.link}
106106+ if (g.link_open!=null) {
107107+ // link with text
108108+ p.accept()
109109+ p.event('open', end, feature)
110110+ open_brackets.push(feature)
111111+ } else {
112112+ // standalone link
113113+ p.accept()
114114+ // ok so we have to pre-shorten link text, because like
115115+ // if we dont remove the scheme part, bsky.app will truncate links a lot
116116+ let sch = g.link_begin?.length ?? 0
117117+ p.event('open', start+sch, feature)
118118+ p.event('text', end)
119119+ p.event('close', end, feature)
120120+ }
121121+ } else if (g.hashtag!=null) {
122122+ if (/^\d+$/.test(g.hashtag) || !TAG_BEFORE.test(p.charBefore(start)||"\n")) {
123123+ p.reject()
124124+ break decide
125125+ }
126126+ // #hashtag
127127+ p.accept()
128128+ let feature = {$type:"app.bsky.richtext.facet#tag", tag:g.hashtag}
129129+ p.element_direct(end, feature)
130130+ } else if (g.style!=null) {
131131+ let type = STYLE_SYNTAX[g.style]
132132+ let before = p.charBefore(start)||"\n"
133133+ let after = p.charAfter(end)||"\n"
134134+ let feature = open_styles.get(type)
135135+ if (!feature) {
136136+ if (('italic'==type ? ITALIC_START : STYLE_START).test(before+after)) {
137137+ // style open
138138+ p.accept()
139139+ feature = {$type:MARKUP_TYPE, style:type, strip:STYLE_SURROUND[type]}
140140+ open_styles.set(type, feature)
141141+ p.event('open', end, feature)
142142+ break decide
143143+ }
144144+ } else {
145145+ if (('italic'==type ? ITALIC_END : STYLE_END).test(before+after)) {
146146+ // style close
147147+ p.accept()
148148+ p.event('close', end, feature)
149149+ open_styles.delete(type)
150150+ break decide
151151+ }
152152+ }
153153+ p.reject()
154154+ break decide
155155+ } else if (g.open!=null) {
156156+ // open bracket
157157+ p.accept()
158158+ p.event('text', end)
159159+ open_brackets.push(null)
160160+ } else if (g.close!=null) {
161161+ if (!open_brackets.length) {
162162+ p.reject()
163163+ break decide
164164+ }
165165+ let feature = open_brackets.pop()
166166+ if (feature) {
167167+ // close bracket
168168+ p.accept()
169169+ p.event('close', end, feature)
170170+ } else {
171171+ // null close bracket
172172+ p.accept()
173173+ p.event('text', end)
174174+ }
175175+ } else if (g.code!=null) {
176176+ // `code` (this is one match instead of separate open/close)
177177+ p.accept()
178178+ let feature = {$type:MARKUP_TYPE, style:'code', strip:STYLE_SURROUND['code']}
179179+ p.event('open', start+1, feature)
180180+ p.event('text', end-1)
181181+ p.event('close', end, feature)
182182+ } else if (g.tag_open!=null) {
183183+ let feature = TAG_FEATURES[g.tag_open]
184184+ if (feature) {
185185+ // valid \tag[
186186+ p.accept()
187187+ feature = {...feature}
188188+ p.event('open', end, feature)
189189+ open_brackets.push(feature)
190190+ } else {
191191+ // invalid \tag[
192192+ p.accept()
193193+ p.event('text', end)
194194+ }
195195+ } else if (g.escaped!=null) {
196196+ // escaped char
197197+ p.accept()
198198+ let feature = {$type:"#escape"} // only used for overlay
199199+ p.event('text', end, feature, g.escaped)
200200+ } else {
201201+ // other (should never happen)
202202+ p.reject()
203203+ break decide
204204+ }
205205+ }
206206+ p.finish()
207207+ // cancel unclosed
208208+ for (let feature of p.open_features.keys())
209209+ p.cancel_feature(feature)
210210+211211+ return this.options.finalize(p, mode, this.options)
212212+ }
213213+}
+151
editor1/site/parser-state.js
···11+// todo: parse modes, e.g.:
22+// 'status' : overlay + status
33+// 'preview' : overlay + status + preview
44+// 'final' : final
55+66+export default class ParserState {
77+ events = [] // [event] // note: each feature object should occur exactly 2 times in events
88+ open_features = new Map() // {feature → event}
99+ overlay_last = 0 // index into rawtext
1010+ overlay_spans = [] // [temp_overlay]
1111+1212+ rawtext = ""
1313+ regex = null
1414+1515+ options = {}
1616+1717+ match = null
1818+ ok = false
1919+2020+ constructor(text, regex, opt) {
2121+ this.rawtext = text
2222+ this.regex = regex
2323+ this.options = opt
2424+ }
2525+ // todo: make a single function that does overlay + event creation
2626+ // also: what if we made overlay point to event rather than vice versa?
2727+ // for most cases (when overlay and events are 1:1)
2828+ // exceptions:
2929+ // - event without an overlay (e.g. auto-closed tag)
3030+ // - easy: no overlays point to the event
3131+ // - overlay without an event (e.g. sets a flag, comment syntax)
3232+ // - easy: overlay.event is null
3333+ // - multiple overlays for a single event
3434+ // - easy
3535+ // - overlay that triggers multiple events
3636+ // - uh oh!
3737+3838+ // hm i wonder if we can think about
3939+ // events (particularly open events) as like
4040+ // this is basically a text event, but with the promise that ok if this isnt cancelled, it will be a feature event instead.
4141+ // so basically consider a bold open event. if that gets cancelled, what happens is it becomes a direct text event. i think let's make that a rule
4242+ element_direct(end, feature) {
4343+ this.event('open', undefined, feature)
4444+ this.event('text', end)
4545+ this.event('close', undefined, feature)
4646+ }
4747+ // what are the ways to eat a span of text:
4848+ // - open_syntax (consumes text and creates an open event)
4949+ // - close_syntax (consumes text and creates a close event)
5050+ // - text_direct (consumes text and creates a text event with that text)
5151+ // - text_indirect (consumes text and creates a text event with custom text)
5252+ // actually let's just give each one a variant with custom text
5353+ // so it'll be like
5454+ // - type (open/close/text)
5555+ // - end (how much text to consume)
5656+ // - text - to set the text/oncancel field (optional, defaults to slice(last, end))
5757+ // - feature: for open/close: the feature (sets overlay owner field, type 1). for text: this is used for the overlay owner field corresp' type 2
5858+ event(type, end=this.overlay_last, feature=null, text=this.slice(this.overlay_last, end)) {
5959+ // add the event
6060+ const event = {type, feature, text}
6161+ this.events.push(event)
6262+ // before
6363+ if ('close'==type)
6464+ this.open_features.delete(feature)
6565+ // add the overlay span
6666+ if (end > this.overlay_last) {
6767+ const text = this.slice(this.overlay_last, end)
6868+ const overlay = {text, features:new Set(this.open_features.keys()), owner:feature, event}
6969+ this.overlay_spans.push(overlay)
7070+ this.overlay_last = end
7171+ }
7272+ // after
7373+ if ('open'==type)
7474+ this.open_features.set(feature, event)
7575+ }
7676+ cancel_feature(feature) {
7777+ const event = this.open_features.get(feature)
7878+ this.open_features.delete(feature)
7979+ event.type = 'text'
8080+ event.feature = null
8181+ feature._cancelled = true // eghh.. (this is ok because these are only used for overlay spans now)
8282+ }
8383+ begin() {
8484+ this.regex.lastIndex = 0 // nnnn
8585+ this.match = this.regex.exec(this.rawtext)
8686+ }
8787+ step(ok=true) {
8888+ if (!ok)
8989+ this.regex.lastIndex = this.match.index + (this.rawtext.codePointAt(this.match.index) < 0x10000 ? 1 : 2)
9090+ this.match = this.regex.exec(this.rawtext)
9191+ }
9292+ finish() {
9393+ const end = this.rawtext.length
9494+ this.event('text', end)
9595+ }
9696+ accept() {
9797+ this.ok = true
9898+ const end = this.match.index
9999+ this.event('text', end)
100100+ }
101101+ reject() {
102102+ this.ok = false
103103+ }
104104+ // ** - .owner: the bold feature, .event: the bold start event
105105+ // #test - .owner: the hashtag feature, .event: the text event
106106+ // and in the first case, the event can be changed into a text event if the bold is cancelled
107107+ // basically:
108108+ // .owner is for spans which are markup syntax that caused a feature to be created,
109109+ // .event is for overlength checking
110110+ // .features are all the currently open features, mainly for text spans (todo: i dont like how we have to copy the features list for every span, and also filter out cancelled ones. let's find a nicer way someday)
111111+ // also, why do we actually have to store the text strings; we could just store an end index until later.
112112+ slice(start, end) {
113113+ return this.rawtext.slice(start, end)
114114+ }
115115+ charAfter(start) {
116116+ return this.rawtext.charAt(start)
117117+ }
118118+ charBefore(end) {
119119+ return this.rawtext.charAt(end-1)
120120+ }
121121+ finalize_overlay_spans() {
122122+ return this.overlay_spans.flatMap((span)=>{
123123+ let cn = ""
124124+ for (const f of span.features)
125125+ if (!f._cancelled)
126126+ cn += " "+f.$type.split("#")[1]
127127+ if (span.owner) {
128128+ if (span.owner._cancelled)
129129+ // if a syntax span had its owner feature cancelled, we still want to highlight that
130130+ // so the user knows there /would/ be something here if they had closed the tag etc.
131131+ cn += " cancelled"
132132+ cn += " "+span.owner.$type.split("#")[1]+"-syntax"
133133+ // todo: we have basically 2 kinds of owner highlighting:
134134+ // 1: syntax which wont appear in the output. e.g. **test** highlighting on the **. this should use bg color
135135+ // 2: "syntax" which will appear in the output. like, these chars Activated the parser but are also expected to be shown as text. e.g. #test highlighting on the entire thing. this should not use strong bg color but does need to be indicated somehow (underline, fg color, etc.).
136136+ }
137137+ if (span.event) {
138138+ const over = span.event.overlength
139139+ if (0===over) {
140140+ cn += " overlength"
141141+ } else if (over>0 && over<=span.text.length) {
142142+ return [
143143+ [span.text.slice(0, over), cn],
144144+ [span.text.slice(over), cn+" overlength"],
145145+ ]
146146+ }
147147+ }
148148+ return [[span.text, cn]]
149149+ })
150150+ }
151151+}