···11+MIT License
22+33+Copyright (c) 2020 Phil Plückthun
44+55+Permission is hereby granted, free of charge, to any person obtaining a copy
66+of this software and associated documentation files (the "Software"), to deal
77+in the Software without restriction, including without limitation the rights
88+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
99+copies of the Software, and to permit persons to whom the Software is
1010+furnished to do so, subject to the following conditions:
1111+1212+The above copyright notice and this permission notice shall be included in all
1313+copies or substantial portions of the Software.
1414+1515+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
1616+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
1717+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
1818+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
1919+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
2020+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
2121+SOFTWARE.
+349
README.md
···11+<div align="center">
22+ <img alt="reghex" width="250" src="docs/reghex-logo.png" />
33+ <br />
44+ <br />
55+ <strong>
66+ The magical sticky regex-based parser generator
77+ </strong>
88+ <br />
99+ <br />
1010+ <br />
1111+</div>
1212+1313+Leveraging the power of sticky regexes and Babel code generation, `reghex` allows
1414+you to code parsers quickly, by surrounding regular expressions with a regex-like
1515+[DSL](https://en.wikipedia.org/wiki/Domain-specific_language).
1616+1717+With `reghex` you can generate a parser from a tagged template literal, which is
1818+quick to prototype and generates reasonably compact and performant code.
1919+2020+_This project is still in its early stages and is experimental. Its API may still
2121+change and some issues may need to be ironed out._
2222+2323+## Quick Start
2424+2525+##### 1. Install with yarn or npm
2626+2727+```sh
2828+yarn add reghex
2929+# or
3030+npm install --save reghex
3131+```
3232+3333+##### 2. Add the plugin to your Babel configuration (`.babelrc`, `babel.config.js`, or `package.json:babel`)
3434+3535+```json
3636+{
3737+ "plugins": ["reghex/babel"]
3838+}
3939+```
4040+4141+Alternatively, you can set up [`babel-plugin-macros`](https://github.com/kentcdodds/babel-plugin-macros) and
4242+import `reghex` from `"reghex/macro"` instead.
4343+4444+##### 3. Have fun writing parsers!
4545+4646+```js
4747+import match, { parse } from 'reghex';
4848+4949+const name = match('name')`
5050+ ${/\w+/}
5151+`;
5252+5353+parse(name)('hello');
5454+// [ "hello", .tag = "name" ]
5555+```
5656+5757+## Concepts
5858+5959+The fundamental concept of `reghex` are regexes, specifically
6060+[sticky regexes](https://www.loganfranken.com/blog/831/es6-everyday-sticky-regex-matches/)!
6161+These are regular expressions that don't search a target string, but instead match at the
6262+specific position they're at. The flag for sticky regexes is `y` and hence
6363+they can be created using `/phrase/y` or `new RegExp('phrase', 'y')`.
6464+6565+**Sticky Regexes** are the perfect foundation for a parsing framework in JavaScript!
6666+Because they only match at a single position they can be used to match patterns
6767+continuously, as a parser would. Like global regexes, we can then manipulate where
6868+they should be matched by setting `regex.lastIndex = index;` and after matching
6969+read back their updated `regex.lastIndex`.
7070+7171+> **Note:** Sticky Regexes aren't natively
7272+> [supported in all versions of Internet Explorer](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/sticky#Browser_compatibility). `reghex` works around this by imitating its behaviour, which may decrease performance on IE11.
7373+7474+This primitive allows us to build up a parser from regexes that you pass when
7575+authoring a parser function, also called a "matcher" in `reghex`. When `reghex` compiles
7676+to parser code, this code is just a sequence and combination of sticky regexes that
7777+are executed in order!
7878+7979+```js
8080+let input = 'phrases should be parsed...';
8181+let lastIndex = 0;
8282+8383+const regex = /phrase/y;
8484+function matcher() {
8585+ let match;
8686+ // Before matching we set the current index on the RegExp
8787+ regex.lastIndex = lastIndex;
8888+ // Then we match and store the result
8989+ if ((match = regex.exec(input))) {
9090+ // If the RegExp matches successfully, we update our lastIndex
9191+ lastIndex = regex.lastIndex;
9292+ }
9393+}
9494+```
9595+9696+This mechanism is used in all matcher functions that `reghex` generates.
9797+Internally `reghex` keeps track of the input string and the current index on
9898+that string, and the matcher functions execute regexes against this state.
9999+100100+## Authoring Guide
101101+102102+You can write "matchers" by importing the default import from `reghex` and
103103+using it to write a matcher expression.
104104+105105+```js
106106+import match from 'reghex';
107107+108108+const name = match('name')`
109109+ ${/\w+/}
110110+`;
111111+```
112112+113113+As can be seen above, the `match` function, which is what we've called the
114114+default import, is called with a "node name" and is then called as a tagged
115115+template. This template is our **parsing definition**.
116116+117117+`reghex` functions only with its Babel plugin, which will detect `match('name')`
118118+and replace the entire tag with a parsing function, which may then look like
119119+the following in your transpiled code:
120120+121121+```js
122122+import { _pattern /* ... */ } from 'reghex';
123123+124124+var _name_expression = _pattern(/\w+/);
125125+var name = function name() {
126126+ /* ... */
127127+};
128128+```
129129+130130+We've now successfully created a matcher, which matches a single regex, which
131131+is a pattern of one or more letters. We can execute this matcher by calling
132132+it with the curried `parse` utility:
133133+134134+```js
135135+import { parse } from 'reghex';
136136+137137+const result = parse(name)('Tim');
138138+139139+console.log(result); // [ "Tim", .tag = "name" ]
140140+console.log(result.tag); // "name"
141141+```
142142+143143+If the string (Here: "Tim") was parsed successfully by the matcher, it will
144144+return an array that contains the result of the regex. The array is special
145145+in that it will also have a `tag` property set to the matcher's name, here
146146+`"name"`, which we determined when we defined the matcher as `match('name')`.
147147+148148+```js
149149+import { parse } from 'reghex';
150150+parse(name)('42'); // undefined
151151+```
152152+153153+Similarly, if the matcher does not parse an input string successfully, it will
154154+return `undefined` instead.
155155+156156+### Nested matchers
157157+158158+This on its own is nice, but a parser must be able to traverse a string and
159159+turn it into an [Abstract Syntax Tree](https://en.wikipedia.org/wiki/Abstract_syntax_tree).
160160+To introduce nesting to `reghex` matchers, we can refer to one matcher in another!
161161+Let's extend our original example;
162162+163163+```js
164164+import match from 'reghex';
165165+166166+const name = match('name')`
167167+ ${/\w+/}
168168+`;
169169+170170+const hello = match('hello')`
171171+ ${/hello /} ${name}
172172+`;
173173+```
174174+175175+The new `hello` matcher is set to match `/hello /` and then attempts to match
176176+the `name` matcher afterwards. If either of these matchers fail, it will return
177177+`undefined` as well and roll back its changes. Using this matcher will give us
178178+**nested abstract output**.
179179+180180+We can also see in this example that _outside_ of the regex interpolations,
181181+whitespaces and newlines don't matter.
182182+183183+```js
184184+import { parse } from 'reghex';
185185+186186+parse(hello)('hello tim');
187187+/*
188188+ [
189189+ "hello",
190190+ ["tim", .tag = "name"],
191191+ .tag = "hello"
192192+ ]
193193+*/
194194+```
195195+196196+### Regex-like DSL
197197+198198+We've seen in the previous examples that matchers are authored using tagged
199199+template literals, where interpolations can either be filled using regexes,
200200+`${/pattern/}`, or with other matchers `${name}`.
201201+202202+The tagged template syntax supports more ways to match these interpolations,
203203+using a regex-like Domain Specific Language. Unlike in regexes, whitespaces
204204+and newlines don't matter to make it easier to format and read matchers.
205205+206206+We can create **sequences** of matchers by adding multiple expressions in
207207+a row. A matcher using `${/1/} ${/2/}` will attempt to match `1` and then `2`
208208+in the parsed string. This is just one feature of the regex-like DSL. The
209209+available operators are the following:
210210+211211+| Operator | Example | Description |
212212+| -------- | ------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
213213+| `?` | `${/1/}?` | An **optional** may be used to make an interpolation optional. This will mean that the interpolation may or may not match. |
214214+| `*` | `${/1/}*` | A **star** can be used to match an arbitrary amount of interpolation or none at all. This will mean that the interpolation may repeat itself or may not be matched at all. |
215215+| `+` | `${/1/}+` | A **plus** is used like `*` and must match one or more times. When the matcher doesn't match, that's considered a failing case, since the match isn't optional. |
216216+| `\|` | `${/1/} \| ${/2/}` | An **alternation** can be used to match either one thing or another, falling back when the first interpolation fails. |
217217+| `()` | `(${/1/} ${/2/})+` | A **group** can be used apply one of the other operators to an entire group of interpolations. |
218218+| `(?: )` | `(?: ${/1/})` | A **non-capturing group** is like a regular group, but whatever the interpolations inside it will match, won't appear in the parser's output. |
219219+| `(?= )` | `(?= ${/1/})` | A **positive lookahead** will check whether interpolations match, and if so will continue the matcher without changing the input. If it matches it's essentially ignored. |
220220+| `(?! )` | `(?! ${/1/})` | A **negative lookahead** will check whether interpolations _don't_ match, and if so will continue the matcher without changing the input. If the interpolations do match the mathcer will be aborted. |
221221+222222+We can combine and compose these operators to create more complex matchers.
223223+For instance, we can extend the original example to only allow a specific set
224224+of names by using the `|` operator:
225225+226226+```js
227227+const name = match('name')`
228228+ ${/tim/} | ${/tom/} | ${/tam/}
229229+`;
230230+231231+parse(name)('tim'); // [ "tim", .tag = "name" ]
232232+parse(name)('tom'); // [ "tom", .tag = "name" ]
233233+parse(name)('patrick'); // undefined
234234+```
235235+236236+The above will now only match specific name strings. When one pattern in this
237237+chain of **alternations** does not match, it will try the next one.
238238+239239+We can also use **groups** to add more matchers around the alternations themselves,
240240+by surrounding the alternations with `(` and `)`
241241+242242+```js
243243+const name = match('name')`
244244+ (${/tim/} | ${/tom/}) ${/!/}
245245+`;
246246+247247+parse(name)('tim!'); // [ "tim", "!", .tag = "name" ]
248248+parse(name)('tom!'); // [ "tom", "!", .tag = "name" ]
249249+parse(name)('tim'); // undefined
250250+```
251251+252252+Maybe we're also not that interested in the `"!"` showing up in the output node.
253253+If we want to get rid of it, we can use a **non-capturing group** to hide it,
254254+while still requiring it.
255255+256256+```js
257257+const name = match('name')`
258258+ (${/tim/} | ${/tom/}) (?: ${/!/})
259259+`;
260260+261261+parse(name)('tim!'); // [ "tim", .tag = "name" ]
262262+parse(name)('tim'); // undefined
263263+```
264264+265265+Lastly, like with regexex `?`, `*`, and `+` may be used as "quantifiers". The first two
266266+may also be optional and _not_ match their patterns without the matcher failing.
267267+The `+` operator is used to match an interpolation _one or more_ times, while the
268268+`*` operators may match _zero or more_ times. Let's use this to allow the `"!"`
269269+to repeat.
270270+271271+```js
272272+const name = match('name')`
273273+ (${/tim/} | ${/tom/})+ (?: ${/!/})*
274274+`;
275275+276276+parse(name)('tim!'); // [ "tim", .tag = "name" ]
277277+parse(name)('tim!!!!'); // [ "tim", .tag = "name" ]
278278+parse(name)('tim'); // [ "tim", .tag = "name" ]
279279+parse(name)('timtim'); // [ "tim", tim", .tag = "name" ]
280280+```
281281+282282+As we can see from the above, like in regexes, quantifiers can be combined with groups,
283283+non-capturing groups, or other groups.
284284+285285+### Transforming as we match
286286+287287+In the previous sections, we've seen that the **nodes** that `reghex` outputs are arrays containing
288288+match strings or other nodes and have a special `tag` property with the node's type.
289289+We can **change this output** while we're parsing by passing a second function to our matcher definition.
290290+291291+```js
292292+const name = match('name', (x) => x[0])`
293293+ (${/tim/} | ${/tom/}) ${/!/}
294294+`;
295295+296296+parse(name)('tim'); // "tim"
297297+```
298298+299299+In the above example, we're passing a small function, `x => x[0]` to the matcher as a
300300+second argument. This will change the matcher's output, which causes the parser to
301301+now return a new output for this matcher.
302302+303303+We can use this function creatively by outputting full AST nodes, maybe like the
304304+ones even that resemble Babel's output:
305305+306306+```js
307307+const identifier = match('identifier', (x) => ({
308308+ type: 'Identifier',
309309+ name: x[0],
310310+}))`
311311+ ${/[\w_][\w\d_]+/}
312312+`;
313313+314314+parse(name)('var_name'); // { type: "Identifier", name: "var_name" }
315315+```
316316+317317+We've now entirely changed the output of the parser for this matcher. Given that each
318318+matcher can change its output, we're free to change the parser's output entirely.
319319+By **returning a falsy** in this matcher, we can also change the matcher to not have
320320+matched, which would cause other matchers to treat it like a mismatch!
321321+322322+```js
323323+import match, { parse } from 'reghex';
324324+325325+const name = match('name')((x) => {
326326+ return x !== 'tim' ? x : undefined;
327327+})`
328328+ ${/\w+/}
329329+`;
330330+331331+const hello = match('hello')`
332332+ ${/hello /} ${name}
333333+`;
334334+335335+parse(name)('tom'); // ["hello", ["tom", .tag = "name"], .tag = "hello"]
336336+parse(name)('tim'); // undefined
337337+```
338338+339339+Lastly, if we need to create these special array nodes ourselves, we can use `reghex`'s
340340+`tag` export for this purpose.
341341+342342+```js
343343+import { tag } from 'reghex';
344344+345345+tag(['test'], 'node_name');
346346+// ["test", .tag = "node_name"]
347347+```
348348+349349+**That's it! May the RegExp be ever in your favor.**