From 953d2ee2a0f73631c2d7efaa4e37bd3a5f3cad1d Mon Sep 17 00:00:00 2001 From: John Doty Date: Sat, 2 Nov 2024 09:29:37 -0700 Subject: [PATCH 1/4] [dingus] about.html --- dingus/about.md | 209 ++++++++++++++++++++++++++++++++++++++++++++++++ makefile | 9 ++- 2 files changed, 217 insertions(+), 1 deletion(-) create mode 100644 dingus/about.md diff --git a/dingus/about.md b/dingus/about.md new file mode 100644 index 0000000..a573b9c --- /dev/null +++ b/dingus/about.md @@ -0,0 +1,209 @@ +% About The Grammar Dingus + + +(This is a demo)[index.html] for a (library)[https://github.com/decarabas/lrparsers] +about doing fun things with grammars. + +## How to Use The Dingus + +- Define your grammar in the left hand pane in python. +- Write some text in your language in the middle pane. +- Poke around the tree and errors on the right hand side. + +## Making Grammars + +To get started, create a grammar that derives from the `Grammar` +class. Create one method per non-terminal, decorated with the `rule` +decorator. Here's an example: + +```python {.numberLines} + from parser import * + + class SimpleGrammar(Grammar): + start = "expression" + + @rule + def expression(self): + return seq(self.expression, self.PLUS, self.term) | self.term + + @rule + def term(self): + return seq(self.LPAREN, self.expression, self.RPAREN) | self.ID + + PLUS = Terminal('+') + LPAREN = Terminal('(') + RPAREN = Terminal(')') + ID = Terminal( + Re.seq( + Re.set(("a", "z"), ("A", "Z"), "_"), + Re.set(("a", "z"), ("A", "Z"), ("0", "9"), "_").star(), + ), + ) +``` + +Terminals can be plain strings or regular expressions constructed with +the `Re` object. (Ironically, I guess this library is not clever +enough to parse a regular expression string into one of these +structures. If you want to build one, go nuts! It's just Python, you +can do whatever you want so long as the result is an `Re` object.) + +Productions can be built out of terminals and non-terminals, +concatenated with the `seq` function or the `+` operator. Alternatives +can be expressed with the `alt` function or the `|` operator. These +things can be freely nested, as desired. + +There are no helpers (yet!) for consuming lists, so they need to be +constructed in the classic context-free grammar way: + +```python {.numberLines} + class NumberList(Grammar): + start = "list" + + @rule + def list(self): + return self.NUMBER | (self.list + self.COMMA + self.NUMBER) + + NUMBER = Terminal(Re.set(("0", "9")).plus()) + COMMA = Terminal(',') +``` + +(Unlike with PEGs, you can write grammars with left or right-recursion, +without restriction, either is fine.) + +When used to generate a parser, the grammar describes a concrete +syntax tree. Unfortunately, that means that the list example above +will generate a very awkward tree for `1,2,3`: + +``` +list + list + list + NUMBER ("1") + COMMA + NUMBER ("2") + COMMA + NUMBER ("3") +``` + +In order to make this a little cleaner, rules can be "transparent", +which means they don't generate nodes in the tree and just dump their +contents into the parent node instead. + +```python {.numberLines} + class NumberList(Grammar): + start = "list" + + @rule + def list(self): + # The starting rule can't be transparent: there has to be something to + # hold on to! + return self.transparent_list + + @rule(transparent=True) + def transparent_list(self) -> Rule: + return self.NUMBER | (self.transparent_list + self.COMMA + self.NUMBER) + + NUMBER = Terminal(Re.set(("0", "9")).plus()) + COMMA = Terminal(',') +``` + +This grammar will generate the far more useful tree: + +``` +list + NUMBER ("1") + COMMA + NUMBER ("2") + COMMA + NUMBER ("3") +``` + +Rules that start with `_` are also interpreted as transparent, +following the lead set by tree-sitter, and so the grammar above is +probably better-written as: + +```python {.numberLines} + class NumberList(Grammar): + start = "list" + + @rule + def list(self): + return self._list + + @rule + def _list(self): + return self.NUMBER | (self._list + self.COMMA + self.NUMBER) + + NUMBER = Terminal(Re.set(("0", "9")).plus()) + COMMA = Terminal(',') +``` + +That will generate the same tree, but a little more succinctly. + +### Trivia + +Most folks that want to parse something want to skip blanks when they +do it. Our grammars don't say anything about that by default (sorry), +so you probably want to be explicit about such things. + +To allow (and ignore) spaces, newlines, tabs, and carriage-returns in +our number lists, we would modify the grammar as follows: + +```python {.numberLines} + class NumberList(Grammar): + start = "list" + trivia = ["BLANKS"] # <- Add a `trivia` member + + @rule + def list(self): + return self._list + + @rule + def _list(self): + return self.NUMBER | (self._list + self.COMMA + self.NUMBER) + + NUMBER = Terminal(Re.set(("0", "9")).plus()) + COMMA = Terminal(',') + + BLANKS = Terminal(Re.set(" ", "\t", "\r", "\n").plus()) + # ^ and add a new terminal to describe it +``` + +Now we can parse a list with spaces! "1 , 2, 3" will parse happily +into: + +``` +list + NUMBER ("1") + COMMA + NUMBER ("2") + COMMA + NUMBER ("3") +``` + +### Error recovery + +In order to get good error recovery, you have to... do nothing. + +The parser runtime we're using here uses a non-interactive version of +[CPCT+](https://tratt.net/laurie/blog/2020/automatic_syntax_error_recovery.html). + +I find that it actually works quite well! If you're skeptical that a +machine-generated parser can do well enough for, say, an LSP, give +your favorite examples a try here. You might be surprised. + +(Go ahead, give it some of your [favorite examples of resilient +parsing](https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html) +and see how it does. I would love to see examples of where the +recovery went fully off the rails!) + +### Syntax highlighting + +*You can annotate the terminals and nonterminals to generate syntax +highlighting but the dingus doesn't have it wired into the editors +yet.* + +### Pretty-printing + +*You can annotate the grammar with rules for pretty printing but the +dingus doesn't expose it yet.* diff --git a/makefile b/makefile index a6d04c0..304a9fa 100644 --- a/makefile +++ b/makefile @@ -24,10 +24,17 @@ dist/lrparsers-$(VERSION).tar.gz dist/lrparsers-$(VERSION)-py3-none-any.whl: pyp clean: rm -rf ./dist rm -rf ./dingus/wheel/* + rm ./dingus/about.html + +# TODO: Get the built dingus artifacts out of the tree :P +# Use hard-links to make editing pleasant. .PHONY: dingus -dingus: dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl +dingus: dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl dingus/about.html python3 ./dingus/srvit.py +dingus/about.html: dingus/about.md + pandoc $< -o $@ -s + dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl: dist/lrparsers-$(VERSION)-py3-none-any.whl cp $< $@ From 3d9c3b2c99b04c920fd297d46bd08f98c726b170 Mon Sep 17 00:00:00 2001 From: John Doty Date: Sat, 2 Nov 2024 09:56:49 -0700 Subject: [PATCH 2/4] [dingus] Fix links in about page --- dingus/about.md | 2 +- dingus/{dingus.html => index.html} | 0 2 files changed, 1 insertion(+), 1 deletion(-) rename dingus/{dingus.html => index.html} (100%) diff --git a/dingus/about.md b/dingus/about.md index a573b9c..3290b5a 100644 --- a/dingus/about.md +++ b/dingus/about.md @@ -1,7 +1,7 @@ % About The Grammar Dingus -(This is a demo)[index.html] for a (library)[https://github.com/decarabas/lrparsers] +[This is a demo](index.html) for a [library](https://github.com/decarabas/lrparsers) about doing fun things with grammars. ## How to Use The Dingus diff --git a/dingus/dingus.html b/dingus/index.html similarity index 100% rename from dingus/dingus.html rename to dingus/index.html From fb181667b5341dd1eb7e95fb0ed8e272899c5618 Mon Sep 17 00:00:00 2001 From: John Doty Date: Sat, 2 Nov 2024 09:57:17 -0700 Subject: [PATCH 3/4] [dingus] Home tweaks --- dingus/index.html | 8 +++++--- dingus/style.css | 14 ++++++++++++-- 2 files changed, 17 insertions(+), 5 deletions(-) diff --git a/dingus/index.html b/dingus/index.html index 82caab3..50ebcd6 100644 --- a/dingus/index.html +++ b/dingus/index.html @@ -4,17 +4,19 @@ Dingus - -
+
+

Grammar Dingus

+ What is this? +
+
Write Your Grammar Here
diff --git a/dingus/style.css b/dingus/style.css index 2da25ab..d2244c0 100644 --- a/dingus/style.css +++ b/dingus/style.css @@ -1,14 +1,18 @@ -/* set codemirror ide height to 100% of the textarea */ body { height: 100vh; box-sizing: border-box; margin: 0; } +h1 { + margin-top: 0; + margin-bottom: 0.25rem; +} + .page-container { display: grid; grid-template-columns: 1fr 1fr 1fr; - grid-template-rows: 4rem 2rem 1fr 2rem; + grid-template-rows: auto 2rem 1fr 2rem; width: 100%; height: 100%; } @@ -33,6 +37,12 @@ body { border-top: 1px solid; } +.page-title { + grid-column: 1 / 4; + grid-row: 1; + padding: 0.5rem; +} + .grammar-title { grid-column: 1; grid-row: 2; From 42183be0f4b327c85a68ce82ab17ad7cdc6c053d Mon Sep 17 00:00:00 2001 From: John Doty Date: Sat, 2 Nov 2024 09:57:26 -0700 Subject: [PATCH 4/4] [dingus] Build out of tree --- makefile | 40 ++++++++++++++++++++++++++++++++-------- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/makefile b/makefile index 304a9fa..84f17dd 100644 --- a/makefile +++ b/makefile @@ -23,18 +23,42 @@ dist/lrparsers-$(VERSION).tar.gz dist/lrparsers-$(VERSION)-py3-none-any.whl: pyp .PHONY: clean clean: rm -rf ./dist - rm -rf ./dingus/wheel/* - rm ./dingus/about.html -# TODO: Get the built dingus artifacts out of the tree :P -# Use hard-links to make editing pleasant. +DINGUS_FILES=\ + dingus/srvit.py \ + dingus/index.html \ + dingus/dingus.js \ + dingus/worker.js \ + dingus/style.css \ + dingus/codemirror/codemirror.css \ + dingus/codemirror/codemirror.js \ + dingus/codemirror/python.js \ + dingus/pyodide/micropip-0.6.0-py3-none-any.whl \ + dingus/pyodide/micropip-0.6.0-py3-none-any.whl.metadata \ + dingus/pyodide/packaging-23.2-py3-none-any.whl \ + dingus/pyodide/packaging-23.2-py3-none-any.whl.metadata \ + dingus/pyodide/pyodide.asm.js \ + dingus/pyodide/pyodide.asm.wasm \ + dingus/pyodide/pyodide-core-0.26.2.tar \ + dingus/pyodide/pyodide.d.ts \ + dingus/pyodide/pyodide.js \ + dingus/pyodide/pyodide-lock.json \ + dingus/pyodide/pyodide.mjs \ + dingus/pyodide/python_stdlib.zip \ + +DINGUS_TARGETS=$(addprefix dist/, $(DINGUS_FILES)) .PHONY: dingus -dingus: dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl dingus/about.html - python3 ./dingus/srvit.py +dingus: $(DINGUS_TARGETS) dist/dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl dist/dingus/about.html + python3 ./dist/dingus/srvit.py -dingus/about.html: dingus/about.md +dist/dingus/%: dingus/% + mkdir -p $(dir $@) + ln $< $@ + +dist/dingus/about.html: dingus/about.md pandoc $< -o $@ -s -dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl: dist/lrparsers-$(VERSION)-py3-none-any.whl +dist/dingus/wheel/lrparsers-$(VERSION)-py3-none-any.whl: dist/lrparsers-$(VERSION)-py3-none-any.whl + mkdir -p $(dir $@) cp $< $@