Commit graph

46 commits

Author SHA1 Message Date
3bffe98df0 [parser] one_or_more
Finally making lists easier
2024-11-10 18:47:24 -08:00
5064a768e7 [all] A whole new style for grammars
Say good by to the sea of `self.`!
2024-11-09 11:21:30 -08:00
179405d849 [parser] Remove embarassing debug output 2024-11-03 07:04:11 -08:00
13f1353134 [runtime] When all else fails ask the author
Better error messages by allowing the author to customize the string.
(This kinda works actually.)
2024-10-28 06:24:47 -07:00
0a0f7b3612 [parser] Comment cleanup, documentation cleanup 2024-10-27 08:36:16 -07:00
385c378edb [parser] Everything is an ItemSet now 2024-10-26 07:51:13 -07:00
923b01f6fd [parser] Simplify StateGraph 2024-10-26 07:35:28 -07:00
27e6bb413c [parser] Remove Canonical LR1 generator
This is fine probably.
2024-10-26 07:25:37 -07:00
2b72811486 [parser] ConfigurationSetInfo -> StateGraph 2024-10-26 06:56:30 -07:00
e501caa073 [parser] Remove unused import 2024-10-26 06:53:53 -07:00
e55bc140f9 [parser] Move ItemSet 2024-10-26 06:53:36 -07:00
2d5c73f0b0 [parser] Remove LR0 and SLR1
Sorry, when this was educational it was nice to have the other
generators but as part of cleaning I'm just getting rid of them.
2024-10-15 07:43:52 -07:00
bb94fc6c9c [parser] clean clean clean 2024-10-11 07:52:48 -07:00
2656a1d328 [parser] Remove bad LALR implementation, start cleanup 2024-10-10 07:58:16 -07:00
eef1db72da [parser] Pager's algorithm. Faster.
As good as LALR but the implementation isn't embarassing. (Still
pretty bad though.)

Honestly the next thing to do is to delete LALR and just use Pager's
and also rebuild ConfigSet et al to be ItemSet so that Pager's alg
can go even faster. I think I want to keep LR1 just for completeness
so I might as well not delete SLR and LR0, although I *could* I
suppose.
2024-10-05 16:00:41 -07:00
bb52ab8da5 [parser] Error recovery tests
Based on the blog post "Resilient LL Parsing Tutorial" by Alex Kladov, at
https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

Because I was trying to be "simple" in my grammar definition I found
a bug in the grammar class, whoops! :)
2024-09-22 08:46:54 -07:00
8a17cfd586 [wadler] Prettier handling of trivia
Split the rules for pre- and post- trivia, understand when we want to
do either, handle multi-line-break (in an unsatisfying way, I guess)
but otherwise lay the groundwork for thinking about it better.

Also now we don't generate lazy "Text" nodes because I thought I might
want to actually look at the newlines in the source but I don't yet.
I *can* now, though. (I can also detect EOF so there's that.)
2024-09-19 16:39:32 -07:00
d5ccd5b147 Really messing around with trivia, it's not good yet
It's really not clear how to track it and how to compose it with
groups yet. Really very difficult.
2024-09-14 17:14:07 -07:00
d7a6891519 Finish annotating test grammar, forced breaks, fixes
Forced breaks force a newline in a spot, which is sometimes what we
want. (Like, this syntax should *never* be on a single line.)
2024-09-13 11:57:16 -07:00
d6dd54f4df Actual pretty-printing!
Now we're cooking with gas ALTHOUGH now we have to deal with the fact
that we're gluing everything together where there *should* be spaces.

Many more improvements to come.
2024-09-11 11:08:02 -07:00
443bf8bd33 Move formatting meta around, actually mark stuff up 2024-09-10 11:47:22 -07:00
0cbf696303 The start rule cannot be transparent 2024-09-09 06:23:11 -07:00
00b4cd4702 Starting to look at pretty-printing with the idea of auto-indentation
I wonder if it will work?
2024-09-06 16:23:14 -07:00
501c2e3fbe Teach the highlight meta about emacs face names 2024-09-06 10:20:17 -07:00
51c4f14c26 Emit highlight queries for tree-sitter
Now we're starting to get somewhere!
2024-09-05 14:52:35 -07:00
be8e017fd9 Fix regex generation, extras 2024-09-05 06:32:28 -07:00
591da0c971 Rework highlighting metadata GOD 2024-09-02 08:50:36 -07:00
0354fbf4a4 More ways of writing
Sometimes prettier
2024-09-01 06:52:13 -07:00
3012df4ac6 Precedence but it doesn't work
Tree sitter doesn't let me do token-based precedence? I don't like
tree-sitter's "make it inline but give it a number" system- seems like
a bug farm to me.
2024-08-31 07:22:49 -07:00
2d87207b54 Some small tweaks 2024-08-30 09:04:18 -07:00
80d932b36a Refactor to use non_terminals() 2024-08-29 08:23:55 -07:00
f8b62bf4a4 Terminal 'value' is 'name', compile_lexer is method 2024-08-29 08:22:23 -07:00
344dde51be Grammar start symbol is public 2024-08-29 08:12:08 -07:00
dc03bf7373 Grammars can be named 2024-08-29 08:00:40 -07:00
abcb0e516a OptionalRule is not required but MetatdataRule is 2024-08-28 08:33:32 -07:00
02c1aa507e Muck around with usability 2024-08-28 08:27:46 -07:00
49ad7fdb52 Associate metadata with terminals
This is a half-assed attempt at doing syntax coloring which I think
will almost certainly turn out to be insufficient. I'm committing it
just to record some of the work I've done but. BUT.

Probably trying to match tree-sitter is a better way of doing
this. (But, like, emitting tree-sitter grammars? Really? Wow, dude.
Way to give up.)
2024-08-27 15:43:07 -07:00
76ef85483e Accept is single-valued, the multi-value thing didn't ever make sense
I mean, it did when we thought we were going to weave NFA states as we
were building them but we ended up not doing that and instead just
using the fancy EdgeList splitting magic when building DFAs from the
NFA. It has the same power and is simpler code, and also means that
we'll *never* be asked to have multiple Terminals be accepted from a
single NFA state.
2024-08-27 15:43:01 -07:00
7a5f17f74b Specify and honor trivia tokens
e.g. "this is how machine-generated parsers know to skip blanks and
comments"

The run time implementation could be better; we don't really want to
just discard trivia because it's useful for e.g. doc comments and the
like. BUT for now this is fine.
2024-08-24 10:01:40 -07:00
0c952e4905 Correct NFA construction
There was a bug in the way that I was converting regular expressions
to NFAs. I'm still not entirely sure what was going on, but I
re-visited the construction and made it follow the literature more
closely and it fixed the problem.
2024-08-24 09:24:29 -07:00
454e6fd6fd Regex API "improvements"
I mean, is it better than a regex parser? No, probably not.
2024-08-24 08:35:45 -07:00
6d6aabdeb3 Terminal name must be explicit on construction 2024-08-24 08:35:10 -07:00
72052645d6 Generated lexers actually kinda work
But regular expressions are underpowered and verbose
2024-08-23 15:32:35 -07:00
58c3004702 Move terminals into grammar definition
Starting to work on machine-generated lexers too
2024-08-23 07:24:30 -07:00
e04aa1966e Start moving the examples into tests 2024-06-15 07:52:16 -07:00
7b3c94c469 Move things into more modules
It will help with testing and profiling
It breaks pyright (it's probably time to abandon pyright)
2024-06-10 05:50:09 -07:00
Renamed from parser.py (Browse further)