Commit graph

79 commits

Author SHA1 Message Date
179405d849 [parser] Remove embarassing debug output 2024-11-03 07:04:11 -08:00
13f1353134 [runtime] When all else fails ask the author
Better error messages by allowing the author to customize the string.
(This kinda works actually.)
2024-10-28 06:24:47 -07:00
c44083b610 [runtime] Slightly better messages
Picking the final reduction seems to get to a better place.
2024-10-28 06:06:23 -07:00
063584fb7e [runtime] Better error messages?
(Sometimes.)
2024-10-27 09:10:31 -07:00
0a0f7b3612 [parser] Comment cleanup, documentation cleanup 2024-10-27 08:36:16 -07:00
385c378edb [parser] Everything is an ItemSet now 2024-10-26 07:51:13 -07:00
923b01f6fd [parser] Simplify StateGraph 2024-10-26 07:35:28 -07:00
27e6bb413c [parser] Remove Canonical LR1 generator
This is fine probably.
2024-10-26 07:25:37 -07:00
2b72811486 [parser] ConfigurationSetInfo -> StateGraph 2024-10-26 06:56:30 -07:00
e501caa073 [parser] Remove unused import 2024-10-26 06:53:53 -07:00
e55bc140f9 [parser] Move ItemSet 2024-10-26 06:53:36 -07:00
2d5c73f0b0 [parser] Remove LR0 and SLR1
Sorry, when this was educational it was nice to have the other
generators but as part of cleaning I'm just getting rid of them.
2024-10-15 07:43:52 -07:00
bb94fc6c9c [parser] clean clean clean 2024-10-11 07:52:48 -07:00
2656a1d328 [parser] Remove bad LALR implementation, start cleanup 2024-10-10 07:58:16 -07:00
eef1db72da [parser] Pager's algorithm. Faster.
As good as LALR but the implementation isn't embarassing. (Still
pretty bad though.)

Honestly the next thing to do is to delete LALR and just use Pager's
and also rebuild ConfigSet et al to be ItemSet so that Pager's alg
can go even faster. I think I want to keep LR1 just for completeness
so I might as well not delete SLR and LR0, although I *could* I
suppose.
2024-10-05 16:00:41 -07:00
bb52ab8da5 [parser] Error recovery tests
Based on the blog post "Resilient LL Parsing Tutorial" by Alex Kladov, at
https://matklad.github.io/2023/05/21/resilient-ll-parsing-tutorial.html

Because I was trying to be "simple" in my grammar definition I found
a bug in the grammar class, whoops! :)
2024-09-22 08:46:54 -07:00
071cd29d8f [readme] Rewrite the readme and add a helper
The helper is nice actually.
2024-09-21 08:45:49 -07:00
b1f4c56f49 [wadler] One more bit of writing. 2024-09-21 07:45:59 -07:00
1a3ce02d48 [wadler] Re-factor into multiple modules
Hard split between builder and runtime, as is proper.
2024-09-21 07:42:52 -07:00
1f84752538 [wadler] Refactor: data and runtime split
Now we convert the grammar into data for a pretty-printer, so in
theoryw e could write the pretty-printer in a different language.
2024-09-21 06:44:53 -07:00
e4585170d8 [wadler] Fixup EOF trivia
Trivia handling feels pretty good now actually.
2024-09-19 20:23:02 -07:00
8a17cfd586 [wadler] Prettier handling of trivia
Split the rules for pre- and post- trivia, understand when we want to
do either, handle multi-line-break (in an unsatisfying way, I guess)
but otherwise lay the groundwork for thinking about it better.

Also now we don't generate lazy "Text" nodes because I thought I might
want to actually look at the newlines in the source but I don't yet.
I *can* now, though. (I can also detect EOF so there's that.)
2024-09-19 16:39:32 -07:00
c31d527077 [wadler] Trivia escapes groups
This means that forced breaks from comments don't screw up the
following single-line things. But this still isn't right; we need to
fine tune how we represent trivia.
2024-09-15 08:51:18 -07:00
9d55588a35 [wadler] Cons has a list of documents in it
I think I want to start thinking about "leftmost" and "rightmost" and
it's just easier and faster if cons has an actual list in it instead
of dotted pairs.
2024-09-15 08:12:30 -07:00
d5ccd5b147 Really messing around with trivia, it's not good yet
It's really not clear how to track it and how to compose it with
groups yet. Really very difficult.
2024-09-14 17:14:07 -07:00
f3a4c4348a Custom indentation 2024-09-14 07:28:18 -07:00
39ae937ddc Comments
Not really.
2024-09-13 23:16:46 -07:00
c8fef52c0c Support prefix newlines, stop generating empty productions
This required rebuilding the matcher compiler significantly, and it
was a lot a lot of work. But now we don't generate so many spurious
newlines and the document we produce is a lot a lot better.
2024-09-13 22:41:33 -07:00
709ba060b4 Hack for metadata in the document 2024-09-13 16:53:51 -07:00
8845bcfdab Fix accidental group duplication
Leading to unnecessary ambiguities
2024-09-13 15:28:50 -07:00
ca240906c9 Remove the text follow stuff
I think we have a good solution to the problem.
2024-09-13 12:02:44 -07:00
d7a6891519 Finish annotating test grammar, forced breaks, fixes
Forced breaks force a newline in a spot, which is sometimes what we
want. (Like, this syntax should *never* be on a single line.)
2024-09-13 11:57:16 -07:00
938f0e5c69 Support newline replacements
This allows us to do maybe more complicated spacing.

Still unclear about identifier/punctuation spacing.
2024-09-12 11:09:14 -07:00
b3b2102864 Record trivia in tokens
This will make our formatting better I think.
2024-09-12 06:22:49 -07:00
276449287d Allow for text to follow tokens in pretty-printing
It's weird that it counts against the line length though, like if you
were going to break you could ignore it right? At least, for the
grammar I'm working here....
2024-09-11 11:22:41 -07:00
d6dd54f4df Actual pretty-printing!
Now we're cooking with gas ALTHOUGH now we have to deal with the fact
that we're gluing everything together where there *should* be spaces.

Many more improvements to come.
2024-09-11 11:08:02 -07:00
443bf8bd33 Move formatting meta around, actually mark stuff up 2024-09-10 11:47:22 -07:00
7edf5e06bf Rebuild the matcher on grammars
Well that wasn't so bad now was it? Eh? Nice to have a parser
generator lying around. Let's keep working to see if I can actually
finish it.
2024-09-09 11:40:14 -07:00
1d28c82007 Saving this for posterity, but it is doomed
Remember that tree levels are generated by context free languages, not
regular languages, and so they can only be recognized by push-down
automatons, not finite state machines.

What happened was that I failed to account for transparent rules.
Without transparent rules the children of a tree node do not have any
recursion in them (by definition!) and so therefore *are* a regular
language. But transparent rules change that: there *can be* recursion
hidden on the same tree level, and it should have been clear from a
moment's reflection that the recursion there meant that tree levels
were once again a context free language.

Fortunately we have a recognizer for context free languages lying
around, so we can just use that I guess.
2024-09-09 06:23:25 -07:00
0cbf696303 The start rule cannot be transparent 2024-09-09 06:23:11 -07:00
49b76b9bcc Teach trees to format themselves. 2024-09-09 06:22:56 -07:00
a2c6390c23 Start to work on a prettier system.
Still very garbage but I think the "hard" part of building a Wadler
document from a parse tree might be there. It's a backtracking matcher
which might turn out to be too slow for alternatives but maybe will be
fine?

Still needs lots of tests.
2024-09-07 13:02:09 -07:00
00b4cd4702 Starting to look at pretty-printing with the idea of auto-indentation
I wonder if it will work?
2024-09-06 16:23:14 -07:00
d7dfd556ec Emit an emacs major mode
With coloring! Next up: formatting but that might be hard.
2024-09-06 11:51:09 -07:00
4941cd049c Helper routines for generating source code
This includes "signing" source to detect modifications, and
maintaining user-modified sections. Hooray!
2024-09-06 11:50:17 -07:00
23981f82ce Start working on emacs mode generation 2024-09-06 10:20:34 -07:00
501c2e3fbe Teach the highlight meta about emacs face names 2024-09-06 10:20:17 -07:00
676ddedbaf Add the language name to the end of generated scopes 2024-09-05 15:12:55 -07:00
dbf893e48b Generate queries a little better 2024-09-05 15:03:44 -07:00
51c4f14c26 Emit highlight queries for tree-sitter
Now we're starting to get somewhere!
2024-09-05 14:52:35 -07:00