As good as LALR but the implementation isn't embarassing. (Still
pretty bad though.)
Honestly the next thing to do is to delete LALR and just use Pager's
and also rebuild ConfigSet et al to be ItemSet so that Pager's alg
can go even faster. I think I want to keep LR1 just for completeness
so I might as well not delete SLR and LR0, although I *could* I
suppose.
Split the rules for pre- and post- trivia, understand when we want to
do either, handle multi-line-break (in an unsatisfying way, I guess)
but otherwise lay the groundwork for thinking about it better.
Also now we don't generate lazy "Text" nodes because I thought I might
want to actually look at the newlines in the source but I don't yet.
I *can* now, though. (I can also detect EOF so there's that.)
This means that forced breaks from comments don't screw up the
following single-line things. But this still isn't right; we need to
fine tune how we represent trivia.
I think I want to start thinking about "leftmost" and "rightmost" and
it's just easier and faster if cons has an actual list in it instead
of dotted pairs.
This required rebuilding the matcher compiler significantly, and it
was a lot a lot of work. But now we don't generate so many spurious
newlines and the document we produce is a lot a lot better.
It's weird that it counts against the line length though, like if you
were going to break you could ignore it right? At least, for the
grammar I'm working here....
Now we're cooking with gas ALTHOUGH now we have to deal with the fact
that we're gluing everything together where there *should* be spaces.
Many more improvements to come.
Remember that tree levels are generated by context free languages, not
regular languages, and so they can only be recognized by push-down
automatons, not finite state machines.
What happened was that I failed to account for transparent rules.
Without transparent rules the children of a tree node do not have any
recursion in them (by definition!) and so therefore *are* a regular
language. But transparent rules change that: there *can be* recursion
hidden on the same tree level, and it should have been clear from a
moment's reflection that the recursion there meant that tree levels
were once again a context free language.
Fortunately we have a recognizer for context free languages lying
around, so we can just use that I guess.
Still very garbage but I think the "hard" part of building a Wadler
document from a parse tree might be there. It's a backtracking matcher
which might turn out to be too slow for alternatives but maybe will be
fine?
Still needs lots of tests.
Instead of trying to build a regular expression string, just build a
structured thing with seq() and choice() and whatnot. This is
technically uglier but fixes a problem I found with comment regular
expressions so you know, it works, which is better than not working.
Also now tokens get named and maybe that's good? It's so hard to say.