Remember that tree levels are generated by context free languages, not
regular languages, and so they can only be recognized by push-down
automatons, not finite state machines.
What happened was that I failed to account for transparent rules.
Without transparent rules the children of a tree node do not have any
recursion in them (by definition!) and so therefore *are* a regular
language. But transparent rules change that: there *can be* recursion
hidden on the same tree level, and it should have been clear from a
moment's reflection that the recursion there meant that tree levels
were once again a context free language.
Fortunately we have a recognizer for context free languages lying
around, so we can just use that I guess.
Still very garbage but I think the "hard" part of building a Wadler
document from a parse tree might be there. It's a backtracking matcher
which might turn out to be too slow for alternatives but maybe will be
fine?
Still needs lots of tests.
Instead of trying to build a regular expression string, just build a
structured thing with seq() and choice() and whatnot. This is
technically uglier but fixes a problem I found with comment regular
expressions so you know, it works, which is better than not working.
Also now tokens get named and maybe that's good? It's so hard to say.
This allows most of our precedence to be re-used. There are some cases
still where tree-sitter gets confused (and we don't), see the
corresponding change to grammar.py. I wish I knew how to fix this but
I don't. :(
Tree sitter doesn't let me do token-based precedence? I don't like
tree-sitter's "make it inline but give it a number" system- seems like
a bug farm to me.
This is a half-assed attempt at doing syntax coloring which I think
will almost certainly turn out to be insufficient. I'm committing it
just to record some of the work I've done but. BUT.
Probably trying to match tree-sitter is a better way of doing
this. (But, like, emitting tree-sitter grammars? Really? Wow, dude.
Way to give up.)
I mean, it did when we thought we were going to weave NFA states as we
were building them but we ended up not doing that and instead just
using the fancy EdgeList splitting magic when building DFAs from the
NFA. It has the same power and is simpler code, and also means that
we'll *never* be asked to have multiple Terminals be accepted from a
single NFA state.
e.g. "this is how machine-generated parsers know to skip blanks and
comments"
The run time implementation could be better; we don't really want to
just discard trivia because it's useful for e.g. doc comments and the
like. BUT for now this is fine.
There was a bug in the way that I was converting regular expressions
to NFAs. I'm still not entirely sure what was going on, but I
re-visited the construction and made it follow the literature more
closely and it fixed the problem.