Instead of trying to build a regular expression string, just build a
structured thing with seq() and choice() and whatnot. This is
technically uglier but fixes a problem I found with comment regular
expressions so you know, it works, which is better than not working.
Also now tokens get named and maybe that's good? It's so hard to say.
This allows most of our precedence to be re-used. There are some cases
still where tree-sitter gets confused (and we don't), see the
corresponding change to grammar.py. I wish I knew how to fix this but
I don't. :(
Tree sitter doesn't let me do token-based precedence? I don't like
tree-sitter's "make it inline but give it a number" system- seems like
a bug farm to me.
This is a half-assed attempt at doing syntax coloring which I think
will almost certainly turn out to be insufficient. I'm committing it
just to record some of the work I've done but. BUT.
Probably trying to match tree-sitter is a better way of doing
this. (But, like, emitting tree-sitter grammars? Really? Wow, dude.
Way to give up.)
I mean, it did when we thought we were going to weave NFA states as we
were building them but we ended up not doing that and instead just
using the fancy EdgeList splitting magic when building DFAs from the
NFA. It has the same power and is simpler code, and also means that
we'll *never* be asked to have multiple Terminals be accepted from a
single NFA state.
e.g. "this is how machine-generated parsers know to skip blanks and
comments"
The run time implementation could be better; we don't really want to
just discard trivia because it's useful for e.g. doc comments and the
like. BUT for now this is fine.
There was a bug in the way that I was converting regular expressions
to NFAs. I'm still not entirely sure what was going on, but I
re-visited the construction and made it follow the literature more
closely and it fixed the problem.