Commit graph

132 commits

Author SHA1 Message Date
ea5fab4e4e Tree-sitter regexps are structured
Instead of trying to build a regular expression string, just build a
structured thing with seq() and choice() and whatnot. This is
technically uglier but fixes a problem I found with comment regular
expressions so you know, it works, which is better than not working.

Also now tokens get named and maybe that's good? It's so hard to say.
2024-09-05 11:51:29 -07:00
5e12af9f31 Extra marks, fields, whatnot 2024-09-05 06:32:28 -07:00
be8e017fd9 Fix regex generation, extras 2024-09-05 06:32:28 -07:00
94f5958087 Field propagation 2024-09-05 06:30:55 -07:00
591da0c971 Rework highlighting metadata GOD 2024-09-02 08:50:36 -07:00
e4a8ad7b76 Trailing thing 2024-09-01 11:30:59 -07:00
a99b3ecb70 Interpret precedence the way tree-sitter does, kinda
This allows most of our precedence to be re-used. There are some cases
still where tree-sitter gets confused (and we don't), see the
corresponding change to grammar.py. I wish I knew how to fix this but
I don't. :(
2024-09-01 07:38:46 -07:00
0354fbf4a4 More ways of writing
Sometimes prettier
2024-09-01 06:52:13 -07:00
3012df4ac6 Precedence but it doesn't work
Tree sitter doesn't let me do token-based precedence? I don't like
tree-sitter's "make it inline but give it a number" system- seems like
a bug farm to me.
2024-08-31 07:22:49 -07:00
98c4bb950f Fix bugs but still doesn't work for Fine 2024-08-30 09:14:01 -07:00
066d2d8439 A converter from grammars to tree-sitter grammars
Hmm, isn't this fine!
2024-08-30 09:04:32 -07:00
2d87207b54 Some small tweaks 2024-08-30 09:04:18 -07:00
80d932b36a Refactor to use non_terminals() 2024-08-29 08:23:55 -07:00
f8b62bf4a4 Terminal 'value' is 'name', compile_lexer is method 2024-08-29 08:22:23 -07:00
344dde51be Grammar start symbol is public 2024-08-29 08:12:08 -07:00
dc03bf7373 Grammars can be named 2024-08-29 08:00:40 -07:00
abcb0e516a OptionalRule is not required but MetatdataRule is 2024-08-28 08:33:32 -07:00
e07f2be3fa Something wrong with this, need to understand more 2024-08-28 08:29:09 -07:00
02c1aa507e Muck around with usability 2024-08-28 08:27:46 -07:00
cd62b65789 Remove the old hand-lexer
The machine lexer is working now
2024-08-27 16:49:03 -07:00
d62076f3c4 Fix a bug in terminal declaration
whoops. now it parses correctly
2024-08-27 16:47:58 -07:00
d03dc6e3d9 Harness uses grammar-generated token stream 2024-08-27 16:47:42 -07:00
0be0075cfe Generic token stream
Compatible with the harness
2024-08-27 16:47:26 -07:00
49ad7fdb52 Associate metadata with terminals
This is a half-assed attempt at doing syntax coloring which I think
will almost certainly turn out to be insufficient. I'm committing it
just to record some of the work I've done but. BUT.

Probably trying to match tree-sitter is a better way of doing
this. (But, like, emitting tree-sitter grammars? Really? Wow, dude.
Way to give up.)
2024-08-27 15:43:07 -07:00
76ef85483e Accept is single-valued, the multi-value thing didn't ever make sense
I mean, it did when we thought we were going to weave NFA states as we
were building them but we ended up not doing that and instead just
using the fancy EdgeList splitting magic when building DFAs from the
NFA. It has the same power and is simpler code, and also means that
we'll *never* be asked to have multiple Terminals be accepted from a
single NFA state.
2024-08-27 15:43:01 -07:00
208491d56e This was out of date 2024-08-26 08:05:01 -07:00
2473ae713d Trivia tests 2024-08-24 15:09:08 -07:00
7a5f17f74b Specify and honor trivia tokens
e.g. "this is how machine-generated parsers know to skip blanks and
comments"

The run time implementation could be better; we don't really want to
just discard trivia because it's useful for e.g. doc comments and the
like. BUT for now this is fine.
2024-08-24 10:01:40 -07:00
8e22c59aa8 This is a nicer number format (e.g. 1e3) 2024-08-24 10:01:05 -07:00
f29ec5072f Augment number pattern, tests
More robust testing. Error messages would be nice but.
2024-08-24 09:38:21 -07:00
0c952e4905 Correct NFA construction
There was a bug in the way that I was converting regular expressions
to NFAs. I'm still not entirely sure what was going on, but I
re-visited the construction and made it follow the literature more
closely and it fixed the problem.
2024-08-24 09:24:29 -07:00
30f7798719 Actual strings and floats
Using the new regex features
2024-08-24 08:36:28 -07:00
c0b623bd6d Remove unused imports 2024-08-24 08:36:20 -07:00
454e6fd6fd Regex API "improvements"
I mean, is it better than a regex parser? No, probably not.
2024-08-24 08:35:45 -07:00
6d6aabdeb3 Terminal name must be explicit on construction 2024-08-24 08:35:10 -07:00
72052645d6 Generated lexers actually kinda work
But regular expressions are underpowered and verbose
2024-08-23 15:32:35 -07:00
58c3004702 Move terminals into grammar definition
Starting to work on machine-generated lexers too
2024-08-23 07:24:30 -07:00
f6bc2ccea8 Move the examples into tests 2024-06-15 12:23:36 -07:00
e04aa1966e Start moving the examples into tests 2024-06-15 07:52:16 -07:00
d3b8d0e836 Configure pyright 2024-06-15 06:20:37 -07:00
fe7e67cce6 Ignore more stuff 2024-06-15 06:18:14 -07:00
c82f53c346 Document the decision 2024-06-15 06:18:04 -07:00
fb2dff51df Tests. Well, test. 2024-06-15 06:14:37 -07:00
3483d99a7d It's been some time coming
The actual library should not require a venv or a setup or anything
but it does make testing and whatnot easier.
2024-06-14 06:29:15 -07:00
7b3c94c469 Move things into more modules
It will help with testing and profiling
It breaks pyright (it's probably time to abandon pyright)
2024-06-10 05:50:09 -07:00
c3098aa435 Various bug fixes related to the end of file 2024-06-09 17:18:44 -07:00
b843dc84f4 Comment 2024-06-09 08:53:06 -07:00
a3786c62ba Proper positions for synthetic tokens 2024-06-09 07:24:19 -07:00
0c3e6b211c Lots of logging tweaks 2024-06-09 07:12:35 -07:00
38837f5c4d Don't crash with empty lists 2024-06-09 07:12:14 -07:00