Vendor things
This commit is contained in:
parent
5deceec006
commit
977e3c17e5
19434 changed files with 10682014 additions and 0 deletions
174
third-party/vendor/aho-corasick/README.md
vendored
Normal file
174
third-party/vendor/aho-corasick/README.md
vendored
Normal file
|
|
@ -0,0 +1,174 @@
|
|||
aho-corasick
|
||||
============
|
||||
A library for finding occurrences of many patterns at once with SIMD
|
||||
acceleration in some cases. This library provides multiple pattern
|
||||
search principally through an implementation of the
|
||||
[Aho-Corasick algorithm](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm),
|
||||
which builds a finite state machine for executing searches in linear time.
|
||||
Features include case insensitive matching, overlapping matches, fast searching
|
||||
via SIMD and optional full DFA construction and search & replace in streams.
|
||||
|
||||
[](https://github.com/BurntSushi/aho-corasick/actions)
|
||||
[](https://crates.io/crates/aho-corasick)
|
||||
|
||||
Dual-licensed under MIT or the [UNLICENSE](https://unlicense.org/).
|
||||
|
||||
|
||||
### Documentation
|
||||
|
||||
https://docs.rs/aho-corasick
|
||||
|
||||
|
||||
### Usage
|
||||
|
||||
Run `cargo add aho-corasick` to automatically add this crate as a dependency
|
||||
in your `Cargo.toml` file.
|
||||
|
||||
|
||||
### Example: basic searching
|
||||
|
||||
This example shows how to search for occurrences of multiple patterns
|
||||
simultaneously. Each match includes the pattern that matched along with the
|
||||
byte offsets of the match.
|
||||
|
||||
```rust
|
||||
use aho_corasick::{AhoCorasick, PatternID};
|
||||
|
||||
let patterns = &["apple", "maple", "Snapple"];
|
||||
let haystack = "Nobody likes maple in their apple flavored Snapple.";
|
||||
|
||||
let ac = AhoCorasick::new(patterns).unwrap();
|
||||
let mut matches = vec![];
|
||||
for mat in ac.find_iter(haystack) {
|
||||
matches.push((mat.pattern(), mat.start(), mat.end()));
|
||||
}
|
||||
assert_eq!(matches, vec![
|
||||
(PatternID::must(1), 13, 18),
|
||||
(PatternID::must(0), 28, 33),
|
||||
(PatternID::must(2), 43, 50),
|
||||
]);
|
||||
```
|
||||
|
||||
|
||||
### Example: ASCII case insensitivity
|
||||
|
||||
This is like the previous example, but matches `Snapple` case insensitively
|
||||
using `AhoCorasickBuilder`:
|
||||
|
||||
```rust
|
||||
use aho_corasick::{AhoCorasick, PatternID};
|
||||
|
||||
let patterns = &["apple", "maple", "snapple"];
|
||||
let haystack = "Nobody likes maple in their apple flavored Snapple.";
|
||||
|
||||
let ac = AhoCorasick::builder()
|
||||
.ascii_case_insensitive(true)
|
||||
.build(patterns)
|
||||
.unwrap();
|
||||
let mut matches = vec![];
|
||||
for mat in ac.find_iter(haystack) {
|
||||
matches.push((mat.pattern(), mat.start(), mat.end()));
|
||||
}
|
||||
assert_eq!(matches, vec![
|
||||
(PatternID::must(1), 13, 18),
|
||||
(PatternID::must(0), 28, 33),
|
||||
(PatternID::must(2), 43, 50),
|
||||
]);
|
||||
```
|
||||
|
||||
|
||||
### Example: replacing matches in a stream
|
||||
|
||||
This example shows how to execute a search and replace on a stream without
|
||||
loading the entire stream into memory first.
|
||||
|
||||
```rust,ignore
|
||||
use aho_corasick::AhoCorasick;
|
||||
|
||||
let patterns = &["fox", "brown", "quick"];
|
||||
let replace_with = &["sloth", "grey", "slow"];
|
||||
|
||||
// In a real example, these might be `std::fs::File`s instead. All you need to
|
||||
// do is supply a pair of `std::io::Read` and `std::io::Write` implementations.
|
||||
let rdr = "The quick brown fox.";
|
||||
let mut wtr = vec![];
|
||||
|
||||
let ac = AhoCorasick::new(patterns).unwrap();
|
||||
ac.stream_replace_all(rdr.as_bytes(), &mut wtr, replace_with)
|
||||
.expect("stream_replace_all failed");
|
||||
assert_eq!(b"The slow grey sloth.".to_vec(), wtr);
|
||||
```
|
||||
|
||||
|
||||
### Example: finding the leftmost first match
|
||||
|
||||
In the textbook description of Aho-Corasick, its formulation is typically
|
||||
structured such that it reports all possible matches, even when they overlap
|
||||
with another. In many cases, overlapping matches may not be desired, such as
|
||||
the case of finding all successive non-overlapping matches like you might with
|
||||
a standard regular expression.
|
||||
|
||||
Unfortunately the "obvious" way to modify the Aho-Corasick algorithm to do
|
||||
this doesn't always work in the expected way, since it will report matches as
|
||||
soon as they are seen. For example, consider matching the regex `Samwise|Sam`
|
||||
against the text `Samwise`. Most regex engines (that are Perl-like, or
|
||||
non-POSIX) will report `Samwise` as a match, but the standard Aho-Corasick
|
||||
algorithm modified for reporting non-overlapping matches will report `Sam`.
|
||||
|
||||
A novel contribution of this library is the ability to change the match
|
||||
semantics of Aho-Corasick (without additional search time overhead) such that
|
||||
`Samwise` is reported instead. For example, here's the standard approach:
|
||||
|
||||
```rust
|
||||
use aho_corasick::AhoCorasick;
|
||||
|
||||
let patterns = &["Samwise", "Sam"];
|
||||
let haystack = "Samwise";
|
||||
|
||||
let ac = AhoCorasick::new(patterns).unwrap();
|
||||
let mat = ac.find(haystack).expect("should have a match");
|
||||
assert_eq!("Sam", &haystack[mat.start()..mat.end()]);
|
||||
```
|
||||
|
||||
And now here's the leftmost-first version, which matches how a Perl-like
|
||||
regex will work:
|
||||
|
||||
```rust
|
||||
use aho_corasick::{AhoCorasick, MatchKind};
|
||||
|
||||
let patterns = &["Samwise", "Sam"];
|
||||
let haystack = "Samwise";
|
||||
|
||||
let ac = AhoCorasick::builder()
|
||||
.match_kind(MatchKind::LeftmostFirst)
|
||||
.build(patterns)
|
||||
.unwrap();
|
||||
let mat = ac.find(haystack).expect("should have a match");
|
||||
assert_eq!("Samwise", &haystack[mat.start()..mat.end()]);
|
||||
```
|
||||
|
||||
In addition to leftmost-first semantics, this library also supports
|
||||
leftmost-longest semantics, which match the POSIX behavior of a regular
|
||||
expression alternation. See `MatchKind` in the docs for more details.
|
||||
|
||||
|
||||
### Minimum Rust version policy
|
||||
|
||||
This crate's minimum supported `rustc` version is `1.60.0`.
|
||||
|
||||
The current policy is that the minimum Rust version required to use this crate
|
||||
can be increased in minor version updates. For example, if `crate 1.0` requires
|
||||
Rust 1.20.0, then `crate 1.0.z` for all values of `z` will also require Rust
|
||||
1.20.0 or newer. However, `crate 1.y` for `y > 0` may require a newer minimum
|
||||
version of Rust.
|
||||
|
||||
In general, this crate will be conservative with respect to the minimum
|
||||
supported version of Rust.
|
||||
|
||||
|
||||
### FFI bindings
|
||||
|
||||
* [G-Research/ahocorasick_rs](https://github.com/G-Research/ahocorasick_rs/)
|
||||
is a Python wrapper for this library.
|
||||
* [tmikus/ahocorasick_rs](https://github.com/tmikus/ahocorasick_rs) is a Go
|
||||
wrapper for this library.
|
||||
Loading…
Add table
Add a link
Reference in a new issue