For the last week I've been working on an exact-pattern-matching solution, and my code hit the limit to what's possible on my machine. On my win10/7300HQ/2667MHz single channel DDR4, running mostly isolated on a single core, it crunches through 512megs (random [a-z,A-Z]) in 58 milliseconds (faster with actually random data) looking for a Needle the size of 8 bytes, but it's not fixed to that length.
I wanted to compare it to ripgrep, but there's no way of getting the numbers for its comparison algorithm only, so it'd be unfair.
Short of scavenging github for (apparently mostly) C code (which I don't speak, and I really want to avoid installing Visual Studio) to use for comparing, I don't know what to do. Worse, looking through the code I've found, despite not speaking it, made it look pretty obvious that all these solutions out there aren't going to beat mine, because they contain far more instructions, memory accesses and comparisons/branches than my own.
I'm not trying to brag. I'm rather desperate and hoping that someone can help me solving my problem.
Is there a benchmark suite of some sorts (a binary?) I can download and feed with my own haystack/needle? Is there some "known to be fastest" pattern matching program I'm unaware of, which I could use for comparing?
And what is the gold standard for exact pattern matching anyway? I know of Rabin-Karp, Boyer-Moore, but I'm not convinced they actually qualify for full range pattern matching, instead of looking for text strings specifically.
PS: To give you some context about me: When it comes to programming, my friend says: "[he] isn't happy until the speed of his code approaches the theoretical limits of light."
I've hit the limits on my machine short of running it on multiple cores, but there's no point in doing so when I can't compare it to anything.
Thank you for any possible help/advice you can give me.
(My apologies for the formatting. I can't seem to get this looking right)
The gold standard for string pattern matching is Hyperscan. This has also benchmarks to support their claims.
For tree matching I have no idea.
https://arxiv.org/pdf/1012.2547v1.pdf
It contains lots of benchmark results, but I don't understand how they can be useful for me, as I can't reproduce their way of benchmarking or how big the haystack really is.
In fact, I don't really understand any of this.
What's making matters worse, HyperScan has no binary for windows. I'd have to compile it myself, using cygwin. I can't find a precompiled binary. It irritates me that there is no easily available windows version for what's supposed to be the fastest pattern matching solution out there.
But worse, it seems to be creating pattern matching files for faster matching, making it relatively useless for things like grep. On the plus side, it hinted me at an additional way of improving my solution's performance.
I never considered that people might wish to save files which allow for faster pattern matching.