As someone who has dabbled quite a lot with deflate this was a very interesting find. It seems like the format that never wants to die.
First of all I am surprised this is even possible. Given the extremely minimal and non-aligned nature of the deflate headers, I actually discarded this as being feasible, with the false positive rate likely being too high.
The paper and implementation is an amazing piece of engineering. Hats off to the author. With an index, you are in "easy" mode, so I consider the unindexed "hard mode" the big accomplishment.
I am still digesting it. So if I read the paper correctly the false positive rate is approximately 1 for every 5GB. Very reasonable.
Repo: https://github.com/mxmlnkn/rapidgzip
As someone who has dabbled quite a lot with deflate this was a very interesting find. It seems like the format that never wants to die.
First of all I am surprised this is even possible. Given the extremely minimal and non-aligned nature of the deflate headers, I actually discarded this as being feasible, with the false positive rate likely being too high.
The paper and implementation is an amazing piece of engineering. Hats off to the author. With an index, you are in "easy" mode, so I consider the unindexed "hard mode" the big accomplishment.
I am still digesting it. So if I read the paper correctly the false positive rate is approximately 1 for every 5GB. Very reasonable.