• by sakras on 10/2/2024, 7:01:45 PM

    What really made HNSW click for me was when someone described it as a "skiplist graph". It really does seem to share a lot of the same properties and intuition as a simple skiplist, where descending levels corresponds to finer-grained navigation through the space.

    A couple of other thoughts with this perspective: - if a drawing of a skiplist is 2D, HNSW seems to be a 3D generalization. It makes me wonder if you could apply an HNSW-like scheme for a traditional relational database index (maybe a multicolumn index?). - if a skiplist is a probabilistic alternative to a B+-tree, what's HNSW a probabilistic alternative to?

    One thing that still mystifies me is how in the context of vector indexing, the HNSW seems to "bootstrap" itself by using itself to find the neighbors of a newly-inserted point. It's not clear to me why that doesn't diverge to some garbage results as the index grows.

  • by janalsncm on 10/2/2024, 8:44:47 PM

    HNSW was the first approximate nearest neighbor search algorithm I encountered. There are others that are useful in different contexts (i.e. depending on your data size and available memory). And you can combine them sometimes. This video is a really awesome overview of them:

    https://youtu.be/SKrHs03i08Q

  • by WhitneyLand on 10/3/2024, 1:58:51 PM

    For anyone new to this, the topic has gained interest in recent years alongside the rise of AI. Many ML models project text, images, etc, into an embedding layer in high dimensional space as a vector (an array of numbers). It then becomes possible to numerically compare these vectors to see how closely related they are, in others words to find the semantic similarity.

    This nifty feature leads to the need to scale up. For example, given many vectors in a database which one is most similar to my word or image. Applications like this need ways to efficiently scale these kinds of searches and HNSW is one prominent technique to enable this.

  • by 3abiton on 10/2/2024, 6:17:20 PM

    Interesting read, I worked on graph theory for a while some long time ago and did not even know some of these terminologies.