• by rattray on 7/27/2021, 3:45:39 PM

    Something that's missing from this which I'm curious about is how far can't postgres search take you?

    That is, what tends to be the "killer feature" that makes teams groan and set up Elasticsearch because you just can't do it in Postgres and your business needs it?

    Having dealt with ES, I'd really like to avoid the operational burden if possible, but I wouldn't want to choose an intermediary solution without being able to say, "keep in mind we'll need to budget a 3-mo transition to ES once we need X, Y, or Z".

  • by tabbott on 7/27/2021, 6:30:49 PM

    Zulip's search is powered by this built-in Postgres full-text search feature, and it's been a fantastic experience. There's a few things I love about it:

    * One can cheaply compose full-text search with other search operators by just doing normal joins on database indexes, which means we can cheaply and performantly support tons of useful operators (https://zulip.com/help/search-for-messages).

    * We don't have to build a pipeline to synchronize data between the real database and the search database. Being a chat product, a lot of the things users search for are things that changed recently; so lag, races, and inconsistencies are important to avoid. With the Postgres full-text search, all one needs to do is commit database transactions as usual, and we know that all future searches will return correct results.

    * We don't have to operate, manage, and scale a separate service just to support search. And neither do the thousands of self-hosted Zulip installations.

    Responding to the "Scaling bottleneck" concerns in comments below, one can send search traffic (which is fundamentally read-only) to a replica, with much less complexity than a dedicated search service.

    Doing fancy scoring pipelines is a good reason to use a specialized search service over the Postgres feature.

    I should also mention that a weakness of Postgres full-text search is that it only supports doing stemming for one language. The excellent PGroonga extension (https://pgroonga.github.io/) supports search in all languages; it's a huge improvement especially for character-based languages like Japanese. We're planning to migrate Zulip to using it by default; right now it's available as an option.

    More details are available here: https://zulip.readthedocs.io/en/latest/subsystems/full-text-...

  • by lettergram on 7/27/2021, 3:55:39 PM

    I actually built a search engine back in 2018 using postgresql

    https://austingwalters.com/fast-full-text-search-in-postgres...

    Worked quite well and still use it daily. Basically doing weighted searches on vectors is slower than my approach, but definitely good enough.

    Currently, I can search around 50m HN & Reddit comments in 200ms on the postgresql running on my machine.

  • by jcuenod on 7/27/2021, 5:43:00 PM

    Huh, just yesterday I blogged[0] about using FTS in SQLite[1] to search my PDF database. SQLite's full-text search is really excellent. The thing that tripped me up for a while was `GROUP BY` with the `snippet`/`highlight` function but that's the point of the blog post.

    [0] https://jcuenod.github.io/bibletech/2021/07/26/full-text-sea...

    [1] https://www.sqlite.org/fts5.html

  • by bityard on 7/27/2021, 4:10:55 PM

    I know Postgres and SQLite have mostly different purposes but FWIW, SQLite also has a surprisingly capable full-text search extension built right in: https://www.sqlite.org/fts5.html

  • by theandrewbailey on 7/27/2021, 4:18:00 PM

    > You could also look into enabling extensions such as unaccent (remove diacritic signs from lexemes) or pg_trgm (for fuzzy search).

    Trigrams (pg_trgm) are practically needed for usable search when it comes to misspellings and compound words (e.g. a search for "down loads" won't return "downloads").

    I also recommend using websearch_to_tsquery instead of using the cryptic syntax of to_tsquery.

  • by simonw on 7/27/2021, 7:04:45 PM

    The Django ORM includes support for PostgreSQL search and I've found it a really productive way to add search to a project: https://docs.djangoproject.com/en/3.2/ref/contrib/postgres/s...

  • by SigmundA on 7/27/2021, 4:11:01 PM

    Keep wondering if RUM Indexes [1] will ever get merged for faster and better ranking (TF/IDF). Really would make PG a much more complete text search engine.

    https://github.com/postgrespro/rum

  • by Grimm1 on 7/27/2021, 3:58:57 PM

    Postgres Full-Text search is a great way to get search running for a lot of standard web applications. I recently used just this in Elixir to set up a simple search by keyword. My only complaint was Ecto (Elixir's query builder library) doesn't have first class support for it and neither does Postgrex the lower level connector they use. Still, using fragments with sanitized SQL wasn't too messy at all.

  • by thom on 7/27/2021, 6:54:37 PM

    We get really nice results with gist indexes (gist_trgm_ops) searching across multiple entity types to do top X queries. It’s very useful to be able to make a stab at a difficult-to-spell foreign football player’s name, possibly with lots of diacritics, and get quick results back. I’m always surprised when I find a search engine on any site that is so unkind as to make you spell things exactly.

  • by rattray on 7/27/2021, 3:43:13 PM

    TBH I hadn't known you could do weighted ranking with Postgres search before.

    Curious there's no mention of zombodb[0] though, which gives you the full power of elasticsearch from within postgres (with consistency no, less!). You have to be willing to tolerate slow writes, of course, so using postgres' built-in search functionality still makes sense for a lot of cases.

    [0] https://github.com/zombodb/zombodb

  • by syoc on 7/27/2021, 4:25:25 PM

    My worst search experiences always come from the features applauded here. Word stemming and removing stop words is a big hurdle when you know what you are looking for but get flooded by noise because some part of the search string was ignored. Another issue is having to type out a full word before you get a hit in dynamic search boxes (looking at you Confluence).

  • by MushyRoom on 7/27/2021, 4:02:59 PM

    I was hyped when I found out about it a while ago. Then I wasn't anymore.

    When you have 12 locales (kr/ru/cn/jp/..) it's not that fun anymore. Especially on a one man project :)

  • by kureikain on 7/28/2021, 8:23:24 AM

    I used Postgres full-text search for mail log feature on my email forward app https://hanami.run

    Essentially allow arbitraty query in from/to/subject/body. One thing that make full-text serch work great for me is that I don't need to sort or rank the relevant of query. I just show a list of email that match the query order by their id.

    I also don't do pagination and counting, instead users has to load more paged and the ID of the email is pass to the query as a point to compare( where id < requests.get.before).

    And with those strategy, full text search works great for us since we don't really want to bring in ElasticSearch because only about 20% of users use this features.

  • by pvsukale3 on 7/27/2021, 4:37:42 PM

    If you are using Rails with Postgres you can use pg_search gem to build the named scopes to take advantage of full text search.

    https://github.com/Casecommons/pg_search

  • by shakascchen on 7/27/2021, 10:23:31 PM

    No fun doing it for Chinese, especially for traditional Chinese.

    I had to install software but on Cloud SQL you can't. You have to do it on your instances.

  • by eric4smith on 7/27/2021, 11:18:52 PM

    Postgres FTS is normally quite good.

    But it does not know how to deal with languages like Chinese, Japanese and Thai.

    For that you have to use something like PGroonga extension.

    The rest of PostgreSQL mostly handles things ok, unless you try to sort on one of these languages and the same things happen again.

    There are all ways around these problems. But it’s not as easy as turning on Unicode and just expect everything to work!

    Yes I’m native English speaker who started to develop in Asia and discovered all of this recently.

  • by mrinterweb on 7/27/2021, 8:14:03 PM

    I've seen Elasticsearch set up for applications that would have equal benefit from just using the postgresql db's full-text search they already have access to.

    The additional complexity is usually incurred when the data in postgresql changes, and those changes need to be mirrored up to Elasticsearch. Elasticsearch obviously has its uses, but for some cases, postgresql's built in full-text search can make more sense.

  • by nuker on 7/28/2021, 5:15:03 AM

    Is there alternative to ES that scales nicely? I'm running ELK stack for logging using AWS Elasticsearch. Logs have unpredictable traffic volume and even overprovisioned ES cluster gets clogged sometimes. I wonder is there something more scalable than ES, and have nice GUI like Kibana?