Hacker News Clone

Stract: Open-souce, non-profit search engine

by FLpxpyJ on 2/4/2024, 8:36:28 PM with 108 comments

by denysvitali on 2/5/2024, 12:06:07 AM
Everyone here is complaining about the search results - but instead I think we should all take some time to appreciate that someone worked hard to create a search engine (including the scraper / crawler part) and making it open source (AGPL).
The results will be improved over time I guess, and for the few search queries I've done - I'm fairly happy with the results.
Kudos to the authors!
by lock-the-spock on 2/5/2024, 7:24:54 AM
Wonderful project, congratulations! I love the speed, clean design, many options, multilingual results, overall very impressive!!
Some quibbles/points to consider: * I can't find anything on the people/organisation behind, and can onl guess from the Terms that the team is based in DK. * Search results are broad and interesting, maybe a bit more weighting for the joint occurrence of terms would be great. * Developing a site weight over time might be interesting, maybe even with user votes. Currently minor and major sites appear all together and e.g. a search for "Donald" gives me an interesting ranking order that gives neither the most famous Donald's nor the most reliable sites firet (not problematic per se - my fault for entering an unclear search term) * There are some interesting result patterns, with often official sites quite low. For instance search for "EU" with some term like subsidy (in any of the languages I speak) gives me random project websites but nothing from any of the official EU websites, or "Microsoft 365" (sorry...) gives me no MS website. * Very minor but hopefully a very easy fix: at least on Firefox mobile there is no direct way to add the search to my search engines, I had to add it manually. For other engines I can long press.on the search field and then get the option.
Great work, keep it up! I will certainly start using this :-)
by vladstudio on 2/5/2024, 5:06:27 AM
To make Stract usable for me (slightly reduced vision), I had to apply the following custom CSS:
``` html, body, div, td, th, p, h1, h2, h3, h4, h5, b, i, strong, li, button { font-family: ui-sans-serif, sans-serif !important; webkit-font-smoothing: antialiased; font-weight: 400; text-rendering: geometricPrecision; } ```
by eviks on 2/5/2024, 5:14:47 AM
> immensely customizable -- We aim to give you the ability to customize everything about the search. You can block sites, boost sites, prioritize links from specific sites and much, much more.
Great! Can I use more than one optic? The drop-down list seems to allow only 1.
> Oh, and if we ever become evil (maybe by changing our motto) please take our code and start a competitor.
The most important part is the index data, what would be the deal with that?
by remram on 2/5/2024, 4:09:06 AM
Sources: https://github.com/StractOrg/stract
Backend in Rust (axum web framework, rocksdb), frontend with Svelte.
by john-radio on 2/5/2024, 3:52:13 AM
VERY cool product. I have a quibble. I searched for "cool pokemon to use" and the top result was "How to use Paypal on Amazon" from "online-tech-tips.com". Understandable that the search results are not perfect - the second result was a perfect match for what I searched for - but anyway, clicking the "dials" icon gets me the following options:
"""Do you like results from online-tech-tips.com? (thumbs down, thumbs up, or banned emoji options) <a href='make-an-llm-do-something-stupid.com">Summarize result</a> """
IMO this feedback widget and (maybe) its backing API could use work. It's not that I like or don't like results from online-tech-tips.com; it's that they're a bad result for the specific context of this search.
by skeptrune on 2/4/2024, 11:29:59 PM
The option to return from only sites popular on HN, blogroll, and the other "manage optics" settings are incredibly cool and useful, I could see myself using this just for that feature alone.
Exciting stuff.
by Brian_K_White on 2/5/2024, 12:09:27 PM
I just set up a YaCy jail on my truenas box at home. It's a distributed p2p system.
Haven't actually used it yet since I'm currently paying for kagi and it's good, and I only just set it up yesterday.
But this just struck me, I just said 2 things there and this post is yet another, between kagi, yacy, and now stract, not just 3 different names but 3 different types of solution to a problem, and all seemingly actually viable, that have popped up recently after decades of no one really feeling like they needed anything else.
I think something is changing.
by jqpabc123 on 2/4/2024, 10:20:03 PM
clearly labelled, contextual ads based on your current search query and a subscription option without ads
Perfect! This is the way the god of the internet intended search engines to work.
But DuckDuckGo does the same and currently provides superior results based on a very brief test.
So good luck with that.
by gregw134 on 2/4/2024, 9:39:23 PM
Wanted to say congrats on launching! I'm building a search engine myself, I can tell a lot of work went into this.
I think the biggest thing you overlooked are page titles. When you issue a query it's a bit hard to quickly scan and judge what a site is about because the page titles are missing.
by com on 2/4/2024, 9:06:07 PM
Fast, feels clean and uncluttered to use and the search results are fairly high quality. I like the “optic” idea.
After reading the about page, I’m not sure what the developers are trying to achieve? Perhaps a sort of alternative-Universe Google search funded by search-context AdWords?
by spaduf on 2/4/2024, 9:30:06 PM
Really like the explore feature. It lets you put in a url and shows you similar sites. Very promising project. Love to see people actually thinking about what search would be rather than rehashing decades old ideas.
by lpellis on 2/5/2024, 12:30:52 AM
Seems surprising ok for coding related queries ('celery rate limit'), I'm curious about their scraping setup, building that out must be quite a big task.
by a1o on 2/4/2024, 11:50:21 PM
Searching for "adventure game studio", neither the website that has the forums or the GitHub repository is in the first page. Most results on the first page of search are really old things. Neither Wikipedia or repology that has the package infos are anywhere in the results.
by bomewish on 2/5/2024, 11:57:25 AM
Thought of grabbing like a big chunk of the way back machine and having THAT in the index? There’s always so much good stuff that gets nuked, and being able to search across it properly would be potentially very interesting.
by logicprog on 2/5/2024, 6:36:44 PM
I really think the existence of this project is a Good Thing. Massive kudos to the people working on it. Previously I was always disappointed that our only options seemed to be open source meta search engines and closed source search engines, with most of the latter being corporate surveillance calitalist hellscapes, or anonymized portals to the same (Google and Bing, Duck Duck Go and Startpage), with only Kagi being an exception, although still closed source. It seemed like a really uniquely bad landscape, given that in most other areas of software there are at least some FLOSS alternatives to proprietary platforms that actually implement their core functionality, regardless of their relative quality. Stract finally changes that, which means several good things to me. First, you get actual, real accountability for them to stick to their stated privacy goals. Second, you get the ability for a wide variety of people to contribute to and influence the project, and/or learn from the project how to do this stuff themselves. And finally, most importantly, since it's a full reimplementation from indexing up, it's an opportunity to innovate on and experiment with the fundamentals, instead of just rearranging deck chairs on the titanic like e.g. Startpage. Thats really great :)
by Pufferbo on 2/4/2024, 10:33:34 PM
Tried searching for Dota (the video game), and the game’s website is buried by a bunch of SEO spam. It might not even have been crawled because it doesn’t appear on the first or second page.
by crotchfire on 2/4/2024, 10:10:59 PM
Where does their crawl come from?
by fulmicoton on 2/5/2024, 8:40:23 AM
That looks quite promising! Thank you for crediting tantivy in the github README, that's well appreciated! Ping me if I can help with anything.
by mcny on 2/5/2024, 1:04:50 AM
In swagger/open API, why is everything a post?
I tried the first endpoint get suggestions and tried searching for Gemini or Gemin hoping it would at least auto complete a word but the result set is empty.
https://stract.com/beta/api/docs/#/autosuggest/route
by gardnr on 2/5/2024, 8:55:18 AM
I just set it as my default search engine for a day. It's not quite there for my use cases. Can we help improve the search results?
by lastdong on 2/4/2024, 10:31:42 PM
Optics are a great idea, something we don’t see on other engines.
Fully open source -`ღ´-
Haven’t dig in to see what’s powering the search, I think DDG uses Bing
by RDaneel0livaw on 2/5/2024, 2:51:09 AM
So is this truly its own search engine / crawler / etc... and not using anyone else's searchs? I know ddg / kagi often use results from bing and other places, so just want to make sure.
also, how can I add this to my firefox search inside the address bar / search field?
by PixelForg on 2/4/2024, 11:46:41 PM
Search still needs some improvement, I typed "gundam watch order reddit" and was expecting some reddit links, but none of the results are reddit links. Perhaps there's another way to limit search results to a particular site here?
by icar on 2/5/2024, 8:11:20 AM
The only current search engine that I can use in my native language, Catalan, is Google. I can't wait for a project like this one to get good at that.
by vanous on 2/4/2024, 9:44:48 PM
Congrats!
I tried to search for a particular domain data but neither search nor the explore would have the domain listed. What's the process to get unlisted domains indexed?
by blinding-streak on 2/5/2024, 1:52:49 PM
Sad to say this for a promising idea, but the search results are objectively terrible. If it wants to succeed, it needs to nail the primary use case.
by kuratkull on 2/4/2024, 10:16:17 PM
It's failing (completely wrong results) my goto query for testing search engines: "best sub 10 usd Linux single board computer" Try it out
by aabbcc1241 on 2/9/2024, 10:21:03 AM
It has the same problem I experienced on Google search: When I search my projects it doesn't found it even when I use exact wording, adding github, npm, and username into search query doesn't help...
by highmastdon on 2/5/2024, 7:15:17 AM
Great stuff!
Just want to mention, when I search for “ExpressLRS use uart on older f4 fcs” it gives me about 15 results, but only the first two are unique. The other 13 are a literal copy of the first, both in content and in URL. Probably best to filter for uniqueness
by MythTrashcan on 2/6/2024, 6:02:10 AM
This is a very cool search engine. Still suprised why this was made though. I thought there was a lot of other search engines already around that werw open-source. Anyways, interesting in seeing what changes as time goes on.
by safety1st on 2/5/2024, 3:30:25 AM
Generates some pretty interesting results. No way to make it my default search engine?
by latentdeepspace on 2/5/2024, 8:46:28 AM
Can someone provide a bit of background how the crawling part works?
by pcblues on 2/5/2024, 8:22:55 AM
If you are interested in setting up your own non-profit org marketplace or know someone who does, I made an example one using free tools (https://donate.pcblues.com/) that costs me only about $10 USD per month to host the example because it is just a hosted linux VM and not Saas or software subscription based. I configured the VM myself and then "just" installed the software and configured it. It hasn't been down for ages. I only just remembered to check it. It does everything from merch and service websites to escrowed payment transactions, user reputation, etc.
by sydbarrett74 on 2/5/2024, 7:02:58 AM
Very impressive, and kudos to the developers and originators.
I just hope Stract doesn't go 'corporate' the way DDG did. :(
by gettodachoppa on 2/5/2024, 9:07:20 PM
Fantastic!
This is what I've been waiting for for 10 years, since Google removed the feature: a search engine which realizes 99% of the time, I want to search Discussions, and gives me the option to only show those. (reddit, forums, mastodon etc). This cuts down the SEO crap by 99.99%.
That said, the results aren't great, hopefully it's something that improves as they index more pages. For example reddit doesn't seem to be indexed, why not? It's a goldmine of user content (even if the frontpage is 99% astroturfed US neolib propaganda).
by tortoise_in on 2/5/2024, 3:03:10 AM
So I have put two inquires of my local country but they didn't shown up
by charcircuit on 2/4/2024, 9:57:52 PM
>how many bits are in a byte
I checked 11 pages and none of the results were relevant.
by abrowniejr on 2/5/2024, 1:23:28 AM
I searched for "calories in 450 gm of steak" and the top 3 results were:
1. Brexit as the start of the reversal of neoliberal globalization - softpanorama.org 2. Directory Search - Fulshear-Katy Area Chamber of Commerce - chamberorganizer.com 3. The 100 Best New Products of 2020 - gearpatrol.com
And none of the Page 1 results were related to my search query...
by vaicorinthians on 2/5/2024, 8:23:06 AM
I would give stract.org a shot tho
by Zuiii on 2/6/2024, 2:31:23 PM
Supports negative prompts! Will happily switch to this if I can figure out how to add it to firefox.
by stainablesteel on 2/5/2024, 2:50:45 AM
this is a neat thing, i like it, i'll add it to my list of search engines i use
by snvzz on 2/5/2024, 1:18:54 AM
The search bar should really be full width.
It can be very annoying to have your query not fit it while the window has plenty of room left.
by bbsz on 2/5/2024, 3:11:39 AM
I think a lot of people will now go and benchmark queries only to report back disappointed with results.
Trying to build generalized search engine for the modern internet that will come close to Google/Bing would require a "tech megaproject" level of investment and commitment. Most likely only to end up with the same optimizations and architecture as existing big-search and the very similar level of experience.
I think it's a better direction to build a search based on more limited amount of topic-based data and focus on great match engine within, then - just aggregate the relevant ones together. Far more maintainable also on the crawling part. I can use google/bing to find the Honda dealership or read keyboard reviews, or get 50 most useful unix commands.
I also wonder if with the rise of LLMs, while it still may not be feasible in such large scale production environment, those can serve as guides/agents to also improve the query itself and not the results of the query, for example - a chat-like search where user answers shift the relevancy metrics for returned documents. This would fit perfectly for smaller but open source, customizable and thematic search.
That being said. I think it's great that project as such pop up more often. (Phind.com was also on my radar this year)
by AlienRobot on 2/4/2024, 11:25:48 PM
I searched for "horror movies" and the first result was a lemmy community that has literally "616 subscribers" "30 Posts" and "76 Comments" which is about as dead as you would expect from a lemmy instance.
I also searched for "league of legends", and it couldn't find its homepage.
I think its ranking algorithm may need improvement.
Edit: also, I'd rather not say this, but do we really need another DuckDuckGo? I don't think Google fails at its job because of financial incentives. I think it might fail at this job simply put because the web of 2024 isn't the web of 1990. For example, the lemmy result, it's a link aggregation about horror movie articles. The search engine could literally do the job of the link aggregator, as it has a SERP that aggregates links, and yet it's aggregating links to link aggregators. Why are the search engines doing this? Because it's 2024. I wish someone tried a new approach at this problem rather than just copying Google's design and saying "it's Google but not yucky".