Hacker News Clone

Finding Dead Websites

by ingve on 6/17/2025, 12:03:01 PM with 22 comments

by zinekeller on 6/20/2025, 1:51:23 AM
The TLS verifier workaround that you've constructed is reasonably sound (based on how TLS validation works in browsers), but the boring answer to workaround certificate problems is to do what Firefox actually does: skip AIA and only rely on known intermediates. You can import Mozilla's intermediate list as roots (https://wiki.mozilla.org/CA/Intermediate_Certificates, requires additional processing to convert to usable certs) to emulate this.
Chrome on the other hand... just read this article (https://blog.benjojo.co.uk/post/browsers-biggest-tls-mistake), it's very hard to emulate and is really, really bonkers.
by atribecalledqst on 6/19/2025, 3:20:38 PM
Before I RTFA, I was wondering if this would be about trying to find a way to include Wayback Machine results in search. Searching the Wayback Machine is always such a nightmare, and wouldn't it be nice if your search turned up that long-dead 1997 web page that has the exact answer for what you're looking for...
(minor use case I had recently was I was trying to find old Japanese blogs for Tamagotchis, which I gather there were a ton of in the 90s but almost none survive today - imagine if I could get those instead of the 1,000,000 sites just trying to sell them to me)
by JdeBP on 6/19/2025, 5:50:00 PM
As someone with a WWW site hit by Brexit where half the country voted to stop me having my domain name (and some other things) I read this with interest to consider how badly it would be caught out on the sort of false positive where a WWW site owner has to change ASes, change HTTP servers, set up redirects and meta information for the time left before eu. becomes unavailable, and even change DNS servers let alone a number of resource records. A lot of those seem to be things that will add up in this model. As would the fact that my prior domain name is today parked. In Canada!
Not the first sudden and unwelcome discontinuity, either.
Google came close to thinking that I was dead, and turned out when I recently checked to be still looking for me under eu., years after the fact.
And with a broader view, this sort of stuff happens to the world, and there are enough people in the same boat that it is worth thinking of false positives when major upheavals occur. They can range from ISPs just up and deciding to close up shop with zero notice (which also happened to me) to international geopolitical upheavals. Who knows! If Brexit happened, it is conceivable that one day, the island of Niue might eventually prevail and then decide overnight that non-Niue citizens may not own a nu. domain. (-:
I wonder how many times Marginalia would have declared me dead, by now. (-:
by 55555 on 6/19/2025, 2:34:55 PM
It's a real edge case, but someone could conceivably let their own domain expire and then register it anew and restore their website. It will be impossible to tell this apart from an SEO buying and restoring a website to use for link juice.
by tart-lemonade on 6/20/2025, 2:49:14 PM
> While the change detection currently only runs on a subset of about 2 million domains, the crawler is aware of approximately 36,000,000 domains in total, and about 1,500,000 of those are subdomains of tumblr.com.
So Tumblr makes up ~4% of all internet domains (or at least, all linked publicly-accessible domains)? Makes sense, but still wild to think about. I'd love to know what the top 10 domains (by sub-domain count) are.
by renegat0x0 on 6/19/2025, 5:44:32 PM
Whoa, this is what I have been wondering for some time, for my crawler.
Crawler results depend on domain authority. If page owner, or page contents page change the ranking may, or should change.
However original author also could change contents, and page ranking should not be changed. So this is not easy to determine what to do with domain of it becomes inactive, or changes contents dramatically.
Currently I use only 30 day window to keep track of domains. After that period inactive domain is thrown out of the window.
However valuable domains, even if dead, reside longer. My UI provides easy link to wayback machine. So even for dead links I can browse them.
I noticed also that some domains, even if expired do serve contents, even if author left it alone. Page contents is served, but with a text that it expired.
by mlhpdx on 6/19/2025, 4:40:58 PM
I’m not sure what the authors point was with respect to ASN 16509. Are they saying parked domains don’t like being viewed by Amazon IPs or that moving to Amazon is a strong signal for being parked? The latter seems absurd. But is it?
by l5870uoo9y on 6/19/2025, 4:32:10 PM
What a pleasant website theme for reading.
by koprocezar on 6/19/2025, 5:37:26 PM
That was interesting.