Author of the post: bot detection and bot-related topics are often discussed on HN, particularly lately with the increase of AI scrapers trying to scrape protected websites to gather training data for LLMs.
Most of the time, people just recommend using CAPTCHA or implementing rate limiting. In the article, I try to cover the most popular bot detection approaches, such as CAPTCHA, IP-based rate limiting, geo-blocking, and static JS fingerprinting, and discuss their main limits.
Author of the post: bot detection and bot-related topics are often discussed on HN, particularly lately with the increase of AI scrapers trying to scrape protected websites to gather training data for LLMs.
Most of the time, people just recommend using CAPTCHA or implementing rate limiting. In the article, I try to cover the most popular bot detection approaches, such as CAPTCHA, IP-based rate limiting, geo-blocking, and static JS fingerprinting, and discuss their main limits.