• by gregjor on 9/6/2024, 8:52:24 AM

    Most of the social media companies are scraping everything to train their LLMs. I think we’ll see some court decisions soon regarding legality.

    Some of the social platforms have APIs you can pay to access. Some have aggressive anti-scraping countermeasures.

  • by fragmede on 9/6/2024, 4:45:51 PM

    HiQ vs Linked in determined this. If the content is available without a login, it's fair game. If there's a login required, then it's not. That's why Twitter now requires a login to view extended content.

  • by ActorNightly on 9/6/2024, 5:36:51 PM

    Legal, yes, as long as you are not accessing stuff you are not supposed to.

    Possible, very much so, just depends on the platform and the rate of access that they allow. Some platforms will basically rate limit hard if they detect a lot of traffic from a single IP.

    With paid API access, you may have a higher rate available, and an easier time getting the data (usually without you have to parse HTML)

  • by leros on 9/7/2024, 1:28:51 PM

    Generally speaking, if you're not logged in and nobody has told you to stop, you should be ok.

    There is a service called SerpAPI that provides an API around stuff you might scrape. Haven't tried it myself but heard its good.

  • by vieques on 9/6/2024, 8:02:01 AM

    This question is way too broad. What is your purpose? What specifically are you scraping, (ie images, text, audio, video)? Please expand

  • by brudgers on 9/6/2024, 9:55:43 PM

    Approaching the platforms adversarially makes you an adversary. This might not be a solid foundation for a stable business.

    Your lawyer is the best opinion regarding legality.

    Good luck.

  • by fasa99 on 9/8/2024, 1:26:25 PM

    they's already sscraped bro

    they's on the archive dot org

    that site has everything but their search is shite

    so to find scraped things that they scraped, you need to scrape their site and build a non-broken search engine for yourself

    but you'll find your post-scraped social media sites

    and many other interdasting things