• gaylord_fartmaster@lemmy.world
    link
    fedilink
    English
    arrow-up
    48
    ·
    3 months ago

    They’re already ignoring robots.txt, so I’m not sure why anyone would think they won’t just ignore this too. All they have to do is get a new IP and change their useragent.

  • scarabine@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    26
    arrow-down
    1
    ·
    3 months ago

    I have an idea. Why don’t I put a bunch of my website stuff in one place, say a pdf, and you screw heads just buy that? We’ll call it a “book”

  • magic_smoke@links.hackliberty.org
    link
    fedilink
    English
    arrow-up
    19
    ·
    3 months ago

    As someone who uses invidious daily I’ve always been of the belief if you don’t want something scraped, then maybe don’t upload it to a public web page/server.

    • Justas🇱🇹@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      3 months ago

      Imagine a company that sells a lot of products online. Now imagine a scraping bot coming at peak sales hours and looking at each product list and page separately for said service. Now realise that some genuine users will have a worse buying experience because of that.