WIRED was able to download stories from publishers like The New York Times and The Atlantic using Poe’s Assistant bot. One expert calls it “prima facie copyright infringement,” which Quora disputes.
Usually NYT sets a cookie to track how many free articles you read and once you exceed that, you get the paywall. The bots probably don’t set/send the cookies, so NYT doesn’t block them. Also, I’d imagine the bots are coming from various different IPs so even server side blocking based on IP wouldn’t block everything and eventually the bot would get to the article. User Agents can also be spoofed.
Quora should be respecting robots.txt, but also why are the NYT etc. serving the full article to the Quora bot anyway?
Usually NYT sets a cookie to track how many free articles you read and once you exceed that, you get the paywall. The bots probably don’t set/send the cookies, so NYT doesn’t block them. Also, I’d imagine the bots are coming from various different IPs so even server side blocking based on IP wouldn’t block everything and eventually the bot would get to the article. User Agents can also be spoofed.