"Such demonstrably false premises lead inevitably to poor public policy, with services that are often harmful, unfair, complex, costly to administer, counterproductive and bound to fail.
The original article contains 26 words, the summary contains 26 words. Saved 0%. I’m a bot and I’m open source!
Hey @rikudou@lemmings.world, the bot seems to be having some difficulties with correctly parsing articles from the ABC. It’s been doing it on a fair few posts (see below examples as well). As far as I can tell, it’s only occuring on articles from the ABC and I’m not entirely sure what’s causing it.
It looks like ABC must have changed the internal layout of their pages for whatever reason. It seems like the bot is just selecting the first block quote as the entire article.
On The Register for example it selects the div with the id #body. For ABC it seems that it looks for the class Article_Body which I can’t find on that article. I might have a closer look later if I’ve got some time and try to get a PR in if it doesn’t get fixed.
That’s the case, they removed one level of nesting from the html. Anyway, it doesn’t look for Article_Body class, but any class that starts with Article_Body. They’re using randomized class names with the prefix being constant, that’s why I have to do it that way. I’ve updated it to this horrible looking selector: div[class*="Article_body"] > div > p, div[class*="Article_body"] > div > ul:not([class*="ShareUtility"]) > li.
Thanks! I thought it might’ve been a wildcard thing but wasn’t sure. They really don’t want their articles summarised do they (or they’re probably trying to discourage AI scrapers)
This is the best summary I could come up with:
"Such demonstrably false premises lead inevitably to poor public policy, with services that are often harmful, unfair, complex, costly to administer, counterproductive and bound to fail.
The original article contains 26 words, the summary contains 26 words. Saved 0%. I’m a bot and I’m open source!
Hey @rikudou@lemmings.world, the bot seems to be having some difficulties with correctly parsing articles from the ABC. It’s been doing it on a fair few posts (see below examples as well). As far as I can tell, it’s only occuring on articles from the ABC and I’m not entirely sure what’s causing it.
Other examples:
https://lemmings.world/comment/8105800
https://lemmings.world/comment/8196693
Thanks for the report! It’s fixed now.
Thank you!
It looks like ABC must have changed the internal layout of their pages for whatever reason. It seems like the bot is just selecting the first block quote as the entire article.
On The Register for example it selects the div with the id
#body
. For ABC it seems that it looks for the classArticle_Body
which I can’t find on that article. I might have a closer look later if I’ve got some time and try to get a PR in if it doesn’t get fixed.That’s the case, they removed one level of nesting from the html. Anyway, it doesn’t look for
Article_Body
class, but any class that starts withArticle_Body
. They’re using randomized class names with the prefix being constant, that’s why I have to do it that way. I’ve updated it to this horrible looking selector:div[class*="Article_body"] > div > p, div[class*="Article_body"] > div > ul:not([class*="ShareUtility"]) > li
.Thanks! I thought it might’ve been a wildcard thing but wasn’t sure. They really don’t want their articles summarised do they (or they’re probably trying to discourage AI scrapers)