Wats0ns@programming.devtoProgramming@programming.dev•Yearly reminder that github still does not have an IPv6 address (2023)English
5·
1 year agoBut why ?
But why ?
“Each team is full-stack and full-lifecycle: responsible for front-end, back-end, database, business analysis, feature prioritization, UX, testing, deployment, monitoring”
“But they also shouldn’t be too large, ideally each one is a Two Pizza Team”
Either that’s a team with some hugely diversified skills, or that’s two car-sized pizzas
Yep try scrapy. And also it handles for you the concurrency of your pipelines items, configuration for every part,…
The huge feature of scrapy is it’s pipelining system: you scrape a page, pass it to the filtering part, then to the deduplication part, then to the DB and so on
Hugely useful when you’re scraping and extraction data, I reckon if you’re only extracting raw pages then it’s less useful I guess
Same, that’s why I don’t understand how this is supposed to stay a two-pizza team system