With all the talk about privacy, people seem to be forgetting that censorship is also a major problem for today's internet.

renzev@lemmy.world · 9 months ago

With all the talk about privacy, people seem to be forgetting that censorship is also a major problem for today's internet.

Mojeek@lemmy.ml · edit-2 9 months ago

if you look at the repo they give thanks to:

“The commoncrawl organization for crawling the web and making the dataset readily available. Even though we have our own crawler now, commoncrawl has been a huge help in the early stages of development.”

There is nothing I can find which says how much of the index is CC and how much is their own; if there’s a decent amount of CC, this is originally for researchers etc. it’s not the best resource in the world for a search index: https://commoncrawl.org/

That being said, as an independent search engine, it’s always good to see people take on the massive task of actually building an index, not becoming a proxy.