Hello and thanks for the reply, Mr. yegg13!
I'm here to help but It would require too much effort to list all the malware sites by hand, however what about filtering all those sites that just send your query to their own search engine, to start with? They don't offer any concrete results and their queries are usually fake.I wonder if it's possible for you to write a regex that filters out sites using
"this+kind+of+url" or "this worse one" in a malicious way? A well-written Genetic Algorithm could also be a solution to improve automation when DDG will get bigger, I believe.
I see you use the SimilarSites API...you could use it on your blacklist to find out sites similar to those you already blocked. I'm testing it right now on a couple of bad sites, for example (I'm replacing Os with *s because I don't want to give them free SEO)
mediafireb*t(dot)com
Is for example a trash site and SimilarSites found a lot of matching sites, probably run by the same shady guys.
The problem could be bigger than I thought, as I'm noticing a lot of BlogSpot sites are basically mirrors for bigger trash sites.
Another trash site that seems to pop up a lot in the results are
m*x*lbums(dot)com (fake queries)
g*gab*twarez(dot)com (trash)
studiaw*b(dot)com (fake queries)
Basically 90% of the sites having "warez" in their URL
Also, more importantly, I've seen very, very suspicious-looking pornographic URLs and titles in the results, even with the SafeSearch turned On (which at least partially reduces the trash links) . Like, VERY suspicious-looking, if you know what I mean. I used to think nobody would actually click on those links but you should never underestimate the stupidity (or ignorance/inexperience) of people. Those links should definitely not appear in the results, whether they are fake (and they probably are) or legitimate or whether the filter is on or off. Right now I am not seeing one but I would definitely report it if I had the chance.
Finally, I understand what you guys are trying to do with the UI and it's awesome, however DDG definitely could use some help from its users and at least in this phase, it needs a report function. Possible solutions without cluttering the UI:
- Report page on its own
- Greasemonkey script (only those who really wanted to contribute would install it but it's good enough)
- Semi-visible Warning sign next to "Similar Sites". Could probably be done changing the alpha value of the image, so you won't need to have more than one.
And these are just generic ideas, I bet you guys can be more creative.
Be sure to specify that the reports are just for spam/trash/extremely illegal sites, otherwise you'll find yourself flooded with reports made by RIAA lawyers to censor actually good sites like TPB, and that would be wrong and time-wasting.
That's all for now. Thanks for listening and I hope I didn't sound too demanding, I'm just trying to be a good user :)