Finally, some Google news: Principal engineer Matt Cutts wrote a post on the Google Blog explaining steps the search-engine giant will take to try to reduce spam in its search results:
January brought a spate of stories about Google’s search quality. Reading through some of these recent articles, you might ask whether our search quality has gotten worse. The short answer is that according to the evaluation metrics that we’ve refined over more than a decade, Google’s search quality is better than it has ever been in terms of relevance, freshness, and comprehensiveness. Today, English-language spam in Google’s results is less than one-half what it was five years ago, and spam in most other languages is even lower than in English. However, we have seen a slight uptick of spam in recent months, and while we’ve already made progress, we have new efforts underway to continue to improve our search quality.
Just as a reminder, webspam is junk you see in search results when Web sites try to cheat their way into higher positions in search results or otherwise violate search-engine quality guidelines. A decade ago, the spam situation was so bad that search engines would regularly return off-topic webspam for many different searches. For the most part, Google has successfully beaten back that type of “pure webspam” — even while some spammers resort to sneakier or even illegal tactics such as hacking Web sites.
As we’ve increased both our size and freshness in recent months, we’ve naturally indexed a lot of good content, and some spam, as well. To respond to that challenge, we recently launched a redesigned document-level classifier that makes it harder for spammy on-page content to rank highly. The new classifier is better at detecting spam on individual Web pages, e.g., repeated spammy words — the sort of phrases you tend to see in junky, automated, self-promoting blog comments. We’ve also radically improved our ability to detect hacked sites, which were a major source of spam in 2010. And we’re evaluating multiple changes that should help drive spam levels even lower, including one change that primarily affects sites that copy others’ content and sites with low levels of original content. We’ll continue to explore ways to reduce spam, including new ways for users to give more explicit feedback about spammy and low-quality sites.
As “pure webspam” has decreased over time, attention has shifted instead to “content farms,” which are sites with shallow or low-quality content. In 2010, we launched two major algorithmic changes focused on low-quality sites. Nonetheless, we hear the feedback from the Web loud and clear: People are asking for even stronger action on content farms and sites that consist primarily of spammy or low-quality content. We take pride in Google search and strive to make each and every search perfect. The fact is that we’re not perfect, and combined with users’ skyrocketing expectations of Google, these imperfections get magnified in perception. However, we can and should do better.