Sunday, August 23, 2009

Automated spam blog detection

Woke up this morning to find this on my blog dashboard:

"This blog has been locked due to possible Blogger Terms of Service violations. You may not publish new posts until your blog is reviewed and unlocked.
This blog will be deleted within 20 days unless you request a review."

And this in my email:
"Your blog at: http://marketdesigner.blogspot.com/ has been identified as a potential spam blog. To correct this, please request a review by filling out the form at http://www.blogger.com/unlock-blog.g?lockedBlogID=4748060798655400108

Your blog will be deleted in 20 days if it isn't reviewed, and your readers will see a warning page during this time. After we receive your request, we'll review your blog and unlock it within two business days. Once we have reviewed and determined your blog is not spam, the blog will be unlocked and the message in your Blogger dashboard will no longer be displayed. If this blog doesn't belong to you, you don't have to do anything, and any other blogs you may have won't be affected.

We find spam by using an automated classifier. Automatic spam detection is inherently fuzzy, and occasionally a blog like yours is flagged incorrectly. We sincerely apologize for this error. By using this kind of system, however, we can dedicate more storage, bandwidth, and engineering resources to bloggers like you instead of to spammers. For more information, please see Blogger Help: http://help.blogger.com/bin/answer.py?answer=42577

Thank you for your understanding and for your help with our spam-fighting efforts.

Sincerely,

The Blogger Team"

Let's see if I can publish this. (Update: it looks like I can still publish, but have to interpret a captcha to show I'm probably human...)

Further update: how could Google's automatic spam blog detector be improved? Well, Google offers a lot of tools for reading blogs. Some fraction of my regular readers apparently read Market Design on Google Reader, since it reports 858 subscribers when I checked just now. (You can check too, or subscribe, by going to Google Reader and typing "market design" after clicking on the + box next to "add a subscription." You aren't committed at that point, but you can see the feed, and the number...)

So, a thought for the humans who program the automatic spam detector: check if spam blogs have fewer subscribers than real blogs, and, if so, include that in the next version of the algorithm.

1 comment:

Pradeep said...

why would spam blogs not fake the feed readers too?