The latest tips and news from the Blogger team
On Spam
October 17, 2005
Spam is a tricky problem. Or as Matt Haughey
says
"spam bloggers sure are resourceful little bastards."
For a while now, the Blogger team has been contending with spam on Blog*Spot through mechanisms like
Flag as Objectionable
and
comment
/
blog creation
CAPTCHAs. The spam classifier that
Pal described
has also dramatically reduced the amount of spam that folks experience when browsing NextBlog.
However, spam is still being created and, as was
widely noted
, Blogger was especially targeted this weekend.
One group of folks who are particularly affected by blog spam are those who use blog search services and those who subscribe to feeds of results from those services. When spam goes up, it directly affects the quality of those results. I'm exceedingly sympathetic with these folks because, well, we run
one of those services
ourselves.
So given that the problems is hard, what more are we doing? One thing we can do is improve the quality of the Recently Updated information we publish.
Recently Updated lists like the one Blogger publishes are used by search services to determine what to crawl and index. A big goal in deploying the filtered NextBlog and Flag as Objectionable was to improve our spam classifiers. As we improve these algorithms, we plan to pass the filtered information along automatically. Just as a first step, we're publishing a list of
deleted subdomains
that were created this weekend during the spamalanche.
Greg from Blogdigger (one of the folks who consumes blog data)
points out
that "ultimately the responsibility for providing a quality service rests on the shoulders of the individual services themselves, not Google and/or Blogger." However, we think by sharing what we've learned about spam on Blogger we can hopefully improve the situation for everyone.
We can also make it more difficult for suspected spammers to create content. This includes placing challenges in front of would-be spammers to deter automation.
Of course, false positives are an unavoidable risk with automatic classifiers. And it's important to remember that the majority of content being posted on Blog*Spot is not spam (we know this from the ongoing manual reviews used to train the spam classifier).
Some have suggested that we go a step farther and place CAPTCHA challenges in front of all users before posting. I don't believe this is an acceptable solution.
First off, CAPTCHAs represent a burden for all users (the majority of whom are legit), an impossible barrier for some, and are incompatible with API access to Blogger.
But, most importantly, wrong-doers are already breaking CAPTCHAs on a daily basis. And not through clever algorithmic means but via the old-fashioned human-powered way. We've actually been able to observe when human-powered CAPTCHA solvers come on-line by analyzing our logs. You can even use the timestamps to determine from whence this CAPTCHA-solving originates.
One thing we've learned from Blog Search, is that even if spam were completely solved on Blog*Spot, there would still be a problem. As others have concluded, we've realized that this is going to be an on-going challenge for Blogger, Google and all of us who are interested in making it easier for people to create and share content online.
Labels
+1
3
10th Birthday
13
2010
1
accessibility
1
ads
1
adsense
7
Amazon
1
Android
2
Blog2Print
1
Blogger
26
Blogger birthday
1
Blogger Fiesta
2
Blogger Meetup
2
Blogger Stats
2
Blogger Template Designer
1
Blogger2Print
1
blogspot
2
BlogThis
1
blogworld
2
Buzz
1
calendar
1
Chrome
2
code
1
commenting
2
community
8
conference
2
custom domain
1
developers
2
DMCA
1
draft
1
dynamic views
5
events
2
feedburner
2
feeds
1
firefox
1
follow by email
1
following
2
foxytunes
1
FTP
1
gadgets
10
GAN
3
Google Analytics
1
Google Buzz
1
Google Sites
1
google+
10
grandcentral
1
help
2
ios
1
jump
1
knol
1
lightbox
1
mobile
5
monetize
3
music
1
navbar
2
New UI
4
next blog
1
OneTrueFan
1
openid
1
OpenSky
1
Page Creator
1
pages
1
pixelodeon
1
polls
1
post summaries
1
read more
1
recommend
1
SEO
2
Share
3
support
1
SXSW
1
template designer
2
twitter
1
video
2
videoblogging
1
Viglink
1
web fonts
1
webcall
1
youtube
3
zemanta
1
Archive
2020
May
2019
Jan
2018
May
2017
Mar
2016
Nov
May
Apr
Mar
2015
Dec
Sep
Jun
May
Jan
2014
Feb
2013
Dec
Sep
Aug
Jun
Apr
2012
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2011
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2010
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2009
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2008
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2007
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2006
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feb
Jan
2005
Dec
Nov
Oct
Sep
Aug
Jul
Jun
May
Apr
Mar
Feed