Online Advertising
Spam in blogs
Sping
Spam in blogs (also called simply blog spam or comment
spam) is a form of
spamdexing. It is done by automatically posting random comments,
promoting commercial services, to
blogs, wikis, guestbooks, or other publicly-accessible online discussion
boards. Any web application that accepts and displays hyperlinks submitted by visitors may be a target.
Adding links that point to the spammer's web site increases the
page rankings
for the site in the search engine
Google. An increased page rank means the spammer's commercial site would be
listed ahead of other sites for certain Google searches, increasing the number of potential visitors and paying customers.
History
This type of spam originally appeared in internet
guestbooks,
where spammers repeatedly fill a guestbook with links to their own site and no
relevant comment to increase search engine rankings. If an actual comment is
given it is often just "cool page", "nice website", or keywords of the spammed
link.
In 2003, spammers began to take advantage of the open nature of comments in
the blogging software like Movable Type by repeatedly placing comments to various blog posts that provided
nothing more than a link to the spammer's commercial web site. Jay Allen created
a free plugin, called
MT-BlackList, for the Movable Type weblog tool (versions prior to 3.2) that
attempted to alleviate this problem. Many current blog software now have methods
of preventing or reducing the effect of blog spam.
Possible solutions
rel=nofollow
In early 2005 Google announced that hyperlinks marked with
rel="nofollow"
would not influence the link target's ranking in the search engine's index.
(rel=nofollow actually tells a search engine "Don't score this
link" rather than "Don't follow this link." This differs from the meaning of
nofollow as used within a
robots meta tag, which does tell a search engine: "Do not
follow any of the hyperlinks in the body of this document.")
Using rel=nofollow is a much easier solution that makes the
improvised techniques above irrelevant. Most weblog software now marks
reader-submitted links this way by default (with no option to disable it without
code modification). A more sophisticated server software could spare the
nofollow for links submitted by
trusted users like those registered for a long time or on a
whitelist
or with a high
karma. Some server software adds rel=nofollow to pages that
have been recently edited but omits it from stable pages, under the theory that
stable pages will have had offending links removed by human editors.
Some weblog authors object to the use of rel=nofollow, arguing,
for example[1],
that
- Link spammers will continue to spam everyone to reach the sites that do
not use
rel=nofollow
- Link spammers will continue to place links for clicking (by surfers),
even if those links are ignored by search engines.
- Google is advocating the use of rel=nofollow in order to reduce the
effect of heavy inter-blog linking on page ranking
In particular, in the Wikipedia after a discussion it was decided not to use
nofollow and to use a spam blacklist instead. In this way, Wikipedia contributes to the scores of
the pages it links to, and expects editors to link to relevant pages.
Turing tests
Various methods requiring humans to do spamming by hand have been attempted.
A variety of
captcha gateways have been implemented, in an effort to prevent bots from
submitting entries. Drawbacks to this are the annoyance it poses for regular
users, the lack of any alternative for visually impaired users, and the ability
of some advanced bots to fool simple captchas most of the time.
Server-side redirects
Instead of displaying a direct hyperlink submitted by a visitor, a web
application could display a link to a script on its own website that redirects
to the correct URL.
This will not prevent all spam since spammers do not always check for link
redirection but has proven very effective. Redirecting links prevent Google from
factoring the link in its
PageRank
algorithm for that site making the spam ineffective. An added benefit is that
the redirection script can count how many people visit external URLs, although
it will increase the load on the site.
This kind of redirection can also be done via the
.htaccess file in Apache, thus saving the load of a script.
Another way of preventing
PageRank
leakage without using client-side
JavaScript or .htaccess file is the public redirection service like a TinyURL or
My-Own.Net. For example,
<a href="http://my-own.net/alias_of_target" rel="nofollow" >Link</a>
where 'alias_of_target' is the alias of target address.
Client-side redirects
Another option is for the script to be client-side
JavaScript.
For example,
<a href="javascript:window.location.href='http://www.wiki.org'">Link</a>
would work as a link but not be picked up by Google. Moreover, the javascript
could be more complicated to ensure that the link would never be picked up since
it was
encoded. For example,
<a href="javascript:redirectFunction('hfksksgjlsll')">Link</a>
where 'hfksksgjlsll' is an encoded URL that is decoded by the javascript
function redirectFunction which presumably is stored in the HEAD tag of the page. A downside of this is that visitors who have disabled Javascript in
their browser would be unable to follow the links.
Distributed Approaches
This approach is very new to addressing link spam. One of the shortcomings of
link spam filters is that most sites only receive one link from each domain
which is running a spam campaign. If the spammer varies IP addresses, there is
little to no distiguishable pattern left on the vandalized site. The pattern,
however, is left across the thousands of sites that were hit quickly with the
same links.
A distributed approach, like the free
LinkSleeve, uses XML-RPC to
communicate between the various server applications (such as blogs, guestbooks,
forums, and wikis) and the filter server, in this case LinkSleeve. The posted
data is stripped of urls and each url is checked against recently submitted urls
across the web. If a threshold is exceeded, a "reject" response is returned,
thus deleting the comment, message, or posting. Otherwise, an "accept" message
is sent.
A more robust distributed approach is
Akismet,
which uses a similar approach to LinkSleeve but uses API keys to assign trust to
nodes and also has wider distribution as a result of being bundled with the
2.0 release of
WordPress.
They claim over 140,000 blogs contributing to their system. Akismet libraries
have been implemented for Java, Python, Ruby, and PHP, but its adoption may be
hindered by the requirement of an API key and its commercial use restrictions.
Application-specific anti-spam methods
Particularly popular software products such as
Movable Type and MediaWiki have developed their own custom anti-spam measures,
as spammers focus more attention on targeting those platforms. Whitelists and
blacklists that prevent certain IPs from posting, or that prevent people from
posting content that matches certain filters, are common defenses. More advanced
access control lists require various forms of validation before users can
contribute anything like linkspam.
The goal in every case is to allow good users to continue to add links to
their comments, as that is considered by some to be a valuable aspect of any
comments section.
RSS feed monitoring
Some wikis allow you to access an RSS feed of recent changes or comments. If
you add that to your news reader and set up a smart search for common spam terms
(usually viagra
and other drug names) you can quickly identify and remove the offending spam.
External links
-
Anti-spam Features of
MediaWiki
-
Six Apart Comment Spam Guide, fairly broad overview from
Movable Type's authors.
-
The (Evil) Genius of Comment Spammers, an article on link spam from
Wired magazine.
- Gilad Mishne, David Carmel and Ronny Lempel:
Blocking Blog Spam with Language Model Disagreement, PDF. From the First
International Workshop on Adversarial Information Retrieval (AIRWeb'05)
Chiba, Japan, 2005.
-
Spam Huntress The Norwegian Spam Huntress - Ann Elisabeth
-
LinkSleeve XML-RPC, free tool to integrate with blogs, forums, wikis,
and guestbooks to fight link spam.
-
Tim Longhurst article on Coca-Cola - explores Coca-Cola's link spam
campaign to promote Coke Zero. With links to affected bulletin boards and
communities.
-
Protect Web Form Free service of verification images. Anti spam project.
-
www.cerospam.com.ar Service of form protection/validation, free.
-
Spam Blocker Crawler Free public service for scanning guestbooks and
send abuse in the Google and spammer's hoster.
Home | Up | Cloaking | Doorway page | Scraper site | Spam blogs | Spam in blogs | Spam mass | Made For AdSense | Bookmark spam | Referer spam | TrustRank
|