Online Advertising
Spam mass
Spam mass is defined as "the measure of the impact of link
spamming on a page's ranking." The concept was developed by Zolt´an
Gy¨ongyi and Hector Garcia-Molina of Stanford University in association
with Pavel Berkhin and Jan Pedersen of Yahoo!.
This paper expands upon their proposed
TrustRank methodology.
The researchers developed a good core and a bad core of
selected Web documents from which they measured spam mass across a collection of
documents. Two types of measurements, absolute mass and relative mass,
are used to compare groups of documents. The higher the mass measurements, the
more likely the documents are to be equivalent to spam.
Thresholds
A threshold value is used to identify groups of documents as spam. If their
relative mass value exceeds the threshold, the documents are considered to be
spam. A second threshold for the
PageRank
values of the selected documents is applied. Only high PageRank documents are
labelled as spam.
The purpose of the methodology is to identify spam documents with
artificially inflated PageRank values.
External links
Home | Up | Cloaking | Doorway page | Scraper site | Spam blogs | Spam in blogs | Spam mass | Made For AdSense | Bookmark spam | Referer spam | TrustRank
|