Spamdexing


Spamdexing is the deliberate manipulation of search engine indexes. It involves a number of methods, such as link building and repeating related or unrelated phrases, to manipulate the relevance or prominence of resources indexed in a manner inconsistent with the purpose of the indexing system.
Spamdexing could be considered to be a part of search engine optimization, although there are many SEO methods that improve the quality and appearance of the content of web sites and serve content useful to many users.

Overview

Search engines use a variety of algorithms to determine relevancy ranking. Some of these include determining whether the search term appears in the body text or URL of a web page. Many search engines check for instances of spamdexing and will remove suspect pages from their indexes. Also, search-engine operators can quickly block the results listing from entire websites that use Spamdexing, perhaps in response to user complaints of false matches. The rise of Spamdexing in the mid-1990s made the leading search engines of the time less useful. Using unethical methods to make websites rank higher in search engine results than they otherwise would is commonly referred to in the SEO industry as "black-hat" SEO. These methods are more focused on breaking the search-engine-promotion rules and guidelines. In addition to this, the perpetrators run the risk of their websites being severely penalized by the Google Panda and Google Penguin search-results ranking algorithms.
Common spamdexing techniques can be classified into two broad classes: content spam and link spam.

History

The earliest known reference to the term spamdexing is by Eric Convey in his article "Porn sneaks way back on Web", The Boston Herald, May 22, 1996, where he said :
The problem arises when site operators load their Web pages with hundreds of extraneous terms so search engines will list them among legitimate addresses.
The process is called "Spamdexing," a combination of spamming—the Internet term for sending users unsolicited information—and "indexing."

Keyword stuffing had been used in the past to obtain top search engine rankings and visibility for particular phrases. This method is outdated and adds no value to rankings today. In particular, Google no longer gives good rankings to pages employing this technique.
Hiding text from the visitor is done in many different ways. Text colored to blend with the background, CSS z-index positioning to place text underneath an image — and therefore out of view of the visitor — and CSS absolute positioning to have the text positioned far from the page center are all common techniques. By 2005, many invisible text techniques were easily detected by major search engines.
"Noscript" tags are another way to place hidden content within a page. While they are a valid optimization method for displaying an alternative representation of scripted content, they may be abused, since search engines may index content that is invisible to most visitors.
In the past, keyword stuffing was considered to be either a white hat or a black hat tactic, depending on the context of the technique, and the opinion of the person judging it. While a great deal of keyword stuffing was employed to aid in Spamdexing, which is of little benefit to the user, keyword stuffing in certain circumstances was not intended to skew results in a deceptive manner. Whether the term carries a pejorative or neutral connotation is dependent on whether the practice is used to pollute the results with pages of little relevance, or to direct traffic to a page of relevance that would have otherwise been de-emphasized due to the search engine's inability to interpret and understand related ideas. This is no longer the case. Search engines now employ themed, related keyword techniques to interpret the intent of the content on a page.

Content spam

These techniques involve altering the logical view that a search engine has over the page's contents. They all aim at variants of the vector space model for information retrieval on text collections.

Keyword stuffing

Keyword stuffing is a search engine optimization technique in which keywords are loaded into a web page's meta tags, visible content, or backlink anchor text in an attempt to gain an unfair rank advantage in search engines. Keyword stuffing may lead to a website being temporarily or permanently banned or penalized on major search engines. The repetition of words in meta tags may explain why many search engines no longer use these tags. Nowadays, search engines focus more on the content that is unique, comprehensive, relevant, and helpful that overall makes the quality better which makes keyword stuffing useless, but it is still practiced by many webmasters.
Many major search engines have implemented algorithms that recognize keyword stuffing, and reduce or eliminate any unfair search advantage that the tactic may have been intended to gain, and oftentimes they will also penalize, demote or remove websites from their indexes that implement keyword stuffing.
Changes and algorithms specifically intended to penalize or ban sites using keyword stuffing include the Google Florida update Google Panda Google Hummingbird and Bing's September 2014 update.
Headlines in online news sites are increasingly packed with just the search-friendly keywords that identify the story. Traditional reporters and editors frown on the practice, but it is effective in optimizing news stories for search.

Hidden or invisible text

Unrelated hidden text is disguised by making it the same color as the background, using a tiny font size, or hiding it within HTML code such as "no frame" sections, alt attributes, zero-sized DIVs, and "no script" sections. People manually screening red-flagged websites for a search-engine company might temporarily or permanently block an entire website for having invisible text on some of its pages. However, hidden text is not always spamdexing: it can also be used to enhance accessibility.

Meta-tag stuffing

This involves repeating keywords in the meta tags, and using meta keywords that are unrelated to the site's content. This tactic has been ineffective. Google declared that it doesn't use the keywords meta tag in its online search ranking in September 2009.

Doorway pages

"Gateway" or doorway pages are low-quality web pages created with very little content, which are instead stuffed with very similar keywords and phrases. They are designed to rank highly within the search results, but serve no purpose to visitors looking for information. A doorway page will generally have "click here to enter" on the page; autoforwarding can also be used for this purpose. In 2006, Google ousted vehicle manufacturer BMW for using "doorway pages" to the company's German site, BMW.de. Google announced that they will inflict a penalty on doorway tactics.

Scraper sites

are created using various programs designed to "scrape" search-engine results pages or other sources of content and create "content" for a website. The specific presentation of content on these sites is unique, but is merely an amalgamation of content taken from other sources, often without permission. Such websites are generally full of advertising, or they redirect the user to other sites. It is even feasible for scraper sites to outrank original websites for their own information and organization names.

Article spinning

involves rewriting existing articles, as opposed to merely scraping content from other sites, to avoid penalties imposed by search engines for duplicate content. This process is undertaken by hired writers or automated using a thesaurus database or an artificial neural network.

Machine translation

Similarly to article spinning, some sites use machine translation to render their content in several languages, with no human editing, resulting in unintelligible texts that nonetheless continue to be indexed by search engines, thereby attracting traffic.

Link spam

Link spam is defined as links between pages that are present for reasons
other than merit. Link spam takes advantage of link-based ranking algorithms, which gives websites higher rankings the more other highly ranked websites link to it. These techniques also aim at influencing other link-based ranking techniques such as the HITS algorithm.

Link farms

Link farms are tightly-knit networks of websites that link to each other for the sole purpose of exploiting the search engine ranking algorithms. These are also known facetiously as mutual admiration societies. Use of links farms has greatly reduced with the launch of Google's first Panda Update in February 2011, which introduced significant improvements in its spam-detection algorithm.

Private blog networks

are a group of authoritative websites used as a source of contextual links that point to the owner's main website to achieve higher search engine ranking. Owners of PBN websites use expired domains or auction domains that have backlinks from high-authority websites. Google targeted and penalized PBN users on several occasions with several massive deindexing campaigns since 2014.

Hidden links

Putting hyperlinks where visitors will not see them is used to increase link popularity. Highlighted link text can help rank a webpage higher for matching that phrase.

Sybil attack

A Sybil attack is the forging of multiple identities for malicious intent, named after the famous dissociative identity disorder patient and the book about her that shares her name, "Sybil". A spammer may create multiple web sites at different domain names that all link to each other, such as fake blogs.

Spam blogs

Spam blogs are blogs created solely for commercial promotion and the passage of link authority to target sites. Often these "splogs" are designed in a misleading manner that will give the effect of a legitimate website but upon close inspection will often be written using spinning software or be very poorly written with barely readable content. They are similar in nature to link farms.