Long tail


In statistics and business, a long tail of some distributions of numbers is the portion of the distribution having many occurrences far from the "head" or central part of the distribution. The distribution could involve popularities, random numbers of occurrences of events with various probabilities, etc. The term is often used loosely, with an imprecise definition, but precise definitions are possible.
In statistics, the term long-tailed distribution has a narrow technical meaning, and is a subtype of heavy-tailed distribution. Intuitively, a distribution is long-tailed if, for any fixed amount, when a quantity exceeds a high level, it almost certainly exceeds it by at least that amount: large quantities are probably even larger. Note that there is no sense of the "long tail" of a distribution, but only the property of a distribution being long-tailed.
In business, the term long tail is applied to rank-size distributions or rank-frequency distributions, which often form power laws and are thus long-tailed distributions in the statistical sense. This is used to describe the retailing strategy of selling many unique items with relatively small quantities sold of each —usually in addition to selling fewer popular items in large quantities. Sometimes an intermediate category is also included, variously called the body, belly, torso, or middle. The specific cutoff of what part of a distribution is the "long tail" is often arbitrary, but in some cases may be specified objectively; see segmentation of rank-size distributions.
The long tail concept has found some ground for application, research, and experimentation. It is a term used in online business, mass media, micro-finance, user-driven innovation, knowledge management, and social network mechanisms, economic models, marketing, and IT Security threat hunting within a SOC.

History

s with long tails have been studied by statisticians since at least 1946. The term has also been used in the finance and insurance business for many years. The work of Benoît Mandelbrot in the 1950s and later has led to him being referred to as "the father of long tails".
The long tail was popularized by Chris Anderson in an October 2004 Wired magazine article, in which he mentioned Amazon.com, Apple and Yahoo! as examples of businesses applying this strategy. Anderson elaborated the concept in his book The Long Tail: Why the Future of Business Is Selling Less of More.
Anderson cites research published in 2003 by Erik Brynjolfsson, Yu Hu, and Michael D. Smith, who first used a log-linear curve on an XY graph to describe the relationship between Amazon.com sales and sales ranking. They showed that the primary value of the internet to consumers comes from releasing new sources of value by providing access to products in the long tail.

Business

The distribution and inventory costs of businesses successfully applying a long tail strategy allow them to realize significant profit out of selling small volumes of hard-to-find items to many customers instead of only selling large volumes of a reduced number of popular items. The total sales of this large number of "non-hit items" is called "the long tail".
Given enough choice, a large population of customers, and negligible stocking and distribution costs, the selection and buying pattern of the population results in the demand across products having a power law distribution or Pareto distribution.
It is important to understand why some distributions are normal vs. long tail distributions. Chris Anderson argues that while quantities such as human height or IQ follow a normal distribution, in scale-free networks with preferential attachments, power law distributions are created, i.e. because some nodes are more connected than others.

Statistical meaning

The long tail is the name for a long-known feature of some statistical distributions. In "long-tailed" distributions a high-frequency or high-amplitude population is followed by a low-frequency or low-amplitude population which gradually "tails off" asymptotically. The events at the far end of the tail have a very low probability of occurrence.
Power law distributions or functions characterize an important number of behaviors from nature and human endeavor. This fact has given rise to a keen scientific and social interest in such distributions, and the relationships that create them. The observation of such a distribution often points to specific kinds of mechanisms, and can often indicate a deep connection with other, seemingly unrelated systems. Examples of behaviors that exhibit long-tailed distribution are the occurrence of certain words in a given language, the income distribution of a business or the intensity of earthquakes.
Chris Anderson's and Clay Shirky's articles highlight special cases in which we are able to modify the underlying relationships and evaluate the impact on the frequency of events. In those cases the infrequent, low-amplitude events – the long tail, represented here by the portion of the curve to the right of the 20th percentile – can become the largest area under the line. This suggests that a variation of one mechanism or relationship can significantly shift the frequency of occurrence of certain events in the distribution. The shift has a crucial effect in probability and in the customer demographics of businesses like mass media and online sellers.
However, the long tails characterizing distributions such as the Gutenberg–Richter law or the words-occurrence Zipf's law, and those highlighted by Anderson and Shirky are of very different, if not opposite, nature: Anderson and Shirky refer to frequency-rank relations, whereas the Gutenberg–Richter law and the Zipf's law are probability distributions. Therefore, in these latter cases "tails" correspond to large-intensity events such as large earthquakes and most popular words, which dominate the distributions. By contrast, the long tails in the frequency-rank plots highlighted by Anderson and Shirky would rather correspond to short tails in the associated probability distributions, and therefore illustrate an opposite phenomenon compared to the Gutenberg–Richter and the Zipf's laws.

Chris Anderson and Clay Shirky

Use of the phrase the long tail in business as "the notion of looking at the tail itself as a new market" of consumers was first coined by Chris Anderson. The concept drew in part from a February 2003 essay by Clay Shirky, "Power Laws, Weblogs and Inequality", which noted that a relative handful of weblogs have many links going into them but "the long tail" of millions of weblogs may have only a handful of links going into them. Anderson described the effects of the long tail on current and future business models beginning with a series of speeches in early 2004 and with the publication of a Wired magazine article in October 2004. Anderson later extended it into the book The Long Tail: Why the Future of Business is Selling Less of More.
Anderson argues that products in low demand or that have a low sales volume can collectively make up a market share that rivals or exceeds the relatively few current bestsellers and blockbusters, if the store or distribution channel is large enough. Anderson cites earlier research by Erik Brynjolfsson, Yu Hu, and Michael D. Smith, that showed that a significant portion of Amazon.com's sales come from obscure books that are not available in brick-and-mortar stores. The long tail is a potential market and, as the examples illustrate, the distribution and sales channel opportunities created by the Internet often enable businesses to tap that market successfully.
In his Wired article Anderson opens with an anecdote about creating a niche market for books on Amazon. He writes about a book titled Touching the Void about a near-death mountain climbing accident that took place in the Peruvian Andes. Anderson states the book got good reviews, but didn't have much commercial success. However, ten years later a book titled Into Thin Air by Jon Krakauer was published and Touching the Void began to sell again. Anderson realized that this was due to Amazon's recommendations. This created a niche market for those who enjoy books about mountain climbing even though it is not considered a popular genre supporting the long tail theory.
An Amazon employee described the long tail as follows: "We sold more books today that didn't sell at all yesterday than we sold today of all the books that did sell yesterday."
Anderson has explained the term as a reference to the tail of a demand curve. The term has since been rederived from an XY graph that is created when charting popularity to inventory. In the graph shown above, Amazon's book sales would be represented along the vertical axis, while the book or movie ranks are along the horizontal axis. The total volume of low popularity items exceeds the volume of high popularity items.

Academic research

Effects of online access

, Yu Hu, and Michael D. Smith found that a large proportion of Amazon.com's book sales come from obscure books that were not available in brick-and-mortar stores. They then quantified the potential value of the long tail to consumers. In an article published in 2003, these authors showed that, while most of the discussion about the value of the Internet to consumers has revolved around lower prices, consumer benefit from access to increased product variety in online book stores is ten times larger than their benefit from access to lower prices online.

The longer tail over time

A subsequent study by Erik Brynjolfsson, Yu Hu, and Michael D. Smith found that the long tail has grown longer over time, with niche books accounting for a larger share of total sales. Their analyses suggested that by 2008, niche books accounted for 36.7% of Amazon's sales while the consumer surplus generated by niche books has increased at least fivefold from 2000 to 2008. In addition, their new methodology finds that, while the widely used power laws are a good first approximation for the rank-sales relationship, the slope may not be constant for all book ranks, with the slope becoming progressively steeper for more obscure books.
In support of their findings, Wenqi Zhou and Wenjing Duan not only find a longer tail but also a fatter tail by an in-depth analysis on consumer software downloading pattern in their paper "Online user reviews, product variety, and the long tail". The demand for all products decreases, but the decrease for the hits is more pronounced, indicating the demand shifting from the hits to the niches over time. In addition, they also observe a superstar effect in the presence of the long tail. A small number of very popular products still dominates the demand.