Search engine privacy


Search engine privacy is a subset of internet privacy that deals with user data being collected by search engines. Both types of privacy fall under the umbrella of information privacy. Privacy concerns regarding search engines can take many forms, such as the ability for search engines to log individual search queries, browsing history, IP addresses, and cookies of users, and conducting user profiling in general. The collection of personally identifiable information of users by search engines is referred to as tracking.
This is controversial because search engines often claim to collect a user's data in order to better tailor results to that specific user and to provide the user with a better searching experience. However, search engines can also abuse and compromise its users' privacy by selling their data to advertisers for profit. In the absence of regulations, users must decide what is more important to their search engine experience: relevance and speed of results or their privacy, and choose a search engine accordingly.
The legal framework in the United States for protecting user privacy is not very solid. The most popular search engines collect personal information, but other search engines that are focused on privacy have cropped up recently. There have been several well publicized breaches of search engine user privacy that occurred with companies like AOL and Yahoo. For individuals interested in preserving their privacy, there are options available to them, such as using software like Tor which makes the user's location and personal information anonymous or using a privacy focused search engine.

Privacy policies

Search engines generally publish privacy policies to inform users about what data of theirs may be collected and what purposes it may be used for. While these policies may be an attempt at transparency by search engines, many people never read them and are therefore unaware of how much of their private information, like passwords and saved files, are collected from cookies and may be logged and kept by the search engine. This ties in with the phenomenon of notice and consent, which is how many privacy policies are structured.
Notice and consent policies essentially consist of a site showing the user a privacy policy and having them click to agree. This is intended to let the user freely decide whether or not to go ahead and use the website. This decision, however, may not actually be made so freely because the costs of opting out can be very high. Another big issue with putting the privacy policy in front of users and having them accept quickly is that they are often very hard to understand, even in the unlikely case that a user decides to read them. Privacy minded search engines, such as DuckDuckGo, state in their privacy policies that they collect much less data than search engines such as Google or Yahoo, and may not collect any. As of 2008, search engines were not in the business of selling user data to third parties, though they do note in their privacy policies that they comply with government subpoenas.

Google and Yahoo

Google, founded in 1998, is the most widely used search engine, receiving billions and billions of search queries every month. Google logs all search terms in a database along with the date and time of search, browser and operating system, IP address of user, the Google cookie, and the URL that shows the search engine and search query. The privacy policy of Google states that they pass user data on to various affiliates, subsidiaries, and "trusted" business partners.
Yahoo, founded in 1994, also collects user data. It is a well-known fact that users do not read privacy policies, even for services that they use daily, such as Yahoo! Mail and Gmail. This persistent failure of consumers to read these privacy policies can be disadvantageous to them because while they may not pick up on differences in the language of privacy policies, judges in court cases certainly do. This means that search engine and email companies like Google and Yahoo are technically able to keep up the practice of targeting advertisements based on email content since they declare that they do so in their privacy policies. A study was done to see how much consumers cared about privacy policies of Google, specifically Gmail, and their detail, and it determined that users often thought that Google's practices were somewhat intrusive but that users would not often be willing to counteract this by paying a premium for their privacy.

DuckDuckGo

DuckDuckGo, founded in 2008, claims to be privacy focused. DuckDuckGo does not collect or share any personal information of users, such as IP addresses or cookies, which other search engines usually do log and keep for some time. It also does not have spam, and protects user privacy further by anonymizing search queries from the website the user chooses and using encryption. Similarly privacy oriented search engines include Startpage, Ecosia, Qwant, MetaGer and Disconnect. Mojeek and Brave Search are privacy-focused search engines that build their own indexes.

Types of data collected by search engines

Most search engines can, and do, collect personal information about their users according to their own privacy policies. This user data could be anything from location information to cookies, IP addresses, search query histories, click-through history, and online fingerprints. This data is often stored in large databases, and users may be assigned numbers in an attempt to provide them with anonymity.
Data can be stored for an extended period of time. For example, the data collected by Google on its users is retained for up to 9 months. Some studies state that this number is actually 18 months. This data is used for various reasons such as optimizing and personalizing search results for users, targeting advertising, and trying to protect users from scams and phishing attacks. Such data can be collected even when a user is not logged in to their account or when using a different IP address by using cookies.

Uses

User profiling and personalization

What search engines often do once they have collected information about a user's habits is to create a profile of them, which helps the search engine decide which links to show for different search queries submitted by that user or which ads to target them with. An interesting development in this field is the invention of automated learning, also known as machine learning. Using this, search engines can refine their profiling models to more accurately predict what any given user may want to click on by doing A/B testing of results offered to users and measuring the reactions of users.
Companies like Google, Netflix, YouTube, and Amazon have all started personalizing results more and more. One notable example is how Google Scholar takes into account the publication history of a user in order to produce results it deems relevant. Personalization also occurs when Amazon recommends books or when IMDb suggests movies by using previously collected information about a user to predict their tastes. For personalization to occur, a user need not even be logged into their account.

Targeted advertising

The internet advertising company DoubleClick, which helps advertisers target users for specific ads, was bought by Google in 2008 and was a subsidiary until June 2018, when Google rebranded and merged DoubleClick into its Google Marketing Platform. DoubleClick worked by depositing cookies on user's computers that would track sites they visited with DoubleClick ads on them. There was a privacy concern when Google was in the process of acquiring DoubleClick that the acquisition would let Google create even more comprehensive profiles of its users since they would be collecting data about search queries and additionally tracking websites visited. This could lead to users being shown ads that are increasingly effective with the use of behavioral targeting. With more effective ads comes the possibility of more purchases from consumers that they may not have made otherwise. In 1994, a conflict between selling ads and relevance of results on search engines began. This was sparked by the development of the cost-per-click model, which challenged the methods of the already-created cost-per-mille model. The cost-per-click method was directly related to what users searched, whereas the cost-per-mille method was directly influenced by how much a company could pay for an ad, no matter how many times people interacted with it.

Improving search quality

Besides ad targeting and personalization, Google also uses data collected on users to improve the quality of searches. Search result click histories and query logs are crucial in helping search engines optimize search results for individual users. Search logs also help search engines in the development of the algorithms they use to return results, such as Google's well known PageRank. An example of this is how Google uses databases of information to refine Google Spell Checker.

Privacy organizations

There are many who believe that user profiling is a severe invasion of user privacy, and there are organizations such as the Electronic Privacy Information Center and Privacy International that are focused on advocating for user privacy rights. In fact, EPIC filed a complaint in 2007 with the Federal Trade Commission claiming that Google should not be able to acquire DoubleClick on the grounds that it would compromise user privacy. The Open Search Foundation specifically targets search engine privacy by investigating ways of making search a public, collaborative good where people can search freely without their personal data being collected and evaluated.

Users' perception of privacy

Experiments have been done to examine consumer behavior when given information on the privacy of retailers by integrating privacy ratings with search engines. Researchers used a search engine for the treatment group called Privacy Finder, which scans websites and automatically generates an icon to show the level of privacy the site will give the consumer as it compares to the privacy policies that consumer has specified that they prefer. The results of the experiment were that subjects in the treatment group, those who were using a search engine that indicated privacy levels of websites, purchased products from websites that gave them higher levels of privacy, whereas the participants in the control groups opted for the products that were simply the cheapest. The study participants also were given financial incentive because they would get to keep leftover money from purchases. This study suggests that since participants had to use their own credit cards, they had a significant aversion to purchasing products from sites that did not offer the level of privacy they wanted, indicating that consumers value their privacy monetarily.