Data valuation
Data valuation is a discipline in the fields of accounting and information economics. It is concerned with methods to calculate the value of data collected, stored, analyzed and traded by organizations. This valuation depends on the type, reliability and field of data.
History
In the 21st century, exponential increases in computing power and data storage capabilities have led to a proliferation of big data, machine learning and other data analysis techniques. Businesses increasingly adapt these techniques and technologies to pursue data-driven strategies to create new business models. Traditional accounting techniques used to value organizations were developed in an era before high-volume data capture and analysis became widespread and focused on tangible assets, ignoring data assets. As a result, accounting calculations often ignore data and leave its value off organizations' balance sheets. Notably, in the wake of the 9/11 attacks on the World Trade Center in 2001, a number of businesses lost significant amounts of data. They filed claims with their insurance companies for the value of information that was destroyed, but the insurance companies denied the claims, arguing that information did not count as property and therefore was not covered by their policies.A number of organizations and individuals began noticing this and then publishing on the topic of data valuation. Doug Laney, vice president and analyst at Gartner, conducted research on Wall Street valued companies, which found that companies that had become information-centric, treating data as an asset, often had market-to-book values two to three times higher than the norm. On the topic, Laney commented: "Even as we are in the midst of the Information Age, information simply is not valued by those in the valuation business. However, we believe that, over the next several years, those in the business of valuing corporate investments, including equity analysts, will be compelled to consider a company's wealth of information in properly valuing the company itself." In the latter part of the 2010s, the list of most valuable firms in the world was dominated by data firms – Microsoft, Alphabet, Apple, Amazon and Facebook.
Characteristics of data as an asset
A 2020 study by the Nuffield Institute at Cambridge University, UK divided the characteristics of data into two categories, economic characteristics and informational characteristics.Economic characteristics
- Data is non-rival. Multiple people can use data without it being depleted or used up.
- Data varies in whether it is excludable. Data can be a public good or a club good, depending on what type of information it contains. Some data can reasonably be shared with anyone who desires to access it. Other data is limited to particular users and contexts.
- Data involves externalities. In economics, an externality is the cost or benefit that affects a third party who did not choose to incur that cost or benefit. Data can create positive externalities because when new data is produced, it combines with already existing data to produce new insights, increasing the value of both, and negative externalities, when data may be leaked, breached or otherwise misused.
- Data may have increasing or decreasing returns. Sometimes collecting more data increases insight or value, though at other times it can simply lead to hoarding.
- Data has a large option value. Due to the perpetual development of new technologies and datasets, it is hard to predict how the value of a particular data asset might change. Organizations may store data, anticipating possible future value, rather than actual present value.
- Data collection often has high up-front cost and low marginal cost. Collecting data often requires significant investment in technologies and digitization. Once these are established, further data collection may cost much less. High entry barriers may prevent smaller organizations from collecting data.
- Data use requires complementary investment. Organizations may need to invest in software, hardware and personnel to realize value from data.
Informational characteristics
- Subject matter. Encompasses what the data describes, and what can it help with.
- Generality. Some data is useful across a range of analyses; other data is useful only in particular cases.
- Temporal coverage, Data can be forecast, real-time, historic or back-cast. These are used differently, for planning, operational and historical analyses.
- Quality. Higher quality data is generally more valuable as it reduces uncertainty and risk, though the required quality varies from use to use. Greater automation in data collection tends to lead to higher quality.
- Sensitivity. Sensitive data is data that could be used in damaging ways. Costs and risks are incurred keeping sensitive data safe.
- Interoperability and linkability. Interoperability relates to the use of data standards when representing data, which means that data relating to the same things can be easily brought together. Linkability relates to the use of standard identifiers within the data set that enables a record in one data set to be connected to additional data in another data set.
Data value drivers
- Exclusivity. Having exclusive access to a data asset makes it more valuable than if it is accessible to multiple license holders.
- Timeliness. For much data, the more closely it reflects the present, the more reliable the conclusions that can be drawn from it. Recently captured data is more valuable than historic data.
- Accuracy. The more closely data describes the truth, the more valuable it is.
- Completeness. The more variables about a particular event or object described by data, the more valuable the data is.
- Consistency. The more a data asset is consistent with other similar data assets, the more valuable it is.
- Usage Restrictions. Data collected without necessary approvals for usage is less valuable as it cannot be used legally.
- Interoperability/Accessibility. The more easily and effectively data can be combined with other organizational data to produce insights, the more valuable it is.
- Liabilities and Risk. Reputational consequences and financial penalties for breaching data regulations such as GDPR can be severe. The greater the risk associated with data use, the lower its value.
Methods for valuing data
Due to the wide range of potential datasets and use cases, as well as the relative infancy of data valuation, there are no simple or universally agreed upon methods. High option value and externalities mean data value may fluctuate unpredictably, and seemingly worthless data may suddenly become extremely valuable at an unspecified future date. Nonetheless, a number of methods have been proposed for calculating or estimating data value.Information-theoretic characterization
provides quantitative mechanisms for data valuation. For instance, secure data sharing requires careful protection of individual privacy or organization intellectual property. Information-theoretic approaches and data obfuscation can be applied to sanitize data prior to its dissemination.Information-theoretic measures, such as entropy, information gain, and information cost, are useful for anomaly and outlier detection. In data-driven analytics, a common problem is quantifying whether larger data sizes and/or more complex data elements actually enhance, degrade, or alter the data information content and utility. The data value metric quantifies the useful information content of large and heterogeneous datasets in terms of the tradeoffs between the size, utility, value, and energy of the data. Such methods can be used to determine if appending, expanding, or augmenting an existent dataset may improve the modeling or understanding of the underlying phenomenon.
Infonomics valuation models
Doug Laney identifies six approaches for valuing data, dividing these into two categories: foundational models and financial models. Foundational models assign a relative, informational value to data, where financial models assign an absolute, economic value.Foundational models
- Intrinsic Value of Information measures data value drivers including correctness, completeness and exclusivity of data and assigns a value accordingly.
- Business Value of Information measures how fit the data is for specific business purposes.
- Performance Value of Information measures how the usage of the data effects key business drivers and KPIs, often using a control group study.
Financial models
- Cost Value of Information measures the cost to produce and store the data, the cost to replace it, or the impact on cash flows if it was lost.
- Market Value of Information measures the actual or estimated value the data would be traded for in the data marketplace.
- Economic Value of Information measures the expected cash flows, returns or savings from the usage of the data.