Data and information visualization
Data and information visualization is the practice of designing and creating graphic or visual representations of quantitative and qualitative data and information with the help of static, dynamic or interactive visual items. These visualizations are intended to help a target audience visually explore and discover, quickly understand, interpret and gain important insights into otherwise difficult-to-identify structures, relationships, correlations, local and global patterns, trends, variations, constancy, clusters, outliers and unusual groupings within data. When intended for the public to convey a concise version of information in an engaging manner, it is typically called infographics.
Data visualization is concerned with presenting sets of primarily quantitative raw data in a schematic form, using imagery. The visual formats used in data visualization includes charts and graphs, geospatial maps, figures, correlation matrices, percentage gauges, etc..
Information visualization deals with multiple, large-scale and complicated datasets which contain quantitative data, as well as qualitative, and primarily abstract information, and its goal is to add value to raw data, improve the viewers' comprehension, reinforce their cognition and help derive insights and make decisions as they navigate and interact with the graphical display. Visual tools used include maps for location based data; hierarchical organisations of data; displays that prioritise relationships such as Sankey diagrams; flowcharts, timelines.
In addition, Narrative visualization is a method that uses visual elements such as charts, graphs, maps and the like to convey data or information. This approach blends data analysis, storytelling, and visualization to present information through structured narrative flows- using visuals not just to present statistics, but to tell a story through data. Its goal is to help people identify trends, patterns, and relationships in data through a compelling story-driven visual experience.
Emerging technologies like virtual, augmented and mixed reality have the potential to make information visualization more immersive, intuitive, interactive and easily manipulable and thus enhance the user's visual perception and cognition. In data and information visualization, the goal is to graphically present and explore abstract, non-physical and non-spatial data collected from databases, information systems, file systems, documents, business data, which is different from scientific visualization, where the goal is to render realistic images based on physical and spatial scientific data to confirm or reject hypotheses.
Effective data visualization is well-sourced, appropriately contextualized, and presented in a simple, uncluttered manner. The underlying data is accurate and up-to-date to ensure insights are reliable. Graphical items are well-chosen and aesthetically appealing, with shapes, colors and other visual elements used deliberately in a meaningful and non-distracting manner. The visuals are accompanied by supporting texts. Verbal and graphical components complement each other to ensure clear, quick and memorable understanding. Effective information visualization is aware of the needs and expertise level of the target audience. Effective visualization can be used for conveying specialized, complex, big data-driven ideas to a non-technical audience in a visually appealing, engaging and accessible manner, and domain experts and executives for making decisions, monitoring performance, generating ideas and stimulating research.
Data scientists, analysts and data mining specialists use data visualization to check data quality, find errors, unusual gaps, missing values, clean data, explore the structures and features of data, and assess outputs of data-driven models. Data and information visualization can be part of data storytelling, where they are paired with a narrative structure, to contextualize the analyzed data and communicate insights gained from analyzing it to convince the audience into making a decision or taking action. This can be contrasted with statistical graphics, where complex data are communicated graphically among researchers and analysts to help them perform exploratory data analysis or convey results of such analyses, where visual appeal, capturing attention to a certain issue and storytelling are less important.
Data and information visualization is interdisciplinary, it incorporates principles found in descriptive statistics, visual communication, graphic design, cognitive science and, interactive computer graphics and human-computer interaction. Since effective visualization requires design skills, statistical skills and computing skills, it is both an art and a science. Visual analytics combines statistical data analysis, data and information visualization, and human analytical reasoning through interactive visual interfaces to help users reach conclusions, gain actionable insights and make informed decisions which are otherwise difficult for computers to do. Research into how people read and misread types of visualizations helps to determine what types and features of visualizations are most understandable and effective. Unintentionally poor or intentionally misleading and deceptive visualizations can function as powerful tools which disseminate misinformation, manipulate public perception and divert public opinion. Thus data visualization literacy has become an important component of data and information literacy in the information age akin to the roles played by textual, mathematical and visual literacy in the past.
Overview
The field of data and information visualization has emerged "from research in human–computer interaction, computer science, graphics, visual design, psychology, photography and business methods. It is increasingly applied as a critical component in scientific research, digital libraries, data mining, financial data analysis, market studies, manufacturing production control, and drug discovery".Data and information visualization presumes that "visual representations and interaction techniques take advantage of the human eye's broad bandwidth pathway into the mind to allow users to see, explore, and understand large amounts of information at once. Information visualization focused on the creation of approaches for conveying abstract information in intuitive ways."
Data analysis is an indispensable part of all applied research and problem solving in industry. The most fundamental data analysis approaches are visualization, statistics, data mining, and machine learning methods. Among these approaches, information visualization, or visual data analysis, is the most reliant on the cognitive skills of human analysts, and allows the discovery of unstructured actionable insights that are limited only by human imagination and creativity. The analyst does not have to learn any sophisticated methods to be able to interpret the visualizations of the data. Information visualization is also a hypothesis generation scheme, which can be, and is typically followed by more analytical or formal analysis, such as statistical hypothesis testing.
To communicate information clearly and efficiently, data visualization uses statistical graphics, plots, information graphics and other tools. Numerical data may be encoded using dots, lines, or bars, to visually communicate a quantitative message. Effective visualization helps users analyze and reason about data and evidence. It makes complex data more accessible, understandable, and usable, but can also be reductive. Users may have particular analytical tasks, such as making comparisons or understanding causality, and the design principle of the graphic follows the task. Tables are generally used where users will look up a specific measurement, while charts of various types are used to show patterns or relationships in the data for one or more variables.
Data visualization refers to the techniques used to communicate data or information by encoding it as visual objects contained in graphics. The goal is to communicate information clearly and efficiently to users. It is one of the steps in data analysis or data science. According to Vitaly Friedman the "main goal of data visualization is to communicate information clearly and effectively through graphical means. It doesn't mean that data visualization needs to look boring to be functional or extremely sophisticated to look beautiful. To convey ideas effectively, both aesthetic form and functionality need to go hand in hand, providing insights into a rather sparse and complex data set by communicating its key aspects in a more intuitive way. Yet designers often fail to achieve a balance between form and function, creating gorgeous data visualizations which fail to serve their main purpose — to communicate information".
Indeed, Fernanda Viegas and Martin M. Wattenberg suggested that an ideal visualization should not only communicate clearly, but stimulate viewer engagement and attention.
Data visualization is closely related to information graphics, information visualization, scientific visualization, exploratory data analysis and statistical graphics. In the new millennium, data visualization has become an active area of research, teaching and development. According to Post et al., it has united scientific and information visualization.
In the commercial environment data visualization is often referred to as dashboards. Infographics are another very common form of data visualization.
Principles
Characteristics of effective graphical displays
has explained that users of information displays are executing particular analytical tasks such as making comparisons. The design principle of the information graphic should support the analytical task. As William Cleveland and Robert McGill show, different graphical elements accomplish this more or less effectively. For example, dot plots and bar charts outperform pie charts.In his 1983 book The Visual Display of Quantitative Information, Edward Tufte defines 'graphical displays' and principles for effective graphical display in the following passage:
"Excellence in statistical graphics consists of complex ideas communicated with clarity, precision, and efficiency. Graphical displays should:
- show the data
- induce the viewer to think about the substance rather than about methodology, graphic design, the technology of graphic production, or something else
- avoid distorting what the data has to say
- present many numbers in a small space
- make large data sets coherent
- encourage the eye to compare different pieces of data
- reveal the data at several levels of detail, from a broad overview to the fine structure
- serve a reasonably clear purpose: description, exploration, tabulation, or decoration
- be closely integrated with the statistical and verbal descriptions of a data set.
For example, the Minard diagram shows the losses suffered by Napoleon's army in the 1812–1813 period. Six variables are plotted: the size of the army, its location on a two-dimensional surface, time, the direction of movement, and temperature. The line width illustrates a comparison, while the temperature axis suggests a cause of the change in army size. This multivariate display on a two-dimensional surface tells a story that can be grasped immediately while identifying the source data to build credibility. Tufte wrote in 1983 that: "It may well be the best statistical graphic ever drawn."
Not applying these principles may result in misleading graphs, distorting the message, or supporting an erroneous conclusion. According to Tufte, chartjunk refers to the extraneous interior decoration of the graphic that does not enhance the message or gratuitous three-dimensional or perspective effects. Needlessly separating the explanatory key from the image itself, requiring the eye to travel back and forth from the image to the key, is a form of "administrative debris." The ratio of "data to ink" should be maximized, erasing non-data ink where feasible.
The Congressional Budget Office summarized several best practices for graphical displays in a June 2014 presentation. These included: a) Knowing your audience; b) Designing graphics that can stand alone outside the report's context; and c) Designing graphics that communicate the key messages in the report.
Useful criteria for a data or information visualization include:
- It is based on data - that is, a data/info viz is not image processing and collage;
- It creates an image - specifically that the image plays the primary role in communicating meaning and is not an illustration accompanying the data in text form; and
- The result is readable.
Kosara also identifies the need for a visualisation to be "recognisable as a visualisation and not appear to be something else". He also states that recognisability and readability may not always be required in all types of visualisation e.g. "informative art" or "artistic visualisation".