SAP IQ
SAP IQ is a column-based, petabyte scale, relational database software system used for business intelligence, data warehousing, and data marts. Produced by Sybase Inc., now an SAP company, its primary function is to analyze large amounts of data in a low-cost, highly available environment. SAP IQ is often credited with pioneering the commercialization of column-store technology.
At the foundation of SAP IQ lies a column store technology that allows for speed compression and ad-hoc analysis. SAP IQ has an open interface approach towards its ecosystem. SAP IQ is also integrated with SAP's Business Intelligence portfolio of products to form an end-to-end business analytics software stack, and is an integral component of SAP's In-Memory Data Fabric Architecture and Data Management Platform.
History
In the early 1990s, Waltham, Massachusetts-based Expressway Technologies, Inc. developed the Expressway 103, a column-based, engine optimized for analytics, that would eventually become Sybase IQ. Sybase acquired Expressway and re-introduced the product in 1995 as IQ Accelerator, then renamed it shortly thereafter to Sybase IQ, giving it version number 11.0.By offering the IQ product as part of a collection of related technologies often found in a data warehouse, Sybase became one of the first mainstream companies to acknowledge the need for specialized products for the data warehouse market.
With version 12.0, Sybase replaced the loosely coupled query interface from Adaptive Server Enterprise with a tight coupling with SQL Anywhere.
Version 16 brings a re-engineered column store for extreme, petabyte scale, data volumes, and more extreme data compression.
In 2014, SAP HANA, together with partners BMMsoft, HP, Intel, NetApp, and Red Hat announced the world's largest data warehouse.
A team of engineers from SAP, BMMsoft, HP, Intel, NetApp, and Red Hat, built the data warehouse using SAP HANA and SAP IQ 16, with BMMsoft Federated EDMT running on HP DL580 servers using Intel Xeon E7-4870 processors under Red Hat Enterprise Linux 6 and NetApp FAS6290 and E5460 storage. The development and testing of the 12.1PB data warehouse was conducted by the SAP/Intel Petascale lab in Santa Clara, Calif., and audited by InfoSizing, an independent Transaction Processing Council certified auditor.
Version history
With the release of SP08, the version numbers have been changed to align with and match SAP HANA's version numbers to reflect the product's continuous integration with SAP HANA. The actual release title SP03 is a follow-on to SP02, covering all platforms not affected by the release.In-memory data fabric
SAP's new approach streamlines and simplifies Data Warehousing into an In-Memory Data Fabric.SAP IQ with SAP HANA
With the advent of big data, SAP IQ has coupled with SAP HANA to deliver a distributed in-memory analytics platform.There are three main applications and use cases which try to capitalize on SAP IQ's strengths concerning scalability and performance as an EDW and big data processor, while leveraging SAP HANA's in-memory speed for operational reporting:
SAP IQ as a Near-line Service (NLS) to SAP HANA
SAP HANA for operational reporting with SAP IQ for big data processing (NLS)
In this scenario, SAP Enterprise Resource Planning data goes into SAP HANA which acts as an operational data store for immediate analysis. Once the data is analyzed it is integrated into SAP IQ via Near-line storage mechanisms. Here SAP IQ acts as an enterprise data warehouse that receives data from a variety of traditional sources, and SAP HANA Operational Data StoreSAP IQ as an Enterprise Data Warehouse (EDW) with SAP HANA as Agile Data Mart
When SAP IQ is used as an EDW, it can also be augmented with HANA's in-memory technology. Common uses include planning and analysis reports where simultaneous OLTP processing is needed. In this case, data flows from SAP IQ to SAP HANA. SAP BusinessObjects BI can be used to achieve visibility across both platforms.Technology
To a user, SAP IQ looks just like any relational DBMS with a SQL-based language layer accessible via ODBC/JDBC drivers. However, inside, Sybase IQ is a column-oriented DBMS, which stores data tables as sections of columns of data rather than as rows of data like most transactional databases.Column-Store Architecture
Column-orientation has a number of advantages. If a search is being done for items matching a particular value in a column of data, only the storage objects corresponding to that data column within the table need to be accessed. A traditional row-based database would have to read the whole table, top to bottom. Another advantage is that when indexed correctly, a value that would have to be stored once in each row of data in a traditional database is stored only once, and in SAP IQ, an n-bitindex is used to access the data. Nbit and tiered indexing is used to allow for increased compression and fast, incremental batch loads.Additionally, the column-based storage enables SAP IQ to compress data efficiently on the fly.
Indexing Technology
Prior to SAP IQ 16, each data page was structured as an array of cells of a fixed size, so all values have the same data type. While this storage approach is efficient for structured and fixed length data, this does not hold for the more unstructured and variable sized data that is seen today. To combat storage inefficiency and store variable sized data with minimal wasted space, each page is composed of cells of a variable size that are packed tightly together; the column store architecture supports a variable number of cells per page and various page formats within a column. SAP IQ also applies Lempel-Ziv-Welch compression algorithms to each data page when it is written to disk, to significantly reduce data volume.Bitmaps are used for secondary indexes.
Massively Parallel Processing Framework
SAP IQ has a massively parallel processing framework based on a shared-everything environment that supports distributed query processing. Most other products capable of MPP tend to be based on shared-nothing environments. The benefit of shared-everything is that it's more flexible in terms of the variety of queries that can be optimized—especially for balancing the needs of many concurrent users. The downside is that in extreme cases, competition among processors to access a shared pool of storage, can lead to I/O contention, which affects query performance.However, the aforementioned storage architecture of SAP IQ allows compute and storage layers to scale out independently of each other and also allows these resources to be provisioned on-demand for better utilization without restructuring the underlying database.
Multiplex Architecture
SAP IQ uses a clustered grid architecture, which is made up of clusters of SAP IQ servers, or Multiplex. These clusters are used to scale performance for large numbers of concurrent queries or queries that are great in complexity. This is built upon a shared-everything architecture where all compute nodes interact with the same shared storage and queries have the ability to distribute across all compute nodes. The Multiplex has a coordinator node which manages the database catalog and coordinates transactional writes to the store. Other nodes can be reader only nodes, or readers and writes, like the coordinator node. The storage fabric can be implemented with numerous technologies that allow sharing amongst the multiplex nodes.This architecture has multiple uses, including workload balancing and elastic virtual data marts. Workload balancing is achieved by the SAP IQ query engine through dynamically increasing/decreasing parallelism in response to changes in server activity. There is automatic failover if a node stops participating in a query, and other nodes will pick up work originally assigned to the failed node so the query can complete. On the client side, compatibility with external load balancers ensures that queries are initiated on physical servers in a balanced fashion to eliminate bottlenecks. Physical nodes in the Multiplex can be grouped together into “logical servers” which allow workloads to be isolated from each other ; machines can be added to these as demand changes. The aim of the grid architecture is to enable resiliency even during global transactions.
Loading Engine
The SAP IQ Loading Engine can be used for incremental batch, low latency, concurrent loading, and bulk loading. The bulk loading process allows for multiple load processes to occur simultaneously, if the loads are of different tables. Data can be loaded from other databases as well as files. Page-level snapshot versioning allows concurrent loads and queries, with locking occurring at the table level only. With SAP Replication Server, now enhanced to optimize loads into SAP IQ, transactions are compiled into the fewest set of operations, and then bulk micro-batch loads into SAP IQ are performed, which gives the appearance of real-time, continuous loads.The bulk loader now performs all operations in parallel to make full use of all server cores, remove bottlenecks, and keep all threads productive, instead of serializing the process. The loading process remains a two-phase process, first reading raw data and creating FP indexed, and second creating secondary indexes, but everything is executed in parallel.
High Group indexes, which the query optimizer relies on for information about which columns/rows contain which data values, are now structured as a set of tiers, increasing as you move down the pyramid.
Lastly, SAP IQ introduces a write-optimized, Row Level Versioned Delta store which enables high-velocity data loads and fast availability of data to users. This store is minimally indexes and compressed, with row-level locking for concurrent write, and its own transaction log and is append only, and acts as a companion to the main store, with data being loaded at high speed to the RLV store, and migrating to the main store later, merging to it periodically. To the user it does not appear as though there are two separate entities at work and queries operate transparently across the two stores. To make use of this the users can specify particular “hot” database tables as RLV tables.