Object storage


Object storage is a computer data storage approach that manages data as "blobs" or "objects", as opposed to other storage architectures like file systems, which manage data as a file hierarchy, and block storage, which manages data as blocks within sectors and tracks. Each object is typically associated with a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.
Object storage systems allow retention of massive amounts of unstructured data in which data is written once and read once. Object storage is used for purposes such as storing objects like videos and photos on Facebook, songs on Spotify, or files in online collaboration services, such as Dropbox. One of the limitations with object storage is that it is not intended for transactional data, as object storage was not designed to replace NAS file access and sharing; it does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file.

History

Origins

coined the term "blob" working at Digital Equipment Corporation to refer to opaque data entities. The terminology was adopted for Rdb/VMS. "Blob" is often humorously explained to be an abbreviation for "binary large object". According to Starkey, this backronym arose when Terry McKiever, working in marketing at Apollo Computer felt that the term needed to be an abbreviation. McKiever began using the expansion "Basic Large Object". This was later eclipsed by the retroactive explanation of blobs as "Binary Large Objects". According to Starkey, "Blob don't stand for nothin'." Rejecting the acronym, he explained his motivation behind the coinage, saying, "A blob is the thing that ate Cincinnatti, Cleveland, or whatever," referring to the 1958 science fiction film The Blob.
In 1995, research led by Garth Gibson on Network-Attached Secure Disks first promoted the concept of splitting less common operations, like namespace manipulations, from common operations, like reads and writes, to optimize the performance and scale of both. In the same year, a Belgian company - FilePool - was established to build the basis for archiving functions. Object storage was proposed at Gibson's Carnegie Mellon University lab as a research project in 1996. Another key concept was abstracting the writes and reads of data to more flexible data containers. Fine grained access control through object storage architecture was further described by one of the NASD team, Howard Gobioff, who later was one of the inventors of the Google File System.
Other related work includes the Coda filesystem project at Carnegie Mellon, which started in 1987, and spawned the Lustre file system. There is also the OceanStore project at UC Berkeley, which started in 1999 and the Logistical Networking project at the University of Tennessee Knoxville, which started in 1998. In 1999, Gibson founded Panasas to commercialize the concepts developed by the NASD team.

Development

played a central role in the development of object storage. According to the Storage Networking Industry Association, "Object storage originated in the late 1990s: Seagate specifications from 1999 Introduced some of the first commands and how operating system effectively removed from consumption of the storage."
A preliminary version of the "OBJECT BASED STORAGE DEVICES Command Set Proposal" dated 10/25/1999 was submitted by Seagate as edited by Seagate's Dave Anderson and was the product of work by the National Storage Industry Consortium including contributions by Carnegie Mellon University, Seagate, IBM, Quantum, and StorageTek. This paper was proposed to INCITS T-10 with a goal to form a committee and design a specification based on the SCSI interface protocol. This defined objects as abstracted data, with unique identifiers and metadata, how objects related to file systems, along with many other innovative concepts. Anderson presented many of these ideas at the SNIA conference in October 1999. The presentation revealed an IP Agreement that had been signed in February 1997 between the original collaborators and covered the benefits of object storage, scalable computing, platform independence, and storage management.

Architecture

Abstraction of storage

One of the design principles of object storage is to abstract some of the lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of blocks or files. Objects contain additional descriptive properties which can be used for better indexing or management. Administrators do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure.
Object storage also allows the addressing and identification of individual objects by more than just file name and file path. Object storage adds a unique identifier within a bucket, or across the entire system, to support much larger namespaces and eliminate name collisions.

Inclusion of rich custom metadata within the object

Object storage explicitly separates file metadata from data to support additional capabilities.
As opposed to fixed metadata in file systems, object storage provides for full function, custom, object-level metadata in order to:
  • Capture application-specific or user-specific information for better indexing purposes
  • Support data-management policies
  • Centralize management of storage across many individual nodes and clusters
  • Optimize metadata storage and caching/indexing independently from the data storage
Additionally, in some object-based file-system implementations:
  • The file system clients only contact metadata servers once when the file is opened and then get content directly via object-storage servers
  • Data objects can be configured on a per-file basis to allow adaptive stripe width, even across multiple object-storage servers, supporting optimizations in bandwidth and I/O
Object-based storage devices as well as some software implementations manage metadata and data at the storage device level:
  • Instead of providing a block-oriented interface that reads and writes fixed sized blocks of data, data is organized into flexible-sized data containers, called objects
  • Each object has both data and metadata ; physically encapsulating both together benefits recoverability.
  • The command interface includes commands to create and delete objects, write bytes and read bytes to and from individual objects, and to set and get attributes on objects
  • Security mechanisms provide per-object and per-command access control

    Programmatic data management

Object storage provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes Create, read, update and delete functions for basic read, write and delete operations. Some object storage implementations go further, supporting additional functionality like object/file versioning, object replication, life-cycle management and movement of objects between different tiers and types of storage. Most API implementations are REST-based, allowing the use of many standard HTTP calls.

Implementation

Cloud storage

The vast majority of cloud storage available in the market leverages an object-storage architecture. Some notable examples are Amazon Web Services S3, which debuted in March 2006, Microsoft Azure Blob Storage, IBM Cloud Object Storage, Rackspace Cloud Files, and Google Cloud Storage released in May 2010.

Object-based file systems

Some distributed file systems use an object-based architecture, where file metadata is stored in metadata servers and file data is stored in object storage servers. File system client software interacts with the distinct servers, and abstracts them to present a full file system to users and applications.

Object-storage systems

Some early incarnations of object storage were used for archiving, as implementations were optimized for data services like immutability, not performance. EMC Centera and Hitachi HCP are two commonly cited object storage products for archiving. Another example is Quantum ActiveScale Object Storage Platform.
More general-purpose object-storage systems came to market around 2008. Lured by the incredible growth of "captive" storage systems within web applications like Yahoo Mail and the early success of cloud storage, object-storage systems promised the scale and capabilities of cloud storage, with the ability to deploy the system within an enterprise, or at an aspiring cloud-storage service provider.

Unified file and object storage

A few object-storage systems support Unified File and Object storage, allowing clients to store objects on a storage system while simultaneously other clients store files on the same storage system. Other vendors in the area of Hybrid cloud storage are using Cloud storage gateways to provide a file access layer over object storage, implementing file access protocols such as SMB and NFS.

"Captive" object storage

Some large Internet companies developed their own software when object-storage products were not commercially available or use cases were very specific. Facebook famously invented their own object-storage software, code-named Haystack, to address their particular massive-scale photo management needs efficiently.

Object-based storage devices

Object storage at the protocol and device layer was proposed 20 years ago and approved for the SCSI command set nearly 10 years ago as "Object-based Storage Device Commands", however, it had not been put into production until the development of the Seagate Kinetic Open Storage platform. The SCSI command set for Object Storage Devices was developed by a working group of the SNIA for the T10 committee of the International Committee for Information Technology Standards. T10 is responsible for all SCSI standards.