Google Cloud Datastore
Google Cloud Datastore is a NoSQL database service provided by Google Cloud Platform. It is a fully managed database which can handle massive amounts of data and it is a part of the many services offered by Google Cloud Platform. It is designed to handle structured data and it also offers a high reliability and efficient platform to create scalable applications. Unlike traditional relational databases, this is a schema-less database concept. This gives flexible data modeling and dynamic schema changes without downtime in its services that rely on this database. Google Cloud Datastore is platform used for data handling on mobile apps, web applications, and also the IoT systems. This is because of its key characteristics such as automatic scaling, strong consistency, and smooth integration with other Google Cloud services. Google Cloud Datastore is built to handle software applications that are require high scalability, low-latency reads and writes, and automatic management of data across distributed systems. Google Cloud Datastore organizes data in entities and properties, where entities are grouped into kinds. This concept is similar to tables in relational databases, however since this is NoSQL database, it is without the schema constraints. Each entity in Datastore is uniquely identified by a key. This key can have a custom user-defined identifier or it can be auto generated key by the system.
Google Cloud Datastore offers an API and client libraries for different types of general purpose programming languages, like Python, Java, and Node.js. This API also has different release versions of these languages, so that Cloud Datastore can be integrated with both legacy and modern apps written in these languages. It also provides support for asynchronous operations. With this, developers can build non-blocking and highly responsive systems. In the context of data consistency, Google Cloud Datastore provides strong consistency for single entity lookups and supports eventual consistency for queries across multiple entities.
History
Google Cloud Datastore was announced on April 11, 2013, as a fully managed NoSQL document database and designed to support large-scale web and mobile applications. It was based on the original Datastore used in Google App Engine since 2008. But it was designed to offer feature such as scalability, higher availability, and automatic data replication across multiple data centers.Before the launch of Cloud Datastore, developers on Google App Engine used to work on a built-in Datastore that only worked with App Engine apps. When Google Cloud Platform started to grow in the market, developers wanted a database which they could use outside of App Engine to integrate with their apps. They needed more flexibility and wide availability. Cloud Datastore met this need by adding features like automatic sharding, indexing, and support for eventual consistency.
Google launched Cloud Firestore in 2018. It was a new NoSQL database with features such as real-time updates, offline support, and faster query execution. It was supposed to replace Cloud Datastore. New users were encouraged to use Firestore instead. However, since many developers were still creating applications that were using Datastore, Google decided to rename it to "Datastore mode in Firestore" in 2020. By doing so, existing users could continue using the familiar Datastore features with an option to upgrade to Firestore later when needed.
Even after the rise of Firestore, Cloud Datastore is still widely used in legacy applications, especially those apps that need a managed NoSQL database with strong multi-region replication and automatic scaling. Google Cloud continues to support these legacy systems so that they remain reliable and fully functional even in the future.
Overview
Access and management
Users can use the database in Google Cloud Datastore using the Google Cloud Console, the gcloud command-line tool. They can also use client libraries for different programming languages. Based on the user need, they can choose to either use graphical interface or writing a code to interact with the database.Data organization
Data is organized into entities in Google Cloud Datastore. These are like individual records. These entities are grouped into kinds. This is just like tables in a traditional database. However, unlike relational databases, entities in the same kind do not have to follow a fixed structure. They can have different sets of properties.Entities and properties
Each entity represents a structured set of properties. Properties are key-value pairs. Examples of values can be strings, numbers, Booleans, timestamps, arrays, and geographic points. The flexible nature of properties allows developers to model complex data structures without a rigid schema.Entity keys
Every entity in Datastore is uniquely identified by a key. A key includes:- A project ID,
- An optional namespace,
- A kind,
- And a name or numeric ID that distinguishes the entity within its kind.
Data types
Google Cloud Datastore supports a range of property data types as shown below:- String
- Integer
- Float
- Boolean
- Timestamp
- Array
- Embedded Entity
- GeoPoint
- Binary data
Querying and indexes
Datastore automatically indexes each property to enable efficient querying. For more complex queries involving multiple properties, composite indexes can be defined manually. Index management is typically handled through an index.yaml file or through the console.GQL
GQL is a query language just like SQL and it is designed to interact with Google Cloud Datastore. GQL allows users to query the Datastore service using a statements just like SQL, however specifically designed to the NoSQL nature of this platform. GQL provides ways to filter, order, and perform operations on Datastore entities without needing to write complex queries in the underlying datastore APIs.Unlike SQL, GQL is limited in terms of the types of joins and relationships it can handle. However, it supports querying by properties, including equality and inequality, as well as range queries. Users can use GQL to query entities based on multiple conditions. This makes GQL suitable for a wide range of use cases such as retrieving user data, product catalogs, and even updating the database.
GQL also has support for ancestor queries. This lets users to get related entities based on their place in a hierarchy. This is very much needed for applications where we need to manage hierarchical data like a content management systems or data models that have parent-child relationships. Even though GQL can help in simplifying querying, it operates within the constraints of Datastore's eventual consistency model.
Example GQL Query:
This query will get all Task entities with a status of "completed" and a priority of "high". It will also order them by the created timestamp with descending order.
Even though GQL has an easy-to-use interface for querying Google Cloud Datastore, when dealing with more complex queries and joins, we need to use Datastore's native APIs such as the Google Cloud Datastore API. This API has greater flexibility and control for developers. For example, we cannot do below join operation:
We can implement a logic like below code. It is basically a two step process. First, use the Datastore API to fetch the customer by ID. Then, use that customer ID to create another query to retrieve related orders.
from google.cloud import datastore
client = datastore.Client
- step 1
customer = client.get
- step 2
query.add_filter
orders = list)
Best practices for developers
Indexing strategy
As a default design, Cloud Datastore automatically indexes all properties in each entity to support faster querying. However, when dealing with complex queries that have multiple filters or sort orders, it is required to have composite indexes defined manually in an index.yaml file. Developers are required to review query plans and manage indexes carefully, otherwise unnecessary indexes can lead to increased write latency and storage costs. This is detrimental to performance, so it has to be avoided.Query limitations
Datastore does not support joins, subqueries, or aggregation operations like those found in relational databases, such as MS SQL and MySQL. Because of this, application design often requires denormalization. It is a process of storing related data together within a single entity or using entity groups to maintain hierarchical relationships. Query filters are required to match existing indexes, and certain combinations of inequality and sort operations may require custom indexes.Transactions and consistency
Cloud Datastore supports ACID transactions for operations on entities within a single entity group. This enables safe updates to related data, such as parent and child entities. It is noted that single-entity lookups and ancestor queries are strongly consistent, however general queries across multiple entity groups offer eventual consistency.Language and API support
Google Cloud Datastore offers RESTful interface and a gRPC API. This is quite useful for developers who need to design distributed applications. These APIs offer direct access to Datastore's features such as transactions, queries, and entity management. The client libraries are built on top of these APIs and hide complex details. They make it easier to use different languages such as Python, Java, Go, Node.js, and C# by handling things like connection handling, retries, and serialization. Datastore's API is also optimized for giving high-throughput and low-latency access. This API also supports batch operations, ancestor queries, and strongly consistent reads within entity groups. With the help of such features, developers can build scalable applications without the need of handling complex database infrastructure on their own.Query execution
When a user send request to Google Cloud Datastore for any database command like write or a read, the GCP system follows a predefined sequence of steps to perform the required task while also maintaining faster performance, partitioning, and consistency.Write commands (put, update, delete)
As opposed to traditional relational systems that parse SQL queries and optimize execution plans, GCP Datastore uses a NoSQL document-style architecture. Each entity is uniquely identified by a key composed of a kind, an identifier, and optionally a parent key.The very first step in a write operation is to authenticate and check the user's IAM permissions. After the authorization is verified and no issue is found, using a key-based partitioning strategy, the request is routed to the correct entity group and then to a specific Datastore node. The entity's key decides which partition is supposed to handle the request.
If the write targets a single entity group, Datastore offers strong consistency. Writes are serialized and applied in a consistent manner. However, when the write deals with multiple entity groups, it relies on eventual consistency, and transactions or batched writes must be used to handle the operation.
Datastore relies on Google Cloud’s Spanner infrastructure for high data durability and high availability. Each write command is duplicated across multiple zones of the selected region. However this practice is abstracted away from developers. Which means, developers need not get into the technical details that goes behind the database engine while executing the write commands. Each write operation automatically updates and takes care of all relevant composite and single-property indexes.
Read commands (get, query)
Read commands like get and structured queries ) work based on how data is stored and indexed. At the time of performing an entity lookup, a get call uses an entity key to directly get access to the appropriate partition and retrieve the entity. However, if the entity belongs to the same entity group and no replication delay occurs, this read operation is strongly consistent.At the time of query execution, a structured query that involves filtering or sorting translates into an index lookup instead of a full table scan. The query engine uses the relevant indexes to identify entity keys that match the query criteria and then retrieves the full entities.
GCP Datastore has two consistency modes for queries. The first is strong consistency, which applies to ancestor queries or direct key lookups within a single entity group. While the second mode, eventual consistency applies to most queries across unrelated entities or entity groups, where the system sacrifices consistency for higher availability.
The function of query planner is to decide how to execute a given query. Since Datastore does not support joins like traditional relational databases, developers have to perform multiple discrete queries and manually combine all the result outputs. This process is called as "application-side joins." Additionally, the query planner checks whether a given query can be served using existing indexes. If no appropriate index is found, the query will fail unless the developer has pre-defined the index in a separate index.yaml file.