Jaql
Jaql is a functional data processing and query language most commonly used for JSON query processing on big data.
It started as an open source project at Google but the latest release was on 2010-07-12. IBM took it over as primary data processing language for their Hadoop software package .
Although having been developed for JSON it supports a variety of other data sources like CSV, TSV, XML.
A comparison to other BigData query languages like PIG Latin and Hive QL illustrates performance and usability aspects of these technologies.
Jaql supports lazy evaluation, so expressions are only materialized when needed.
Syntax
The basic concept of Jaql issource -> operator -> sink ;
where a sink can be a source for a downstream operator. So typically a Jaql program has to following structure, expressing a data processing graph:
source -> operator1 -> operator2 -> operator2 -> operator3 -> operator4 -> sink ;
Most commonly for readability reasons Jaql programs are linebreaked after the arrow, as is also a common idiom in Twitter :
source -> operator1
-> operator2
-> operator2
-> operator3
-> operator4
-> sink ;
Core operators
Source:Expand
Use the EXPAND expression to flatten nested arrays. This expression takes as input an array of nested arrays and produces an output array, by promoting the elements of each nested array to the top-level output array.Filter
Use the FILTER operator to filter away elements from the specified input array. This operator takes as input an array of elements of type T and outputs an array of the same type, retaining those elements for which a predicate evaluates to true. It is the Jaql equivalent of the SQL WHERE clause.Example:
data = ;
data -> filter $.manager;
data -> filter $.income < 30000;
,