Gremlin (query language)


Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both Online transaction processing|OLTP-based graph databases as well as online analytical processing|OLAP-based graph processors. Gremlin's automata theory|automata and Functional programming|functional language foundation enable Gremlin to naturally support imperative programming|imperative and declarative programming|declarative querying, host language agnosticism, user-defined domain-specific language|domain specific languages, an extensible compiler/optimizer, single- and multi-machine execution models, and hybrid depth- and breadth-first evaluation with Turing completeness.
As an explanatory analogy, Apache TinkerPop and Gremlin are to graph databases what the Java Database Connectivity|JDBC and SQL are to RDBMS|relational databases. Likewise, the Gremlin traversal machine is to graph computing as what the Java virtual machine is to general purpose computing.

History

Vendor integration

Gremlin is an Apache License|Apache2-licensed graph traversal language that can be used by graph system vendors. There are typically two types of graph system vendors: OLTP graph databases and OLAP graph processors. The table below outlines those graph vendors that support Gremlin.
VendorGraph System
Neo4jgraph database
OrientDBgraph database
DataStax Enterprise graph database
Apache Hadoop|Hadoop graph processor
Apache Hadoop|Hadoop graph processor
InfiniteGraphgraph database
JanusGraphgraph database
Cosmos DBgraph database
Amazon Neptunegraph database
ArcadeDBgraph database

Traversal examples

The following examples of Gremlin queries and responses in a Gremlin-Groovy environment are relative to a graph representation of the dataset. The dataset includes users who rate movies. Users each have one occupation, and each movie has one or more categories associated with it. The MovieLens graph schema is detailed below.

user--rated-->movie
user--occupation-->occupation
movie--category-->category

Simple traversals


gremlin> g.V.label.groupCount

>occupation:21, movie:3883, category:18, user:6040



gremlin> g.V.hasLabel.values.min

>1919



gremlin> g.V.has.inE.values.mean

>4.121848739495798


Projection traversals


gremlin> g.V.hasLabel.as.
select.
by.
by.count)

>a:Adventure, b:283

>a:Action, b:503

>a:Sci-Fi, b:276

>a:Mystery, b:106

>a:Western, b:68



gremlin> g.V.hasLabel.as.
where.count.is).
select.
by.
by.values.mean).
order.by.
limit

>a:Godfather, The, b:4.524966261808367

>a:Wrong Trousers, The, b:4.507936507936508

>a:Raiders of the Lost Ark, b:4.47772


Declarative pattern matching traversals

Gremlin supports declarative graph pattern matching similar to SPARQL. For instance, the following query below uses Gremlin's match-step.

gremlin> g.V.
match.hasLabel,
__.as.out.has,
__.as.has,
__.as.inE.as,
__.as.has,
__.as.outV.as,
__.as.out.has,
__.as.
select.groupCount.by.
order.by.
limit

>Star Wars Episode VI - Return of the Jedi=22

>Indiana Jones and the Last Crusade=11

>Abyss, The=9


OLAP traversal


gremlin> g = graph.traversal

>graphtraversalsourcehadoopgraphgryoinputformat->gryooutputformat, sparkgraphcomputer

gremlin> g.V.repeat.has.inV.
groupCount.by.
inE.has.outV).
times.cap

>Star Wars Episode IV - A New Hope