-
Notifications
You must be signed in to change notification settings - Fork 20
Cross shard Relationships: Xlinks (OLAP)
[Table of Contents](https://github.com/dell-oss/Doradus/wiki/OLAP Databases: Table-of-Contents) | [Previous](https://github.com/dell-oss/Doradus/wiki/OLAP Sharding Model) | [Next](https://github.com/dell-oss/Doradus/wiki/Shard Aging (OLAP))
OLAP Data Model: Cross-shard-Relationships:-Xlinks (OLAP)
With Doradus OLAP, links can only reference objects in the same shard. A link such as Person.Manager cannot reference an object in another shard. (In fact, setting Person.Manager
to a given object ID implicitly creates the inverse object in the same shard if it does not already exist.) This means that some objects may need to be duplicated in multiple shards so that each shard is a complete graph, allowing queries to work efficiently. Because OLAP stores data compactly, the duplication is worthwhile since same-shard link path evaluation is extremely fast.
However, sometimes object duplication is not practical, and relationships must span shards. For example, suppose we want to track messages in the same conversation thread. Since replies and forwards can be sent at any future date, messages in the same thread may reside in any shard. We could add a scalar field ThreadID
that identifies messages in the same thread and query for a given value across shards. For some scenarios, this may be sufficient.
But in some cases, we may want to treat the object relationships in a way that allows us to use link paths. For these cases, Doradus OLAP supports a cross-shard field type called an xlink.
Xlinks are similar to regular links: a pair of xlinks are defined in the schema as inverses of each other, forming a bi-directional relationship. However, xlinks are not explicitly assigned: relationships are _implicitly _formed via foreign keys called _junction _fields. In its definition, each xlink identifies its junction field, which is a text field whose values point to related objects, which reside in the same and/or other shards. An example for connecting objects in a message thread is shown below:
<table name="Message">
<fields>
<field name="ThreadID" type="Text"/>
<field name="InReplyTo" type="XLINK" table="Message" inverse="Responses"
junction="ThreadID"/>
<field name="Responses" type="XLINK" table="Message" inverse="InReplyTo"
junction="_ID"/>
...
</fields>
</table>
Here is how the xlinks InReplyTo and Responses work:
- The
_ID
of a _root _message that begins a new conversation thread is used as the thread ID. - When a new message is created that is **not **part of another thread, we set its ThreadID to its own _ID. That is, every root message is the initially the only member of its own thread.
- When other messages are created (replies or forwards) in the same message thread, we set their
ThreadID
to the root message’s_ID
, even if the root message resides in another shard. - We can then traverse
Message.Responses
to navigate from the root message to other messages in the same thread. To do this, Doradus takes the root message’s_ID
(because it is the junction field forResponses
) and searches for messages in other shards whoseThreadID
matches (because it is the junction field for the inverse link,InReplyTo
). - Similarly, we can traverse
Message.InReplyTo
to navigate from any message back to the root message. In this case, Doradus takes the message’sThreadID
and searches for another message with a matching_ID
.
One consideration used in this example is shard merging. In an OLAP database that uses time-oriented shards, we generally want to add data to **new **shards, which are then merged. We don’t want to modify data in older shards if possible because this requires extra merging. In the example above, message threads are formed by simply setting the ThreadID
of newer messages. Older messages in the thread, including the root message, are never modified, hence we don’t need to merge older shards.
Although xlinks are similar to regular links, there are differences in how they are declared and used:
- The inverse of an xlink must also be an xlink. In the example above,
Responses
andInReplyTo
are inverses. Although these xlinks both belong toMessage
, in general xlinks can relate objects between any tables. - Each xlink identifies a _junction _field, which must be a text field belonging to the same table or the
_ID
field. The junction field is a foreign key to related objects. In a given relationship, at least one xlink must_ID
field as its junction field. If the junction field is not explicitly defined, it defaults to the_ID
field. - One xlink can use a text field as its junction field. This is the normal practice for most use cases. In the example above:
-
- The xlink
InReplyTo
definesThreadID
as its junction field. This means an object is related viaInReplyTo
to the message(s) whose_ID
matches itsThreadID
. ** The xlinkResponses
uses_ID
as its junction field. This means an object is related viaResponses
to the message(s) whoseThreadID
matches its_ID
.
- The xlink
- If both xlinks in a relationship use
_ID
as their junction field, each object is related to objects with the same object ID. This is allowed even if the xlinks are defined in different tables. - A xlink’s junction field can be an MV text field, thereby allowing the xlink to refer to multiple objects in each shard.
Xlinks form _soft _relationships, hence no referential integrity is assured. When a junction field is assigned a value, there may or may not exist any foreign objects with a matching value. Likewise, if two objects are related, the relationship may be broken by altering the junction field value, deleting one of the objects, or shard aging. Traversing an xlink whose junction field doesn’t match any foreign objects acts as if the xlink is null.
In aggregate queries, xlinks can be used anywhere regular links are used: query expressions, aggregate grouping expressions, and metric expressions. Doradus OLAP searches the shards defined by the shards
or range
parameter for _perspective _objects, and it searches shards defined by the xshards
or xrange
parameter for objects related via xlinks. For example:
GET /Email/Message/_aggregate?m=COUNT(*)&q=_ID=XYZ&shards=2014-01-01&xrange=2014-01-01
&f=Responses.Sender.Person.Department&range=0
This query counts the messages in the thread rooted by the message with _ID=XYZ
, grouped by the Sender. Person.Department
of each response. Only shard 2014-01-01
is searched for the root message; all shards named 2014-01-01
or greater are searched for objects related to the xlink Responses. See the section [Doradus Query Language (OLAP)](https://github.com/dell-oss/Doradus/wiki/Doradus Query Language (OLAP)) for more details on query parameters.
Using xlinks in queries is slower than regular links. Consequently, they should be used only in those cases where normal links are not feasible.
Technical Documentation
[Doradus OLAP Databases](https://github.com/dell-oss/Doradus/wiki/Doradus OLAP Databases)
- Architecture
- OLAP Database Overview
- OLAP Data Model
- Doradus Query Language (DQL)
- OLAP Object Queries
- OLAP Aggregate Queries
- OLAP REST Commands
- Architecture
- Spider Database Overview
- Spider Data Model
- Doradus Query Language (DQL)
- Spider Object Queries
- Spider Aggregate Queries
- Spider REST Commands
- [Installing and Running Doradus](https://github.com/dell-oss/Doradus/wiki/Installing and Running Doradus)
- [Deployment Guidelines](https://github.com/dell-oss/Doradus/wiki/Deployment Guidelines)
- [Doradus Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Doradus Configuration and Operation)
- [Cassandra Configuration and Operation](https://github.com/dell-oss/Doradus/wiki/Cassandra Configuration and Operation)