Introduction to the
Constellation Query Language
(Copyright 2008 Clifford Heath)
The Constellation Query Language (CQL) is a language for constructing and querying fact-based information models. It is based on the graphical Object Role Modeling language, with some terminology adopted from the Semantics of Business Vocabulary and Business Rules.
Because CQL has an open vocabulary, and because it is designed to resemble natural language, the formal grammar has potential ambiguities. Most of these ambiguities can be automatically resolved, and the resolutions will become obvious as the syntax rules unfold. As described here, CQL uses English language keywords and expressions, though variants of CQL are being defined that substitute keywords and expressions from other languages. Some of these keywords are disallowed as open vocabulary, where they would create unresolvable ambiguities. This includes all the logical operators, such as and, or, maybe, not, none, no, only if, etc. Other keywords such as is, identified, kind, of, as, at, etc, are allowed to be used anywhere open vocabulary is allowed, and their special meaning applies only in the specific places noted.
CQL is case sensitive. "Person" is not the same
thing as "person". It's conventional, though not required, to
use a capital letter for all object type names.
White space and comments as used in C and C++ are allowed:
/* comment
may span lines */ and
// introduces a comment to end of the current line.
CQL Definitions
A fact-based model (known in CQL as a vocabulary) comprises definitions of the following kinds:
- Object Types designated by name, including entity types and value types.
- Fact Types each designated by one or more readings. Fact types declare a relationship between object types, or a boolean property of a single object type. A fact type may be designated by a name (objectified), which allows its use as an object type in other fact types. A fact type may be derived from a query, analogous to an SQL view.
- Instances of object types and fact types as examples or as reference data.
- Constraints which restrict the allowed object instances and facts within a valid population.
- Units used to automate value conversion.
A CQL file must start with a vocabulary definition, and may import elements from one or more other vocabularies.
- statement:
- definition:
An import definition imports object type names from another vocabulary, possibly using the alias syntax to rename some terms. In addition, fact type readings from the imported vocabulary may subsequently be included in new definitions which provide translations specific to this vocabulary.
- import_def:
An object type represents a type of thing that can be perceived or conceived. Each object type has at least one name (or term), and the word object type as used throughout this document implies the use of one of these terms. An object type definition starts with its name, and is one of the following kinds. Names in CQL are case sensitive, and it’s conventional practice to use initial capital letters for object type names (this is required in Object Role Modeling but not in CQL). It is however a good way of allowing object type names to be to be distinguished from the same words in lower case, where they may occur in fact type readings.
- object type:
Note that although shown above, a fact type is only an object type if it is named (objectified).
Value Types
A Value Type is a kind of thing which has a single value that may be written down, that is, a lexical type, like a number, a name, a date, etc.
A value type is usually derived from another value type, where top-level value types are defined in an imported vocabulary. A value type may refine its supertype by the use of length and scale parameters where relevant. (The ability to define custom parameters is anticipated in a revision of the language.) A value restriction might also limit the allowable values from those allowed by the supertype; these restrictions are discussed below.
- value type:
- value type details:
Top-level value types are defined by self-reference, e.g.
Integer is written as Integer;
or implicitly by being used as a supertype for another value type. However, top-level types must be known to any underlying mapping layer, such as a procedural language or relational database, in order to be used with that layer.
A unit definition defines a new unit identifier in terms of an optional coefficient (real number or integer fraction) multiplied by one or more base units, each raised to an integer power. It's common to define the singular form of a unit, then also define the plural as equivalent.
- unit_def:
-
- unit_derivation:
-
- unit:
An extensive library of unit definitions is provided, and you can define your own.
Entity Types
Each Entity Type plays roles in at least one fact type, and is identified by the combination of one or more such roles. At least one identifying role must be mandatory. CQL uses the closed-world assumption for non-mandatory identifying roles, which means that the same identifier (with the same role missing) may not occur more than once in a population.
- entity type:
- supertypes:
- identification:
- id_fact_types:
- role_ref:
The simplest form of entity identification scheme is by a single role value, for which the reference mode shorthand is provided, as in:
Thing is identified by its Value;
Value is assumed to be (or created as) a value type. A subtype of Value is assumed (or created) called ThingValue. The Thing is then associated in an identifying one-to-one fact type with ThingValue. The result is equivalent to saying:
ThingValue is written as Thing; Thing is identified by ThingValue where Thing has one ThingValue, ThingValue is of at most one Thing;
If the default fact type readings (has/is of) aren't appropriate, you can provide one or more alternative readings. The required uniqueness and mandatory constraints are still added where needed.
Thing is identified by its Name where Thing is called ThingName;
The full form of identification must be used where a new entity type is identified by its relationship to another entity type, where adjectives are applied, or where the entity type has more than one identifying role. In this last case, at least one reading must be provided for each fact type involved in the identification. Normally these readings will embed the appropriate uniqueness constraints to create each one-to-many relationship. Each fact type reading in an entity type definition must involve the entity type and one of the identifying roles, and no other roles, as in the folowing example.
Note that hyphens are used here to indicate the use of adjectives, which can be either leading or trailing as required by the language. The hyphen is only required once within a declaration, and this associates the adjective with that role player throughout this declaration.
Person is identified by given-Name and family-Name where Person is called one given Name, given Name is of Person, Person has one family Name, family Name is of Person;
Note that the uniqueness constraints for these fact types are shown. They could alternatively be provided later, but must be included within the same vocabulary.
Hyphens may be used to designate multiple adjectives, but must have a space beside the hyphen, on the side of the existing object type name. Otherwise the pair of (previously unseen) words is treated as a simple hyphenated term:
suitably- trained Person is allowed to drive semi-trailer;
Finally, when the full form of identification is used, but there are fact type readings all referring to the same roles none of which is the defined entity type, these are the readings of a new fact type, which is objectified (named) as the new entity type. This is discussed below, but in this case, the entity type has an identification scheme which is not drawn from the fact type roles.
Subtypes
An entity type may be declared to be a subtype (or more informally, using the word “kind”) of one or more other entity types, the supertypes. Any subtype may play any of the roles of its supertypes. A subtype may have its own identification scheme, but doesn't need to. It will be identified by its relationship with its first supertype.
Apple is a kind of Fruit; Employee is a kind of Person identified by its Number;
ShelfLife is written as Time in days; Perishable has at most one ShelfLife; Fruit has one Price per kg; Apple is a kind of Fruit, Perishable;
In these fact types, each apple must have a price and may record a shelf life.
Declaring a subtype creates subtyping fact types, which is useful when subtyping relationships must be constrained.
Fact Types
Fact types are declared as one or more fact type readings. Each reading provides a verbal description of the relationship between two or more object types, or a property of a single object type. Each object type is referred to as "playing a role" in the fact type.
All the readings of a fact type must have the same set of role players. The first reading of a fact type is the default reading, and provides the identification scheme when needed (when the fact type is objectified).
A derived fact type is followed by its derivation conditions, introduced by where.
- fact_type:
- clause_list:
- clause:
- qualifiers:
Each fact type reading may contain a quantifier expression before the last role player, which can assert mandatory, uniqueness or frequency constraints over the allowed population of instances of that fact. A qualifier that asserts a ring constraint (see below) may also follow a reading.
Note that a fact type does not have to be named; it can simply be a set of readings. Naming makes the fact type an object type, and is required whenever no unique quantifier exists in the fact type, or where there are more than two roles. Where a fact type isn’t named, it cannot play roles in other fact types. This example shows an un-named and a named (objectified) fact type:
Person was born at one birth-Place; Directorship is where Person directs Company;
If an object type plays more than one role in a fact type, the separate roles must be distinguished by either adjectives or a defined role name. See the resolution rules under Readings for more details.
The qualifiers are encased in square brackets, and are most commonly used for ring constraints. See the section on constraints for more details. The 'maybe' outer-join qualifier is only used in derivation clauses. Derivations are discussed below, under queries.
- derivation:
Readings
Fact type readings are a sequence of roles and linking words:
- reading:
- fact_role:
- role_name_def:
The syntax here is ambiguous, because a fact_role consists of one or more IDs with optional hyphens to indicate adjectives, interspersed with arbitrary linking words (also IDs). So how is the ambiguity resolved?
First of all, CQL looks through all the readings in this definition, and finds the occurrences of known object type names. Each occurrence may be followed by the definition of a role name, or may have one or more associated adjectives attached using a hyphen (dash). Adjectives may not be the same as the name of any object type, or of any local role name, and the complete term (with adjectives) must likewise be unique.
A subsequent scan finds all occurrences of these role names and terms with adjectives (which now don't require the hyphen). Any remaining words are the open vocabulary which forms the reading that designates this fact type.
When a fact type reading is re-iterated in order to invoke an existing fact type, for example in a constraint, in a derivation or to add a new reading to an existing fact type, there may be adjectives in the definition of the invoked fact type. These also must be matched in the reading.
Embedded Presence Constraints
The final role in a reading may be preceeded by a quantifier, which does not form a part of the reading. These are normally used to apply a uniqueness constraint (at most one), a mandatory constraint (at least one) or both (exactly one or just one). Other forms allow various other role frequency constraints. The quantity here may be a positive integer or the word one. Note that some quantifiers are only used in derivations or in constraints.
In this way, CQL absorbs many of the uniqueness, mandatory and frequency constraints of Object Role Modeling.
The only ORM characteristic that cannot be expressed this way is a non-mandatory constraint having a minimum frequency above one; for example a constraint that allows zero, or more than two, occurrences. For example, in a footy tipping competition, it might be the case that if a participant submits no tips this week, they get the tips published by a known tipster, but if they do submit tips, they must submit at least eight. This kind of non-mandatory frequency constraint may be expressed in CQL using the maybe qualifier, which is also used in outer join derivations.
maybe Participant entered at least 8 Tips
Value Restrictions
A value restriction may follow a role reference where the role is played by a value type (or by an entity type ultimately identified by a single value type), and this constrains the allowed values of that value type in this role. In addition to fact type definitions, a value or a value restriction may be applied to fact instances and in derivations, where it has the obvious effect.
In addition to the value restrictions that can apply to value types, a role played by a value type may be restricted to specified values or value ranges:
- restriction:
- range:
- numeric_range:
- string_range:
Note that the ranges in a value restriction may be open ended at one end.
Constraints
Quantifiers allow the definition of the most common kinds of constraints, the mandatory, uniqueness and frequency constraints (collectively, CQL calls these presence constraints). Often there are constraints that cannot be expressed in this form however, such as when an object type must play one of many unrelated roles. This is handled in CQL by the use of an external constraint definition, or with a ring constraint qualifier.
- constraint
- mandatory_or_exclusive_constraint
Mandatory (and either-or) constraints
When a single role player must play one and only one (or at least one) of a set of roles, we can say:
each Range occurs at least one time in Range has minimum-Bound, Range has maximum-Bound; for each ReceivedItem exactly one of these holds: ReceivedItem is for PurchaseOrderItem, ReceivedItem is for TransferRequest;
for each Unit exactly one of these holds: Unit is fundamental, that Unit is derived from some base-Unit;In the case where one of two fact types applies, you can use the more natural form:
either Unit is fundamental or Unit is derived from some base-Unit but not both;
External Uniqueness Constraints
For example, supposing that we were to identify Person instances by given name and family name (not a good idea in a real system!) we need to ensure that the combination given name, family name is unique. We can say:
each family Name, given Name occurs at most one time in Person is known by given-Name, Person has family-Name;
Subset Constraints
When one role may be played only if another is, you can use a subset constraint:
Address has third-StreetLine only if Address has second-StreetLine;
Note that this example didn’t use the first and second StreetLine, as we assume that the first StreetLine is a mandatory part of the address, so the subset constraint would be redundant.
Equality Constraints
Equality constraints declare that the populations of two or more roles (or sequences of roles) are the same. They are expressed using ‘if and only if’:
Competition is in Series if and only if Competition has series-Number;
Ring Constraints
When a fact type includes the same object type more than once, or includes a supertype and its subtype, there’s the possibility of the same instance playing both roles. This is often not desired, but further it introduces a whole class of further situations which can be restricted using ring constraints. The CQL keywords used in fact clause qualifiers for ring constraints are the following:
intransitive, transitive, acyclic and symmetric. Intransitive means that just because “A relates to B”, and “B relates to C”, that doesn’t mean that “A relates to C”. Transitive means the opposite. Acyclic means that no A may relate to itself, or to any B that has that relation to A, and so on. Symmetric means that if A relates to B, B also relates to A (so there is only one fact instance possible between A and B).
This method for defining ring constraints is not fully general, and a new syntax is required for covering complex cases
Join constraints
Most of the above constraint types may use joins, where more than one fact type is joined together with the and keyword. A full discussion is beyond the scope of this paper, but here’s a small example of a subset constraint using a join:
Diplomat speaks Language; Country uses Language, Language is spoken in Country; Diplomat serves in Country; Diplomat serves in Country only if Country uses Language and Diplomat speaks Language;
This constraint requires that in order to serve in a country, a diplomat must speak at least one language used in that country. Many joins may be contracted using who or that. This example contracts like this:
Diplomat serves in Country only if Diplomat speaks Language that is spoken in Country;
Fact Instances
When a fact reading is invoked with values, a fact instance is created. The simplest is where a declaration is just an object type name followed by a value:
Name ‘Fred’;
This form is allowed for any value type, or any entity type that’s identified by a single value type (or an entity identified by a single entity identified by a single value type, etc). In more complex cases, it might be necessary to invoke more than one fact type to define the instance:
Person is called given name ‘Fred’, Person has family Name ‘Bloggs’;
or
given Name ‘Fred’ is of Person who has family Name ‘Bloggs’;
The Person instance being defined is a reference to the same instance in each fact type reading; there is an implicit join over the two clauses.
An objectification join allows traversal of an objectified fact type:
Directorship (where Person ‘Fred Bloggs’ directs Company ‘Acme, Inc’) began on appointment Date ‘2010-04-27’;
Fact Derivation (Queries)
Fact derivation is a large subject by itself, so we’ll leave a detailed coverage for a future paper. Here’s a short introduction however.
When a fact type has the optional clauses that define its derivation, the population of that fact type is derived as a query over the fact types it invokes. Each condition clause is either a comparison, which compares a role value with a constant or another role value, or a fact type reading.
For a reading, at least one role must match an occurrence of that role in the new fact type or in another reading in the derivation. This match defines a logical join operation across the fact populations. Where the same object type occurs in different roles, and those roles are not to be joined, additional adjectives or defined role names may be used to separate the roles.
It’s possible to negate any invoked fact type by inserting the word no into it as a quantifier, or not as an additional linking word. Here are some examples of fact type derivations (the full fact type definition is elided):
Person has family Name, family Name = ‘Bloggs’, Person is not called given Name ‘Fred’, Person is a kind of Employee, Employee is managed by no Manager;
Result Constellations
A fact derivation may include a returning clause. Normally, when processing a query, only the object instances that play the roles of the derived fact type (and satisfy the query) will be available in the results, and there is no defined ordering in the values. When the returning clause is used, additional object and fact instances may be accessible from the result, which may also be sorted. This extension of the result set is transitive, so that if a derived fact type invokes another derived fact type, the returned instances from the invoked fact type’s returning clause will also be available.
The results now include more than just a simple table of the instances that play the roles of the derived fact type. Instead, each object instance may be associated with additional facts for other roles it plays, and the roles of those facts will be populated by further object instances, and so on. This data structure is hereby defined as a constellation, which is where CQL gets its name. The query has selected certain instances from the entire fact population, much as an astronomer might select stars from the night sky.
The use of returning doesn’t change the contents of
the defined fact type, it’s merely a pragmatic instruction to
the query engine about which additional instances will be useful to
the calling program, and in what order.