Property Mapping
Overview
Main goal of Topaz libray is to help applications in mapping RDF graphs into objects. Typically applications that work with RDF graphs will have to write queries to get a list of RDF statements and map the results onto an existing application data-structure. And similarly changes made to the application data-structure is then written back to the Triple-Store.
For large applications, this task becomes tedious and error prone quite fast. With the Topaz library, the application annotates its data-structures as you saw in Annotations Chapter, and work with the Topaz Session API to load and save these objects.
This chapter explains the details of this mapping.
Types of Mapping
Forward Mapping
Majority of mapping usage belongs to the forward mapping category. This is where a property value is mapped to the following form:
<id> <property> $value

An example is :
article:23 --> topaz:hasAuthor --> author:42
See inverse attribute in Predicate for how this is configured. The value for inverse attribute should be 'false' to indicate forward mapping. This is also the default case.
Reverse Mapping
While majority of mapping requirements for an application belongs to the forward mapping case, there are also some cases where you would want to know the reverse.
For example the statement:
article:23 --> topaz:hasAuthor --> author:42
can be forward mapped into an Article object or reverse mapped into an Author object. The reverse mapping in this example is making the assertion that if an Article object has a topaz:hasAuthor statement that connects it to this Author, then this Author can claim that article as something he/she wrote.
So reverse mapping is mapped as :
<$value> <property> <id>

See inverse attribute in Predicate for how this is configured. The value for inverse attribute should be set to 'true' to indicate reverse mapping.
Is this inferencing?
From a conceptual level this does look like inferencing. However from an implementation level, all Topaz does is issue the query differently.
In other words Topaz is not inferring new statements either at insert time or query time to perform the reverse mapping.
[TBD: Does this need more explanation?]
Handling cardinality
In RDF, there is no requirements to have any {subject, predicate} pair to be unique. For example, an article with multiple authors can be expressed as:
article:23 --> topaz:hasAuthor --> author:42 article:23 --> topaz:hasAuthor --> author:43 article:23 --> topaz:hasAuthor --> author:44
This is perfectly acceptable for Topaz. However this set of values may only be loaded to a a non-scalar field.
In POJO mapping, this means a java array or java.util.collection or a custom application data-structure for which the application has registered a Binder. However it is a Run-time exception to map this to a scalar property field.
See collectionType attribute in Predicate for how this is configured. The value for collectionType attribute should be set to CollectionType.PREDICATE to indicate multi-valued properties. This is also the default value.
Cardinality and Queries
As in the load and save of objects case above, queries also treats collections and scalar properties in the same way. For example, the following Oql query:
select a from Article a, Author p where a.author.givenname = 'Joe';
will work the same way if the 'author' property in Article is mapped to a scalar field or a collection field.
So in short, scalar property is a special case where the max-cardinality is '1'. Therefore Topaz does not treat these any differently anywhere.
Rdf Collections and Containers
A common enough usage in RDF is the rdf:list and rdf:bag constructs to represent ordered sets of values. It is possible to create the mappings needed to represent the RDF Collections and RDF Containers using the normal forward mapping described above. However this becomes cumbersome and error prone fairly soon; especially when writing queries involving association attributes.
Topaz makes it easy for applications to work with Rdf Collections and Containers. Most applications ony care about an ordered set of values that the Rdf collections and Rdf containers provide. The intermediate blank nodes in the graph are not of any interest to them. Therefore Topaz transparently maps the blank nodes involved internally and only maps the useful property values to application's data-structures. This way the mapped property appears to the application as a multi-valued collection - much the same as a predicate with no max-cardinality restriction.
See collectionType attribute in Predicate for how this is configured. The value for collectionType attribute should be set to CollectionType.RDFLIST to indicate Rdf Collections and any one of CollectionType.RDFBAG or CollectionType.RDFSEQ or CollectionType.RDFALT to indicate Rdf Container usage.
Note: Before RDF Containers can be used in your application, Mulgara Prefix Resolver needs to be configured. Please see Topaz/Manual/Section03#ConfiguringPrefixResolver for help in configuring this.
Rdf Collection Example
An example from http://www.w3.org/TR/REC-rdf-syntax/#collections:
This can be mapped to a POJO as:
@Entity class Course { .... .... List<Student> getStudents() {...} @Predicate(uri="http://example.org/students/vocab#students", collectionType = CollectionType.RDFLIST) void setStudents(List<Student> students) { ... } .... }
That is all there is to it. Now you can do queries of the form:
select c from Student s, Course c where c.students = s and s.givenname = 'Johann';
This will return all courses taken by any student with the givenname 'Johann'. Notice how this query is the same regardless of how the underlying mapping is. In this case, all of the details of transitively traveling through the graph is taken care of by Topaz.
Rdf Container Example
From http://www.w3.org/TR/REC-rdf-syntax/#containers, the same example above can be represented as an rdf:bag as follows:
The only thing we need to change is the collectionType mapping:
@Entity class Course { .... .... List<Student> getStudents() {...} @Predicate(uri="http://example.org/students/vocab#students", collectionType = CollectionType.RDFBAG) void setStudents(List<Student> students) { ... } .... }
Which should I use?
There is no difference in queries or any other usage by the application since Topaz handles the details of Rdf Collection/Container mapping.
One difference however may be significant. Queries involving containers are faster to execute than collections since there is no transitive traversal required. Therefore it is better to prefer CollectionType.RDFSEQ instead of CollectionType.RDFLIST.
However if a List semantics is not a requirement (ie. allowing duplicates and ordering), then the preference should be given to CollectionType.PREDICATE. This is because of two reasons:
- queries are faster
- reverse mapping is only possible for CollectionType.PREDICATE mapped fields.
Mapping values
In RDF, the object of a statement can be a literal value or a resource. Topaz can be configured to handle the expected type of a property (the rdfs:range).
This is done by the following three Options in the RdfDefinition object configured in the SessionFactory for a given property: (See below for @Predicate options)
- objectProperty - true or false and indicates if the value is a resource or literal
- dataType - the data type in case of Literals
- associatedEntity - the name of the entity corresponding to the value. eg. 'Author' in a list of authors for an 'Article'.
This also corresponds to the following properties in the @Predicate annotation:
- type - determines if the value is a literal or a resource
- dataType - the dataType or the special flag, UNTYPED to indicate untyped. By default, the dataType is guessed by Topaz.
The associatedEntity is inferred by the annotation parser.
Literal Values
Topaz can convert Literal values from their lexical form to any object that the application wants as long as a 'Serializer' for it is defined. By default Topaz has serializers defined for all primitive java types and their object forms (eg. 'int' and java.lang.Integer' etc.) as well as the commonly used classes like the java.util.Date and java.util.Calendar. The following shows how additional 'Serializers' can be added:
class MySerializer implements Serializer<MyClass> { public String serialize(MyClass o) throws Exception { return (o == null) ? null : o.toString(); } public MyClass deserialize(String o, Class c) throws Exception { return MyClass.valueOf(c, o); } } // Add this serializer for our class with a custom data-type. sessionFactory.getSerializerFactory().setSerializer(MyClass.class, Rdf.xsd + "myType", new MySerializer(), true);
Once this is setup, the preload() or preloadFromClassPath() will start treating MyClass.class as a serializable literal - no different than the default literals.
Resource Reference Values
A property value can reference another graph node. Topaz can be configured to treat these as raw values or an association to another Entity.
Blobs
From an RDF graph point of view, blobs are really literal values that can potentially be quite large. While it may be possible to store these directly in a triple-store, the storage and retrieval mechanisms that a triple-store API provides does not adequately satisfy the performance expectations of most applications. This is where specialized blob-stores come into the picture. Blobs differ from literals in three aspects:
- a separate dedicated store
- additional streaming APIs
- stored as {id, value} pairs as opposed to triples
Restrictions on Blob fields
Because of the differences between blobs and literal values, Topaz imposes the following restrictions on usage of Blobs:
- there can only be one field in an entity that is a Blob. You can see that this really is coming from the 'pair' storage restrictions of a Blob-Store.
- the id of the blob is the same as the id field of the entity. If you recall from the sections above, the id field is the field that is annotated with the @Id annotations.
- blobs can be streamed or the entire content stored in a field as a 'byte[]' or java.lang.String field. For Blobs that are streamed, Topaz is the factory that creates these Blob objects. This restriction is coming from the need to support a getOutputStream() mechanism to let applications write directly to the underlying Blob.
Blob features to note
The shared common features with triple-stores and blob-stores are:
- all writes and reads are within a transaction scope
- writes that are not committed are not visible to other transactions
- if a transaction is rolled back, all writes to the blobs are rolled back too.


