Navigation: (Ambra)

Ambra Content Model

Content models are formal descriptions for classes of data objects (single units of content, complex data objects, or aggregations of data objects) in a Fedora Repository. Each content model documents the datastreams (media and metadata files) and a set of disseminators that the member objects subscribe to. By definition, all objects of the same class subscribe to the same disseminators and have the same configuration of datastreams that must meet certain standards. The disseminators include the behaviors that encode the varying functionality that an end-user or another system would encounter in the use of an object, and the varying programmatic mechanisms needed to exhibit those behaviors. In case of Topaz, we probably will not use Fedora disseminators, but rely on external applications to do the rendering based on local rules.

Topaz will support a variety of content and at this stage we see a inherent difficulty in defining a single content model to encompass everything. Our plan is to start by documenting the content model of each individual services and build or work towards a single model (if achievable). The content models for Topaz are:

Article to Fedora-Object mapping

An article has a DOI, and consists a "main" part (e.g. the pmc.xml) plus a bunch of related objects (e.g. images, videos, spreadsheets, etc) each with their own DOI. Each object (DOI) may have one or more representations; e.g. the main article text may be available in both XML and PDF, or an image may be available as TIFF and as PNG, or data may be available in a spreadsheet and a graph. Lastly, there may be one or more versions of an article and all it's associated objects. An article will be versioned as a whole, though, so an update to any part (whether the main part or any related object) implies a new version for all parts.

The triple (DOI, version, representation) will be mapped into Fedora and RDF as follows:

DOI Fedora PID
version (some RDF lookup ->) Fedora date range
representation Fedora datastream

The main and related parts are linked together via RDF statements: the main part contains a list of <topaz:hasMember> predicates, and each related object contains a reverse link property <topaz:isMemberOf>.

The above mapping has some properties worthy of note:

  • different representations have same DOI, since they represent the same "thing"
  • because currently RDF statements can only be attached to Fedora objects, no direct statements about an individual datastream can be made. One hack around this is to create specially named predicates that contain the ID of the datastream in the name, e.g. <topaz:PDF-objectSize>.
  • the ID of a datastream is related to the PID of the object: <pid>/<ds-id>

Open question: can a handle server resolve <doi>/<ds-id>?

RDF Schema

Dublin Core

The RDF representation of DC is badly defined, though this is being rectified. E.g., DCMES-XML (Expressing Simple Dublin Core in RDF/XML) defines the domain of dc:creator as rdfs:Literal, whereas DCQ-RDFXML (Expressing Qualified Dublin Core in RDF/XML) defines the domain as a node. New versions of the specs (DC-RDF and DC-RDF-NOTES) are cleaning this up. While still working drafts, they promise cleaner (and more interoperable) definitions and seem more future proof, so we'll follow those.