Background

The Research Vocabularies Australia Portal can present a browse visualization of vocabulary data that has been described in SKOS format. Such vocabulary data can be complex; in particular, the SKOS model of concept schemes, collections, and concepts is very flexible, admitting cycles, polyhierarchies, nested collections, etc.

A key observation that has driven design and development to this point is that almost all of the vocabulary data published in RVA is tree-structured. We do have some vocabularies that have polyhierarchies, but they are still "tree-like", and it is meaningful to present such data in a tree-like way.

In fact, the visualization is of a forest, not a tree. The use of the term "tree" is historical: the initial implementation was known as the "browse tree". The following description may occasionally blur the distinction between tree and forest.

Things that have to be taken into account

Hierarchies

Certain SKOS relationships either directly state, or imply, a hierarchical relationship between resources.

Non-hierarchies

Gotchas

Deep dive

The rest of this document takes a "deep dive" into the implementation of the browse visualisation. For the most part, the implementation tries to do "what you expect". If you want to know why the implementation behaves the way it does, this description may provide the answers. For further clarification, please contact services@ardc.edu.au.

Terminology

We start with some terminology drawn from the Vocabulary Registry Schema.

The following terminology related to vocabulary data and its presentation comes up in various places in the rest of this document. These definitions are exclusive in this sense: a statement of the form "we say that X is Y if Z" is to be read as "we say that X is Y if and only if Z".

Exclusions

Both the RDF model and the SKOS model are flexible to a ridiculous degree. For example, an RDF list may have any number of "first" elements, e.g., zero, or three. We could try to support all of the possibilities, but we have instead decided to take a pragmatic approach that supports what we expect that people will actually use, including, in particular, the behaviour of PoolParty, and excludes as many pathological cases as possible.

These possibilities are excluded:

How things should be

This section describes, using a question/answer format, what a user can expect of the behaviour of the visualization.

The graph of resources and relations

The SKOS relations (skos:broader, skos:narrower, skos:inScheme, skos:member, etc.) between resources in the vocabulary induce a directed graph, in which the resources are nodes, and the relations are edges. The processing of this graph has to produce a forest. To do this, the processing uses a depth-first search to produce a spanning forest. The construction of the spanning forest starts with roots that are the resources that have been determined should appear at the top level of the visualization, consistent with the description in the previous section. The configuration of, and therefore the result of, this depth-first search depends on the settings of the includeConceptSchemes and includeCollections browse flags.

If includeConceptSchemes is true, and includeCollections is false

We construct a graph in which every edge representing the broader/narrower relation is coloured. There is a colour for every concept scheme, and a special colour none to represent separation from all concept schemes.

For every concept scheme CS, we consider the graph to have a directed edge that is coloured with CS's colour from CS to every concept that has been chosen to appear as a direct descendant of CS.

For every pair of concepts C1 and C2 such that both C1 and C2 are marked as belonging to a concept scheme CS, if C1 is marked as broader than C2, then we consider the graph to have a directed edge from C1 to C2 that is coloured with CS's colour.

For every pair of concepts C1 and C2 such that both C1 and C2 are not marked as belonging to any concept scheme, if C1 is marked as broader than C2, then we consider the graph to have a directed edge from C1 to C2 that is coloured with the special colour none.

No other edges are added between concepts. To be specific: for every edge in the completed graph that is between two concepts C1 and C2 and which is of colour c, it is the case that either (a) c is the colour of a collection CS, and both C1 and C2 are marked as belonging to CS, or (b) c is the special colour none, and both C1 and C2 are not marked as belonging to any concept scheme.

The processing of the graph is then a set of depth first searches of the edge-coloured graph, one for each colour, starting from each concept scheme, and from the concepts that are not marked as belonging to any concept scheme and which have been chosen to appear at the top level of the visualization. The result is forest which is the union of the spanning forests produced by each search.

If includeConceptSchemes is false, and includeCollections is false

In this case, we ignore membership of concept schemes. We construct a graph in which edges are not coloured. For every pair of concepts C1 and C2, if C1 is marked as broader than C2, then we consider the graph to have a directed edge from C1 to C2.

The processing of the graph is then one depth first search of the graph; the result is a spanning forest.

If includeCollections is true

The above subsections cover the cases in which includeCollections is false. If it is true, then there is a graph node for each collection CO, and for each resource R marked as directly belonging to CO, there is a directed edge from CO to R. The depth-first search includes searches starting at the nodes for each collection that has been chosen to appear at the top level of the visualization. For the purposes of cycle detection, only collection nodes are considered, as the SKOS broader/narrower hierarchy of concepts plays no part in the visualization of collections.

Sort orders

There are currently two supported sort orders. The order "Label" was originally known as "Preferred label", since it was based on the values of the skos:prefLabel predicate. Now, a resource's label might come from the value of the dcterms:title or rdfs:label predicate.

Sort by "Label"

Sort by "Label" means: to order resources X and Y:

Sort by "Notation"

Sort by "Notation" means: to order resources X and Y: