The Embarkation Rolls as Linked Data

Although incredibly useful, and supplemented with data from many other sources, the pages served up by a search of the Cenotaph database are not all that helpful when we want to start answering questions such as:

  • How many New Zealanders took part in WWI?
  • What were the relative contributions of the different regions?
  • What was the makeup in terms of occupation and age?
The answers to questions such as these are obviously inherent in the data, though we need to perform more work manipulating it before we can obtain any answers. Once we have answers to questions such as these, we will then be in a position to work towards determining answers to even more meaningful questions such as:
  • What was the social makeup of those that took part in WWI?
  • What effect did the embarkation of so many people have on New Zealand society?

Semantic Web technologies

The way to be able to discover answers to these questions is to make the Embarkation Roll data available in a more regular and granular form, and then construct ad-hoc queries which can be processed against it by a computer. Thankfully people have been thinking about this sort of problem for a while, and semantic web technologies offer us some solutions.

Specifically, Resource Description Framework (RDF) offers a way of storing data such as that contained in the Embarkation Rolls in a regular and granular form, while query languages such as SPARQL allows us to apply ad-hoc queries against this data to retrieve meaningful results.

Other benefits of using a Semantic Web approach are:

  • The ability to merge our knowledge with other domains of knowledge (i.e. third-party websites) which overlap with ours, and allow querying of the merged datasets
  • The ability to offer sophisticated browsing across and between the collections which are described in the merged datasets
  • The ability to improve and extend our underlying model without requiring extensive data manipulation or migration when we do so

Querying the data

Armed with these technologies, we can start to formulate ways of asking questions of our data. The SPARQL query to the right demonstrates how we can determine the strength (number of personnel) of the various units involved in the embarkations, while the results are displayed underneath.

Once we have results such as these, we can present them in a visual fashion, such as using the pie-chart shown opposite, top.

Pie chart summarising the strength of units.

SELECT ?unitTitle (COUNT(?unitTitle) AS ?numberUnits)
  ?person awm:hasMembership ?unit .
  ?unit awm:unitTitle ?unitTitle .
GROUP BY ?unitTitle
ORDER BY DESC(?numberUnits) ?unitTitle
SPARQL query used to produce data for the pie chart above

Query results as returned by Sesame Open RDF Workbench (truncated)