Tutorial

We will demonstrate in this tutorial how to query our triplestore, and return the results in a form that we can then manipulate for the purposes of visualisation.

Creating our query

Once we have the Embarkation Roll data transformed into RDF triples and stored in a triplestore, we can query our triplestore using SPARQL queries. For example, to determine which countries New Zealander personnel died in during WWI, we could use a query such as the following:

PREFIX awm:<http://muse.aucklandmuseum.com/databases/cenotaph#>

SELECT *
WHERE
{
 {
  SELECT ?namePosition (COUNT(?namePosition) AS ?numberDied)
  WHERE {
    ?person awm:diedAt_country ?diedAt .
    ?diedAt awm:addressConstitutedOf ?addressConstitutedOf .
    ?addressConstitutedOf awm:namePosition ?namePosition .
  }
  GROUP BY ?namePosition
  HAVING (COUNT(?namePosition) > 3)
  ORDER BY DESC(?numberDied) ?namePosition
 }
UNION
 {
  SELECT ?namePosition (COUNT(?numberDied) as ?numberDied)
  WHERE
  {
   SELECT ?namePosition (COUNT(?namePosition) AS ?numberDied)
   WHERE {
     ?person awm:diedAt_country ?diedAt .
     ?diedAt awm:addressConstitutedOf ?addressConstitutedOf .
     ?addressConstitutedOf awm:namePosition ?nameOfPosition .
     BIND ( "Other" AS ?namePosition)
   }
   GROUP BY ?nameOfPosition ?namePosition
   HAVING (COUNT(?namePosition) <= 3)
   ORDER BY DESC(?numberDied) ?namePosition
  }
  GROUP BY ?namePosition
 }
}       

This query is slightly complicated because of the union which we use to show individually those countries where more than three personnel died (the first part of the union) as well as combining those countries where three or less personnel died into an "Other" result item (the second part of the union), but it does demonstrate some of the functionality available with SPARQL.

This query would provide us with a result similar to the following (as displayed by the Sesame OpenRDF Workbench):

Obtaining results in a more useful format

Using software such as Sesame is fine if we are only interested in storing and querying the triplestore locally.

However, what we really want is to be able to query the triplestore remotely, and to have the results returned in a regular form that we can manipulate programmatically. As Sesame allows us to expose a SPARQL end-point for remote querying, with a bit of extra work and the help of the phpSesame library, we can write a script to query a remote installation of Sesame and have results returned as either XML or JSON, which we can parse more easily for visualisation purposes:

{
  "head": {
    "vars": [ "namePosition", "numberDied" ]
  }, 
  "results": {
    "bindings": [
      {
        "namePosition": { "type": "literal", "value": "France" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "4469" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Belgium" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "2759" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Turkey" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "2061" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "New Zealand" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "489" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Greece" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "45" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Auckland" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "7" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "England" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "7" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Australia" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "6" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Palestine" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "5" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Egypt" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "4" }
      }, 
      {
        "namePosition": { "type": "literal", "value": "Other" }, 
        "numberDied": { "type": "typed-literal", "datatype": "http:\/\/www.w3.org\/2001\/XMLSchema#integer", "value": "33" }
      }
    ]
  }
}       

A script to run our query

To acheive the above results, we can fashion a page such as the following:

<?php 
  /* build the SPARQL query*/
  $sparql = '
    PREFIX awm:<http://muse.aucklandmuseum.com/databases/cenotaph#>

    SELECT *
    WHERE
    {
       {
          SELECT ?namePosition (COUNT(?namePosition) AS ?numberDied)
          WHERE {
            ?person awm:diedAt_country ?diedAt .
            ?diedAt awm:addressConstitutedOf ?addressConstitutedOf .
            ?addressConstitutedOf awm:namePosition ?namePosition .
          }
          GROUP BY ?namePosition
          HAVING (COUNT(?namePosition) > 3)
          ORDER BY DESC(?numberDied) ?namePosition
       }
    UNION
       {
          SELECT ?namePosition (COUNT(?numberDied) as ?numberDied)
          WHERE
          {
             SELECT ?namePosition (COUNT(?namePosition) AS ?numberDied)
             WHERE {
               ?person awm:diedAt_country ?diedAt .
               ?diedAt awm:addressConstitutedOf ?addressConstitutedOf .
               ?addressConstitutedOf awm:namePosition ?nameOfPosition .
               BIND ( "Other" AS ?namePosition)
             }
             GROUP BY ?nameOfPosition ?namePosition
             HAVING (COUNT(?namePosition) <= 3)
             ORDER BY DESC(?numberDied) ?namePosition
          }
          GROUP BY ?namePosition
       }
    }';

  /* retrieve the results of the SPARQL query */

  $url = 'http://cenotaph.mebooks.co.nz/endpoint.php';
  $ch = curl_init( $url );

  curl_setopt( $ch, CURLOPT_POST, true );
  // 'format' can either be 'json' or 'xml' (defaults to 'xml' if not specified).
  curl_setopt( $ch, CURLOPT_POSTFIELDS, array('query' => urlencode($sparql), 'format' => $_GET['format'], 'user' => '1234') );
  curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true );
  curl_setopt( $ch, CURLOPT_HEADER, true );
  curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true );          
  curl_setopt( $ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT'] );
  // supress the "100 Continue" header.
  curl_setopt( $ch, CURLOPT_HTTPHEADER, array( 'Expect:' ) );

  $response = curl_exec( $ch );
  list( $header, $contents ) = preg_split( '/([\r\n][\r\n])\\1/', $response, 2 );
  $status = curl_getinfo( $ch );
  curl_close( $ch );
  
  // spit out the results.
  echo "$contents;\n";          

Running this script allows us to either obtain the results as JSON or obtain the results as XML. Note that we have opted to use a scripting language here, as using javascript is problematic because of the cross-domain restriction and the fact that we are doing a POST.

Note also that we have passed three parameters as part of our POST request:

  • query: the SPARQL query
  • format: the desired return format ('json' or 'xml')
  • user: the token identifying us as a user.

The user token allows the service to track usage, which will be important if the performance of the service becomes affected by long-running queries. In order to obtain a valid value for the user token, please email us at mebooks.consulting@gmail.com and we'll supply a token to allow you to query this service.

Visualising our results

Finally, with the use of a javascript charting library such as highcharts, we can create a script such as pie-country-of-death.php (which includes retrieve_sparql.php) to transform this JSON into a more easily-understood graphic (click on the graphic to see the script run in realtime):

Other resources