AGLDWG LD Catalogue

This instance of the ld.cat tool is the Australian Government Linked Data Workng Group's instance.

This following content is the same as the README page from this system's code repository.

LD Cat

A simple Lined Data catalogue tool that contains both a harvester and a web display framework.

This web-based catalogue tool harvests the metadata for and lists Datasets, Linksets, definitional items and anything else available as Linked Data that is "pointed at" (given the identifying URI for). It only contains information extracted from the items via their URIs and doesn't store anything locally except for caching purposes.

Instances

This tool is used for the following catalogues:

The items that constitute the catalogues in the instances above are different and specified per-installation in a config file. The branding they use for their web pages is maintained in branches of this repository.

Catalogue Implementation

Harvester

The harvesting component of this catalogue uses a series of very simple Python programming language scripts to collect Datasets, Linksets, ontologies and tools metadata from their points of truth. It is able to do this very simple since all of those items present basic DCAT (revised) metadata at easy-to-find web addresses and using the RDF Turtle data format.

For example, the GNAF Dataset is online at the persistent URI of http://linked.data.gov.au/def/gnaf and its DCAT (rev.) metadata is accessible by adding the Query Strong Arguments _view and _format to that URI: http://linked.data.gov.au/def/gnaf?_view=dcat&format=text/turtle.

The harvester uses Python's Requests module to retrieve all item's DCAT (rev.) RDF and then it stores it in a Python rdflib data graph. It then applies some rule-based reasoning to that graph using the OWL-RL Web Ontology Language (OWL) rule engine to create generic generic properties from the items' specialised ones.

The harvester validates each item's RDF data by using the pySHACL, SHACL to comparing the information it retries to 'shapes' templates of expected information.

When done, the harvester stores the items' validated information in a single on-disk graph that it can use to service catalogue requests for information (see next section).

Web catalogue

This catalogue uses a very simple Python Flask HTTP framework instance to service requests for the information it contains. In general, it receives a request (someone or some tool clicking on a web link at http://{IMPLEMENTATION_URI}/...) and translates that into a Python function call that accesses the information the catalogue contains. Since the catalogue stores all of its information within an RDF data graph, it uses either rdflib loop queries or SPARQL queries facilitated by rdflib.

Sitemap

The full sitemap of the LocI project's implementation of this catalogue is:

Dependencies

See the requirements.txt standard Python dependency listing file.

License

This code is licensed using the GPL v3 licence. See the LICENSE file for the deed.

Contacts

Author:
Nicholas Car
Senior Experimental Scientist
CSIRO Land & Water, Environmental Informatics Group
nicholas.car@csiro.au

Co-maintainer:
Edmond Chuc
Junior Developer
CSIRO Land & Water, Environmental Informatics Group
edmond.chuc@csiro.au