Introduction

Apache Marmotta can optionally be extended with a Linked Data Caching Module that transparently retrieves resources from the Linked Data Cloud when they are needed (e.g. when querying a relation to a remote resource) and caches them transparently. Linked Data Caching is integrated at the triple store level and thus available to all services accessing the triple store, including the SPARQL endpoint.

Functionality

Transparent Linked Data Access

The Linked Data Caching module is a powerful component for integrating with other Linked Data servers, either on the public Linked Data Cloud or from local severs deployed in an intranet. It provides transparent access to resources on Linked Data-aware servers. When enabled, it is triggered when the triples of a non-local resource are requested from the triple store. A typical case where this can happen is when a local resource links to a resource in the Linked Data Cloud using a triple and a query to the system requests information about this external resource.

For example, the FOAF description of a user in the Marmotta might contain a reference to the FOAF file of a user at some external location:

		local:peter foaf:knows <http://example.com/john.rdf>
	

A query could then ask for all the names of persons that peter knows, e.g. in SPARQL:

		SELECT ?name WHERE { local:peter foaf:knows ?p . ?p foaf:name ?name }
	

When evaluating the query, the Linked Data Caching module would then transparently retrieve the resource http://example.com/john.rdf and try to answer the query using the triples contained therein.

Local Caching

When the triple data of a remote resource are retrieved, they are cached locally in the Linked Media Framework's triple store in a special named graph called "cache", together with provenance and cache expiry information. The cache graph is special in the sense that resources in this graph are by default not considered for direct indexing in the semantic search (i.e. a foaf:Person retrieved from the Linked Data Cloud will not be returned by the Semantic Search component), and triples and resources in this graph are not included in versioning.

A query to a resource is answered from the local cache as long as the entry is not expired, i.e. subsequent queries to a resource will be significantly faster than the first query as long as they are carried out within a certain time frame. The expiry date of a resource in the cache is determined in two ways:

Modes of Operation

Since the Linked Data Cloud is distributed over the Web and some services might not provide the reliability or availability necessary, or some servers are not yet Linked Data aware, the Linked Data Caching Module offers different modes of operation for accessing resources:

The different modes of operation can be configured as described below. In addition to the RDF/XML format, the Linked Data Caching Module also supports additional RDF serialisation formats that can be configured for certain Linked Data servers.

Configuration

The Linked Data Caching Module can be configured using Marmotta configuration mechanism. The following section describes the configuration options used.

Configuration Options

The following options affect the general behaviour of the Linked Data Caching Module:

Endpoint Configuration

The Linked Data Caching Module allows to define URI prefix to endpoint mappings. These can be used to either redirect Linked Data queries to cache or SPARQL endpoints for improving performance or accessing non-Linked Data resources, or for setting different parameters for certain server. An endpoint definition consists of the following parameters:

Webservice

Endpoints are configured using the Linked Data Caching Webservice available under <APPDIR>/cache/endpoint ; the Webservice provides the following operations for managing endpoints:

In addition to managing endpoints, the webservice also offers two methods for retrieving resources:

For your convenience, the Marmotta administration interface offers a simple UI for adding, listing and removing endpoint definitions.