For a current project we need to load data from an RDF graph store to populate tables in an existing SQL database. The approach we chose is to use SPARQL SELECT queries to express the mapping from the graph to tabular model. The CSV results from such a query can be used to populate the tables in the target database.
One of the columns in the target table contains coded values, where each code indicates the ‘type’ of the thing the row describes.
The codes are two or three letters (e.g. FE
, FA
, NMF
) where each code has a predetermined meaning.
However the RDF data used more types than in the target database, so it was necessary to create an n:1 mapping.
An example of such mappings using Schema.org classes is:
schema:Person --> "PE"
schema:Book --> "BK"
schema:MusicAlbum --> "ALB"
schema:TVClip --> "TV"
schema:TVSeries --> "TV"
schema:TVEpisode --> "TV"
One option would be to express these mappings in RDF, load them to the graph store and query over them. The above mappings might be written in Turtle as:
@prefix schema: <http:></http:> .
@prefix dct: <http:></http:> .
schema:Person dct:identifier "PE" .
schema:Book dct:identifier "BK" .
schema:MusicAlbum dct:identifier "ALB" .
schema:TVClip dct:identifier "TV" .
schema:TVSeries dct:identifier "TV" .
schema:TVEpisode dct:identifier "TV" .
If this was loaded to the RDF graph store along with the dataset, we can do a query like this:
PREFIX schema: <http:></http:>
SELECT ?name ?type_code
WHERE {
?s a ?type ;
schema:name ?name .
?type dct:identifier ?type_code .
}
In case it makes sense to somehow partition the data, the code list could be loaded to a named graph, or to a separate SPARQL endpoint and use federation.
However in this case, as the value list is only really applicable to the target database, we decided to use the VALUES
clause in SPARQL to associate the code to the class.
In some ways this is simpler as all the logic is encapsulated in a single SPARQL query, which should be easier to maintain.
PREFIX schema: <http:></http:>
SELECT ?name ?type_code
WHERE {
VALUES (?type ?type_code) {
(schema:Person "PE")
(schema:Book "BK")
(schema:MusicAlbum "ALB")
(schema:TVClip "TV")
(schema:TVSeries "TV")
(schema:TVEpisode "TV")
}
?s a ?type ;
schema:name ?name .
}
Should it be necessary to have some ‘otherwise’ case, the VALUES
claue can be wrapped in an OPTIONAL
clause, then use COALESCE
function to provide a default binding for any solution where ?type_code
variable is not bound:
PREFIX schema: <http:></http:>
SELECT ?name ?type_code_default
WHERE {
?s a ?type ;
schema:name ?name .
OPTIONAL {
VALUES (?type ?type_code) {
(schema:Person "PE")
(schema:Book "BK")
(schema:MusicAlbum "ALB")
(schema:TVClip "TV")
(schema:TVSeries "TV")
(schema:TVEpisode "TV")
}
}
BIND (COALESCE(?type_code, "NK") as ?type_code_default)
}