Aggregates functions like SUM
, MIN
, MAX
, AVG
and GROUP_CONCAT
provide an easy way to aggregate over
a complete dataset, or groups within a dataset in combination with a GROUP BY
clause. This provides an easy
way to summarise information, but often we want to know more about specific solutions within a group.
For demo purposes, consider a dataset consisting of information about books, an excerpt of which is shown below:
base <http://example.com/id/>
prefix schema: <http://schema.org/>
<book/1> a schema:Book ;
schema:name "The Hobbit" ;
schema:genre "Fantasy" ;
schema:pages 367 ;
schema:author <person/1> .
<book/2> a schema:Book ;
schema:name "The Lord Of The Rings: The Fellowship Of The Ring" ;
schema:genre "Fantasy" ;
schema:pages 404 ;
schema:author <person/1> .
<book/2> a schema:Book ;
schema:name "The Lord Of The Rings: The Two Towers" ;
schema:genre "Fantasy" ;
schema:pages 450 ;
schema:author <person/1> .
<book/2> a schema:Book ;
schema:name "The Lord Of The Rings: The Return Of The King" ;
schema:genre "Fantasy" ;
schema:pages 496 ;
schema:author <person/1> .
<person/1> a schema:Person ;
schema:name "J.R.R. Tolkien" .
Use case: finding the longest book in each genre.
Solutions:
MAX
aggregate, then split the during projection to give filenameprefix schema: <http://schema.org/>
select ?genre (strafter(max(concat(?Book_pages, "//", ?Book_name)), "//") as ?Book_pages_name)
where {
[] a schema:Book ;
schema:name ?name ;
schema:pages ?pages ;
schema:genre ?genre .
}
group by ?genre
This quick approach can satisfy many simpler use cases, but starts to fall apart when you want to return more information about the solution within a group (especially if that information consists of RDF terms that are not strings or plain literals). Also this approach will return once solution from each group, in this case the book with most pages and the alphabetically last name. In some use cases, we may want to know all books with the maximum number of pages within each group.
prefix schema: <http://schema.org/>
select *
where {
{
# find max pages per genre
select ?genre (max(?pages) as ?max_pages) {
[] a schema:Book ;
schema:genre ?genre ;
schema:pages ?pages .
}
group by ?genre
}
?Book a schema:Book ;
schema:name ?name ;
schema:genre ?genre ;
schema:pages ?max_pages .
}
Use case: How many authors have X number of books?
Solutions:
As demonstrated in the query:
prefix schema: <http://schema.org/>
select ?count_books (count(*) as ?count_authors)
where {
select ?author (count(*) as ?count_books) {
[] a schema:Book ;
schema:author ?author .
}
group by ?author
}
group by ?count_books