SPARQL

SPARQL (Sparql Protocol and RDF Query Language) is a query language for RDF.

Sparql endpoints are services that are identified by URLs and are capable of processing queries and returning results.

In this manner an organization can provide a service for accessing its data both in an interactive, and, more importantly, also an automated manner.

An increasing number of endpoints are becoming available; there are several sources that list available Sparql endpoints, such as

The DBPedia project extracts structured data from Wikipedia and makes this data available in various formats:

Taking a look at a sample page such as http://dbpedia.org/page/Spain we find the various properties that seem to be associated with countries:

In the following example we will query the endpoint to get a list of countries with their populations and capitals.

Sparql queries work on triples, so they can be understood as a list of restrictions that iteratively narrow down the result. This does not mean that the query is actually executed in this fashion.

The following queries can be executed on the DBPedia Sparql endpoint. We limit our results so as not to overload the endpoint.

E.g. to select the names of all things in the triple store:

SELECT ?name WHERE { ?x rdfs:label ?name } LIMIT 10

And to narrow it down to things with a capital:

SELECT ?name WHERE { ?x rdfs:label ?name . ?x dbo:capital ?c } LIMIT 10

Since DBPedia is multi-lingual we narrow down further to English versions of the names:

SELECT ?name WHERE { ?x rdfs:label ?name . ?x dbo:capital ?c filter(lang(?name) = 'en') } LIMIT 10

Add some prefixes, do a little more browsing through the DBPedia page on Spain, and we can formulate our query.

To automate the process in a Jupyter notebook we use the Python SPARQLWrapper package to access the endpoint.

  1. Prepare the query: from SPARQLWrapper import SPARQLWrapper, JSON, CSV sparql = SPARQLWrapper("http://dbpedia.org/sparql") sparql.setQuery(""" PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbo: <http://dbpedia.org/ontology/> PREFIX dbp: <http://dbpedia.org/property/> SELECT ?place ?pop ?capital WHERE { ?x dbp:populationCensus ?pop . ?x dbo:capital ?c . ?x rdfs:label ?place . ?c rdfs:label ?capital filter(lang(?place) = 'en') filter(lang(?capital) = 'en') } LIMIT 20 """) sparql.setReturnFormat(JSON)
  2. Execute the query (wait a few seconds between executions of this statement): data = sparql.query().convert()
  3. Display the results (accessing the proper fields of the JSON record is somewhat tedious): for res in data["results"]["bindings"]: print(res["place"]["value"], res["pop"]["value"], res["capital"]["value"])

And there you have it, although the list includes various types of populated places:

Albania 2821977 Tirana Geography of Saint Pierre and Miquelon 7036 Saint-Pierre, Saint Pierre and Miquelon Anguilla 13452 The Valley, Anguilla Nevis 12106 Charlestown, Nevis Algeria 37900000 Algiers Greece 10815197 Athens Egypt 72798000 Cairo Indonesia 237424363 Jakarta Angola 25789024 Luanda Norway 5214890 Oslo Czech Republic 10436560 Prague Latvia 2070371 Riga Italy 59433744 Rome Bosnia and Herzegovina 3531159 Sarajevo Nigeria 140431790 Abuja Ghana 24200000 Accra Madagascar 12238914 Antananarivo Mali 14517176 Bamako Central African Republic 4987640 Bangui Belize 324528 Belmopan

Exercises: