SPARQL
SPARQL (Sparql Protocol and RDF Query Language) is a query language for RDF.
Sparql endpoints are services that are identified by URLs and are capable of processing queries and returning results.
In this manner an organization can provide a service for accessing its data both in an interactive, and, more importantly, also an automated manner.
An increasing number of endpoints are becoming available; there are several sources that list available Sparql endpoints, such as
- https://www.wikidata.org/wiki/Wikidata:Lists/SPARQL_endpoints
- https://www.w3.org/wiki/SparqlEndpoints
The DBPedia project extracts structured data from Wikipedia and makes this data available in various formats:
Taking a look at a sample page such as http://dbpedia.org/page/Spain
we find the various properties that
seem to be associated with countries:
- dbo:capital
- dbp:populationCensus
In the following example we will query the endpoint to get a list of countries with their populations and capitals.
Sparql queries work on triples, so they can be understood as a list of restrictions that iteratively narrow down the result.
This does not mean that the query is actually executed in this fashion.
The following queries can be executed on the DBPedia Sparql endpoint. We limit our results so as not to overload the endpoint.
E.g. to select the names of all things in the triple store:
SELECT ?name WHERE { ?x rdfs:label ?name } LIMIT 10
And to narrow it down to things with a capital:
SELECT ?name WHERE {
?x rdfs:label ?name .
?x dbo:capital ?c }
LIMIT 10
Since DBPedia is multi-lingual we narrow down further to English versions of the names:
SELECT ?name WHERE {
?x rdfs:label ?name .
?x dbo:capital ?c
filter(lang(?name) = 'en')
} LIMIT 10
Add some prefixes, do a little more browsing through the DBPedia page on Spain, and we can formulate our query.
To automate the process in a Jupyter notebook we use the Python SPARQLWrapper package to access the endpoint.
- Prepare the query:
from SPARQLWrapper import SPARQLWrapper, JSON, CSV
sparql = SPARQLWrapper("http://dbpedia.org/sparql")
sparql.setQuery("""
PREFIX rdfs:
PREFIX dbo:
PREFIX dbp:
SELECT ?place ?pop ?capital
WHERE {
?x dbp:populationCensus ?pop .
?x dbo:capital ?c .
?x rdfs:label ?place .
?c rdfs:label ?capital
filter(lang(?place) = 'en')
filter(lang(?capital) = 'en')
}
LIMIT 20
""")
sparql.setReturnFormat(JSON)
- Execute the query (wait a few seconds between executions of this statement):
data = sparql.query().convert()
- Display the results (accessing the proper fields of the JSON record is somewhat tedious):
for res in data["results"]["bindings"]:
print(res["place"]["value"], res["pop"]["value"], res["capital"]["value"])
And there you have it, although the list includes various types of populated places:
Albania 2821977 Tirana
Geography of Saint Pierre and Miquelon 7036 Saint-Pierre, Saint Pierre and Miquelon
Anguilla 13452 The Valley, Anguilla
Nevis 12106 Charlestown, Nevis
Algeria 37900000 Algiers
Greece 10815197 Athens
Egypt 72798000 Cairo
Indonesia 237424363 Jakarta
Angola 25789024 Luanda
Norway 5214890 Oslo
Czech Republic 10436560 Prague
Latvia 2070371 Riga
Italy 59433744 Rome
Bosnia and Herzegovina 3531159 Sarajevo
Nigeria 140431790 Abuja
Ghana 24200000 Accra
Madagascar 12238914 Antananarivo
Mali 14517176 Bamako
Central African Republic 4987640 Bangui
Belize 324528 Belmopan
Exercises:
- Find more interesting and useful applications of DBPedia Sparql queries.
- Do not overload the endpoint:
- Wait a few seconds before executing another query.
- Limit your results to a small number of rows with the LIMIT option.