RDF Reasoning¶
Remember that an RDF graph is composed of triples which state facts, such as:
data = """
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://my.org/> .
@prefix : <#> .
:spot a :Dog ; :name "Spot" .
:rex a :Dog ; :name "Rex" .
:minka a :Cat ; :name "Minka" .
:Dog rdfs:subClassOf :Animal .
:Cat rdfs:subClassOf :Animal .
"""
We already saw a number of examples where logical deductions would make sense, as in this case:
- from the fact that :spot is a :Dog, and
- :Dog is a subclass of :Animal, there follows
- :spot is an :Animal .
However, the expression 'there follows' is from our point of view as humans associating meaning with words.
The last statement is not in the graph, only the facts that we entered:
import rdflib
g = rdflib.Graph()
g.parse(data=data, format='turtle')
print('Triples:', len(g))
Triples: 8
So if we query the graph for a list of the names of our animals, we get nothing:
for row in g.query("SELECT ?a WHERE { ?s a :Animal . ?s :name ?a }"):
print(row['a'])
We already know a trick that can help us in this particular situation:
q = """
SELECT ?a WHERE {
?s a/rdfs:subClassOf* :Animal .
?s :name ?a
}
"""
for row in g.query(q):
print(row['a'])
Spot Rex Minka
We are manually doing the 'reasoning', in this case querying not only subjects that are in the :Animal class, but also in any subclass of :Animal.
How can we make the expression
?s a :Animal .
apply to all members of subclasses in our graph, without using the /* notation?
This is a simple example of reasoning: inferring additional facts from known ones.
There are many implementations for RDF reasoners that can deal with this sort of problems, and many other, but instead of using one of those like a black box, we want to know what is happening -- we write our own mini reasoner.
Implementing a Mini-Reasoner¶
For our toy graphs this is not such a daunting task. We already have the means to infer additional facts from existing ones: queries.
Remember that this corresponds to the way that additional facts can be derived from the contents of relational tables, if we interpret them as collections of facts:
- Given a DB table with the names and clubs of soccer players,
- we can derive the number of players per club.
For the situation above we have to add the statements that can be inferred from the existing ones, such as:
g.parse(data="@base <http://my.org/> . @prefix : <#> . :spot a :Animal .",
format='turtle')
<Graph identifier=N03a43909dd08475cb7ebf3b9dd68332a (<class 'rdflib.graph.Graph'>)>
We added another statement, so now we have the following triples in our graph:
for s,p,o in g:
print(s,p,o)
print('Triples:', len(g))
http://my.org/#Dog http://www.w3.org/2000/01/rdf-schema#subClassOf http://my.org/#Animal http://my.org/#spot http://my.org/#name Spot http://my.org/#rex http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Dog http://my.org/#minka http://my.org/#name Minka http://my.org/#rex http://my.org/#name Rex http://my.org/#spot http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Dog http://my.org/#minka http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Cat http://my.org/#spot http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Animal http://my.org/#Cat http://www.w3.org/2000/01/rdf-schema#subClassOf http://my.org/#Animal Triples: 9
Now we have a statement about :spot being an :Animal, and therefore:
for row in g.query("SELECT ?n WHERE { ?s a :Animal . ?s :name ?n }"):
print(row['n'])
Spot
subClassOf¶
Let's find the subjects that are in a class that is a subclass of some other class:
q = """
SELECT ?s ?c2 WHERE {
?s a ?c1 .
?c1 rdfs:subClassOf ?c2
}
"""
for row in g.query(q):
print(row)
(rdflib.term.URIRef('http://my.org/#spot'), rdflib.term.URIRef('http://my.org/#Animal')) (rdflib.term.URIRef('http://my.org/#rex'), rdflib.term.URIRef('http://my.org/#Animal')) (rdflib.term.URIRef('http://my.org/#minka'), rdflib.term.URIRef('http://my.org/#Animal'))
For each such case we add another statement to the graph:
from rdflib.namespace import RDF
q = """
SELECT ?s ?c2 WHERE {
?s a ?c1 .
?c1 rdfs:subClassOf ?c2
}
"""
for s, c in g.query(q):
print('found:', s, c)
g.add((s, RDF.type, c))
print('Triples:', len(g))
found: http://my.org/#spot http://my.org/#Animal found: http://my.org/#rex http://my.org/#Animal found: http://my.org/#minka http://my.org/#Animal Triples: 11
Note how we went from 9 triples to 11 and not to 12, since we already added :spot as an :Animal.
Stating a known fact again does not have any effect, the graph and therefore the number of triples stay the same.
domain and range¶
To see that this approach is not limited to subclass relations we add another functionality to ourlittle reasoner: domains and ranges of properties.
We add some more definitions to our graph:
data ="""
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://my.org/> .
@prefix : <#> .
:owns rdfs:domain :Person .
:owns rdfs:range :Animal .
:john :owns :spot .
:suzi :owns :minka .
"""
g.parse(data=data, format='turtle')
print('Triples:', len(g))
Triples: 15
This gives us 4 more statements. However, from the domain we learn about the subject, so let's add that to our reasoner:
from rdflib import Namespace
#n = Namespace('http://my.org/#')
for s,d in g.query("SELECT ?s ?d WHERE { ?s ?p ?o . ?p rdfs:domain ?d }"):
print('found:', s, d)
g.add((s, RDF.type, d))
print('Triples:', len(g))
found: http://my.org/#john http://my.org/#Person found: http://my.org/#suzi http://my.org/#Person Triples: 17
We have not explicitely stated that the two people are in class :Person, yet after our reasoner did its work they are returned in queries:
for row in g.query("SELECT ?s WHERE { ?s a :Person }"):
print(row[0])
http://my.org/#john http://my.org/#suzi
☆ N3 Rules¶
The N3 specification includes rules that can be used for reasoning, see https://w3c.github.io/N3/spec/ for details.
Let's start with some facts:
%%file facts.n3
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://example.org/>.
:spiderman a :SuperHero, :Man .
:greengoblin a :SuperHero, :Goblin .
Overwriting facts.n3
We want to mark everything that is a superhero and a goblin as imaginary:
%%file rules.n3
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://example.org/>.
{ ?x a :SuperHero . ?x a :Goblin . } => { ?x a :Imaginary } .
Overwriting rules.n3
Challenge: come up with Python code to apply such rules!
Here is a very naive approach:
- Getting to the facts and the parts of the rule is not too difficult, since N3 rules are part of the graph, with their own QuotedGraph type
- both the left side and the right side of the rule can have more than one statement
- When we look at rules like the one above it becomes clear that all variables used in the right part must occur in the left part, but not vice versa
- Using graph.query() gives the results of the left side, then we need to bind the proper values to (subj, prop, obj) of the right side depending on type (Variable or URIRef)
- Apparently it is not possible to serialize a QuotedGraph without the @prefix statements, so here we are relying on an empty line after those statements; not clean.
import rdflib
from owlrl import DeductiveClosure
from owlrl import OWLRL_Semantics
# make select statement from QuotedGraph
def makequery(g, vars):
# bad
txt = s.serialize(format='n3').strip()
i = txt.find('\n\n') # relying on empty line after @prefix
# select all variables found in the left side of the rule
v = ' '.join([ '?' + x for x in vars ])
q = 'select ' + v + ' where { ' + txt[i+2:] + ' }'
return q
# replace variables with the value in the current query result row
def bind(x, row):
if type(x) == rdflib.term.Variable:
return row[x]
else: # URIRef
return x
g = rdflib.Graph()
g.parse('facts.n3')
g.parse('rules.n3')
# find rules in the graph and apply them
for s,p,o in g:
if p == rdflib.term.URIRef('http://www.w3.org/2000/10/swap/log#implies'):
print('rule left side:')
for a,b,c in s:
print('s:', type(a), a)
print('p:', type(b), b)
print('o:', type(c), c)
print('rule right side:')
vars = {}
for d,e,f in o:
print('s:', type(a), d)
print('p:', type(b), e)
print('o:', type(c), f)
for x in (d,e,f):
if type(x) == rdflib.term.Variable:
vars[x] = 1
q = makequery(s, vars)
print('query:', q)
for row in g.query(q):
for d,e,f in o:
print('add:')
print('d:', type(d), bind(d, row))
print('e:', type(e), bind(e, row))
print('f:', type(f), bind(f, row))
g.add((bind(x, row) for x in (d,e,f)))
print('graph now:')
print(g.serialize(format='n3'))
rule left side: s: <class 'rdflib.term.Variable'> x p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type o: <class 'rdflib.term.URIRef'> http://example.org/SuperHero s: <class 'rdflib.term.Variable'> x p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type o: <class 'rdflib.term.URIRef'> http://example.org/Goblin rule right side: s: <class 'rdflib.term.Variable'> x p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type o: <class 'rdflib.term.URIRef'> http://example.org/Imaginary query: select ?x where { ?x a :Goblin, :SuperHero . } add: d: <class 'rdflib.term.Variable'> http://example.org/greengoblin e: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type f: <class 'rdflib.term.URIRef'> http://example.org/Imaginary graph now: @prefix : <http://example.org/> . :greengoblin a :Goblin, :Imaginary, :SuperHero . :spiderman a :Man, :SuperHero . { ?x a :Goblin, :SuperHero . } => { ?x a :Imaginary . } .
Our naive approach works as a proof of concept.
For practical applications better use an existing reasoning engine, such as
EYE (Euler Yet another proof Engine), download and install from
There is much more to reasoners and how they can be implemented efficiently for large graphs, but we have covered a fair amount of basics, and now it is again your turn:
Exercises¶
find more applications for reasoning
specify
- a query
- the expected result
- the triples to be added to the graph
- make it happen in Python
moderate experience in programming:
- implement your own additional reasoning rules
- test them on small toy problems
substantial coding skills and huge motivation:
- build on the code above
- apply one pattern only, e.g. ?x someprop someobj => ?x otherprop otherobj
- extend to more patterns..