RDF Reasoning¶

Remember that an RDF graph is composed of triples which state facts, such as:

In [1]:
data = """
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://my.org/> .
@prefix : <#> .

:spot a :Dog ; :name "Spot" .
:rex a :Dog ; :name "Rex" .
:minka a :Cat ; :name "Minka" .

:Dog rdfs:subClassOf :Animal .
:Cat rdfs:subClassOf :Animal .
"""

We already saw a number of examples where logical deductions would make sense, as in this case:

  • from the fact that :spot is a :Dog, and
  • :Dog is a subclass of :Animal, there follows
  • :spot is an :Animal .

However, the expression 'there follows' is from our point of view as humans associating meaning with words.

The last statement is not in the graph, only the facts that we entered:

In [2]:
import rdflib

g = rdflib.Graph()

g.parse(data=data, format='turtle')
print('Triples:', len(g))
Triples: 8

So if we query the graph for a list of the names of our animals, we get nothing:

In [3]:
for row in g.query("SELECT ?a WHERE { ?s a :Animal . ?s :name ?a }"):
    print(row['a'])

We already know a trick that can help us in this particular situation:

In [4]:
q = """
SELECT ?a WHERE { 
  ?s a/rdfs:subClassOf* :Animal . 
  ?s :name ?a 
  }
"""

for row in g.query(q):
    print(row['a'])
Spot
Rex
Minka

We are manually doing the 'reasoning', in this case querying not only subjects that are in the :Animal class, but also in any subclass of :Animal.

How can we make the expression

?s a :Animal .

apply to all members of subclasses in our graph, without using the /* notation?

This is a simple example of reasoning: inferring additional facts from known ones.

There are many implementations for RDF reasoners that can deal with this sort of problems, and many other, but instead of using one of those like a black box, we want to know what is happening -- we write our own mini reasoner.

Implementing a Mini-Reasoner¶

For our toy graphs this is not such a daunting task. We already have the means to infer additional facts from existing ones: queries.

Remember that this corresponds to the way that additional facts can be derived from the contents of relational tables, if we interpret them as collections of facts:

  • Given a DB table with the names and clubs of soccer players,
  • we can derive the number of players per club.

For the situation above we have to add the statements that can be inferred from the existing ones, such as:

In [5]:
g.parse(data="@base <http://my.org/> . @prefix : <#> . :spot a :Animal .", 
        format='turtle')
Out[5]:
<Graph identifier=N03a43909dd08475cb7ebf3b9dd68332a (<class 'rdflib.graph.Graph'>)>

We added another statement, so now we have the following triples in our graph:

In [6]:
for s,p,o in g:
    print(s,p,o)
    
print('Triples:', len(g))
http://my.org/#Dog http://www.w3.org/2000/01/rdf-schema#subClassOf http://my.org/#Animal
http://my.org/#spot http://my.org/#name Spot
http://my.org/#rex http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Dog
http://my.org/#minka http://my.org/#name Minka
http://my.org/#rex http://my.org/#name Rex
http://my.org/#spot http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Dog
http://my.org/#minka http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Cat
http://my.org/#spot http://www.w3.org/1999/02/22-rdf-syntax-ns#type http://my.org/#Animal
http://my.org/#Cat http://www.w3.org/2000/01/rdf-schema#subClassOf http://my.org/#Animal
Triples: 9

Now we have a statement about :spot being an :Animal, and therefore:

In [7]:
for row in g.query("SELECT ?n WHERE { ?s a :Animal . ?s :name ?n }"):
    print(row['n'])
Spot

subClassOf¶

Let's find the subjects that are in a class that is a subclass of some other class:

In [8]:
q = """
SELECT ?s ?c2 WHERE { 
  ?s a ?c1 . 
  ?c1 rdfs:subClassOf ?c2 
  }
"""

for row in g.query(q):
    print(row)
(rdflib.term.URIRef('http://my.org/#spot'), rdflib.term.URIRef('http://my.org/#Animal'))
(rdflib.term.URIRef('http://my.org/#rex'), rdflib.term.URIRef('http://my.org/#Animal'))
(rdflib.term.URIRef('http://my.org/#minka'), rdflib.term.URIRef('http://my.org/#Animal'))

For each such case we add another statement to the graph:

In [9]:
from rdflib.namespace import RDF

q = """
SELECT ?s ?c2 WHERE { 
  ?s a ?c1 . 
  ?c1 rdfs:subClassOf ?c2 
  }
"""

for s, c in g.query(q):
    print('found:', s, c)
    g.add((s, RDF.type, c))
    
print('Triples:', len(g))
found: http://my.org/#spot http://my.org/#Animal
found: http://my.org/#rex http://my.org/#Animal
found: http://my.org/#minka http://my.org/#Animal
Triples: 11

Note how we went from 9 triples to 11 and not to 12, since we already added :spot as an :Animal.

Stating a known fact again does not have any effect, the graph and therefore the number of triples stay the same.

domain and range¶

To see that this approach is not limited to subclass relations we add another functionality to ourlittle reasoner: domains and ranges of properties.

We add some more definitions to our graph:

In [10]:
data ="""
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@base <http://my.org/> .
@prefix : <#> .

:owns rdfs:domain :Person .
:owns rdfs:range :Animal .

:john :owns :spot .
:suzi :owns :minka .
"""
In [11]:
g.parse(data=data, format='turtle')
print('Triples:', len(g))
Triples: 15

This gives us 4 more statements. However, from the domain we learn about the subject, so let's add that to our reasoner:

In [12]:
from rdflib import Namespace
#n = Namespace('http://my.org/#')

for s,d in g.query("SELECT ?s ?d WHERE { ?s ?p ?o . ?p rdfs:domain ?d }"):
    print('found:', s, d)
    g.add((s, RDF.type, d))
    
print('Triples:', len(g))
found: http://my.org/#john http://my.org/#Person
found: http://my.org/#suzi http://my.org/#Person
Triples: 17

We have not explicitely stated that the two people are in class :Person, yet after our reasoner did its work they are returned in queries:

In [13]:
for row in g.query("SELECT ?s WHERE { ?s a :Person }"):
    print(row[0])
http://my.org/#john
http://my.org/#suzi

☆ N3 Rules¶

The N3 specification includes rules that can be used for reasoning, see https://w3c.github.io/N3/spec/ for details.

Let's start with some facts:

In [14]:
%%file facts.n3

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://example.org/>.

:spiderman a :SuperHero, :Man .
:greengoblin a :SuperHero, :Goblin .
Overwriting facts.n3

We want to mark everything that is a superhero and a goblin as imaginary:

In [15]:
%%file rules.n3

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix : <http://example.org/>.

{ ?x a :SuperHero . ?x a :Goblin . } => { ?x a :Imaginary } . 
Overwriting rules.n3

Challenge: come up with Python code to apply such rules!

Here is a very naive approach:

  • Getting to the facts and the parts of the rule is not too difficult, since N3 rules are part of the graph, with their own QuotedGraph type
  • both the left side and the right side of the rule can have more than one statement
  • When we look at rules like the one above it becomes clear that all variables used in the right part must occur in the left part, but not vice versa
  • Using graph.query() gives the results of the left side, then we need to bind the proper values to (subj, prop, obj) of the right side depending on type (Variable or URIRef)
  • Apparently it is not possible to serialize a QuotedGraph without the @prefix statements, so here we are relying on an empty line after those statements; not clean.
In [16]:
import rdflib
from owlrl import DeductiveClosure
from owlrl import OWLRL_Semantics

# make select statement from QuotedGraph
def makequery(g, vars):
    # bad
    txt = s.serialize(format='n3').strip()
    i = txt.find('\n\n')  # relying on empty line after @prefix
    # select all variables found in the left side of the rule
    v = ' '.join([ '?' + x for x in vars ])
    q = 'select ' + v + ' where { ' + txt[i+2:] + ' }'
    return q

# replace variables with the value in the current query result row
def bind(x, row):
    if type(x) == rdflib.term.Variable:
        return row[x]
    else: # URIRef
        return x

g = rdflib.Graph()
g.parse('facts.n3')
g.parse('rules.n3')

# find rules in the graph and apply them
for s,p,o in g: 
    if p == rdflib.term.URIRef('http://www.w3.org/2000/10/swap/log#implies'):
        print('rule left side:')
        for a,b,c in s: 
            print('s:', type(a), a)
            print('p:', type(b), b)
            print('o:', type(c), c)
        print('rule right side:')
        vars = {}
        for d,e,f in o:
            print('s:', type(a), d)
            print('p:', type(b), e)
            print('o:', type(c), f)
            for x in (d,e,f):
                if type(x) == rdflib.term.Variable:
                    vars[x] = 1
        q = makequery(s, vars) 
        print('query:', q)
        for row in g.query(q):
            for d,e,f in o:
                print('add:')
                print('d:', type(d), bind(d, row))
                print('e:', type(e), bind(e, row))
                print('f:', type(f), bind(f, row))
                g.add((bind(x, row) for x in (d,e,f)))

print('graph now:')
print(g.serialize(format='n3'))
rule left side:
s: <class 'rdflib.term.Variable'> x
p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
o: <class 'rdflib.term.URIRef'> http://example.org/SuperHero
s: <class 'rdflib.term.Variable'> x
p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
o: <class 'rdflib.term.URIRef'> http://example.org/Goblin
rule right side:
s: <class 'rdflib.term.Variable'> x
p: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
o: <class 'rdflib.term.URIRef'> http://example.org/Imaginary
query: select ?x where { ?x a :Goblin,
        :SuperHero . }
add:
d: <class 'rdflib.term.Variable'> http://example.org/greengoblin
e: <class 'rdflib.term.URIRef'> http://www.w3.org/1999/02/22-rdf-syntax-ns#type
f: <class 'rdflib.term.URIRef'> http://example.org/Imaginary
graph now:
@prefix : <http://example.org/> .

:greengoblin a :Goblin,
        :Imaginary,
        :SuperHero .

:spiderman a :Man,
        :SuperHero .

{
    ?x a :Goblin,
            :SuperHero .

} => {
        ?x a :Imaginary .

    } .


Our naive approach works as a proof of concept.

For practical applications better use an existing reasoning engine, such as

EYE (Euler Yet another proof Engine), download and install from

https://github.com/eyereasoner/eye

There is much more to reasoners and how they can be implemented efficiently for large graphs, but we have covered a fair amount of basics, and now it is again your turn:

Exercises¶

  • find more applications for reasoning

  • specify

    • a query
    • the expected result
    • the triples to be added to the graph
    • make it happen in Python
  • moderate experience in programming:

    • implement your own additional reasoning rules
    • test them on small toy problems
  • substantial coding skills and huge motivation:

    • build on the code above
    • apply one pattern only, e.g. ?x someprop someobj => ?x otherprop otherobj
    • extend to more patterns..