#  Knowledge Graphs, RDF, and N3

Since ancient times philosophers have wrestled with the question: What can we know?

We take a practical approach: knowledge is intimately tied to language; 
consider only the type of knowledge that can be put into words. Then, knowledge 
processing becomes word processing. We answer a query by
putting words together, and if we do it well enough a human 
reader will get the impression that our system performs in an 'intelligent'
fashion.

#### &star; Symbolic vs Connectionist AI

When we automate the process of question answering in a computer program we 
create a basic form of 'artificial intelligence' if the program responds 
to queries as we would expect from a human. There are basically two methods:

- symbolic AI, such as methods based on knowledge graphs as described here. The
  knowledge and the processing are encoded in an explicit fashion, and it
  is easy to follow the reasoning, to see how it derives its answers to queries.
  Collecting and coding large amounts of explicit knowledge is a huge challenge.

- connectionist AI using
  <a href=../dsai1>artificial neural networks</a> and huge amounts of training
  data, such as very large collections of already existing natural language text.
  The knowledge is encoded in the connections of the network, and it is not easy
  to explain the reasoning. Connectionist AI based on large language models has
  seen spectacular successes recently, such as chatGPT and similar systems.

Since both symbolic and connectionist AI have their strengths and weaknesses 
there is growing interest in hybrid systems that seek to combine the advantages of both
approaches.

An **ontology** is a formal description of knowledge. It lists the types of 
things that exist and the properties that are used to describe them. 
Ontologies do this in a machine-readable way and are concerned with

- classes
- attributes
- relationships
- restrictions
- rules

Ontologies can provide a sharable and reusable knowledge representation and 
allow for adding new knowledge about the domain. 
An ontology is more easily created for a clearly defined and very
limited domain.

An ontology is concerned with knowledge about classes of things; a **knowledge graph** adds data about
individual objects belonging to those classes. Therefore, knowledge graphs tend to be much bigger than
ontologies; e.g., a book ontology is concerned with general concepts about books and not
individual instances. The ontology describes classes such as book, author, 
publisher, and their relationships, while
a book knowledge graph contains data on individual books, such as
their titles, authors, year of publication.

ontology + instance data = knowledge graph

## Graphs

Both ontologies and instance data can be represented as graphs. A graph consists of 
nodes and edges and is convenient to view in an image, provided the graph is not very big. 
With increasing graph size images become less useful.

<img src="france.jpg">

**RDF**, the Resource Description Framework, is commonly used as a method of 
specifying knowledge graphs. 
A collection of statements describe the graph, each with  

- subject
- property 
- object 

Since there are three elements such statements are also called **triples**. 

An example in the image above is the triple (France, capital, Paris).

- The subject is some sort of resource about which a statement is made
- The property is used to make that statement 
- The object is another resource

In RDF everything is considered a resource, including literals, such as numbers and strings.

A resource can play different roles in different statements, e.g. in 
the triple (France,
capital, Paris) the resource 'capital' acts as a property, but it can 
be subject or object in other triples.

RDF data is usually stored in XML format. However, there are other storage
formats for RDF, and XML as a storage format is used for many types of data,
not just RDF.

### XML 

XML is a well-established format. Many software tools exists for processing XML documents.

The essential requirements for XML documents are:

- all elements are closed, e.g. &lt;a&gt; ... &lt;/a&gt;
- there is a single root element, everything else is nested in the root or other elements
- elements must not overlap, e.g. the following ist not allowed:
  &lt;a&gt; ... &lt;b&gt; ... &lt;/a&gt; ... &lt;/b&gt;

XML is used for various types of content. 
Well-formed HTML conforms to XML.

An XML version
of the customer table in a relational DB can look like this:

    <xml>
    <table name="cust">
      <row>
        <id>BK</id>
        <name>Buster Keaton</name>
      </row>
      <row>
        <id>DF</id>
        <name>Douglas Fairbanks</name>
      </row>
    </table>
    </xml>

Here is an RDF document in XML format about a person identified by '#tom'; 
this person has the name 'Tom'.

    <foaf:Person rdf:about="#tom" xmlns:foaf="http://xmlns.com/foaf/0.1/">
      <foaf:name>Tom</foaf:name>
    </foaf:Person>
    


### N3


XML is fine for automated processing but tedious for human readers.

The examples in the Semantic Web Primer are still very informative
(written by Tim Berners-Lee, the inventor of the World Wide Web):

https://www.w3.org/2000/10/swap/Primer 

It describes the more convenient N3 notation.
In N3 we can write something like

    <#tom> <#name> "Tom" .

We will use this notation here since it is much easier to read.

### URI

RDF uses <b>URI</b>s (Uniform Resource Identifiers) to indentify subjects,
properties, and non-literal objects. 

<i>&star;
There is also the IRI (Internationalized Resource Identifier) which permits
a wider range of Unicode characters than the URI specification
(subset of ASCII); however, like all
'international' characters (meaning non-English) it can cause many
problems in practical applications. Use plain ASCII characters whenever you can;
they are most likely to work everywhere.</i>

URIs look very much like
URLs. However, here that format is just used to identify something, such as 

    http://example.com/people#tom

The idea is that someone else writing N3 documents about a different 'tom'
will use a different URI, such as

    http://widgets4us.com/team#tom

In some cases an URI can also be used as an URL i.e. there is some web 
content at the 
address; however, for the purposes of RDF this is not necessary. 
When used as an URL the example.com URI returns some HTML content, but the
widgets URI results in an error (at least at the time of writing).

Sometimes a (hopefully) global identification is not necessary:
leaving out everything before the hashtag results in  #tom
which identifies some 'tom' in the current document only.

### Populating the Knowledge Graph

In this example we start by adding data into the knowledge graph. 

In N3 we can write:

    <#tom> <#knows> <#jane> . 
    <#jane> <#age> 28 .

While subject and property
are stated in URIs, the third part of the statement can also be a literal, such
as a string or number. Note that #age acts as a property, while #knows 
can also be understood as a relationship.


The meaning for the human reader is obvious. 
However, in terms of RDF
processing the string identifying (within the current document) our 'tom' 
does not need to be human-readable.
Some knowledge graphs use numbers for subjects and properties, 
e.g. WikiData uses 'Q91' for Abraham Lincoln
(the 16th president of the United States); DBPedia uses 'Abraham_Lincoln'.

     http://www.wikidata.org/entity/Q91
     http://dbpedia.org/resource/Abraham_Lincoln



Since N3 was designed as a help for human readers it also contains some options for abbreviations:


    <#jane> <#child> <#albert>, <#martha> ;
         <#age> 28 ;
         <#eyecolor> "blue" .

This avoids repetition and says that Jane has two children, Albert and Martha, her (Jane's) eye color
is blue, and her age is 28. It is equivalent to the more tedious:

    <#jane> <#child> <#albert> .
    <#jane> <#child> <#martha> .
    <#jane> <#age> 28 .
    <#jane> <#eyecolor> "blue" .

Similarly we can provide a tabular-like version using:

    <#albert> <#age>  2;  <#eyecolor> "green" .
    <#martha> <#age>  4;  <#eyecolor> "blue" .

which says that Albert is 2 years old and has green eye color, while Martha is 4 years old and has blue eye 
color.

We can also make a statement about objects without identifiers - we only want to state that they exist
and have certain properties. This can done by using square brackets with the properties inside:

    <#jane> <#child> [ <#age> 2 ] , [ <#age> 4 ] .

This says that Jane has two children aged 2 and 4. 

At this point let us again stress some concepts:

An identifier like <#jane> works very much like an employee ID - 
the letters do not specify someone
whose name is "Jane". 

We can add that information by saying something like 

    <#jane> <#name> "Jane" . 

The same applies to the properties. They were chosen to provide a nice example; as far as the automated
processing is concerned, we could have used <#P40> instead of <#child>
(e.g. see Wikidata https://www.wikidata.org/wiki/Property:P40).

The identifiers we used work just fine in our own document, but when we process data from different sources
there may well be a name clash: the same name is used in another source in a different way. On the other hand,
we do not want to always write the full URI in our statements. 

### Introducing Namespaces


**Namespaces** solve the problem of name clashes. Suppose we save our little
knowledge graph which we are building step by step in a file, and we 
want to give that document a title, very much like the title of a web page:

    <> <#title>  "Some N3 Examples".
    
The expression <> in N3 refers to this document i.e. the document it is written in.

In the following statement the meaning of the word 'title' is clear for fantasy fans 
since lotr commonly means Lord of the Rings in a fantasy context.

    <#lotr> <#title> "Lord of the Rings" .
    
However, the next statement also uses the word 'title'

    <#tom> <#title> "Managing Director" .

For the human reader the term 'title' *in this context* is again clear to us. However, out of context 
even the innocent little word 'title' can refer to a number of things:

- the title of a document, book, film, piece of music
- an academic title or job title
- a sports prize, such as heavyweight boxing title
- a legal right, such as title to a property



To clarify our meaning of terms we can use a namespace, such as the 
<a href=https://www.dublincore.org/specifications/dublin-core/dcmi-terms/>Dublin Core</a>, 
a small vocabulary to describe certain types of resources:

    @prefix dc:  <http://purl.org/dc/elements/1.1/> .
    <> dc:title "Some N3 Examples" .
  
Let's add another prefix for the FOAF vocabulary 
(<a href=http://xmlns.com/foaf/spec/>Friend of a Friend</a>):

    @prefix foaf: xmlns:foaf="http://xmlns.com/foaf/0.1/" .
    
    <#tom> foaf:name "Tom" .
    <#jane> foaf:name "Jane" .

Now we are using pre-defined vocabularies identified by prefixes such as foaf. There are a number of
well-known vocabularies; prefix statements typically include
RDF, RDFS (RDF Schema), FOAF, and OWL (Web Ontology Language):

    @prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
    @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
    @prefix owl:  <http://www.w3.org/2002/07/owl#> .
    @prefix foaf: xmlns:foaf="http://xmlns.com/foaf/0.1/" .

The empty prefix refers to this document, which can be specified as

    @prefix : <#> .

This makes the notation even shorter and easier to read.


## Defining an Ontology

In addition to providing a vocabulary an ontology usually defines
a type hierarchy for classes and some rules or restrictions for
properties. OWL provides means to define more rigorous data models, 
but we will use the simpler RDF and RDFS here.

First we want to specify types i.e. classes for things. 

    :Person rdf:type rdfs:Class .

Since this is so often done there
is a special keyword in N3 acting as a shorthand for rdf:type, simply 'a':

    :Person a rdfs:Class .

Now that we have defined a class for people we can add instances to that class:

    :tom a :Person .
    :jane a :Person .        

RDFS provides a number of vocabulary elements to specify details for classes and properties, such
as hierarchy:

    :Man a rdfs:Class; rdfs:subClassOf :Person .
    :Woman a rdfs:Class; rdfs:subClassOf :Person .

Now, when we say that

    :martha a :Woman .

it follows logically that 

    :martha a :Person . 

When that logic is implemented we can automatically
make such inferences. Similar logic can be defined for properties:

    :brother a rdf:Property .
    :sister a rdf:Property .
    
    :brother rdfs:domain :Person; rdfs:range :Man .
    :sister  rdfs:domain :Person; rdfs:range :Woman .

These statements provide information on properties:

- rdfs:domain states the types allowed on the left side
- rdfs:range states the types allowed on the right side

Given these definitions we now make the following statement:

    :martha :brother :albert .

If the property 'brother' is used as defined i.e. with range 'Man' then this implies:

    :albert a :Man .

This is clear to us human readers, not the machine. Just like the
type hierarchy the logic behind domain
and range must also be implemented in software for these inferences to be made.
This will be the subject of the section on <a href=infer.html>reasoning</a>.

Note that since the domain of rdfs:range and rdfs:domain is rdf:Property, it follows that 
:brother and :sister are both rdf:Property. However, stating it explicitly is not a problem since
it does not introduce a contradiction.

**EXERCISES**:

- Continue to populate both the knowledge graph and the ontology part above
- Determine what can be logically inferred from the statements
- Find other applications for this graph notation; e.g. our database tables

&star; Define an ontology for some topic you are interested in, such as sports
  or other hobby. Populated the graph with some facts and see how far you
  can go before encountering the limits of the approach.