Practical introduction to RDF and SPARQL#

Version: 1.3

Objective#

The goal of this notebook is to make you comfortable with representing (simple) knowledge graphs in RDF, and then write simple SPARQL queries.

Reminder: IRIs and Literals

Resource refer two complex objects identified by an IRI (International Resource Identifier == URI allowing international characters). Note that URLs are IRIs pointing to web accessible documents/data. URIs can be shortened with PREFIX. As an example <http://my/super/vocab/my_term> can be shortened as ns:my_term if ns is defined as a prefix for http://my/super/vocab/.

Literals refer two simple values (numercial values, strings, boolean, dates)

Reminder: RDF, triples

an RDF statement represents a relationship between two resources: a subject and an object
relationships are directional and are called a .red[predicates] (or RDF properties)
(logical) statements are called triple : {subject, predicate, object}
a set of triples form a directed labelled graph : subject nodes are IRIs, edges are predicate (IRIs only), object nodes are IRIs or Literals.

Go through https://www.w3.org/TR/rdf11-primer/ to have more details on RDF.

Reminder: Turtle syntax

header to define prefix
- example: with @prefix ns: http://my_voc# ., http://my_voc#term can be written as ns:term
generally one line per triple with a . at the end: <subject> <predicate> <object> .
possible shortcuts to share the same subject: ;

s p1 o1 ; p2 o2 .

- possible shortcuts to share the same subject-predicate: `,`

s p o1, o2, o3 .

Exemple

turtle syntax:

<http://HG37> rdf:type <http://human_genome> .
<http://sample1> <http://is_aligned_with> <http://HG37> .
<http://sample1> rdfs:comment "Sample 1 from Study X [...]"^^xsd:string .

<http://HG37> rdf:type <http://human_genome> .
<http://sample1> <http://is_aligned_with> <http://HG37> ;
              rdfs:comment "Sample 1 from Study X [...]"^^xsd:string .

Question 1#

Consider the following RDF properties family:has_mother, family:has_father, family:has_sister
Represent with RDF triples the following family:
- The mother of John is Mary,
- Mickael is the son of Mark,
- Mickael and John are cousins,
- Mark is the uncle of John.
Go to https://www.ldf.fi/service/rdf-grapher
Generate a graphical representation of the RDF graph.

Answer 1#

my_rdf_data = """
PREFIX family: <http://etbii>

<http://John> family:has_mother <http://Mary> .
<http://Mickael> family:has_father <http://Mark> .
<http://Mark> family:has_sister <http://Mary> .

"""

SPARQL hands-on#

SPARQL is the standards language to query multiple data sources expressed in RDF. The principle consists in defining a graph pattern to be matched against an RDF graph.

Note

Triple Patterns (TPs) are like RDF triples except that each of the subject, predicate and object may be a variable. Variables are prefixed with a ? .

Example

Triple Patterns

?x <is_a_variant_of> <RAC1> .

RDF graph

<SNP:123> <is_a_variant_of> <NEMO> .
<SNP:rs527330002> <is_a_variant_of> <RAC1> .
<SNP:rs527330002> <refers_to_organism> <http://www.uniprot.org/taxonomy/9606> .
<SNP:rs61753123> <is_a_variant_of> <RAC1> .

Bindings of variables ?x

?x = <SNP:rs527330002>
?x = <SNP:rs61753123>

Definition

Basic Graph Patterns (BGPs) consist in a set of triple patterns to be matched against an RDF graph.

4 Types of SPARQL queries#

SELECT : returns the variables values (i.e. bound variables) for each graph pattern match ;
CONSTRUCT : returns an RDF graph constructed by substituting variables in a set of triple patterns ;
ASK : returns a boolean (true/false) indicating whether a query pattern matches or not ;
DESCRIBE : returns an RDF graph that describes the resources found (resources neighborhood).

Additional features: Optional BGPs, union, filters, aggregate functions, negation, service, *etc.*

Anatomy of a SPARQL query#

:scale 95%

DESCRIBE <http://identifiers.org/hgnc.symbol/RAC1>

Question 2#

We will now use the RDFlib package to parse RDF Data and do some very basic SPARQL queries.

from rdflib import Graph

# RDF graph, in turtle syntax, stored in a string
my_rdf_data = """
@prefix ns: <http://my_voc/> .
@prefix snp: <http://my_snps/> .

snp:123 ns:is_a_variant_of "NEMO" .
snp:rs527330002 ns:is_a_variant_of "RAC1" .
snp:rs527330002 ns:refers_to_organism <http://www.uniprot.org/taxonomy/9606> .
snp:rs61753123 ns:is_a_variant_of "RAC1" .
"""

# Initialization of the in-memory RDF graph, RDFlib Graph object
kg = Graph()

# Parsing of the RDF data
kg.parse(data=my_rdf_data, format='turtle')

# Printing the size of the graph and serializing it again. 
print(f'the knowledge graph contains {len(kg)} triples\n')
print(kg.serialize(format="turtle"))

the knowledge graph contains 4 triples

@prefix ns: <http://my_voc/> .
@prefix snp: <http://my_snps/> .

snp:123 ns:is_a_variant_of "NEMO" .

snp:rs527330002 ns:is_a_variant_of "RAC1" ;
    ns:refers_to_organism <http://www.uniprot.org/taxonomy/9606> .

snp:rs61753123 ns:is_a_variant_of "RAC1" .

We now execute a simple query to search for all “variants” of RAC1.

q = """
SELECT ?x WHERE {
    ?x ns:is_a_variant_of "RAC1" .
}
"""

res = kg.query(q)
for row in res:
    print(f"{row['x']} is a variant of RAC1")

http://my_snps/rs527330002 is a variant of RAC1
http://my_snps/rs61753123 is a variant of RAC1

Question 3#

Generalize this query to show all is a variant of relations. You can use two variables ?x and ?y.

q = """

PREFIX ns: <http://my_voc/>

SELECT ?x ?y WHERE {
?x ns:is_a_variant_of ?y .

}
"""

res = kg.query(q)
for row in res:
    print(f"{row['x']} is ...")

http://my_snps/123 is ...
http://my_snps/rs527330002 is ...
http://my_snps/rs61753123 is ...

Question 4#

Search for the name of the gene who has a variant refering to the http://www.uniprot.org/taxonomy/9606 organism

q = """

PREFIX ns: <http://my_voc/>

SELECT ?x WHERE {
?x ns:is_an_organism_of ?x .

}
"""

res = kg.query(q)
for row in res:
    print(row)

Recap#

IRI#: International Resource Identifier == URI allowing international characters
RDF#: Resource Description Framework

Web semantic

Practical introduction to RDF and SPARQL

Contents