BASICS
Counting each triple at a specific rdf graph database can be done by:
SELECT COUNT(*) {
?subject ?property ?object.
}
WHERE ?subject, ?propery and ?object they variables selecting any possible
match in an rdf graph.
So there are four possible selection to an rdf graph.
1. With three variables: ?subject ?property ?object (and filtering the result
by our need)
2. With two variables
3. With one variables
4. And when we know each value. (In this case it is possible we just want to
know if they exists in the graph)
For more convenient use In SPARQL it is possible to create an alias for each url.
This alias is called prefix.
Dbpedia has a lot of predefined prefixes, listed here:
http://dbpedia.org/sparql?nsdecl
In SPARQL a prefix is created with the following syntax:
PREFIX alias:<URL>
So example the previous JavaScript subject url can have a prefix with the
following definition:
PREFIX javascript:<http://dbpedia.org/resource/JavaScript>
QUERING the JavaScript resource normally is done by this query:
SELECT ?js_label WHERE {
<http://dbpedia.org/resource/JavaScript> rdfs:label ?js_label.
}
This query will return the label on all possible languages. If we want to see only
the english version we can add a filter for this:
PREFIX javascript: <http://dbpedia.org/resource/JavaScript>
SELECT ?js_label WHERE {
javascript: rdfs:label ?js_label.
FILTER (LANG(?js_label)='en')
}
The result now is only the label for the english language. We see that there is a
concatenated @en telling the language. If we don't want to display that, we
can use the STR function to cast the result into a string.
PREFIX javascript: <http://dbpedia.org/resource/JavaScript>
SELECT str(?js_label) AS ?js_label WHERE {
javascript: rdfs:label ?js_label.
FILTER (LANG(?js_label)='en')
}
I used the AS operator to add a custom name for the row in the result.
Most of the time, we don't know the subject, but we know the label name. In
this case, the subject can be searched using the query:
SELECT ?subject WHERE {
?subject rdfs:label "JavaScript"@en.
}
This will find each subject, having the label JavaScript in english. Note, that we
should add the language selector after our string, otherwise we wont have any
result.
Most of the subjects have one or more TYPE property. These types can be
accessed with multiline queries for a given label.
Multiple triple selections are separated by the . (dot)
Listing all the types for the JavaScript subject could be done by:
SELECT ?subject ?types WHERE {
?subject rdfs:label "JavaScript"@en.
?subject rdf:type ?types.
}
Having the same subject used on two ore more consecutive triple selection,
could be written with this short form:
SELECT ?subject ?types WHERE {
?subject rdfs:label "JavaScript"@en;
rdf:type ?types.
}
Note that the first selection has a ; (semicolon) at the and, the next line has
only two resource. The subject will be the same like at the first line.
This selection will show the links to each type. If we want to see the LABEL for
these types we must add two other line for this query:
Most of the time we don't know the exact case of the term in dbpedia we want
to search. We could transform into lowercase, and doing filter by that to find
the subjects.
SELECT ?subject {
?subject rdfs:label ?subject_label.
FILTER(LANG(?subject_label)='en')
filter(lcase(str(?subject_label)) = 'javascript')
}
Sometimes we select some terms but we want to exclude some specific url-s
form the result. This selection will show the types of the word Barcode. The
result has a lot of YAGO class domain.
SELECT DISTINCT ?label {
<http://dbpedia.org/resource/Barcode> rdf:type ?type.
<http://dbpedia.org/resource/Barcode> rdfs:label ?label.
FILTER (LANG(?label)='en')
}
We can observe, that we obtained a very small list of occupations doing this
selection. This is because dbpedia have a lot of subcategories for each
occupation, and not each is related directly to the category:Occupations, but
most of them having a path through other nodes to this.
If we want to have more occupation listed, we can specify a recursive selection
of the subcategories:
SELECT str(?subject_label) str(?category_label) {
?subject dcterms:subject ?category;
rdfs:label ?subject_label.
?category skos:broader{,1} category:Occupations;
rdfs:label ?category_label.
FILTER(LANG(?category_label)='en').
}
Most of the time, we know only part of the label, and we want to check if that is
a term related to somewhere.
For instance, let's say we have the term Petru Maior and we want to check if
this is part of the name of a university.
For this, one solution is to us a regexp filter on the result with this string.
PREFIX univ_ontology: <http://dbpedia.org/ontology/University>
SELECT str(?university_label) {
?univ_subject rdf:type univ_ontology:;
rdfs:label ?university_label.
FILTER(LANG(?university_label)='en')
FILTER(REGEX(?university_label, "Petru Maior", "i"))
}
DBPEDIA DISAMBIGUATES
getting the types of subjects (kind of tagging)
working with disambiguated terms.
Using UNION
Sometimes a subject can have more meaning that what we have as a first
result from Dbpedia.
For example lets query the type of the label Apache.
SELECT STR(?type_label) WHERE {
?subject rdfs:label "Apache"@en;
rdf:type ?types.
?types rdfs:label ?type_label.
FILTER (LANG(?type_label)='en')
}
As a result, we got concept, and enthnic group. But we know that Apache could
be also an http server. How we can include in the result also that?
This is where we can use the dbpedia disambiguates feature.
For this we need to find the disambiguation url for this term. This is done by:
SELECT DISTINCT ?subject_disambiguation_url{
?subject rdfs:label "Apache"@en;
rdf:type ?subject_type.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?subject.
}
Now that we have this url, we can use to select the subjects having the same
disambiguation url.
SELECT DISTINCT ?disamb_subjects{
?subject rdfs:label "Apache"@en;
rdf:type ?subject_type.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?subject.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?
disamb_subjects.
}
WE see that there is a couple, and the one we needed is also there:
http://dbpedia.org/resource/Apache_HTTP_Server
Now we can list each type for each of these disambiguated subjects related to
Apache.
SELECT DISTINCT ?disamb_subjects_types{
?subject rdfs:label "Apache"@en;
rdf:type ?subject_type.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?subject.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?
disamb_subjects.
?disamb_subjects rdf:type ?disamb_subjects_types.
}
And of course we can show only the labels for these, and in english.
SELECT DISTINCT str(?disamb_subjects_labels){
?subject rdfs:label "Apache"@en;
rdf:type ?subject_type.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?subject.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?
disamb_subjects.
?disamb_subjects rdf:type ?disamb_subjects_types.
?disamb_subjects_types rdfs:label ?disamb_subjects_labels.
FILTER(LANG(?disamb_subjects_labels)='en')
}
What about if we have multiple terms, and we want to show disambiguates for
each of them in one query? For this we can use UNION in this way:
SELECT DISTINCT ?subject str(?disamb_subjects_labels){
{?subject rdfs:label "Apache"@en.} UNION {?subject rdfs:label "Java"@en.}
?subject rdf:type ?subject_type.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?subject.
?subject_disambiguation_url dbpedia-owl:wikiPageDisambiguates ?
disamb_subjects.
?disamb_subjects rdf:type ?disamb_subjects_types.
?disamb_subjects_types rdfs:label ?disamb_subjects_labels.
FILTER(LANG(?disamb_subjects_labels)='en')
}
The result will show all the disambiguates for Apache and than Java.
Of course, the subject urls could be transformed into labels adding another
line ?subject rdfs:label ?subject_label and selecting this instead of subject.