Anda di halaman 1dari 7

See

discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/306064084

A Model of Provenance Applied to Biodiversity


Datasets

Conference Paper June 2016


DOI: 10.1109/WETICE.2016.59

CITATIONS READS

0 42

7 authors, including:

Tom De Nies Ruben Verborgh


Ghent University Ghent University
44 PUBLICATIONS 61 CITATIONS 188 PUBLICATIONS 432 CITATIONS

SEE PROFILE SEE PROFILE

Erik Mannens Dilvan Abreu Moreira


iMinds - Ghent University University of So Paulo
249 PUBLICATIONS 580 CITATIONS 99 PUBLICATIONS 233 CITATIONS

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Linked Data Fragments View project

DiSSeCt View project

All in-text references underlined in blue are linked to publications on ResearchGate, Available from: Flor Karina Mamani Amanqui
letting you access and read them immediately. Retrieved on: 25 November 2016
25th IEEE International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises

A Model of Provenance Applied to Biodiversity Datasets

Flor K. Amanqui , Tom De Nies , Anastasia Dimou , Ruben Verborgh ,


Erik Mannens , Rik Van de Walle and Dilvan Moreira
Ghent University - iMinds - Data Science Lab, Belgium
University of Sao Paulo, ICMC, Sao Carlos, Brazil
Email: ork@icmc.usp.br, tom.denies@ugent.be

AbstractNowadays, the Web has become one of the main


sources of biodiversity information. An increasing number of
biodiversity research institutions add new specimens and their
related information to their biological collections and make
this information available on the Web. However, mechanisms
which are currently available provide insufcient provenance
of biodiversity information. In this paper, we propose a new
biodiversity provenance model extending the W3C PROV Data
Model. Biodiversity data is mapped to terms from relevant
ontologies, such as Dublin Core and GeoSPARQL, stored in
triple stores and queried using SPARQL endpoints. Addition-
ally, we provide a use case using our provenance model to
enrich collection data. Figure 1: Example of Provenance in Biodiversity Data
Keywords-Linked Data; Provenance; Biodiversity.

I. I NTRODUCTION model is based on W3C PROV Data Model[3]. The PROV


Biological diversity is essential to life on Earth and specication provides the concepts and supporting deni-
motivates many efforts to collect data about species [1]. That tions to enable the inter-operable interchange of provenance
gives rise to large amounts of data. These data are collected information in heterogeneous environments such as the
in different places and published in different formats. Col- Web [4]. This conforms to the principles of Linked Open
lecting data in the eld is expensive, difcult, and sometimes Data (LOD), which encourages a set of best practices for
dangerous. Not only does it require close interaction with publishing and connecting structured data on the Web [5].
organisms, but it also requires close collaboration with To further clarify the relation between the biodiversity
different people [2]. domain and provenance, we provide an example in Figure
Several research institutions are setting up biological 1.
collection programs as part of their scientic strategic plan. Due to new discoveries, species names could change
Some of these research institutions are: the Global Biodiver- over time. As Figure 1 shows, this is the case for the
sity Information Facility1 (GBIF), the Biodiversity Database Batrachospermun Alga. Data related to this species was
Collection of the National Research Institute for the Ama- collected and saved into a csv le by the Collector. After
zon2 (INPA), the Large-Scale Biosphere-Atmosphere Exper- a cataloguing process, this species was determined as Ba-
iment in Amazonia3 (LBA), the Reference Center on Envi- trachospermum Helminthosum by the Cataloguer. After 15
ronmental Information4 (CRIA), and the New York Botani- years, a User needs to determine the genetic name of this
cal Garden5 (NYBG). While most researches in biodiversity species. Subsequent to molecular studies, this species had
pay much attention to the generation of biodiversity datasets its name changed to Batrachospermum viride-brasiliense.
and Web access, information that species how/where these This is a common problem faced by biodiversity collections.
data are derived and who owns/publish the data is often The user needs to answer the following questions: Who was
ignored. the cataloguer of the species? Who was the collector of the
In this paper, we propose a conceptual model for prove- species? When was the data collected? Why was the data
nance in biodiversity data for species identication. This collected? Which institution can provide the data? The user
needs to know the history (provenance) of the species. This
1 http://www.gbif.org
2 http://colecoes.inpa.gov.br
means that the trustworthiness of the cataloguer, the person
3 http://lba.inpa.gov.br who determined the species and the user involved should
4 http://www.cria.org.br be judged, since they participate in the identication and
5 http://www.nybg.org modication of the species name.

978-1-5090-1663-1/16 $31.00 2016 IEEE 235


DOI 10.1109/WETICE.2016.59
As our main contribution in this paper, we present an data store, while a spatial process provenance is represented
extended provenance model applied to biodiversity datasets. as graphs and stored using Semantic Web technologies based
We mapped a set of representative data about biodiversity on the RDF, and is backed with standard storage tecnologies
(217,829 records) from the Botanical Institute (IBt/SP). (e.g. database) and RDF stores (e.g. Sesame). However, the
This data was downloaded from the SpeciesLink web site6 . authors do not consider the variable spacetime to enhance
SpeciesLink is a distributed information system that inte- the geospatial capabilities of either provenance-aware GIS.
grates primary data from biological collections. We also use Yuan et al. [11] propose a Linked Data approach
the GeoSPARQL language7 (an extension to the SPARQL for geospatial data provenance. The authors dened a
language that allows queries based on spatial relations, such geospatial data provenance ontology based on the Provenir
as being inside a polygon, etc.) to answer spatially complex ontology[14], published geospatial data provenance as
queries. Linked Data, and analyzed queries of linked geospatial data
The remainder of this article proceeds as follows: Section provenance. Their approach is based on the Registry Infor-
2 discusses related work. Section 3 shows the provenance mation Model (ebRIM)9 and the DCMI model. However,
model for biodiversity datasets. In Section 4, we present a this approach does not achieve geospatial reasoning based
use case, where we model all 217,829 records of the Species- on the linked geospatial data provenance.
Link website using our proposed approach and Section 5 Magnuson et al. [1] explain that biodiversity research will
concludes by summarizing our results and describing future often have specic taxonomic or ecosystem interests, but
work. the primary keys that link all of these things are related to
space (geographic location) and time (when the observation
II. R ELATED W ORK
was made). However, most of the available biological col-
Biodiversity data is an assortment of different types lections are not consistently georeferenced making use of a
of data about organisms that co-occur in time and space coordinate system. The authors explain that there is still a
(geospatial) [1]. We investigated existing works related with fundamental lack to answer complex queries, i.e., queries
the geospatial and biodiversity domains that use a prove- that need logical inference that use spatial and temporal
nance model. relations (e.g., plantations within a protected area in Manaus,
Zhao et al. [6] propose a method for recording the Brazil between 2005 and 2011).
provenance data into biological datasets. This method helps A critical look at the available literature indicates that
scientists obtain the information about particular biological a number of techniques have been developed for using
terms. The authors use the Dublin Core Metadata Initiative provenance models, such as OPM and DCMI, in different
(DCMI) [7] and named Resource Description Framework scientics domains (biological, biodiversity and geospatial).
(RDF) graphs to represent the aspects of data provenance. Despite the variety of models, there is currently no unied,
The authors only considered two types of links between conceptual model for biodiversity information that can be
a pair of genes, i.e., either they are the same or different applied to different datasets and setups, while remaining
from each other. Provenance information about these links both expressive and generic enough to cover many use cases.
is needed to provide reliable and accurate services to re- The PROV specication [15] denes a core data model
searchers. for provenance for building representations of the entities,
Beserra et al. [8] propose a provenance-based approach to people and processes involved in producing a piece of
manage long term preservation of scientic data. This ap- data or thing in the world. However, there is a lack of
proach uses a case study related to the long term preservation expressiveness using this generic W3C recommendation to
of the animal sound collection at the Fonoteca Neotropical model the different types of organisms that co-occur in time
Jacques Vielliard (FNJV)8 . Their approach is based on the and space (geospatial relations).
Open Provenance Model (OPM) [9]. However, this approach
does not provide support to connect curated metadata with III. A RCHITECTURE FOR PUBLISHING LINKED
Linked Open Data. It would allow breaking down disci- BIODIVERSITY DATA
plinary boundaries among repositories and enhance reuse. This section presents our architecture for publishing
There are a number of studies, that have used provenance linked biodiversity data (as illustrated in Figure 2). Our
in the geospatial domain [10], [11], [12], [13]. For example, architecture uses Linked Data and Semantic Web standards
Wang et al. [10] propose a provenance-aware architecture to (Resource Description Framework RDF[16] and the Web
record the lineage of spatial data in Geographic Information Ontology Language OWL[17]) to represent biodiversity
Systems (GIS). Their architecture is based on the OPM data.
model and organize spatial provenance as objects in a spatial SPARQL is a W3C standard language for querying RDF
data (triples) [18]. A RDF triple is comprised of three pieces
6 http://splink.cria.org.br/
7 http://www.opengeospatial.org/standards/geosparql 9 http://docs.oasis-open.org/regrep/regrep-core/v4.0/regrep-core-rim-
8 http://www2.ib.unicamp.br/fnjv/ v4.0.html

236
In our model, a species is denominated Collection Object
(CO). The CO was generated by an activity denominated
Collecting (in Figure 3, prov:wasGeneratedBy). A Collec-
tor agent was associated with this activity (in Figure 3,
prov:wasAssociatedWith). We trace the date this activity was
executed with the property prov:atTime.
After a Collecting process, the Collection Object needs to
be identied through an activity denominated Cataloguing
Figure 2: Architecture of publishing linked biodiversity data (in Figure 3, prov:wasGeneratedBy). The Cataloguer agent
assigns a unique identier to each collection item using the
taxonomic classication. The Cataloguer uses the reference
of information: Subject (S), Predicate (P), and Object (O). work to indicate the published material in which the collec-
Where S and O are nodes and P is the property or aspect tion object is mentioned (in Figure 3, Reference Work).
that relates the subject to the object. The Collection Object has a relationship to the
SPARQL syntax and the way it queries data are based on Reference Work, indicating the published material in
the RDF triple scheme (the basis for RDF data represen- which the Collection Object is mentioned (in Figure 3,
tation). That makes it possible to create searches that seek prov:wasDerivedFrom). The Location describes where the
not only based on instances, but also on the relationships data was collected (in Figure 3, prov:atLocation). It denes
between them. SPARQL Endpoints are portals to data that a locality, named place, habitat and spatial information.
provider makes available for querying using SPARQL. They The Agents describe persons and organizations, which
are usually implemented using triple stores. Triple store is deal with the biological collection information and interacts
the common name given to a database management system with all activities and all information that is updated in
for RDF triples. They provide data management and data the model. The Curator, Collector, Cataloguer and User
access by way of APIs and query languages to RDF triples agents are members of an Organization Agent (in Figure
(such as SPARQL). 3, prov:actedOnBehalfOf). An User agent can be inu-
When biodiversity data are collected and catalogued by enced by a Cataloguer agent and vice-versa (in Figure 3,
third parties (Cataloguer and Collector), they are registered prov:wasInuencedBy), since they participate in the identi-
and stored in commercial spreadsheets (e.g., Microsoft Ex- cation and modication of the species name.
cel) or databases (e.g., DBase, PostgreSQL) or les asso- The PROV data model provides extensibility points that
ciated with statistical programs (e.g., R or SPSS) [1]. A allow designers to specialize it for specic applications
mapping component converts the biodiversity data to RDF. or domains (subtypes, roles, and Attribute-value lists)[3].
The RML language [19] is used to map biodiversity data In order to model the species names that are dened by
to RDF triples. The RDF triples are integrated with our the agents, we propose the following extensions that are
provenance model for biodiversity data. These RDF triples subtypes of prov:Entity:
are stored in triple stores. After that, users can retrieve bioprov:OriginalSpeciesName: denotes an original
biodiversity information through SPARQL queries. species name that is not derived from any other name
and the collector who emitted it is the initiator of the
A. Provenance Model for Biodiversity Data (BioProv) identication process.
In order to create our provenance model, ve biodi- bioprov:CataloguerSpeciesName: denotes a species
versity scientists were interviewed to categorize important name which is based on another name that has been
information from biodiversity data (e.g. collecting process, published in the past. The Cataloguer agent emitted
genus, family, species, location). These interviews helped this name based on the original species name.
us to understand more about their work and to form a bioprov:MolecularSpeciesName: denotes a species
common ground for discussions. A list of our interviews name that is produced by modifying an existing species
are available at https://www.researchgate.net/publication/ name. It is possible that the species name is altered.
301287330 Interviews. With these three extensions we have covered the main
Using the information gathered through these interviews, case of the example illustrated in Section 1 (species names
we created our provenance model for biodiversity data, as could change over time). Our model is available at https:
shown in Figure 3. This model is based on the W3C PROV //www.researchgate.net/publication/299690682 BioProv.
Data Model[3]. It denes a set of starting point terms divided
into three classes: Entity, Agent and Activity. These classes B. Mapping Provenance and Biodiversity Data to RDF
are associated by relations such as prov:wasAttributedTo, The Mapping component of our architecture to publish-
prov:wasInformedBy, etc. The entity responsible for com- ing linked biodiversity data loads the domain ontologies,
manding the execution of an activity is modeled as an Agent.

237
Figure 3: Provenance model for biodiversity data

provenance model, taxonomic information and the collection (URIs) are generated for the mapped resources and is used
database and transforms them in a set of RDF triples. We as the subject of all RDF triples generated from this Triples
used the RDF Mapping Language (RML) [19] to represent Map. A Predicate-Object Map (Line 17-20) consists of
the mapping between rows of data tables (in csv les) and Predicate Maps, which dene the rule that generates the
properties and objects in RDF. triples predicate and Object Maps or Referencing Object
RML is a mapping language dened to express cus- Maps (Line 18, 20), which dene how the triples object
tomized mapping rules from heterogeneous data structures is generated. The Subject Map, the Predicate Map and the
and serializations to the RDF data model. RML is dened Object Map are Term Maps, namely rules that generate an
as a superset of the W3C-standardized mapping language RDF term (an Internationalized Resource Identier (IRI), a
(R2RML) [19]. A Triples Map denes how triples of the blank node or a literal).
form (subject, predicate, object) will be generated. A Triples
IV. U SE C ASE
Map consists of three main parts: the Logical Source, the
Subject Map and zero or more Predicate-Object Maps. In In order to validate our provenance model, biodiversity
the following, we show an example of a triple map: scientists were interviewed to dene use cases with features
and scenarios to identify the various user tasks. A list
1 @prefix rr:<http://www.w3.org/ns/r2rml#>. of our use cases are available at https://www.researchgate.
2 @prefix rml:<http://semweb.mmlab.be/ns/rml#> .
3 @prefix geobio:<http://geobio.lod.usp.br/>.
net/publication/301287330 Interviews. In this article, we
4 @prefix prov:<http://www.w3.org/ns/prov#> . present one of these use cases:
5 @prefix dcterms:<http://purl.org/dc/terms/> . USE CASE 01: Molecular Identication of Cladophora
6 @prefix adms:<http://www.w3.org/ns/adms#>.
7 @prefix bioprov:<http://geobio.lod.usp/bioprov/>. delicatula Alga
8 @prefix skos:<http://www.w3.org/2004/skos/core#> . USER: Monica Paiano, 32 years-old, Collector and Cat-
9 @prefix xsd:<http://www.w3.org/2001/XMLSchema#> .
10 @prefix foaf:< http://xmlns.com/foaf/0.1/> . aloguer of the Laboratory BETA, UNESP, Brazil and Phy-
11 <#BioMapping> cology Research Group, Ghent University, Belgium.
12 rml:logicalSource[rml:source "speciesLink.csv"; GOAL: To determine the scientic name of Cladophora
13 rml:referenceFormulation ql:CSV];
14 rr:subjectMap[ delicatula alga through a genetic identication.
15 rr:template MOTIVATION: Due to new discoveries, species names of
"http://geobio.lod.usp.br/ibt/id/{code}";
16 rr:class adms:Identifier]; Cladophora delicatula alga could change over time. Keeping
17 rr:predicateObjectMap [rr:predicate skos:notation; such data up to date and consistent is extremely important
18 rr:objectMap[ rml:reference "code";]];
19 rr:predicateObjectMap[ rr:predicate prov:agent;
because the presence or not of some species of this alga can
20 rr:objectMap[ rml:reference "institutioncode";]]; serve as biological markers (bio indicators) that indicate the
degree of conservation or degradation in a aquatic habitat.
The Logical Source represents the source to be mapped. TASKS
This can be a pointer to any dataset (Line 12-13). The 1. Retrieve all information about Cladophora delicatula
Subject Map (Line 14-16) denes how unique identiers alga. For example, when it was collected, who collected it,

238
all the specic characteristics; :Location
a prov:Entity;
2. Store all the information in csv, text le or in a foaf:name "Ponta do Gil Lake";
biodiversity database; prov:qualifiedGeneration [ a prov:Generation;
3. Start the molecular studies of the Cladophora delicatula prov:activity geo:feature;
prov:atTime "1966-11-11T01:01:01Z";
alga; prov:atLocation
4. Identify the species names in a exible way: using the "POINT(-46.7175 -23.653056)"geo:wktLiteral;] .
broader taxonomic level (phylum or genus) without having
to worry about whether the original collection used this B. Cataloguing Activity
particular classication level. The Cataloguing activity (Figure 3, Cataloguing) permits
NECESSARY TOOL FEATURES the taxonomic identication of the biodiversity data. The
1. Retrieve all specications of the bio-marker species
taxonomic identication information contains the identiers
using the species name or any higher taxonomic level, like
of the biological classication as Order, Family, Genus,
phylum, genus or family.
Species and nearly all of them were used. In the following,
After studying our use case, we mapped the corre-
we show an example:
sponding biodiversity provenance records to RDF. We used
the RML language to convert all IBTs records from the :AgentCataloguer
a prov:Agent;
SpeciesLink web site (217,829 records) to RDF triples. foaf:givenName "D.P. Santos";
This RDF data was stored in our Strabon Triple Store prov:actedOnBehalfOf :IBT .
and can be explored using SPARQL queries. The biodiver-
:Cataloguing
sity datasets are available at https://www.researchgate.net/ a prov:Activity;
publication/299740010 ProvGeoIBT Dataset. prov:wasAssociatedWith :AgentCataloguer;
In order to show how the previous use case was imple- prov:atTime "1982-01-01T01:01:01Z";
bioprov:CataloguerSpeciesName Cladophor Delicatula
mented, in the following subsections we explain the more
important activities of our provenance model: Collecting and We use the ProvValidator and ProvTranslator tools10
Cataloguing Activity. to validate and translate PROV representations about the
collecting and cataloguing activities of our biodiversity
A. Collecting Activity
datasets, making them fully interoperable. The complete
For this activity (Figure 3, Collecting), it is important representations of our use case are available at https://www.
to keep track of (1) When was the species collected (2) researchgate.net/publication/301287278 BioProvExample.
Who was the collector of the species, (3) Where was the To integrate the biodiversity data in RDF to the wider
species collected, and (4) Which institution can provide the LOD community on the Web, we set up a SPARQL end-
species. This activity is crucial for capturing the origin of the point11 . Our endpoint allows third-party programs to query
biodiversity data, as it is only at this step that information our knowledge base, via the SPARQL language, and reuse
is known. In the following, we show an example of our it in their applications.
provenance model applied to the collecting process for
Cladophora Delicatula species. C. Querying Linked Biodiversity Data Provenance
In Example 1, we used the subclass bio- For the previous use case, a User wants to identify all
prov:OriginalSpeciesName to dene the species name information about Cladophora delicatula alga. One of the
(Figure 3, bioprov:OriginalSpeciesName). big advantages of having the biodiversity data in RDF is to
:AgentCollector be able to connect it to other sources. We created a SPARQL
a prov:Agent; query for integrating different triples stores. The following
foaf:givenName "M.C.Marino & R.Marino"; example is provided to show the SPARQL query used to
prov:actedOnBehalfOf :IBT.
obtain the provenance information of a specic dataset.
:Collecting SELECT ?Species ?agent ?activity ?date ?pontowkt
a prov:Activity; WHERE {?Species prov:wasAttributedTo ?agent.
prov:wasAssociatedWith :AgentCollector; ?Species prov:wasGeneratedBy ?activity.
prov:atTime "1966-11-11T01:01:01Z"; ?activity prov:atTime ?date.
bioprov:OriginalSpeciesName Cladophor Delicatula. ?Species geo:hasGeometry ?Geometry.
?Geometry geo:asWKT ?pontowkt.
In Example 2, we used the properties prov:activity, ?Species bioprov:CataloguerSpeciesName
prov:atTime and prov:atLocation to dene the spatiotem- "Cladophora delicatula".
poral location of our species collected. To deal with this, FILTER(?date > "1980-01-01T01:01:01Z").
geof:sfWithin(?pontowkt,
we used the GeoSPARQL language and the Well-Known "Polygon(-1 -58,-7 -58,-7 -69,-1 -69,-1 -5))").}
Text (WKT), a pattern dened by the Open Geospatial
Consortium (OGC) for dening coordinates in the form of 10 https://provenance.ecs.soton.ac.uk/
11 http://java.icmc.usp.br:1100/strabonendpoint/
points, lines and polygons.

239
Using this query, we could retrieve the lineage of the [6] J. Zhao, G. Klyne, and D. Shotton, Provenance and Linked
Cladophora delicatula species. In our provenance model, Data in Biological Data Webs, in Proceedings of the
we reused the GeoSPARQL ontology terms to describe WWW2008 Workshop on Linked Data on the Web (LDOW
2008), C. Bizer, T. Heath, K. Idehen, and T. Berners-Lee,
georeferenced data. This implementation permits to answer Eds., 2008.
complex queries such as: Locate all occurrences containing
Cladophora delicatula alga samples inside of a Polygon (-1 [7] S. L. Weibel and T. Koch, Dublin Core Metadata Initiative
-58, -7 -58, -7 -69, -1 -69, -1 -58) (DCMI), 2000.

V. C ONCLUSION [8] R. Beserra Sousa, D. Cintra Cugler, J. Gonzales Malaverri,


and C. Bauzer Medeiros, A Provenance-based Approach
In this work, we presented a model for biodiversity data to Manage Long Term Preservation of Scientic Data, in
provenance (BioProv). This model is based on the W3C Data Engineering Workshops (ICDEW), 2014 IEEE 30th
PROV ontology and data model. BioProv enables appli- International Conference on, March 2014, pp. 162133.
cations that analyze biodiversity to incorporate provenance
[9] L. Moreau, J. Freire, J. Futrelle, R. E. McGrath, J. Myers, and
data in their information. We dened a mapping document P. Paulson, The Open Provenance Model: An Overview, in
for the biodiversity data from IBT to generate RDF triples. Provenance and Annotation of Data and Processes. Springer,
We also reused the GeoSPARQL ontology terms to describe 2008, pp. 323326.
georeferenced data. We use the provenance information to
[10] S. Wang, A. Padmanabhan, J. D. Myers, W. Tang, and
allow experts in biodiversity to perform queries and answer Y. Liu, Towards Provenance-aware Geographic Information
scientic questions. Systems, in Proceedings of the 16th ACM SIGSPATIAL Inter-
As future work, we also intend to extend our current national Conference on Advances in Geographic Information
implementation with more advanced structured queries, in Systems, ser. GIS 08. New York, NY, USA: ACM, 2008,
partnership with biodiversity researchers. We also intend to pp. 70:170:4.
build a benchmark to evaluate the precision and recall of [11] J. Yuan, P. Yue, J. Gong, and M. Zhang, A Linked Data
our queries. Approach for Geospatial Data Provenance, Geoscience and
Remote Sensing, IEEE Transactions on, vol. 51, no. 11, pp.
ACKNOWLEDGMENT 51055112, Nov 2013.
The research activities described in this paper were funded
by Ghent University, iMinds, the IWT-Flanders, the FWO- [12] J. E. G. Malaverri, C. B. Medeiros, and R. C. Lamparelli,
A Provenance Approach to Assess the Quality of Geospatial
Flanders, and the European Union, and the FINCyT Science Data, in Proceedings of the 27th Annual ACM Symposium
and Technology Program from Peru. on Applied Computing, ser. SAC 12. New York, NY, USA:
ACM, 2012, pp. 20432044.
R EFERENCES
[1] W. Magnusson, R. Braga-Neto, F. Pezzini, F. Baccaro, [13] I. Celino, Human Computation VGI Provenance: Semantic
H. Bergallo, J. Penha, D. d. J. Rodrigues, L. M. Verdade, Web-Based Representation and Publishing, Geoscience and
A. Lima, A. L. Albernaz, J.-M. Hero, B. Lawson, C. Castilho, Remote Sensing, IEEE Transactions on, vol. 51, no. 11, pp.
D. Drucker, E. Franklin, F. Medonca, F. Costa, G. Galdino, 51375144, Nov 2013.
G. Castley, J. Zuanon, J. d. Vale, J. L. C. d. Santos, R. Luizao,
R. Cintra, R. I. Barbosa, A. Lisboa, R. Koblitz, C. N. d. [14] A. S. Satya S. Sahoo, Provenir Ontology: Towards a Frame-
Cunha, and A. R. M. Pontes, Biodiversity and Integrated En- work for eScience Provenance Management, pp. 1517,
vironmental Monitoring. Program for Planned Biodiversity 2009.
and Ecosystem Research (PPBio), 2013.
[15] W3C, The PROV Ontology (PROV-O), 2013.
[2] J. L. C. dos Santos, A Biodiversity Information System in
an Open Data/Metadatabase Architecture, Ph.D. dissertation, [16] O. Lassila, R. R. Swick, W. Wide, and W. Consortium,
Enschede, 2003. Resource Description Framework (RDF) Model and Syntax
Specication, 1998.
[3] L. Moreau, P. Missier (Eds.), and W3C Provenance Working
Group, PROV-DM: The PROV Data Model. W3C, 2013. [17] G. Antoniou, , G. Antoniou, G. Antoniou, F. V. Harmelen,
and F. V. Harmelen, Web Ontology Language: OWL, in
[4] I. Taxidou, T. De Nies, R. Verborgh, P. Fischer, E. Mannens, Handbook on Ontologies in Information Systems. Springer,
and R. Van de Walle, Modeling Information Diffusion in So- 2003, pp. 6792.
cial Media as Provenance with W3C PROV, in Proceedings
of the 6th International Workshop on Modeling Social Media, [18] E. Prudhommeaux and A. Seaborne, SPARQL Query Lan-
May 2015. guage for RDF, W3C, Tech. Rep., 2006.

[5] W. Kuhn, T. Kauppinen, and K. Janowicz, Linked Data [19] A. Dimou, M. Vander Sande, P. Colpaert, R. Verborgh,
- A Paradigm Shift for Geographic Information Science, E. Mannens, and R. Van de Walle, RML: A Generic Lan-
in Geographic Information Science, ser. Lecture Notes in guage for Integrated RDF Mappings of Heterogeneous Data,
Computer Science. Springer International Publishing, 2014, in Proceedings of the 7th Workshop on Linked Data on the
vol. 8728, pp. 173186. Web, apr 2014.

240