Discover millions of ebooks, audiobooks, and so much more with a free trial

Only $11.99/month after trial. Cancel anytime.

Digital Libraries
Digital Libraries
Digital Libraries
Ebook235 pages2 hours

Digital Libraries

Rating: 3 out of 5 stars

3/5

()

Read preview

About this ebook

The technological interoperability of digital libraries must be rethought in order to adapt to new uses and networks. Informative digital environments aimed at responding to heritage, cultural, scientific or commercial demands have taken over the global cyberspace and have redesigned the techno-informative landscape of the Web. However, while the technological models demonstrate their effectiveness and explain to a large extent the creation of digital libraries, archives and deposits, the subjacent concept of uses continues to cause debate.

The information technologies used by heterogeneous digital libraries enable a technical interoperability of content. This is not enough to allow the adhesion of a public connected to very different information profiles and techniques. This book explores the avenues of a user-orientated interoperability where the questions of consultation interfaces and content description processes are studied.

  • Discusses Metadata as a resource for linking
  • Provides a practical approach
  • A valuable resource for anyone involved in digital library developments and digital collections and services
LanguageEnglish
Release dateJan 29, 2016
ISBN9780081004869
Digital Libraries
Author

Fabrice Papy

Fabrice Papy is Professor of Information and Communication Sciences at the University of Lorraine in France. His research activities focus on the problems of mediation instrumented by ICT in the digital devices of the "Society of Information".

Related to Digital Libraries

Related ebooks

Enterprise Applications For You

View More

Related articles

Reviews for Digital Libraries

Rating: 3 out of 5 stars
3/5

1 rating0 reviews

What did you think?

Tap to rate

Review must be at least 10 words

    Book preview

    Digital Libraries - Fabrice Papy

    Digital Libraries

    Interoperability and Uses

    First Edition

    Fabrice Papy

    Table of Contents

    Cover image

    Title page

    Copyright page

    1: Information Retrieval, Web and Interoperability

    Abstract

    1.1 Information retrieval: from theory to practice

    1.2 Information sources and documentary resources

    1.3 Digital uses: IS and IR specificity

    1.4 From websites to new technical documentation architectures

    1.5 Digital libraries: between library and computer science

    1.6 Web interoperability

    1.7 Interoperability for use: RSS

    1.8 Interoperability in digital libraries

    2: Interoperability, Interface and Hypertext

    Abstract

    2.1 Digital spaces and territories

    2.2 Interactive services

    2.3 Web of documents and hypertext browsing

    2.4 The persistence of hypertext

    2.5 Hypertext and hypermedia

    2.6 Browsing paths

    2.7 Consultation interfaces in the service of autonomy

    2.8 Tools for hypertext and tools for Web users

    3: Usage-Oriented Interoperability Instruments

    Abstract

    3.1 From computer science to information and communication technologies (ICT), and from ICT to Web technologies

    3.2 Digital roaming and the social Web

    3.3 Instrumentation of digital libraries using Web technologies

    3.4 Architectural invariants of digital libraries

    3.5 HTML for the Web of documents, XML for the Web of data

    3.6 Data visualization: shapes, CANVAS and SVG

    3.7 Homogenizing interoperability data

    3.8 XML and DTD: lingua franca of technological interoperability

    3.9 Plasticity of content and dissolution of documents

    3.10 Interoperability for use: a looped technological loop

    4: Usage Prospects

    Abstract

    4.1 Technological, semantic and cultural interoperability

    4.2 Content and documents in digital libraries

    4.3 Digital interfaces for the information society

    4.4 Information searching and interoperable technological architectures

    4.5 Rethinking the role of digital libraries

    4.6 Utopia of deintermediation in digital libraries

    Bibliography

    Index

    Copyright

    First published 2016 in Great Britain and the United States by ISTE Press Ltd and Elsevier Ltd

    Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

    ISTE Press Ltd

    27-37 St George's Road

    London SW19 4EU

    UK

    www.iste.co.uk

    Elsevier Ltd

    The Boulevard, Langford Lane

    Kidlington, Oxford, OX5 1GB

    UK

    www.elsevier.com

    Notices

    Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods, professional practices, or medical treatment may become necessary.

    Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information, methods, compounds, or experiments described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility.

    To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein.

    For information on all our publications visit our website at http://store.elsevier.com/

    © ISTE Press Ltd 2016

    The rights of Fabrice Papy to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

    British Library Cataloguing-in-Publication Data

    A CIP record for this book is available from the British Library

    Library of Congress Cataloging in Publication Data

    A catalog record for this book is available from the Library of Congress

    ISBN 978-1-78548-045-4

    Printed and bound in the UK and US

    1

    Information Retrieval, Web and Interoperability

    Abstract

    Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms1, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].

    Keywords

    Computer science

    Digital use

    HTTP

    Information retrieval

    Interoperability

    IR specificity

    IS specificity

    Repositories

    Technical documentation

    WordPress data model

    1.1 Information retrieval: from theory to practice

    Widespread access to the Internet and its most commonly used services, such as electronic mail, Web and, most recently, digital social networks, has led to a trivialization of information retrieval (IR) practices [GRI 11, DIN 14, DIN 07, CIA 05, ASS 02], which were in the recent past reserved for information specialists (journalists, information officers, guards, archivists, librarians, etc.) [CAT 01, DUF 01, LEF 00]. Having been introduced to the general public by the free access general search engines, IR was for several years delegated to a group of Internet surfers at the mercy of these automated indexing and retrieval systems with basic mechanisms¹, which can precisely and rapidly list a large part of the visible document production in the Web of documents [LEW 08, CHI 07, RIE 06, LEL 99].

    A survey conducted in 2008² among 2,218 PhD students in Bretagne on their training needs in mastering scientific information revealed that 96% of the respondents used search engines³ (73% very often and 23% regularly) as resources, while 53% used specialist portals, which came in second position.

    Due to their algorithmic simplicity, sustained by easily modulable software architectures (clustering and cloud computing), these search tools were able to adapt to document format versatility and absorb the exponential increase in queries⁴. This technological performance has for a long time contributed to search engines being perceived as state-of-the-art retrieval systems. From a simplified IR perspective, the situation may have stayed the same, had it not been for the social turning point (Web 2.0) that took hold of the Web, and thus disrupted the spectacular technological progression in the accessibility of digital documents, by reintroducing variations in the information and digital practices.

    While the conceptual approach of IR [DEN 03, MAN 02, LEF 00] is based on a systematic and methodical model that is easy to understand (Figure 1.1), when put into practice it takes multiple forms depending on the primary or secondary resources used, and above all on the technical means required (indexing and classification languages, query languages, technical tools, etc.) [TAS 14, ZOU 13, PIR 10, REP 11]. Within a short time, the multiple technological vectors in the field of document retrieval applied to natively digital information have generated a novel complexity that required employment of retrieval strategies based on precise knowledge of the technical functioning of IR systems, irrespective of their nature: It is already true that technological devices propose retrieval solutions that the average person ignores. Surprisingly, it is often a convergence of such solutions that should be used for qualitative information [MOE 98, p. 67].

    Figure 1.1 The information retrieval loop (inspired by Bernard Pochet, 2008, Knowing how to search and query under Creative Commons License). The entry point to the figure is the initial question which implies a cognitive approach to information literacy, with a personal and implicit covering. In compensation, information problem solving aims at confronting this initial question to a first detailed explanation (without specializing it for a particular technical system) in words, terms or expressions to use (canonical or non-canonical form), excluded terms, terms to be mandatorily or optionally associated (Boolean operators). Above all, it allows the user to define the vocabulary available to describe his information expectations in various ways. It is an important stage during which the user will establish a lexical strategy for identifying generic/specific terms, either common or specialized, choice of synonyms or antonyms, etc.

    Through the democratization of constantly evolving content production (articles, blog posts and comments) enriched with multimedia data [PAP 03a, PAP 99], the blogosphere⁵ has disrupted the technological order established by search engine engineering. These publishing systems have facilitated content creation driven by essentially editorial logic [ANG 11, DES 09, SOU 03a], eliminating the constraints of indispensible computer skills that made it accessible to experts and specialist communication agencies that were disconnected from the publishing/online writing techniques [PAP 14b, AMG 08, TAR 07].

    1.2 Information sources and documentary resources

    During the 2000s, the worldwide frenzy of social networks [BOU 10, REB 07] and the participative and collaborative approaches have further shaken up the IR as orchestrated by the main Web search engines. We know, for example, that Facebook is currently made up of over one million (third party) developers distributed in more than 180 countries. Facebook today consists of over 550,000 applications developed by third parties, and more than 70% of the users interact with these applications [TCH 11, p. 60].

    The functioning of the blogosphere and digital social networks is characterized by permanent instability of contents and paratextual data⁶. Blog posts, virtual pages or social networks never reach a definitive form. The object may at any time evolve during content update (insertion, modification and deletion) performed by one or more authors, depending on the assigned access rights (Figure 1.2). Instability is equally generated by social interactions, which give rise to data that accompany the primary resource and enrich it with various facets, as soon as the information is available online. Comments, evaluations, ratings, as well as readers' identity, status and quality, date, etc., are all data that lend themselves to retrieval.

    Figure 1.2 Partial view of the WordPress data model (version 3.8, extracted from https://codex.wordpress.org/ ). The relational tables wp_term_taxonomy, wp_links and wp_term_relationships host the semantics of links and posts stored in the blog's Content Management System. The posts are connected to categories and semantic markers (tags) within the wp_terms table. The semantic links (associations) are stored in the wp_term_relationships table

    For example, the Facebook Graph Application Program Interface (API)⁷ is the primary mechanism for extracting data from the social platform. It is a low-level hypertext transfer protocol (HTTP) transaction-based API that can be used for data interrogation, adding data to user account, downloading all sorts of resources, etc. The Facebook data framework relies on an architecture based on nodes (things such as user, image, page, comments, etc.), edges (connections between nodes) and fields (user's date of birth, name of the page, most recent viewing date, etc.)⁸. Fields are extremely diverse and they significantly enrich nodes and edges with additional semantic data. Knowledge of these fields is, therefore, critical to reducing noise during IR (and enhancing query relevance), in particular when applied to huge sets of data⁹.

    Permanent instability of these objects and copresence of content data and social relations-generated data make it difficult for general search engines to index these publishing areas. On the one hand, indexing operations are performed at a pace impossible to sustain by indexing robots and on the other hand, availability of blogs and social networks would be affected by constant requests sent by robots, to the detriment of human users¹⁰.

    On top of these questions of automated indexing, there is reduced compatibility between these very generic mechanisms of blog functioning and the basic methods of search engines. Collaborative categorization, node (and connection) semantization and co-construction of fields and themes through cross-references between posts on the same site or platform (blogs being extremely simplified versions of the latter) are very badly (if at all) processed by search engine software engineering. This fact contributes to the trend of customized development in the field of IR. Due to a variety of additional scripts (plugins, add-ons and specific developments), each blog can indefinitely enrich its final information (primary data and resources) retrieval possibilities by refining them due to posts' structural data¹¹.

    Websites, blogs and social networks thus determine contents that carry specificities translated as additional specific data that cannot be subject to similar query methods. The importance of the visibility of information systems (IS) structuring has long been known. In 1982, Y. Corson, researcher with INRIA (French national institute for computer science and applied mathematics), had already taken an interest in the psychological aspects related to database interrogation [COR 82]. He stressed the importance of the conceptual view adopted by the database organization operators, which significantly improves the IR performance when operators have at their disposal frameworks that reflect the types of relations existing between data.

    Subsequent works in cognitive psychology follow the same direction and highlight the need to have an explicit organizational structure: Certain experiences (…) indicate that an individual modifies his IR strategies as result of own experiences and practices, as he comes to better understand the structure of the body of information, the information content and tools he has available

    Enjoying the preview?
    Page 1 of 1