Anda di halaman 1dari 7

WordNet Atlas: a web application

for visualizing WordNet as a zoomable map


Matteo Abrate, Clara Bacciu, Andrea Marchetti and Maurizio Tesconi
Istituto di Informatica e Telematica (IIT) - CNR
Via G. Moruzzi, 1 - Pisa, Italy
{matteo.abrate,clara.bacciu,andrea.marchetti,
maurizio.tesconi}@iit.cnr.it

Abstract signed interface could be of little help if it makes


the user feel overwhelmed or disoriented.
The English WordNet is a lexical database The information visualization community
containing more than 206,900 word- agrees that providing a visual representation of
concept pairs and more than 377,500 se- graphs is of a great importance for end users.
mantic and lexical links. Trying to visual- Visual depictions exploit human visual processing
ize them all at once as a node-link diagram to reduce the cognitive load of many tasks that
results in a representation which can be too require understanding of global or local structure
big and complex for the user to grasp, and of data (Munzner, 2000; Ware, 2004; Zhang,
which cannot be easily processed by cur- 1991).
rent web technologies. To obtain such representations, the data must
We propose a visualization technique be preprocessed and interpreted. For example, in-
based on the concept of semantic zooming, put data can be simplified by highlighting or hid-
usually found in popular web map appli- ing some features, considered respectively more or
cations. This technique makes it feasible less interesting for a given task. The data can also
to handle large and complex graphs, and be enriched by derived features.
display them in an interactive representa- Size is a major challenge in graph visualization,
tion. By zooming, it is possible to switch leading to problems like algorithm complexity,
from an overview visualization, quickly display cluttering, poor readability and difficulties
and clearly presenting the global charac- in navigation (Cui, 2011). The aforementioned
teristics of the structure, to a detailed one, issues are also found within the context of visu-
displaying a classic node-link diagram. alization of smaller-scale graphs, where the size
is still a relevant issue (100,000 edges or more).
WordNet Atlas is a web application that Structures of that size could be obtained from
leverages this method to aid the user in the document collections such as Wikipedia, Linked
exploration of the WordNet data set, from Data1 datasets like DBpedia or GeoNames, lexi-
its global taxonomy structure to the detail cal databases like WordNet.
of words and synonym sets. The English WordNet version 3.0 (Miller, 1995;
Fellbaum, 1998) contains 155,287 words linked
1 Introduction to 117,659 sets of synonyms (synsets) each rep-
A big variety of data sources often contain a mas- resenting a concept, defining 206,941 word-synset
sive amount of interlinked elements, forming very pairs2 . Synsets are grouped by part-of-speech in
large graph structures: Telecommunication traffic, four sub-structures: nouns, verbs, adjectives, and
social networks and the World Wide Web are all adverbs. Moreover, words and synonym sets are
prominent examples of sources that could yield interconnected by 285,348 semantic links (sem-
graphs having more than 1 million edges (Abello links) and 92,231 lexical links (lexlinks) of 28
et al., 2002; Cui, 2011). Because of their size different types. A complete visual representation
and topology, such structures could be very dif- of WordNet using a standard node-link diagram
ficult to explore, even for expert users. Getting would result in a particularly heavy graphic to pro-
an overview of or finding detailed information in 1
http://linkeddata.org
2
them can prove to be complicated, and a poorly de- http://wordnet.princeton.edu/wordnet/man/wnstats.7WN.html
cess and display, and would probably be too big • It gives us the ability to comprehend large
and complex for users to understand. amounts of data;
Current web technologies cannot cope with
• It lets us perceive properties that where non
such a number of elements without special tech-
evident before;
niques. These techniques should somehow re-
duce the amount of information sent to the client • It makes problems with the data immediately
and rendered by the user agent, without exces- apparent;
sively compromising the meaning of the graphic
or worsening the quality of user interaction. More- • It facilitates us in understanding both large-
over, without choosing a good layout, the Word- scale and small-scale features of the data;
Net graph could easily render in a large, intricate • It facilitates hypothesis formation about the
and unorganized diagram. data.
This paper describes an interactive map3 based
on a modified radial tree layout. It lets the user To best exploit such advantages, visualizations can
navigate the entire structure of the WordNet graph be carefully designed in a way that conveys mean-
of noun synsets and makes him capable to have ing while keeping aesthetics into account. This
a general overview, as well as to concentrate on kind of visualizations are often referred to as in-
details on demand (Shneiderman, 1996). We be- fographics (information graphics).
lieve that this visualization adds usability and use- In his survey, Cui (2011) describes the main is-
fulness to WordNet, because it helps the user to get sues behind graph visualization, along with layout
oriented in the large amount of data it contains. techniques, like node-link and space division, and
interaction styles, such as filtering and semantic
1.1 Outline of the paper zooming. The latter technique improves a typical
In section 1, we discussed the main issues related zoom and pan approach by introducing the con-
to the visualization of graphs obtained from large cept of Level Of Detail (LOD), thus suggesting to
data sets, such as WordNet. display the graph diagram in a Zooming User In-
In section 2, we provide a background reference terface.
for general visualization concepts and methods, Zooming User Interfaces (ZUI) are a kind of
ending with a brief survey of the state of the art graphical interfaces that gives the user the ability
of web interfaces for browsing WordNet (subsec- to zoom and pan an environment. Within this envi-
tion 2.2). ronment, visual representations of the data are dis-
In section 3, we describe the proposed solution, played with different Level Of Detail according to
along with a report of the analysis of WordNet’s the zoom level. Very popular and successful ser-
structure (subsection 3.1) and implementation de- vices that embrace this approach are web mapping
tails (subsection 3.2). applications such as Google Maps4 or Bing Live
In section 4, we present our conclusions and Maps5 , where the user could quickly zoom out for
hint at possible directions for further research. an overview of the whole planet and zoom in for
observing the details of a particular place. It is our
2 Background opinion that the semantic zooming approach could
prove to be valid outside the context of digital car-
2.1 Graph visualization concepts
tography, and in particular within the context of
It is known that our perception of things is mostly visualization of big graphs such as WordNet.
due to vision: we acquire more information Many tools have been developed to help re-
through sight than through all the other senses searchers to visualize graphs, even of big size.
combined (Ware, 2004). This is why a good visual These tools can be very useful to analyze a graph
representation can be a powerful interface between to envision a meaningful graphical representation
large amounts of data and our cognitive system, to base an infographic upon.
helping us in analysis and decision-making tasks. Gephi6 is an interactive visualization and ex-
It has been pointed out by Ware (2004) that a good ploration platform for various kinds of networks
visualization has five big advantages: 4
http://maps.google.com
3 5
The web application will be publicly available at http://www.bing.com/maps/
6
http://wafi.iit.cnr.it/wordnetatlas http://gephi.org/
and complex systems, dynamic and hierarchical sition is computed as soon as the graph is drawn,
graphs up to 50 thousand nodes and 500 thousand they remain still (they don’t adjust their position
edges. Tulip7 (Auber, 2003) is a framework to to better fit the screen), and a fisheye view (Fur-
interactively visualize graphs up to 1 million el- nas, 1986) is used to focus on the details. Even
ements, that makes several graph drawing, clus- in this case, though, the graph is partial and cen-
tering and visual mapping algorithms available. tered in a specific word sense. Collins (2007) uses
Among others, Visone8 is a research project that a fisheye technique too, giving detail as well as a
designs and implement an interactive visualization good level of overview. The showed graph is never
software for social network analysis. Cytoscape9 complete, though, and it is always rooted in the se-
is an open source platform designed for biological lected node.
research, now used to display and analyze differ-
ent kind of networks. NodeXL10 is a template for 3 WordNet Atlas
Microsoft Excel that integrates a graph visualiza- The potentially huge amount of nodes and edges
tion and analysis tool into the application. of a graph could be organized in a meaningful spa-
tial layout, alongside artificially added spatial fea-
2.2 Web interfaces to WordNet
tures such as colored regions or placemarks, pos-
There are many different interactive visualizations sibly derived from graph data or measures. The re-
of the WordNet graph that complement the tradi- sulting visualization will be a mix between a clas-
tional WordNet Search11 . They often show words sic node-link diagram and a so-called infographic,
of different part-of-speech, synsets and senses as a graphic way to present information that abstracts
graph nodes, and word-synset links, semlinks and and represents many qualitative and quantitative
lexlinks as edges. They show a small subgraph of aspects of a subject in a carefully studied design.
WordNet, created on the fly. This graph is dynam- To make the visualization and exploration of the
ically adjusted as the user chooses to navigate to WordNet graph more convenient for the users, we
other nodes, selected by clicking or by issuing a tried to make it easier for them to get an orien-
word search. tation and remember where the nodes are in the
WordVis12 , Synonym13 and Javascript Visual- map, and consequently to enhance their ability to
Wordnet14 show the graph from the vantage point find them again (revisitation). This is achieved by
of a selected word or synset. It centers the se- drawing an interactive graphic that gives a stable
lected node in the view, showing details about it representation of the synset nodes (they are always
and deleting the nodes that become not directly drawn in the same position) and offers the users an
connected to it. WordNet Editor15 , Visuwords16 overview of the entire data set, along with some
and the visualization tool17 described in the work added landmarks, useful to improve their own ori-
by Kamps (2002) actually allow a deeper explo- entation. Since the diagram is intended for pre-
ration of the graph: the user can click on a node sentation and for a better fruition mostly by non
and the connected nodes are added to the view, experienced WordNet users, some simplification
without deleting the old ones. The disadvantage of in data has been made. Therefore, the map is far
these approaches is the progressive performance from being a perfect representation of the English
deterioration as nodes and links are added. Tree- WordNet lexical database.
bolic18 uses a different kind of graph visualization In WordNet Atlas, we choose to show two dif-
compared with the previous ones: the nodes po- ferent visualizations at once, one best suited for
7
exploring the graph of synsets, and one for navi-
http://tulip.labri.fr/TulipDrupal/
8 gating among words (see figure 1).
http://www.visone.info/
9
http://www.cytoscape.org/ The first is an interactive map of a graph, rep-
10
http://nodexl.codeplex.com/ resenting all the WordNet noun synsets arranged
11
http://wordnetweb.princeton.edu/perl/webwn in a special circular layout. At the center of the
12
http://wordvis.com/
13 map is the most generic node of the noun synset
http://code.google.com/p/synonym/
14
http://kylescholz.com/projects/wordnet/ taxonomy, the one containing the word “entity”19 ,
15
http://wordventure.eti.pg.gda.pl/wne.html serving as the root of a radial tree layout.
16
http://www.visuwords.com/ To further characterize the representation, we
17
http://staff.science.uva.nl/ kamps/wordnet/
18 19
http://id.asianwordnet.org/visualize/treebolic/ http://wordnet.princeton.edu
Figure 1: Screenshot of WordNet Atlas prototype interface.

graph is shown as a circle made of concentric


rings, representing the depth level of the taxonomy
tree. The ring thickness depends on how many
synset are present at that depth level. Rings play
the role of spatial landmarks (Ghani and Elmqvist,
2011) while giving an overview of the taxonomy
tree structure.
By zooming in, the user can load the actual
Figure 2: Different Levels Of Detail: at minimum synset nodes grouped in the rings, shown as white
zoom (left), the whole graphic of the noun synsets, circles. Rings are still displayed as colored regions
divided into rings, and at a greater zoom (right), under the graph, following the technique of sub-
the central region, in which the classic node-link strate encoding (Ghani and Elmqvist, 2011). The
diagram is visibile. user can use this visual information to more eas-
ily recognize the location of the map he or she is
looking at.
decided to display each level of the tree as a col- By interacting with the synset, the user could
ored ring underlying the representation of nodes. reach the maximum level of detail, obtaining fine
This should also have the effect of enhancing the grained information such as the synset’s defini-
graph memorability by leveraging the user’s spa- tion, its set of synonym words and semantic links
tial memory (Ghani and Elmqvist, 2011). (edges) that connect it with other synsets in differ-
To address both the user’s inability to cope with ent points of the map.
large amounts of displayed symbols and the tech- We developed a second visualization, comple-
nical problems of performance of current web mentary to the map: an hypertextual dictionary
technologies, we decided to follow a semantic shown at the left side of the interface. The user
zooming approach. can search for a word entry by using the provided
At the minimum Level Of Detail, the entire search box, or scroll the whole dictionary by drag-
ging a scroll bar. Like a paper dictionary, each node through hyponym links21 as a measure of
entry displays a word, its part-of-speech and a list tree depth. Then, we grouped the nodes in rings
of numbered word senses. For each sense, a list by depth value. Unlike a standard tree layout, we
of synonym words and a definition are displayed. decided to place the nodes nearer to each other, by
Not unlike the map, we decided to give the user an packing the nodes of a ring in more than one or-
overview of the amount of available information bit. By using a traditional radial layout, in which
in the word database by giving them the illusion all the nodes of a ring are placed in a single orbit,
that the whole dictionary was loaded in a scrol- rings would have to be drawn very far from each
lable section of the interface. other, because of the fast growth in the number of
It is possible to jump from a word to another nodes as the ring index increases.
within the dictionary by clicking it in the synonym
Ring Nodes Internal Leaves Types Inst.
lists or in the map, or to jump to a synset within 0 1 1 0 1 0
the map by clicking its definition. This behavior 1 3 3 0 3 0
integrates seamlessly with the standard user agent 2 22 12 10 22 0
3 228 146 82 227 1
interface, so the user is able to exploit the familiar 4 2,020 891 1,129 2,011 9
navigation buttons, the browser history and book- 5 6,249 2,033 4,216 5,641 608
6 12,267 3,171 9,096 10,555 1,712
marks to navigate words and synsets. 7 18,936 3,276 15,660 16,892 2,044
8 14,155 2,666 11,489 13,028 1,127
3.1 Analysis of the WordNet database 9 11,042 1,955 9,087 9,286 1,756
10 7,207 1,251 5,956 6,867 340
In order to find useful properties to obtain the de- 11 4,267 738 3,529 4,204 63
scribed visualization, we analyzed an SQL version 12 2,505 451 2,054 2,450 55
of the WordNet 3.0 database. 13 1,383 261 1,122 1,381 2
14 846 135 711 845 1
We chose the synsets nouns as graph nodes, 15 449 99 350 448 1
trying to identify a hierarchy rooted in the node 16 341 57 284 341 0
“entity”, and hypernymy-hyponymy (“is-a”) links 17 164 11 153 153 11
18 30 0 30 30 0
as edges, since they are the most frequently en- Total 82,115 17,157 64,958 74,385 7,730
coded20 and more likely to form a meaningful
structure, resembling a taxonomy. Even with this Table 2: Count of synset nodes, grouped by ring.
simplification, the remaining nodes and edges do The ring index corresponds to the minimum depth
not define a simple tree, but define instead a tan- of the node (length of the shortest path to the root
gled tree, i. e. a tree that contains a few nodes that node).
can have multiple parents.
WordNet synsets are divided in two categories:
Par. Nodes Internal Leaves Types Inst.
types and instances. We identified and marked
0 1 1 0 1 0
1 79,901 16,680 63,221 72,962 6,939 7,730 instance noun synsets (target of instance hy-
2 2,134 463 1,671 1,388 746 ponym links22 ) to differentiate them visually in the
3 63 12 51 30 33
4 12 0 12 3 9
map. We also marked the leaf nodes, 64,958 out
5 3 1 2 1 2 of the total of 82,115 noun synsets (79%).
6 1 0 1 0 1
Total 82,115 17,157 64,958 74,385 7,730 3.2 Implementation details
Displaying a graph with more than 80,000 nodes is
Table 1: Count of synset nodes, grouped by num- a daunting task, even for state-of-the-art web tech-
ber of parents. nologies. Moreover, a visualization of a graph that
big would probably be unsuitable for giving the
We decided to consider the root synset
user an overview of it. We decided to compute
100001740 (entity, “That which is perceived or
offline the fixed position of each synset node rep-
known or inferred to have its own distinct exis-
resentation within the space of the graphic, and to
tence (living or nonliving)”) as the map center,
21
and to place the remaining nodes around it, us- Both regular and instance hyponym links, though in the
visualization they are differentiated.
ing a modified radial tree layout. For each synset, 22
Type nodes are always destinations for regular hy-
we computed the shortest path to reach the root ponymy edges only, and instance nodes are destinations for
instance hyponymy edges. This is true except for 5 specific
20
http://wordnet.princeton.edu/ nodes that are probably errors in the database.
store the resulting data in a geospatial database23 . user to perform the navigation task effectively, by
This kind of databases provide spatial indices, leveraging proven user interaction paradigms and
to achieve good performance when issuing spa- familiar interfaces. Each item presented by Word-
tial queries. For example, a spatial query could Net Atlas is thus uniquely identified by a derefer-
request all the countries that overlap a specified enceable URI29 that can be shared or simply writ-
polygon in a map. We implemented a REST- ten in a document to unambiguously refer to the
style (Fielding, 2000) web API that makes use item.
of this technique to retrieve a JSON24 represen-
tation of the nodes that overlap the rectangular re- 4 Conclusions
gion defined by the client viewport. This approach
We presented WordNet Atlas, a web application
ensures that only the visible nodes are actually
capable of giving WordNet users an interactive vi-
loaded from the server and displayed by the user
sualization of the resource. We believe that its ca-
agent, while giving to the user the illusion of hav-
pability to scale from a meaningful overview to
ing the whole graph loaded when panning.
the details of WordNet structure makes it ideal for
When the user decreases the zoom, the viewport presentation or teaching purposes, or for the pro-
increases the covered region, and consequently the motion of WordNet to a broader audience. We
amount of information to load and show. As previ- also believe that the effort to investigate a graphi-
ously described, we followed a common approach cal way to organize such a big graph could assist
used for Zooming User Interfaces, and reduced in the inspection and improvement of the underly-
this amount of data by implementing a concept of ing database, for example by exposing anomalies
Level Of Detail: each graphic element stored in in the structure caused by human errors.
the database is given a minimum zoom level for it
to appear. The spatial database receives the cur- 4.1 Future work
rent zoom setting of the client along with the posi-
In the next step of our work, we will search for
tion and size of the viewport, and discards graphic
confirmation of the usefulness of the presented vi-
symbols that are considered too fine-grained to ap-
sualization technique. We are already setting up
pear at that particular zoom level.
a round of user evaluation tests, with comparisons
The JSON response returned by the web API with interactive representations of WordNet found
is then processed client-side and transformed into in literature.
an interactive vector graphic by a cross-browser25 In our future research, we hope to improve both
JavaScript library26 that leverages HTML5 can- the quality of the interaction and the amount of in-
vas27 and Scalable Vector Graphics (SVG)28 tech- formation the application is capable of conveying,
nologies. while keeping its design polished. In respect to
The hypertextual dictionary view of WordNet is that, the visualization layout can probably be im-
implemented by using similar techniques, applied proved by taking into account more information,
to the single scroll dimension instead of the 2D such as the node height (length of the path to a
space of the map. leaf) or the distinction between type and instance
The selection of words and synsets is imple- nodes. We plan to provide a visualization for the
mented by following a common web application other three WordNet subnets, verbs, adjectives and
practice that requires to update the page hash URI adverbs, linked to the current one.
whenever a meaningful change in application state It could be interesting to use the same system
is detected. In our work, this makes it possible behind WordNet Atlas to display wordnets in other
to use the browser’s history, navigation buttons languages, and to compare and interlink the result-
and bookmarks to browse words and synsets, each ing representations.
uniquely identified by an URI. This enables the The described methodology could also prove to
23
be useful to represent other big graphs, such as so-
http://postgis.refractions.net
24
http://www.json.org cial graphs or semantic web graphs, and even data
25
WordNet Atlas is reported to work in Mozilla Firefox 4+, sets with a non-graph structure.
Internet Explorer 8+ and Chrome 14+.
26
http://raphaeljs.com
27 29
http://www.w3.org/TR/html5/the-canvas-element.html This is a best practice in web development, commonly
28
http://www.w3.org/Graphics/SVG/ used in the semantic web initiative (http://semanticweb.org).
References
James Abello, Jeffrey Korn, and Matthias Kreuseler.
2002. Navigating giga-graphs. In Proceedings of
the Working Conference on Advanced Visual Inter-
faces, AVI ’02, pages 290–299, New York, NY,
USA. ACM.
David Auber. 2003. Tulip: A huge graph visualization
framework. Graph Drawing Softwares, pages 105–
4126.
Christopher Collins. 2007. Wordnet explorer: Apply-
ing visualization principles to lexical semantics.
Weiwei Cui. 2011. A survey on graph visualization.
Christiane Fellbaum. 1998. WordNet: An Electronic
Lexical Database. The MIT Press, Cambridge, MA.

Roy Thomas Fielding. 2000. Architectural styles and


the design of network-based software architectures.
George W. Furnas. 1986. Generalized fisheye views.
In Proceedings of the SIGCHI conference on Human
factors in computing systems, pages 16–23.
Sohaib Ghani and Niklas Elmqvist. 2011. Improving
revisitation in graphs through static spatial features.
In Proceedings of Graphics Interface 2011, pages
175–182.
Jaap Kamps. 2002. Visualizing wordnet structure. In
Proc. of the 1st International Conference on Global
WordNet, pages 182–186.
George A. Miller. 1995. Wordnet: a lexical database
for english. Commun. ACM, 38:39–41, November.
Tamara Munzner. 2000. Interactive visualization of
large graphs and networks.
Ben Shneiderman. 1996. The eyes have it: A task
by data type taxonomy for information visualiza-
tions. In IEEE Symposium on Visual Languages,
pages 336–343.
Colin Ware. 2004. Information Visualization: Percep-
tion for Design. Morgan Kaufmann Publishers Inc.,
San Francisco, CA, USA.
Jiajie Zhang. 1991. The interaction of internal and
external representations in a problem solving task.
In Proceedings of the Thirteenth Annual Conference
of Cognitive Science Society, pages 88–91.

Anda mungkin juga menyukai