Anda di halaman 1dari 34

Tagonomy

------------------------------ By Ferdy Christant ------------------------------

TAXONOMY MEETS FOLKSONOMY

Introduction
The rise of online communities has led to an explosion of information and content for mankind to enjoy. But for us to enjoy it, that information needs to be findable. Classic information architecture literature discusses two major ways to find information: Search and Browse. Clearly, search is a dominant way to find information. Advanced search engines have made it possible to find information fast and direct, without the need to remember site names or the need to browse through layers of navigation at the site level. But what if you are not searching? What if you are not sure what you are looking for and just want to browse for interesting content? What if you want to browse to related content based on a search result? What if you want to do a narrow search after browsing to a section? Search does not replace browse, in fact, they often work together to make information findable and explorable. In order for information to be findable via browsing, it needs to be classified. Information architects generally refer to two major systems to do so: Taxonomies and Folksonomies. Taxonomies are as old as civilization, yet folksonomies, often implemented as tagging, are relatively new. Both systems are dramatically different in how they approach the structure, classification, data quality and findability of content. This article discusses the major pros and cons of both systems. Better yet, it introduces a way to overcome the cons of both systems whilst preserving their pros. Tagonomy is a pragmatic marriage between classic taxonomy and modern folksonomy that results in an ultimately flexible and powerful information architecture for online communities and applications in general. Since ideas are nothing without execution, this article will showcase and discuss a real implementation of Tagonomy in an actual application called ImageDragon. Enjoy.

Page 1

The South African Impala - folksonomy


It was in the summer of 2009 that my girlfriend and I travelled down to South Africa and took this picture of a female Impala in Kruger National Park:

If I were to share this picture in an online community, I would want to classify it so that others can easily find it. In photo communities, a title, description and tags field are commonly used to do so: Title: Female impala close-up Description: Female impala close-up in Kruger National Park, South Africa Tags: Impala, South Africa, Kruger National Park In essence all of these data fields are free format. There is no real data structure, data hierarchy, list of possible value sets or strong input validation. The classification of the image lies completely in the hands of the community. The community, rather than system administrators are making up categories (tags) as they see fit. This is called a folksonomy. A folksonomy is an incredible flexible and friendly way to categorize and annotate content and is in use in many popular web sites.

Page 2

The South African Impala taxonomy


Now let us look at the South African Impala on Wikipedia:

In this case, the content is a page about Impalas, not an image. The page has a lot of unstructured content about Impalas, mainly body text that is collaboratively edited by the Wikipedia community. The interesting part, however, sits in the right panel of the page. Apparently, Wikipedia knows that this page is not just any page, it is a specie page. Because it is a specie page, Wikipedia shows a variety of structured information within context. For example, there is the conservation status of the specie. This is a structured field with a limited value set for classification. And there is the zoology, a hierarchical data structure used to classify species. The structured classification of data like this is called a taxonomy. Taxonomies bring structure and order, and thereby meaning to data. Taxonomy systems have existed for long time and you will find them all around you, for example in a library, at the airport and in business applications.

Page 3

Folksonomy versus Taxonomy


With the Impala example fresh in mind, let us discuss the differences between folksonomies and taxonomies. We will start with folksonomies, which we will simply call tagging from now on.

Tagging The good


There is a reason why some of the world largest websites adapted tagging. That reason is flexibility. You can call it freedom too. Tags are inherently flexible in that it allows users to classify content as they see fit. This is not about freely choosing the values; it is about freely choosing structures. In the Impala example, there was no field for the image location, yet I tagged it South Africa. I find location to be an important browsing perspective; therefore I tagged it as such. Nobody asked me for a location nor was I limited in the actual value. Tagging allows for complete freedom in both structure and values. Users can give content as little or as much meaning as they fit. Flexibility and usability are the major selling point of tagging. Another great thing about tagging is that it largely decouples the information architecture from the system, meaning the system implementers have little or no data structure maintenance.

Tagging The bad


The major strength of tagging, its flexibility, is also the cause of its most important weaknesses: Tags lack strong relationships amongst each other. For example, the system does not know that the tags South Africa and Kruger National Park have anything to do with each other. They are tags just like any other tags, completely isolated from each other. In addition, tags have no sense of hierarchical relations. The system does not know that Impala is a child of the Animalia kingdom, or that The Netherlands is a country within Europe. Tags lack meaning, to a system that is. The system does not know that Africa is a continent. The system does not know that Impala is a specie. The system does not know that Kruger National Park is a national park. Because it does not know this, it cannot provide rich context information about any of these entities. Tags suffer from data quality issues. Due to the unlimited value set of possible tags, there is nobody stopping a user from: o o o Not tagging something at all Misspelling tag names Introducing many synonyms for the same thing

Page 4

The combination of the above weaknesses leads to a relatively poor experience when it comes to browsing for information.

Taxonomy The good


Humanity uses taxonomy for almost everything it wants to organize. Heres why: Taxonomy allows for strong relationships between data classifications. For example, it is possible for a taxonomy system to know the relationship between a region and a country. This in turn is great for how one can browse and filter data. Taxonomy brings meaning to data. It is possible for a taxonomy system to know what a specie is and provide you rich, supportive data for that particular specie. Note that technically, it is up for debate whether that context data is part of the taxonomy or simply attached to the taxonomy. In this article I will assume both the classification and related elements are the taxonomy. Although taxonomies are not immune to data quality issues, they do perform much better in this aspect: o o Once a system owner has defined and implemented a taxonomy, that structure can never be changed by the user Unlike tags, a taxonomy system can enforce strict input validation and translation rules on the actual values entered by the user

Taxonomies bring control, structure and quality to data, which makes it great for both browsing and searching.

Taxonomy The bad


Despite the immense power of taxonomies, there are a few important cons to mention: o Taxonomies are generally inflexible. Users can only classify content the way the system creator has envisioned it. This applies to data elements, relationships and data value sets. Taxonomies are generally tightly coupled with the data it tries to classify. In IT terms, one could say that taxonomies are hardcoded. The implication of this limitation is rather high: o o o It places control exclusively in the hands of system owners Data structure maintenance is required when there is a need for it Taxonomies are domain specific

Page 5

Folksonomy marries taxonomy The challenge


Having discussed both folksonomies and taxonomies we have learned that there is both good and bad in both systems. One could argue that each method is designed for a specific purpose and that one should select the right approach for the right system. But what if you want tagging minus the cons? What if you want to strengthen tagging with the pros of taxonomy? Is it possible to design an information architecture that bridges the gap, combining the pros and minimizing the cons? Why ask these questions to begin with?

ImageDragon
It is time to introduce the case, the reason why I am writing this article. For the last two years, I have been developing ImageDragon, a challenging pet project. ImageDragon is a social photo community web application. It could write a lengthy book about what it can do, but the essence is about users sharing photos. These photos are shared in vertical communities. For example, there is JungleDragon, an instance of ImageDragon that focuses on photos concerning wildlife. A second example could be CarDragon, a separate community that is about sports car photos. Both communities have a completely isolated access point (URL), user base and storage container. Same software, different content. In the development of ImageDragon, I started out with a tagging system. Users are free to tag photos in any way they like. However, I had three strong desires that were not met by this tagging system: o Hierarchy. For example in the JungleDragon site, which is about wildlife photos, it would be cool if users could hierarchically browse to photos, like so: Animals :: Mammals :: Antelopes :: Impala. Tags do not allow such a drilldown, since their relationships are undefined. Only a human can understand the relationship between tags. o Context. It would be cool if I could tag a picture as Impala after which the system shows extra information about this specie. Tags cannot do that because they lack context. The system does not know what the tag means. Likewise if I tag a picture as Africa, the system just sees this as a text label, not as a continent. Data quality. I want to minimize the data quality issues associated with tagging systems.

I want a tagging system with taxonomy benefits, without the cons of either system. If that is not challenging enough already, it has to be loosely coupled: I do not need specie information in a community about cars. I want to continue to run general software (ImageDragon), yet deploy a domain-specific information architecture per vertical community. It is time for a Tagonomy.

Page 6

You write so much text. Is it time for pictures yet?


Yes. In the remainder of this article, I will show how the aforementioned challenge is solved in ImageDragon. We will be looking at the JungleDragon instance.

Tag relations Implicit


One problem with tags that we identified is that they have no relationship with each other. This is not entirely true. Although they have no explicitly defined relationship with each other, there can be an implicit relationship based on tag usage:

Bear

Canada

Canada

Salmon

Picture A is tagged as Bear, Canada, whilst picture B is tagged as Canada, Salmon. Since both images have a tag in common (Canada), there must be a relationship between Bear and Salmon. This is pure magic at work. Well, not really. What we are doing is finding images that are also tagged as Canada, and then see what other tags are used for those other images. This creates an implicit relationship of which the strength is determined by the amount of images in the related tags of the current tag. The great thing about implicit tag relations is that you would never have modeled them. Bears eat Salmon, that is their relationship. We have never defined this relationship. It is discovered. See, tags can have all kinds of relationships; they do not necessarily have to be of a hierarchical nature. In JungleDragon, our case implementation, the tag navigation bar shows implicitly related tags to the right of the currently opened tag:

We have opened the Snake tag, which holds one image. To the right (in white), are the related tags. Reptiles seem to be related to snakes. In fact, we know that Snakes are reptiles. There is also a

Page 7

relationship with South America, although this relationship is weaker than the Reptile connection. Supposedly, snakes are largely present in this continent. Again, none of this was designed. It is discovered through how images are tagged. You might expect that this functionality requires some sophisticated neural network or other AI algorithm. It is fact an embarrassingly simple SQL query where you input the tag ID and get returned the related tag ids, if any.

Tag relations Explicit


Of course we like the implicit relations between tags. They are a nice, unexpected gift. They are not designed though. If we are to allow for hierarchical browsing, we must take control too. Sticking with the Snake example, one might imagine how it is organized in a hierarchical structure like so:

Animals

Plants

Micro organisms

Reptiles

Mammals

Fish

Birds

Snakes

Alligators

Lizards

This example shows a tree like structure of hierarchical tags that are used to describe an image. Before discussing the JungleDragon implementation, let us first establish some terms: o Depth: This is the depth in the tree. For example, the tags Animals, Plants, and Micro organisms all share the same depth. The row below has a different depth. Sort order: This is the horizontal order in which elements within a depth are arranged. For example, we see that Mammals comes before Fish in the second row. Parent tag: The upward relation from one tag to another tag. For example, the parent tag of Snakes is Reptiles. The parent tag of Reptiles is Animals.

This design assumes that a tag can only have one parent tag. It is very well possible to allow for more parents, although that complicates the implementation.

Page 8

The sole purpose of hierarchical tags is navigation, being able to drill down into content. Here is how this looks in the JungleDragon tag navigation bar:

There are different ways to visualize a hierarchy. JungleDragon uses a layered model where you can see the current tag, one level up and one level down. In the screenshot above we are at the top level and have selected the tag Animals. Sideways we see tags of the same depth. Below animals we see its children. Animals is a top level tag, so it has no parent. Since we are interested in Reptiles, lets select it:

This time, we have the Reptiles tag open. It has a direct parent Animals marked in green. The parent row (top) also shows other tags at that parent level. At the Reptile level, we see the tags of the same depth, allowing for sideways navigation. Finally, we see the Reptile children. Finally, let us select the Snakes tag:

We have arrived at the bottom of the hierarchy. The tag Snakes has no children. We can still navigate sideways and upwards though. The point should be clear: hierarchical tags allow for hierarchical navigation, something that is not possible with normal tags. A noteworthy consideration about this implementation is that although it does not allow one tag to have multiple parents, it does allow for multiple hierarchies in isolation. For example, it is possible to organize tags by a geographic hierarchy, by ecosystem, by zoology, anything really. There are some other considerations to make: o Specific browsing or cumulative browsing. In a cumulative tag browsing model, all content of the current tag and its children all the way down the tree are shown and counted. For example, by choosing Animals one would see images linked to children tags, such as Snakes. In a specific tag browsing model, one does not see the Snake image until one has browsed to that depth. JungleDragon currently supports specific browsing only.

Page 9

Hierarchical tags coexist with normal tags. We do not want to destroy the flexibility of normal tagging. Users are still free to make up new tags as they like. They can also choose from the hierarchical tags that we preload. If they do so, their image will be easier to find. Administrators can also choose to upgrade a normal tag to a hierarchical by linking it to a parent. We keep the flexibility of normal tagging and enrich it with hierarchical tags. For users there is no difference, it is tagging as usual. Browsing, however, becomes whether when hierarchical tags are used. Hierarchical tags only work when properly used. This is a data quality issue that will be discussed later in this article.

How is one to implement the hierarchical tag system demonstrated and discussed above? It is quite simple really. In the database, tags have some basic metadata for navigational positioning. In addition, a parent tag field points to another tag to indicate its parent. The tags table therefore has a foreign key to itself. Tag ID Name Depth Sort order Parent ID

Our content is mapped to this tag table. In the case of JungleDragon, this concerns images: Tag ID Name Depth Sort order Parent ID Tag Map ID Tag ID Image ID Image ID Title Description

1 2

Note: if we were to support multiple parent tags per parent, we would require an addition tag parents table. For now we assume a single parent model. With these tables in place and filled, it is trivial to query them for parent and child tags. Now let us assume that we want to include a geographic hierarchy of tags in the system that exists out of the following structure: World :: Region :: Country It would be quite painful to manually insert these relations into tables. Therefore, JungleDragon has a very powerful administration screen for the creation of new tags:

Page 10

In this example, we are creating a set of tags, regions in this case, that we set at depth 2 (just below depth 1, which is World) and link all these new tags to the parent World. If World does not exist yet, it will be created. We could also create the parent first, and then the childs, the system is smart enough to handle both approaches. In only a few easy transactions, an administrator can set up an extensive hierarchy of tags. As mentioned before, multiple hierarchies can coexist next to each other. With a proper set of hierarchical tags preloaded, users can start using them right away. If a user is to tag an image as South Africa, the system knows that the parent tag is Africa. It also knows the other countries in the same continent (siblings). Concluding, the combination of implicit and explicit relationships between tags can dramatically improve the content browsing experience.

Page 11

Tag content types theory


In the previous section, we borrowed a strength of taxonomies, data classification relations, and applied it to tagging without giving up on the flexibility of tagging. Users have the same tagging capabilities as before yet better browsing capabilities. Administrators have hardly taken on any extra work due to the simplicity of setting up a tag hierarchy. The next problem to solve is tag context. As discussed before, tags have no meaning. If I tag an image as Impala, we as humans might know what it is (An antilope specie), the system has no clue. To the system an Impala tag is a piece of flat text that is used to group content. The tag could have been called anything really. To the system, there is no contextual difference between Africa, a continent, and Impala, specie. The problem with this lack of meaning is that we cannot provide contextual information to content applied with a specific tag. As you recall from the Impala Wikipedia page, it had all kinds of rich context attached to it, such as its conservation status and place in the zoology. There was even a graphic map detailing where Impalas are present. Obviously, Wikipedia knows that that particular page is a specie page and thereby automatically discloses this rich context-specific information. If you were to open a Wikipedia page of a specific country, it would show you country-specific rich context information. These pages have meaning, and content attached to that meaning. A nave approach to designing meaning into our content system is to directly include it on the record: Title: Female impala close-up Description: Female impala close-up in Kruger National Park, South Africa Tags: Impala, South Africa, Kruger National Park Conservation status: protected Main territory: South Africa, Mozambique

Note that we have added two context-specific data elements: conservation status and main territory. A model like this has severe limitations: o o Not all images are species, yet these fields are present at all times It only allows for one context. What if we want to provide context information for a region, not a specie? To support multiple content types we would have to add the combination of all fields of all content types We are bothering the users with data elements they may not know or care about

Page 12

We have tightly coupled domain-specific data elements to actual content. In a vertical community about cars, we would require totally different contextual data elements and would have to deeply hack into the database design, blocking the community site from using standard software.

Although we cannot completely avoid domain specific data elements, we can loosely couple them to content by using tags as an intermediary:

Tags

Africa

Kruger Natio

Impala

Close-up

Content type

Continent

Park

Specie

Content record

Africa

Kruger Natio

Impala

Our curious impala is tagged as Africa, Kruger, Impala, Close-up. In this setup, we say that the tag Africa is a continent. A continent is a content type that has specific data elements relevant to a continent only. Of course there are multiple continents each having different values, so the tag Africa is not only linked to a content type, it is also linked to a specific record of that content type. Africa is not just a continent, it is the African continent. The second tag, Kruger, is the name of a national park which we have linked to the content type Park. Of course we want to link it to a specific park, so it also has a content record attached to it. The third tag, Impala, is an interesting case. It is linked to multiple content types. Apparently, Impala is both the name of a National Park and the name of a Specie. Thats right, tags can have multiple contexts. This is all too common in natural language. Take the word light for example. It can be used in the context of color (light blue), mass (a light car), calories (light beer) and illumination (a light source). The fourth tag, Close-up, has no content type attached to it. This is perfectly normal. Tags do not need to have context. If you are confused by now, I failed to explain the concept clearly. No worries, in the next section we will see what all of this means.

Page 13

Tag content types practice


To demonstrate the power of content types, I have developed two simple example content types: o o Continent. A simple data structure containing the continents name, description and its position on a globe. Continent info is maintained by administrators, outside of the UI of the application. National Park. This content type contains a single data element: a Wikipedia link to the Nation Park. Unlike the Continent content type, this content type can actually be edited in the UI by the community, not only by administrators.

To fully appreciate the power of content types, let us first look at the Impala page without content types:

Page 14

In the left column we see the image title, image, description, tags and comments (none yet). Note that we have tagged the image as Africa, Impala, Kruger National Park. In the right column we see more images uploaded by the same user, the user who uploaded and edited the image, general image information and statistics. All of this information is general purpose. It is specific to this image but not to its context. There is no context on Africa or the Kruger National Park. Now, this is what happens when the content type Continent is activated for the tag Africa and the tag Africa is linked to the correct continent:

Page 15

Notice the new Africa section in the right column, showing the continent title and position on the globe. It gives extra meaning to the tag Africa, and thereby the current image on screen. Although the image remains the central attention point, the content type enriches it. However, if we click on the globe, we are taken to a large view of this context:

We are now at the Info tab of the tag Africa. Here we see a richer version of the African context (although this example is still very simple). From here it is only one click away to all images tagged as Africa. Remember, all that end users need to do to get this extra context is to tag an image with a continent. As discussed before, the content type Continent is maintained by administrators. They prepared the data for the continents which is exposed when users use the correct tag. Now, let us discuss the second content type, National Park. This content type is equally simple, yet it behaves totally different. First, let us activate the content type National Park to the tag Kruger. This is like saying Kruger is a National Park:

Page 16

All an administrator needs to do is to open the admin tab of the tag and click the Activate link of the desired content type. This will tell the system that Kruger is a National Park. In addition, this will create an empty National Park record. Lets have a look at our image once more. Remember, it was tagged Africa, Impala, Kruger:

Notice how after activating the content type National Park for tag Kruger, a Wikipedia link to that park is added in the context sidebar. Since this image has multiple tags, it also has multiple contexts. In fact, one single tag can have multiple content types. Just for the sake of demonstration, were going to activate the content type National Park for the tag Africa:

Page 17

We now have two National Park Wikipedia links, one for the tag Africa, the other for the tag Africa. We still have our globe which shows that one tag can have multiple contexts (content types). As I told you before, the Continent content type is admin provided. The content is prepared in the database and exposed when classifying an image with the right tag (the right continent). The National Park content type, however, can be edited by the community. On the info tab of the Kruger tag, we see how this works:

The Kruger tag has only one content type associated with it, but if it would have multiple content types associated with it, this screen shows multiple edit buttons, like little portlets. Clicking the Edit button brings us into edit mode:

This content type has a single field, the Wikipedia link to the actual park. Im sure you can imagine that this is just a simple example and that we could have included many data elements related to a National Park. The form has both client-side and server-side validation and a Cancel button that brings us back to the previous screen. If we were to save our updated link, the Wikipedia link on the image screen will point to the new address.

Page 18

Tag content types options


We just witnessed how content types can enrich tags and particularly the content (images in our case) linked to those tags. Once again I would like to stress that these example content types are incredibly simple and merely for the sake of demonstration. Content types can be as simple or rich as the developer of that content type decides. But there is more power under the hood. There are a few settings per content type that determine how they behave: o View mode. The Continent content type example appeared in small in the context of the image and in large when clicking through (in context of the tag). Simple content types may decide to only have the small view implemented. This setting determines this. View position. The information coming from the content types on the image screen was positioned at the bottom of the right column. This is not hard-coded. The view position number allows administrators to position it anywhere. This applies to both the small and large view. View column. Another positioning setting. When set to 0, the content type will use the full width of the page. When set to 1 it will be positioned in the left column, when set to 2 it will be positioned in the right column. This settings works together with the View position setting to position content type blocks horizontally and vertically. Editable by. Perhaps the most powerful content type setting. As the name suggest, this setting determines who can edit the data of the content type: o Nobody. There is no edit UI for the content type data. Database administrators need to directly enter the data into the database. This setting was used for the Continent content type. Admin. There is a UI to edit the content type data which can only be edited by administrators. All. Everybody in the community (meaning all signed in users) are allowed to edit the content type data via a UI. This setting was used for the National Park content type. Class. It depends on your reputation in the community (in JungleDragon called your class) whether you can edit the content type data or not. This is handy when you want to want content type data to be editable by trusted users only.

Karma log. This setting determines whether a content type data edit event is recorded in the karma log of the user. This is a thing specific to JungleDragon and has nothing to do with content types in itself.

Page 19

The above options that can be used to set up the behavior of a content type are typically configured only once. Nevertheless, there is a convenient admin screen for it:

Tag content types how theyre made


In the theory section of content types we discussed how we decoupled content types from content. Instead, we tie the content type to a tag. This is to avoid the tight coupling of data elements to content. Although our content types are hard-coded, our architecture is loosely coupled. As mentioned in the beginning of this article, the JungleDragon community site runs on general purpose Imagedragon software. At the same time, we know we have created JungleDragon specific content types, such as National Park. What would happen if the ImageDragon software would release a new version and we would have to upgrade our JungleDragon instance? Nothing would happen. Even if the new ImageDragon would update the code, database model and style of the JungleDragon instance, all custom content types will continue to work. This is great, we have the ability to run and reuse powerful general software while preserving custom content types specific for a vertical community. This meets the challenge identified earlier.

Page 20

Now, on to the question how content types are created. They are plug-in really, consisting of a set of components, some optional, some required: o Content type entry. A content type needs to be registered in the general content type table. This makes it available for activation and the editing of content type settings. Storage (optional). A content type that requires storage (most do) requires the manual creation of tables in the database model. These tables require a naming convention and are not linked in any way to the general purpose database model (that holds the actual content and tags). Small view. At a minimum, a content type needs a small view. A view in this case is a HTML fragment that outputs the content type data record. This view can be as simple or complex as required. Large view (optional). If required, a content type can implement a large view. This too is a HTML fragment that outputs the content type data record, but this time in context of the tag, not the image. Edit view (optional). If the content type data is editable via the UI, the Edit view is the HTML form that allows users to edit the content type data record.

And then there is the API that each content type needs to implement. These methods are separated from general purpose code and called during runtime using class and method introspection: o GetContentTypeData($objectid). Returns the requested content type data record. This method is used to display content types. The result of this method is passed to the views. CreateContentTypeData($data). This method creates a content type data record with the initial values from the data array. It is called when activating a content type for a tag to prepare the record for editing. This method is only required for content types with a edit UI. PutContentTypeData($objectid, $data). Saves the content type data record using the values in the data array. This method is only required for content types with a edit UI. DeleteContentTypeData($objectid). Deletes a content type data record. This method is only required for content types with a edit UI.

All of these components combined form a Content Type implementation. Some considerations: o Content types are custom developed, but smartly integrated into the architecture. They do not interfere with the general software, database or UI. Developing a simple custom content type including edit capability takes as little as one hour due to the high level of standardization, templating and example code.

Page 21

Tagging and data quality


Phew. We have the heavy lifting behind us. We just solved two huge information architecture challenges. We strengthened the flexible nature of tagging with strong relationships for better browsing and with content types for taxonomy-based rich context information. All without sacrificing flexibility, portability and the user experience. We hardly increased our administrative load too. Life is good! But not perfect. On to our last challenge related to tagging. Data quality. We identified the following potential problems: o o o o Users not tagging at all Users introducing many tag synonyms and misspellings Users not reusing the tags we preloaded (which is essential for hierarchy and rich context) Administrators hating to clean up this mess

Ultimately, the quality of tag usage lies in the hands of the community. That is what makes tagging so flexible and user friendly. Whilst there is no single answer that will guarantee perfect data quality, there are many things we can do to improve things.

Tagging and data quality the incentive


Our first potential problem users not tagging at all could easily be solved by making it a required field. That, however, would frustrate users and likely make tag data quality even worse because of fake data. Lets face it; tagging content is like paying tax, bad for the individual, good for the greater community. In JungleDragon, therefore, the tag field is not required. However, if you do tag an image, it is rewarded with karma points. Karma points are the currency used in the JungleDragon community. Not only are your karma points a badge of honor, it also allows you to climb the community ladder. The higher your position, the more influence you have in voting. You even get access to exclusive reputation-based features.

Page 22

The above screenshot shows the JungleDragon reputation system in action. Note the panel within the green header (on the right). It permanently shows your reputation summary no matter the page you are on. The main page itself in this case shows the current users profile page. There is a karma graph reporting the reputation trend of the current user. Below the graph are historic events (25 types in total) and their karma rewards, giving the user clear insight into positive behavior (that is rewarded) and negative behavior (that is punished). To the right is the current users class, in JungleDragon literally implemented as a food chain. You start out as Ant and work your way up to the Lion class. Gradually, your influence, visibility and powers are increased, making you long for more. Reward models like these are often found in games, yet can be highly effective and addictive in online communities. Once a user is at a high enough level in the food chain, he will be able to edit, and thus tag, images uploaded by other users. As this has an enormous potential to earn karma, another incentive is born. Summarizing, reward systems can create an incentive for users to tag their content. It can even create incentive to tag content correctly, since poorly tagged images may be downvoted by others, costing the user reputation.

Page 23

Tagging and data quality Synonyms and misspellings


Incentive through reward as just discussed does not guarantee tags being used or being used correctly. It helps though. We will deal with the content where it did not help later. Now, lets focus on another tagging quality issue: synonyms and misspellings. The following tags could perfectly coexist in a community: o o o Afrika Africa Affrica

Clearly, these three tags mean the same thing: Africa. By having a single meaning spread across three variations we are hurting the findability of the content inside these tags. This problem can be solved, or at least reduced, in a few ways. The best way is to prevent them. A very powerful way to prevent tag misspellings and synonyms is to implement an auto suggest feature that helps the user in selecting existing tags:

Needless to say, this will only work when the list of preloaded tags is of a good quality. One other hugely important benefit of auto suggest tagging is that it dramatically increases the chance that users tag content with a hierarchy or content type tag that we discussed so extensively in this article. A secondary preventive measure to avoid tag misnaming may be the users browser spell checker, if available. What can we do after the fact for those tags that slip through the cracks, where the preventive measures did not work? JungleDragon supports two methods: o A tag can be deleted. This will not delete the images associated with it, only the tag itself. This is a fairly aggressive moderation, in practice only used for inappropriately named tags. A tag can be merged with another tag. This is a powerful moderation tool used when a tag is misspelled or a synonym of another tag.

Lets see we want to rename the tag Affrica (a misspelling) into Africa. Here is the moderation screen in action:

Page 24

If we were to change the Name field value from Affrica into Africa, the following will happen: o If the target tag (Africa) does not exist yet, the current tag will simple be renamed. All links to this tag and its images will remain unaffected. If the target tag already exists, the system merges the source and target tag: o o All images linked to the source tag will be relinked to the target tag The source tag will be deleted

Note that next to merging, this screen also allows us to link any tag into our tag hierarchy. Manually moderating tags is not anyones idea of fun, but this feature makes it at least incredibly simple.

Tagging and data quality Leave no tag behind


The methods above will reduce, but not eliminate tag data quality issues. We create incentive for users to tag, tag suggestions for them to tag correctly, and a powerful rename tool that helps us properly name tags and position them in the hierarchy.

Page 25

But what do we do with untagged content? What if the incentive generated by our reward system did not work? In a large community site, it is realistic to assume that many images will not be tagged by users at all. You could just leave that situation as is and hope for this content to be findable via search, since users can hardly browse to it. A second approach has been discussed already: we allow trusted users to tag content of others and reward them for it. This secondary incentive will reduce the problem even further. What are left are the hopeless cases. The only remaining method for us to properly tag these images is through manual moderation. As painful as that sounds, a sophisticated implementation can turn the tide. It can turn it from a chore into a fun activity. Here is how this works in JungleDragon.

This turtle dude tells the user that he can earn karma by tagging untagged images. Both administrators and trusted users (with enough reputation) can take on this moderation tags. After clicking on the untagged images link, the list of untagged images appears:

Page 26

The screens show 26 untagged images. The list of images can be sorted in various ways and viewed in 4 display modes (small, medium, large, slideshow). Images with an orange frame are authorized to be tagged by the current user: o o o Basic users can only tag their own images Trusted users (enough karma points) can also tag images of others Administrators can tag all images

The first image is titled Eagle. The title is the only required field of an image, next to the image itself. Normally, if I would click on the image, it would bring me to the image page shown earlier in this article (remember the Impala?). In this special Untagged screen however, it launches a modal dialog for inline tagging:

This way, the administrator stays in focus and can quickly tag the image whilst looking at it. Of course, the autosuggest feature works here too. Since this is mostly a task performed by administrators, they will have intimate knowledge of the tag structure of the community. Tagging one image is a matter of seconds. When clicking Save, the image is tagged. The modal dialog closes automatically and refreshes the screen below it:

Page 27

Notice, how the first Eagle image is gone? It has no business in this overview anymore since we just tagged it. Next in line is an owl waiting for us to be tagged. To make a long story short, the following pattern can be repeated very quickly: o o o o o Click on first image Tag it Click on first image Tag it Etc

Although ones idea about fun may differ, it is far from a painful activity. Some people even volunteer doing micro tasks like this for search engines. It is a Mechanical Turk in action.

Page 28

Tagonomy Conclusion and summary


If you made it this far into the article, congratulations and thank you. It was not easy explaining the entire concept and Im sure it was not easy to grasp for you as a reader. Allow me to wrap up and summarize. At the beginning of this article we introduced two information architecture systems, folksonomy and taxonomy, and the need to combine the strengths of both whilst minimizing the cons of both. We started out with a classic tagging model and tried to address the following challenges: o Adding relations between tags in other to make them better browsable. We solved this puzzle by implicit relations (related tags through content discovery) and explicit relations (an admin managed tag hierarchy concept supported by a friendly way to navigate the tree and powerful ways to upload and manage this structure). Adding meaning to tags. We turned tags from flat text values into contextual data structures using content types, whilst preserving the flexible, user-driven mentality of tagging. Our approach led to a plug-in driven model backed by powerful admin tools. Tag data quality issues. We addressed this issue using various strategies: a sophisticated reward system, social tagging, an autosuggest control, a powerful tag rename/merge tool and an engaging, fast way to cover untagged images.

Id like to think that we managed to combine the strengths of folksonomies and taxonomies whilst minimizing their cons. We do all of this for the user and the community, so that they can find things and have a great experience. I have no proof of how this total concept, which I titled Tagonomy, works in the field. It is experimental and as far as I know, never applied like this in a real application. Soon I will. Should it not work, then nothing is lost. In fact, I can simply flip the switch in JungleDragon: // switch for taxonomy. when enabled hierarchical tag browsing is enabled $config['pd_taxonomy'] = true; // switch for enabling custom types for tags $config['pd_taxonomy_type'] = true; By settings these flags to false, all hierarchy between tags will disappear and all content types associated with them too. Like nothing happened, back to classic tagging. I will not need these flags though. I am convinced that a Tagonomy has the potential to be world class information architecture.

Page 29

Bonus Tag visualizations


Although tag visualization has nothing to do with taxonomy, it is vital to the experience of browsing and searching. Many tag implementations only offer tags visualized as a cloud tag:

In a cloud tag, popular tags are displayed in a large font size and often also in a darker color. The great thing about tag clouds is that it is very easy to spot popular tags. It is a mistake, however, to only offer a tag cloud as that ignores to other user needs: o o The ability to look up a tag (regardless of popularity) The desire to explore. Tag popularity is no measure for content quality and gets in the way of finding impopular tags with great content.

Page 30

The JungleDragon tag visualizing implementation meets all of these desires. By default, tags are visualized at equal size in tabular format, each tag showing the name and number of images:

The list of tags can be sorted in 4 different ways, making it ideal for a tag lookup. Note that this screen packs three ways to find a tag (or better said, to find the content inside a tag): Scan the list with tags on display. Optionally sort it to your needs first. Search for a tag. Just type ahead in the large search box. Matches will be shown instantly in the same tabular format. Hierarchical browsing, starting from the row above the search bar.

Users that quickly want to find popular content and want this visualized as a cloud, can simply switch the display mode to cloud. It will show the same tags, just visualized differently:

Page 31

And if youre more of an explorer, there is a picture tag visualization display mode:

Page 32

This visualization shows a thumb of the most popular image inside each tag. There are more noteworthy features of this tag visualization implementation: Users can save their preferred display mode by simply clicking the little green save icon to the right. From that moment on, tags will always be visualized as per the users preference. There is a cool Surprise sort option will return tags in complete random order, allowing for the discovery of unexpected content, old or new, popular or not. The amount of tags one sees per page depends on the users reputation. The higher the reputation, the more tags will be shown, reducing the need to paginate through results.

Final words
Again, thank you for taking the time to read this article. I hope you found it useful. Please take note of the following final words: I love feedback. Please do rate and comment this article on my blog (ferdychristant.com). It makes writing articles worthwhile. I also appreciate link love, for example by linking to this article from social bookmarking sites like Digg. I cannot and will not stop people from distributing this document in isolation, but I would appreciate if you always refer to this article via the blog entry on ferdychristant.com. This way all feedback stays in place and people always have the latest version. It was a lot of hard work to write this article and even more work to research and implement its suggestions. If you feel the need to do something back, you can make a voluntary donation at jungledragon.com to support my pet project. Nature for all, all for nature. Some of you may be interested in the code produced for Tagonomy. It goes beyond this article to share it and is fairly specific to my ImageDragon project. What I can do however, is consult and advise you if you want to go a similar path. You can find my contact details at my blog.

Peace out.

Page 33

Anda mungkin juga menyukai