Supported Languages
Support for over 60 languages including:
Afrikaans Albanian Arabic Azeri Basque Belarussian Breton Bulgarian Catalan Chinese Croatian Czech Danish Dutch English (all common varieties) Estonian Faroese Finnish French (incl. Canadian) Gaelic Galician German Greek Greenlandic Hebrew Hindi Hungarian Icelandic Indonesian Italian Japanese Kazakh Korean Kurdish Kyrgyz Lappish Latin Latvian Lithuanian Luxembourgish Macedonian Malay Maltese Maori Mongolian Norwegian Persian Polish Portuguese (incl. Brazil) Romanian Russian Serbian Slovak Slovenian Somali Sorbian Spanish (incl. South/Central America) Swahili Swedish Tagalog Tatar Thai Turkish Ukrainian Urdu Uzbek Valencian Vietnamese Welsh
Stemming In languages some words have a common morphological root. Autonomy provides stemming algorithms that reduce words to this form. This is useful because it allows concepts to be matched regardless of the grammatical use of words. In English for example, the words "run", "runner" and "running" can all be stripped down to their stem "run" without significant loss of meaning. Autonomy provides as standard a set of stemming algorithms for the most commonly used languages.
Transliteration schemes Transliteration is the ability to represent letters that do not belong to the Latin alphabet or words that comprise accented letters with the corresponding characters of another alphabet. This make familiarity with the accents and special characters of different languages unnecessary.
Canonicalization of characters Some encodings have more than one way of representing a character. The Japanese katakana script, for example, can be written in full width or half width characters. Regardless of its width the character in itself carries the same meaning. Autonomy's software infrastructure uses canonicalization to ensure that all character forms are treated equally through automatic conversion to an internationally recognized canonical form.
Stoplists Every language has words that do not carry much significant meaning. In grammatical terms these are normally prepositions, conjunctions, auxiliary verbs and so on (for example, words such as "the", "a", "and", "to" in English). These words can be safely ignored when processing content. Autonomy provides as standard a set of stoplists for the most commonly used languages.
Multiple encodings Autonomy supports multiple encodings for languages such as Greek and Russian. Different encodings can be used interchangeably which means that it does not matter which encoding a language is given in. This makes it, for example, possible to query in one recognized encoding for a language and receive results that are in other encodings.
Architecture
Autonomy Inc. One Market Plaza, 19th Floor, San Francisco, CA 94105 Tel: 415 243 9955 Fax: 415 243 9984 Email: info@us.autonomy.com
Autonomy Systems Ltd Cambridge Business Park Cowley Road Cambridge CB4 0WZ Tel: +44 (0) 1223 448 000 Fax: +44 (0) 1223 448 001 Email: autonomy@autonomy.com
Other Offices Autonomy has additional offices in Atlanta, Boston and New York,as well as in Amsterdam, Brussels, Copenhagen, Frankfurt, Madrid, Milan, Munich, Paris, Oslo, Stockholm, Singapore and Sydney.
Copyright 2005 2004 Autonomy Corp. All rights reserved. Other trademarks are registered trademarks and the properties of their respective owners. Product specifications and features are subject to change without notice. Use of Autonomy software is under license.
www.autonomy.com