Cultural Elements in Internet Software Localization Valentina Dagienė, Tatjana Jevsikova, email@example.com firstname.lastname@example.org Institute of Mathematics and Informatics Lithuania Internet Software The term “Internet software” here is used as a general term to address: software, used to access Internet resources (usually on a client side), web-based applications (server side). Culture Provides the context in which the world is understood: rules for behavior, communication, interaction and understanding. Multilevel onion-like models, e.g.: basic assumptions and values, with resultant behavioral norms, attitudes and beliefs which manifest themselves in systems and institutions as well as behavioral patterns and non- behavioral items. There is a relation between user’s culture and software usability. Software can influence culture as well (this especially applies to Internet software). Software Localization Software localization is software adaptation for particular cultural environment (locale). Unfortunately, still usually referred to as “language translation”. Localized software must look and feel as if it would have been made for the target language and culture. Solving Culture-sensitive Issues At software production time, making it language- and culture-neutral and suitable for localization (internationalization). After software production time, modifying the original code at localization time. An Aim of the Presentation To look at software elements that are based on culture and cultural conventions (a kind of “reflection” of cultural dimensions). To classify and discuss the most important software elements for successful cultural portability, basing on analysis of related normative documents and more than 10-year experience in software localization. Classification A topic of studies by G. Hofstede, F. Trompenaar, E. Hall. The cultural dimensions identified by Hofstede offer possibility to structure culture according to the five concepts: Power Distance. Individualism vs. Collectivism. Masculinity vs. Femininity. Uncertainty Avoidance. Long-term vs. Short-term Orientation. These are categories that organize general cultural data. Speaking about software, we can look at software elements that are based on culture and cultural conventions. Structure of Cultural Elements in Software Possible Users of the Classification Researchers: to evaluate the level of internationalization of the original software, check the user-friendliness of localized software. Software developers: to develop better- internationalized software. Localizers: to adapt more cultural elements to the target culture and detect internationalization bugs. Formal Definition of Cultural Elements International standard on procedures for registration of cultural elements (ISO/IEC 15897) defines locale as “the definition of the subset of a user’s information technology environment that depends on language, territory, or other cultural customs”. Locale is usually identified by the language, using two-letter language code (ISO 639-1), and by territory, using two-letter territory code (ISO 3166-1). POSIX Locale Categories Set of Formal Definitions of Cultural Conventions (FDCC), ISO/IEC 14652 format of postal addresses; information on measurement system; format of writing personal names; format for telephone numbers and other telephone information. International standard on procedures for registration of cultural elements (ISO/IEC 15897) Specifies the procedures to be followed in preparing, publishing and maintaining a register of cultural specifications for computer use. First six clauses coincide with POSIX locale categories. Additional information: national or cultural Information Technology terminology; personal naming rules; inflection; hyphenation; spelling; numbering; coding of national entities; identification of persons and organizations; electronic mail addresses; keyboard layout; man- machine dialogue, etc. Unicode CLDR (more than 100 locales registered) Date and time formats; Number and currency formats; Measurement system; Collation specification (sorting, searching, matching); Translated names for languages, territories, scripts, timezones, and currencies; Script and characters used by a language. Locale Implementation in Software Locale Defined Elements (red rectangles) Language-driven elements: Alphabets and Names Names (identifiers of various objects in Internet software, e.g. files, logins, passwords, domains...) are not only used by computers, but also by humans. Names in a native language and script are easier to: devise, memorize, guess, understand, manipulate, correct, etc. Restriction to use in names only English alphabet letters (in outdated software): Forces a user not to use some/all letters from his/her native alphabet, but allow using foreign letters: Most languages (even using Latin script), have some extra letters: e.g., å, š, ž, … Some English letters are not used in most of languages (using Latin script): usually q, w, and x. Makes impossible to use characters of non-Latin scripts. The main reasons, why international characters are not used in names today: External: some aspects of restriction for character use in names still exist in today’s software. Internal: previous experience on restriction had been applied for names affects users not to use national characters in names, unless such usage is technically possible. Login Name Used in many web-based applications (virtual learning environments, e-mail clients, instant messengers, etc.). Characters: Usually only underscores, numbers, and letters from the basic Latin alphabet are accepted. Some systems use the login name not only for internal identification but also for addressing the user in the system. Personal Name Today, practically all the software allows using all letters of alphabet to write person's first and last name (surname) (a user shouldn't change or misspell his/her real name to register in the system). However, in telecommunications many users avoid using their native alphabet and write their names with spelling errors. For example, the number of incorrectly written names of Skype users varies from 10% to 90% depending on the language. Such a great “illiteracy” may be caused by previous experience with outdated software or influence of present restriction on login names. Passwords Used in software that performs user’s authorization (virtual learning environments, e- mail clients, instant messengers, etc.). Usually may be composed from letters and digits. Many programs still restrict the set of letters to ASCII alphabet. The restriction of the character set available for password reduces its security. Passwords (an example) User usually does not think that “letters” in this context are only letters of English alphabet, but of his native language. File/Folder Names for: Storing documents on a local computer: No technical problems in today’s OS. Exchanging documents between computers by removable storage devices: Works well as long as the same 8-bit encoding is used in both computers. Sending documents as parts of e-mail messages or as their attachments, or directly by instant messengers: No technical problems. Before sending are encoded in UTF-8 (%FF sequences) without non-ASCII letters, after receiving are decoded back. Storing web pages or other web content on a server: Theoretically solved, the same method as sending by e-mail. Using inside applications: A duty of developer to provide user-friendly names for visible items. Domain names Till 2003: letters of Basic Latin alphabet (26 letters), digits, dash. 2003: documents on using international characters in domain names were issued (RFC 3490, RFC 3491, RFC 3492): International characters (represented in Unicode) are converted to ASCII string (Punycode), and before showing it to user, it is converted back to Unicode characters again: räksmörgås.josefsson.org ↔ xn--rksmrgs-5wao1o.josefsson.org Problems: usage of homographs. Domain names in browsers Semantically-expressed elements: Matching of plural and singular forms English – 2 forms: 1 object, 2 objects, 10 objects Lithuanian, Polish, Russian ... – 3 forms: Some European languages, e.g. Slovenian, Maltese – 4 forms Plural and Singular Forms in Other Languages No of No of Language (example) forms lang. 1 12 Georgian, Japanese, Korean, Vietnamese, Turkish, etc. 2 46 Dutch, English, German, Norwegian, Swedish, Estonian, Finnish, Greek, Hebrew, Italian, Portuguese, Spanish, etc. 3 14 Slovak, Czech, Polish, Lithuanian, Russian, Romanian, etc. 4 2 Slovenian, Maltese 5 0 Arabic 6 1 Grammatical Name Forms In inflective languages (Lithuanian, Finnish, Polish, etc.) names in dialog windows may appear in various cases. 'Hello, Jonas' (in English) will be 'Sveikas, Jonai' (in Lithuanian) Gender “%S is logged in“, %S is a user name. English: John is logged in. Mary is logged in. Lithuanian (and many other languages): John yra prisijungęs. Mary yra prisijungusi. Human-sensitive Elements Usually not defined by national or international standards (normative documents). Depend on deep cultural habits, country or its historical unit’s cultural conventions. They can also depend on individual persons and should be adaptable to person’s habits. They are difficult to express in a formal way (e.g. include into formal locale definition). Some Examples Icons/Metaphors. Images, photos. Colour meaning. Usage of sounds and videos. Examples. Jokes and analogies. Political statements. Navigation scheme. Page layout. ... Colour-Culture Chart (Boor & Russo, 1993) Color China Japan Egypt France USA Red Happiness Anger Death Aristocracy Danger Danger Stop Blue Heavens Villainy Virtue Freedom Masculine Clouds Faith Peace Truth Green Ming Future Fertility Criminality Safety Dynasty Youth Strength Go Heavens Energy Yellow Birth Grace Happiness Temporary Cowardice Wealth Nobility Prosperity Temporary Power White Death Death Joy Neutrality Purity Purity Icons Example: Home Function MS Internet Explorer Possible Chinese icons Mozilla Firefox Problems Mentioned elements are more difficult to implement in internet software than in autonomously running software: they are deeply “grown” into the program, internet software has many links with other software. Requirements: flexibly adaptable to software and other cultural components; flexibly fitting to each other; flexibly chosen by the user (multiple choices). Existing Ways of Solution Cultural Web Spider, designed to extract information on culture specific webpage design elements (cultural markers) from the HTML and CSS code of websites for a particular country domain, that could help to create a cultural interface design “look and feel” prototyping tool (Kondratova I., Goldfarb I., Gervais R., Fournier, L., 2005). Many researchers confirm an importance of the cultural dimensions, set by Hofstede. They are used to create recommendations for a website navigation scheme and content presentation (Marcus A., Gould E.W., and others). Existing Ways of Solution Recent research on incorporation cultural dimensions into global software includes attempts to create culturally adaptive software, applying AI mechanisms. It is also proposed to incorporate culture into a usermodel in order to implement adaptable personalization mechanisms, assigning Hofstede’s value for each cultural dimension according to user’s birthplace, country of current and former residence, languages, sex, age, political orientation and education level (Reinecke K. et al, 2007). Conclusions Existing shortcomings in software internationalization can be explained by the lack of categories included in formal locale definitions, and lack of compatibility of different locale models. While the developed list of cultural elements is limited, we hope that it can help to pay more attention to the complex set of cultural elements while designing, localizing and testing localized or intended to localize internet software. Special attention during internet software development should be paid not only for a generalized set of elements, defined in existing locale models, but also to the ability to use international characters in object names (names of logins, files, domains, passwords); an ability to include a component for language’s grammatical forms generation; usage of parameters in localizable strings should be reduced due to different rules of words and phrases composition in different cultures. Another trend for future work could be some formalization of human- sensitive elements, used in software.