A Study on Collation of Languages from Developing Asia

A Study on Collation of Languages from Developing Asia

Collation of all written languages are defined in their dictionaries, developed over centuries, and are thus very representative of cultural tradition. However, though it is well understood in these cultures, it is not always thoroughly documented or well understood in the context of existing character encodings, especially the Unicode.

This volume aims to address the complex algorithms needed for sorting out the words in sequence for a small but diverse set of scripts and languages chosen from developing Asian region. The set is chosen for the variety it exhibits and to show the challenges it poses to solve the collation puzzle.

This work must be taken as an initial step towards addressing the collation of languages in the region as there is still more which can be said about collation of these languages, and there are many more languages which need to be documented.

The data on different languages has been obtained from the dictionaries published in these languages, and through interacting with the PAN Localization project teams in relevant countries.

Surveyed Languages

Bengali

Dzongkha

Lao

Mongolian

Sindhi

Sinhala

Tamil

Urdu


Click here to download A Study on Collation of Languages from Developing Asia