|
“In the field of IT, all language communities are entitled to have
at their disposal equipment adapted to their linguistic system and
tools and products in their language, so as to derive full advantage
from the potential offered by such technologies for self-expression,
education, communication, publication, translation and information
processing and the dissemination of culture in general” [1].
PAN Localization project (www.panl10n.net)
has been a regional initiative addressing these challenges and
promoting the use of language technology across Asia. The project,
initiated in 2003, has developed and disseminated computing
solutions for Bahasa Indonesia, Bangla, Dzongkha, Khmer, Lao,
Mongolian, Nepali, Pashto, Sinhala, Tamil, Tibetan and Urdu. These
languages represent a population of nearly one billion people across
developing Asia.
On the occasion of the eleventh International Mother Language Day,
21st February 2010, PAN Localization project is pleased
to release its research, technology and resources through its
website.
This project has been carried out with collaboration of Pan Asia
Networking (PAN) program of IDRC, Canada (www.idrc.ca),
Center for Research in Urdu Language Processing (www.crulp.org)
at National University of Computer and Emerging Sciences, Pakistan (www.nu.edu.pk)
and the following partner organizations:
- Afghan Computer Science Association (ACSA:
www.acsa.org.af), Afghanistan
- BRAC University (CRBLP:
crblp.bracu.ac.bd), Bangladesh
- Development Research Network (D.NET:
www.dnet-bangladesh.org), Bangladesh
- Department of IT (DIT:
www.dit.gov.bt), Bhutan
- Ministry of Education, Youth and Sports (www.pancambodia.info),
Cambodia
- Institute of Technology (ITC:
www.itc.edu.kh), Cambodia
- National ICT Development Authority (NIDA:
www.nida.gov.kh), Cambodia
- Tibet University (TU:
www.utibet.edu.cn), China
- Institute of Science and Technology, TAR China
- Tibet Academy of Agricultural and Animal Husbandry Sciences, China
- University of Indonesia (UI:
www.ui.ac.id), Indonesia
- Agency for the Assessment and Application of Technology (BPPT:
www.bppt.go.id),
Indonesia
- National Authority for Science and Technology (NAST:
www.nast.gov.la), Laos
- InfoCon Co. Ltd.(
www.infocon.mn), Mongolia
- Mongolian University of Science and Technology (MUST:
www.must.edu.mn),
Mongolia
- National University of Mongolia (NUM:
www.num.edu.mn), Mongolia
- Madan Puraskar Pustakalaya (MPP:
www.mpp.org.np), Nepal
- E-Network Research and Development (ENRD:
www.enrd.org), Nepal
- University of Colombo School of Computing (LTRL, UCSC:
www.ucsc.cmb.ac.lk/ltrl),
Sri Lanka
|
|
Bahasa Indonesia
Statistical Machine Translation,
English-Bahasa Parallel Corpus (1 Million words), POS Tagged Bahasa
Corpus (500,000 words), Part of Speech Tagset and Tagger,…[details]
Bangla
Text
to Speech System (Awarded),
Optical Character Recognition System (Short listed for Award), Bangla
Pad, Spell Checker, Lexicon, Language Table for IDNs, Part of Speech
Tagset and Tagger, Wordnet (1000 words), Tagged Corpus (5 Million
words), English-Bangla Parallel Corpus, Training on Content
Development using infomediaries, Online Legal Content for Farmers in
Bangla,…[details]
Dzongkha
DzongkhaLinux, Optical
Character Recognition System, Language Table for IDNs, Part of
Speech Tagset, Corpus (600,000 words), Lexicon (23,000 words), Text
to Speech System (prototype), Dzongkha Terminology, Collation,
Locale, Fonts and Keyboard, Training on DzongkhaLinux,…[details]
Khmer
Optical Character
Recognition System, Java Applications and OpenOffice.org Plug-ins
for Collation, Encoding Conversion, Word Segmentation, Locale,
Mobile SMS, Language Table for IDNs, Part of Speech Tagset and
Tagger, Lexicon, Text to Speech System (prototype), Tagged Corpus
(150,000 words), Online Khmer Content, Training of Govt. officials
on Khmer Open Source Software,…[details]
Lao
Optical Character
Recognition System, OpenOffice.org and MS Office Plug-in for Word
Segmentation, Collation, Spell Checker, Lao Pad, Fonts, Keyboard,
Language Table for IDNs, Part of Speech Tagset, POS Tagged Corpus,
Parallel Corpus (37,000 words), Online Lao Content,…[details]
Mongolian
Part of Speech Tagset
and Tagger, Spell Checker, Corpus (1,000,000 words), Tagged Corpus
(100,000 words), Lexicon (10,000 words), Automatic Speech
Recognition, Localization of Pidgin and SeaMonkey,… [details]
Nepali
NepaLinux (Awarded),
Spell Checker, Grammar Checker, Parallel Corpus (100,000 words),
Tagged Corpus (80,000 words), Lexicon (37,000 words), Optical
Character Recognition System (prototype), Language Table for IDNs,
Training Material on NepaLinux, Training of Rural Centers on Nepali
Open Source Software,…[details]
Pashto
Localized SeaMonkey (Awarded),
Keyboard, Fonts, Language Table for IDNs,…[details]
Sinhala & Tamil
Sinhala Optical
Character Recognition System, Sinhala Text to Speech System (Awarded), Screen Reader for Sinhala for Blind, Language Learning Tool for Tamil in Sinhala and
English, Sinhala Wordnet, Localized OpenTM, Language Table for IDNs,
Collation Standard, Encoding Conversion tool,…[details]
Tibetan
Collation
Standard, Online
Tibetan Content, Farmer Training on using Online Tibetan Content,…[details]
Urdu
Parallel Corpus (100,000
words), Stemmer, Collation, Optical Character Recognition,
Localization of OpenOffice.org, SeaMonkey, Web Composer and Psi,
Terminology Glossary, Gendered Outcome Mapping Tool (Awarded),
Part of Speech Tagset and Tagger, Tagged Corpus (200,000 words),
Language Table for IDNs, Training Material on Localized
Applications, Training on Localized Software to Rural School
Children, Content Generated by Rural School Children and Teachers,…[details] |