Pan Localization Phase 2

The International Development Research Centre (IDRC), Canada, through its Pan Asia Networking (PAN) Program

National University of Computer and Emerging Sciences (NUCES), Pakistan, through its Centre for Research in Urdu Language Processing (CRULP)

[Objectives] [Scope] [Partner Institutions]

are pleased to announce their continued partnership in PAN Localization project starting from 2007 up till 2010 after the successful completion of Phase I from 2004 to 2007, a partnership in a South and South-East Asia wide initiative to build capacity in regional institutions for local language computing.

The Phase I of PAN Localization project focussed on developing local language standards and technology across seven Asian countries. The countries (and languages) included in the Project were Afghanistan (Pashto), Bangladesh (Bangla), Bhutan (Dzongkha), Cambodia (Khmer), Laos (Lao), Nepal (Nepali) and Sri Lanka (Sinhala, Tamil).

At the end of Phase I of the project, the countries successfully completed the planned spectrum of research and developed local language technology and applications. Outputs include the development and release of Linux distributions for Dzongkha and Nepali, working systems for Optical Character Recognition for Sinhala, Bangla and Lao, Lexica and Spell Checking Utility for Bangla, Dzongkha, Khmer, Lao and Nepali, Text To Speech System for Sinhala, Keyboard and Collation Standards, Fonts and more.

The project has also carried out an extensive training program to raise capacity to develop language technology, conducting national and regional short-term and long term programs across all partner countries. Training has been imparted in linguistics, standards development, open source software localization, speech processing, script processing and computational linguistics. Details of training conducted, training programs and training material is also published at the project website (under the Activities link). The first phase has also built an Asian network of researchers to share knowledge in language computing. The project has been (and is continually) publishing research reports, documenting effective processes, results and recommendations.

Phase II of PAN Localization project has research into challenges associated with digital literacy of end-users using the localized technology for communication and to produce local language content. The project has also mature the language technology in the target languages.

A complete list of software and associated research outputs are posted under the Outputs link.

Objectives

The general objectives of Phase II of the project are to:

examine effective means to develop digital literacy through the use of local language computing and content.

explore development of sustainable human resource capacity for R&D in local language computing as a means to raise current levels of technological support for Asian languages.

advance policy for development and use of local language computing and content.

study and develop coherent instruments to gauge the effectiveness of multi-disciplinary research concerning the adoption of local language technology by rural communities.

Scope

Phase II aims to provide training to end-users across partner countries for using the localized technology developed through Phase I, in order to draw the intended socio-economic benefit of the work. The project will also study effective methods for training a variety of user-groups for accessing and publishing local language content, including rural populations, students, monks, government staff, private sector, etc. across most of the partner countries. These groups will be trained to do document processing, emailing, accessing internet and local content publishing through websites. Training material for this purpose will also be developed in the local languages of participating countries.

PAN Localization Phase II will also be looking at various aspects related to local content development, including requirement gathering, creation and deployment for all the languages of the project. Primarily the research will be dealing with appropriate models for the identification of user-groups and their local language content requirements across different communities, languages and countries. It will also address the study of effectiveness of current standards for online publishing of local language content. The project will also be producing research reports on effective ways of development and publishing of content and language resources.

Phase II aims to consolidate and further develop the advanced end-user applications for languages across the region and explore localization of the emerging mobile platform. The research will study of challenges and effective means to develop local language standards, tools and frameworks available for developing local language technology. Planned outputs include Speech Recognition system, Text To Speech system, Open Office Localization, Mobile Applications, Linux distributions, Language processing Applications e.g. Tagged corpora, parallel corpora, Lexica, Applications for word segmentation, and morphological and syntactical analyses. These applications will be developed in multiple languages across partner countries.

The project will also look at the policy support to develop and promote local language technology, training and content, and evaluative techniques for such work. This region-wide initiative will particularly benefit non-English speakers in rural Asia, who form the digitally-divided populations of the region.

This project will be led by researchers at CRULP, NUCES. CRULP will be coordinating efforts across Asia through ICT researchers, practitioners, linguists and policymakers from government agencies, universities and the private sector. The countries (and languages) included in the second phase of the project are Afghanistan (Pashto), Bangladesh (Bangla), Bhutan (Dzongkha), Cambodia (Khmer), China (Tibetan), Laos (Lao), Mongolia (Mongolian), Nepal (Nepali), Pakistan (Urdu) and Sri Lanka (Sinhala, Tamil).

Partner Institutions

The implementers of the Project are ICT researchers, practitioners, linguists and policy-makers from government agencies, universities and the private sector. In addition to PAN and CRULP, the following are the participating institutions in this Project:


	The regional project is collaboration between

	Pakistan:	Center for Research in Urdu Language Processing, National University of Computer and Emerging Sciences (NUCES)

	and the following partner organizations across developing Asia

	Afghanistan:	Afghan Computer Science Association

	Bangladesh:	Center for Research on Bangla Language Processing BRAC University D.Net

	Bhutan:	Department of Information Technology Dzongkha Development Authority

	Cambodia:	Institute of Technology Ministry of Education, Youth and Sports The National ICT Development Authority (NIDA)

	China:	Institute of Science Technology Tibat Academy of Agricultural and Animal Husbandry Sciences Tibet University

	Indonesia:	University of Indonesia Agency for the Assessment and Application of Technology (BPPT)

	Laos:	National Authority of Science and Technology

	Mongolia:	InfoCon Co. Ltd. Mongolia University of Science and Technology National University of Mongolia

	Nepal:	E-Network Research and Development Madan Puraskar Pustakalya

	Sri Lanka:	Language Technology Research Center University of Colombo School of Computing

	and is funded by

	Canada:	International Development Research Center (IDRC)

This region-wide Project initiative will particularly benefit non-English speakers in rural Asia who form the digitally-divided populations of the region.