Partners - UCSC, Sri Lanka

Sri Lanka

University of Colombo School of Computing

In September 2002, University of Colombo School of Computing (UCSC) was established by merging The Institute of Computer Technology and The Department of Computer Science both of the University of Colombo, as the first centre of higher learning of computing in Sri Lanka. The number of students admitted to these programmes may be increased in the coming years. Three M.Sc programmes in Computer Science, Information Technology and Advanced Computing, which admitted a total of 200 students, were started in the year 2002. M.Phil and Ph.D programmes were introduced by the UCSC in the year 2002.

Computer Science course modules are also conducted for first, second, third and fourth year students of the Physical and Bio Science streams of the Faculty of Science, University of Colombo. The vision of the UCSC is to be a centre of international repute in training in Information and Communication Technologies (ICT).

The UCSC is fully equipped with nine student laboratories, two multi media laboratories, two research laboratories, a campus wide fibre network, with the entire UCSC building complex fully wired for Internet access. UCSC has a state of the art network-operating centre. The library is well equipped with Books, Journals, CDs and provides Internet access to e-journals. The UCSC is involved in many research areas such as, Natural & Local Language Processing, Human Computer Interfaces, Image Processing and Vision, Cryptographic Systems, Multi Media and Virtual Reality, Intelligent Agent Systems, Pattern Recognition, Distributed Systems, Information Retrieval and Data Mining, Process Broker Modeling Systems, Web Based Business Services, Multi Media Database Systems, e-Learning, Strategic Planning & Management of IT, IT Policy and Multi Database Systems.

The major goal of the UCSC is to prepare students for careers in Information and Communication Technology as Software Developers, Systems Analysts, Network Administrators, Database Administrators, Web Developers, IT Managers, IT Strategic Planners and IT Policy Makers.

Team's Achievements

Language Technology Research Laboratory is involved in many projects by itself or by partnering with other organizations to promote local language computing.

Collation Algorithm for Sinhala

Localization of Google interface

User Interface Terminology for Sinhala

Sinhala Lexicon database for Microsoft Cooperation

The Lexicon is aimed to be a speller, to be used with Office applications, which contains a set of all the words for the Sinhala language. The idea is to develop a word pool with all parts of speech, and generated forms of words, which provides 90% coverage for the language.

Speech Technology Initiative

Under this initiative, all the speech related research are being carried out and relevant products and updates are available freely for the community.

Font Encoding converters for Sinhala/Tamil

Both online and desktop application tools for converting proprietary font encodings to Unicode and vice versa are freely available for the community.

Localization of AMIS

AMIS is free of charge, open source DAISY book playback software. The AMIS Sinhala localization is collaboratively done by LTRL and DAISY Lanka Foundation.

English to Sinhala dictionary for Windows Mobile® based devices

Computer based teaching tool for Sinhala

UCSCSellinam - Tamil SMS Application

Unicode Fonts by UCSC

Achievements during Phase I

In Phase I, two electronic resources and two commercial grade applications for free (non-commercial use) were delivered.

Sinhala corpus of 10 million words

The aim was to build an electronic corpus for various language processing tasks. It contains large amount of Sinhala electronic text from a wide range of sources in UNICODE format. This corpus, containing 10,000,000 words can be obtained for research purposes through a written request to LTRL.

Sinhala lexicon with translations to Tamil and English

The lexicon contains a list of more than 25,000 Sinhala words together with some grammatical features. The features identified currently are, the part-of-speech, number and gender, but may be extended as the requirements arise. In addition, this lexicon contains English & Tamil translations for corresponding Sinhala words providing a resource for language translation work

Commercial-grade Sinhala Text-To-Speech system

While there were some experimental TTS systems by the UCSC for Sinhala are already under work, the aim of this project was to produce one that is of commercial quality. To this end, considerable effort was being spent on quality aspects of this activity. Apart from identifying the phonetic alphabet of the language, recording relevant word sentences in the database and building a text analysis component, the project also produced a synthesizing engine that facilitates natural sounding Sinhala voice.

Commercial-grade Sinhala OCR software

Previous works at the UCSC concerning OCRs had concentrated on developing a technique best suited for detecting printed Sinhala characters. This component of work focused on converting that research into a real product by making it robust for variations in font size, particularly those commonly used by the majority of the people including newspaper prints and government publications. Later it will be developed into font-independent OCR software.

Goals during Second Phase

Five main tasks are identified under the Phase 2 proposal. They are

Regional Tasks

This activity is proposed in collaboration with other partners in order to build up a repository of tools and resources across the language groups covered by the project. It is divided into several sub tasks and some of these are supported by the corpus collected in phase 1.

Machine Assisted Translation Tool

One of the key enabling technologies for wide access to ICT is for content to be available in one's native tongue. With the vast amount of information already available on the web in English and other non-local languages, translation becomes of crucial importance if citizen’s of countries such as Sri Lanka are to benefit. This task is concerned with making this process speedier for human translators, with much of the routine translations done automatically. The approach to be used to achieve this task is based on Example Based Machine Translation where a mechanism for sharing Translation Memories will be built in order to assist translators to have access to the experience of other translators.

Handwriting Recognition on Mobile Devices

With the ever increasing importance of the stylus as an input device to handheld devices, online handwriting recognition is becoming crucial. In this task, the main objective is to develop an easy-to-use Graffiti style solution to recognize Sinhala characters on different mobile platforms.

Effective Training Methodology and Material Development for Local Language Teaching and Learning

Language is a very powerful tool for mutual understanding between people. The lack of knowledge in another's language on the other hand has been the cause of many a misunderstandings and causes for wars. Sri Lanka's ethnic conflict has roots in language among other things. No local language project could ignore the strategic opportunity provided by technology to scale the teaching and learning of another language. This is the aim of this task: to develop effective training materials and methodology to make learning another language less arduous. The framework developed is expected to be flexible enough to extend itself to be used by other project partners to teach their languages.

Training on Sinhala Web Content Development

The distribution of content on the World Wide Web in different languages of the world does not accurately reflect the users of such languages. Languages in the partner countries are grossly under represented. In order to mitigate this anomaly, this component is designed to encourage the publishing of local language content on the web. Apart from the technologies surrounding UNICODE, methods of content publishing ranging from web site development to uploading content to blogs and wikis will be part of this training.

Project Team Leader

Dr. A.R. Weerasinghe
With over 18 years of teaching experience, Dr. A.R. Weerasinghe has been dealing more recently with issues of scaling ICT education through technology. He has served on various committees and boards at the UGC, Ministry of Education and CINTEC/ICT Agency in this capacity over the years. Currently he also serves as a member of the NCED Software & ICT Cluster of the Treasury and the ICT Advisory Committee of the Export Development Board. His research interests are in Human Language Technology including Machine Translation, and particularly in techniques employing statistical methods and machine learning using corpora. He leads a research group of 6-10 involved in Language Technology at the UCSC’s Language Technology Research Lab. He was awarded fellowships by the European Union as an ERCIM Fellow at France’s INRIA Labs and by the Fulbright Commission at Carnegie-Mellon’s Language Technology Institute to pursue these research interests in 2001 and 2002 respectively.

CPI Team Members and their Designations

Dr. A.R. Weerasinghe

Head of Laboratory & Project Lead of PAN Localization Project
The present Director of the UCSC, with a PhD in Natural Language Processing and wide experience over the years supervising many student projects in the field.

Mr. Harsha Wijewardhane

Software Consultant & Deputy Project Lead of PAN Localization Project
Experienced software developer and a current coordinator of the Sinhala UNICODE font and keyboard implementation work, responsible for fundamental aspects of language support implementation like collation sequence and sorting, development, deployment, testing and quality assurance aspects of the products to be delivered.

Prof. J.B. Disanayaka

Senior Linguist
One of the widely accepted Sinhala experts, responsible for all linguistic aspects of the project including training.

Mr. S.T. Nandasara

Senior Research Associate
Has a long history in the field of Sinhala language-support on computers beginning from DOS environment, primarily responsible for the Text-to-Speech (TTS) system.

Dr. Lalith Premarante

Senior Research Associate
With a PhD in signal processing, presented a new method of optical character recognition optimized for recognizing Sinhala characters, responsible for training and development of the OCR system.

Mr. Vincent Halahakone

Corpus Linguist

Mr. Harshula Jayasuriya

Visiting Research Associate

Mr. Dulip Herath

Senior Research Assistant & Team Lead of PAN Localization Project

Mr. Viraj Welgama

Senior Research Assistant

Mr. Rajathurai Premkumar

Research Assistant

Mr. Asanka Wasala

Research Assistant

Mr. Namal Udalamatta

Research Assistant

Mr. Asiri Ranasinghe

Research Assistant

Mr. Chamila Liyanage

Research Assistant