[back]

Sri Lanka
   
University of Colombo School of Computing

 

In September 2002, University of Colombo School of Computing (UCSC) was established by merging The Institute of Computer Technology and The Department of Computer Science both of the University of Colombo, as the first centre of higher learning of computing in Sri Lanka. The number of students admitted to these programmes may be increased in the coming years. Three M.Sc programmes in Computer Science, Information Technology and Advanced Computing, which admitted a total of 200 students, were started in the year 2002.  M.Phil and Ph.D programmes were introduced by the UCSC in the year 2002.

   
 

Computer Science course modules are also conducted for first, second, third and fourth year students of the Physical and Bio Science streams of the Faculty of Science, University of Colombo. The vision of the UCSC is to be a centre of international repute in training in Information and Communication Technologies (ICT).

 

 

The UCSC is fully equipped with nine student laboratories, two multi media laboratories, two research laboratories, a campus wide fibre network, with the entire UCSC building complex fully wired for Internet access. UCSC has a state of the art network-operating centre. The library is well equipped with Books, Journals, CDs and provides Internet access to e-journals. The UCSC is involved in many research areas such as, Natural & Local Language Processing, Human Computer Interfaces, Image Processing and Vision, Cryptographic Systems, Multi Media and Virtual Reality, Intelligent Agent Systems, Pattern Recognition, Distributed Systems, Information Retrieval and Data Mining, Process Broker Modeling Systems, Web Based Business Services, Multi Media Database Systems, e-Learning, Strategic Planning & Management of IT, IT Policy and Multi Database Systems.

The major goal of the UCSC is to prepare students for careers in Information and Communication Technology as Software Developers, Systems Analysts, Network Administrators, Database Administrators, Web Developers, IT Managers, IT Strategic Planners and IT Policy Makers.

 

 

   
Team's Achievements

 

Language Technology Research Laboratory is involved in many projects by itself or by partnering with other organizations to promote local language computing.

 
 
 
 
  • Sinhala Lexicon database for Microsoft Cooperation
 

The Lexicon is aimed to be a speller, to be used with Office applications, which contains a set of all the words for the Sinhala language. The idea is to develop a word pool with all parts of speech, and generated forms of words, which provides 90% coverage for the language.

 
 

Under this initiative, all the speech related research are being carried out and relevant products and updates are available freely for the community.

 
 

Both online and desktop application tools for converting proprietary font encodings to Unicode and vice versa are freely available for the community.

 
 

AMIS is free of charge, open source DAISY book playback software. The AMIS Sinhala localization is collaboratively done by LTRL and DAISY Lanka Foundation.

 
 
 
 
   
 

   
   
Achievements during Phase I

  In Phase I, two electronic resources and two commercial grade applications for free (non-commercial use) were delivered.
 
 

The aim was to build an electronic corpus for various language processing tasks. It contains large amount of Sinhala electronic text from a wide range of sources in UNICODE format. This corpus, containing 10,000,000 words can be obtained for research purposes through a written request to LTRL.

 
 

The lexicon contains a list of more than 25,000 Sinhala words together with some grammatical features. The features identified currently are, the part-of-speech, number and gender, but may be extended as the requirements arise. In addition, this lexicon contains English & Tamil translations for corresponding Sinhala words providing a resource for language translation work

 
 

While there were some experimental TTS systems by the UCSC for Sinhala are already under work, the aim of this project was to produce one that is of commercial quality. To this end, considerable effort was being spent on quality aspects of this activity. Apart from identifying the phonetic alphabet of the language, recording relevant word sentences in the database and building a text analysis component, the project also produced a synthesizing engine that facilitates natural sounding Sinhala voice.

 
 

Previous works at the UCSC concerning OCRs had concentrated on developing a technique best suited for detecting printed Sinhala characters. This component of work focused on converting that research into a real product by making it robust for variations in font size, particularly those commonly used by the majority of the people including newspaper prints and government publications. Later it will be developed into font-independent OCR software.

 

   
   
Goals during Second Phase

  Five main tasks are identified under the Phase 2 proposal. They are
 
  • Regional Tasks
 

This activity is proposed in collaboration with other partners in order to build up a repository of tools and resources across the language groups covered by the project. It is divided into several sub tasks and some of these are supported by the corpus collected in phase 1.

 
  • Machine Assisted Translation Tool
 

One of the key enabling technologies for wide access to ICT is for content to be available in one's native tongue. With the vast amount of information already available on the web in English and other non-local languages, translation becomes of crucial importance if citizen’s of countries such as Sri Lanka are to benefit. This task is concerned with making this process speedier for human translators, with much of the routine translations done automatically. The approach to be used to achieve this task is based on Example Based Machine Translation where a mechanism for sharing Translation Memories will be built in order to assist translators to have access to the experience of other translators.

 
  • Handwriting Recognition on Mobile Devices
 

With the ever increasing importance of the stylus as an input device to handheld devices, online handwriting recognition is becoming crucial. In this task, the main objective is to develop an easy-to-use Graffiti style solution to recognize Sinhala characters on different mobile platforms.

 
  • Effective Training Methodology and Material Development for Local Language Teaching and Learning
 

Language is a very powerful tool for mutual understanding between people. The lack of knowledge in another's language on the other hand has been the cause of many a misunderstandings and causes for wars. Sri Lanka's ethnic conflict has roots in language among other things. No local language project could ignore the strategic opportunity provided by technology to scale the teaching and learning of another language. This is the aim of this task: to develop effective training materials and methodology to make learning another language less arduous. The framework developed is expected to be flexible enough to extend itself to be used by other project partners to teach their languages.

 
  • Training on Sinhala Web Content Development
 

The distribution of content on the World Wide Web in different languages of the world does not accurately reflect the users of such languages. Languages in the partner countries are grossly under represented. In order to mitigate this anomaly, this component is designed to encourage the publishing of local language content on the web. Apart from the technologies surrounding UNICODE, methods of content publishing ranging from web site development to uploading content to blogs and wikis will be part of this training.

 

   
   
Project Team Leader

 

Dr. A.R. Weerasinghe
With over 18 years of teaching experience, Dr. A.R. Weerasinghe has been dealing more recently with issues of scaling ICT education through technology. He has served on various committees and boards at the UGC, Ministry of Education and CINTEC/ICT Agency in this capacity over the years. Currently he also serves as a member of the NCED Software & ICT Cluster of the Treasury and the ICT Advisory Committee of the Export Development Board. His research interests are in Human Language Technology including Machine Translation, and particularly in techniques employing statistical methods and machine learning using corpora. He leads a research group of 6-10 involved in Language Technology at the UCSC’s Language Technology Research Lab. He was awarded fellowships by the European Union as an ERCIM Fellow at France’s INRIA Labs and by the Fulbright Commission at Carnegie-Mellon’s Language Technology Institute to pursue these research interests in 2001 and 2002 respectively.

 

 

 

 

CPI Team Members and their Designations

  Dr. A.R. Weerasinghe
 

Head of Laboratory & Project Lead of PAN Localization Project 
The present Director of the UCSC, with a PhD in Natural Language Processing and wide experience over the years supervising many student projects in the field.

 

Mr. Harsha Wijewardhane

 

Software Consultant & Deputy Project Lead of PAN Localization Project
Experienced software developer and a current coordinator of the Sinhala UNICODE font and keyboard implementation work, responsible for fundamental aspects of language support implementation like collation sequence and sorting, development, deployment, testing and quality assurance aspects of the products to be delivered.

  Prof. J.B. Disanayaka
 

Senior Linguist
One of the widely accepted Sinhala experts, responsible for all linguistic aspects of the project including training.

  Mr. S.T. Nandasara
 

Senior Research Associate
Has a long history in the field of Sinhala language-support on computers beginning from DOS environment, primarily responsible for the Text-to-Speech (TTS) system.

 

Dr. Lalith Premarante

 

Senior Research Associate
With a PhD in signal processing, presented a new method of optical character recognition optimized for recognizing Sinhala characters, responsible for training and development of the OCR system.

  Mr. Vincent Halahakone
  Corpus Linguist
  Mr. Harshula Jayasuriya
  Visiting Research Associate
UCSC Team
  Mr. Dulip Herath
  Senior Research Assistant & Team Lead of PAN Localization Project
  Mr. Viraj Welgama
  Senior Research Assistant
  Mr. Rajathurai Premkumar
  Research Assistant
  Mr. Asanka Wasala
  Research Assistant
  Mr. Namal Udalamatta
  Research Assistant
  Mr. Asiri Ranasinghe
  Research Assistant
  Mr. Chamila Liyanage
  Research Assistant