 |
Sri Lanka |
|
|
 |
University of
Colombo School of Computing |
 |
 |
|
In September 2002,
University of Colombo School of Computing (UCSC) was established by
merging The Institute of Computer Technology and The Department of
Computer Science both of the University of Colombo, as the first
centre of higher learning of computing in Sri Lanka. The number of
students admitted to these programmes may be increased in the coming
years. Three M.Sc programmes in Computer Science, Information
Technology and Advanced Computing, which admitted a total of 200
students, were started in the year 2002. M.Phil and Ph.D programmes
were introduced by the UCSC in the year 2002. |
|
|
|
|
Computer Science
course modules are also conducted for first, second, third and
fourth year students of the Physical and Bio Science streams of the
Faculty of Science, University of Colombo. The vision of the UCSC is
to be a centre of international repute in training in Information
and Communication Technologies (ICT). |
|
The UCSC is fully
equipped with nine student laboratories, two multi media
laboratories, two research laboratories, a campus wide fibre
network, with the entire UCSC building complex fully wired for
Internet access. UCSC has a state of the art network-operating
centre. The library is well equipped with Books, Journals, CDs and
provides Internet access to e-journals. The UCSC is involved in many
research areas such as, Natural & Local Language Processing, Human
Computer Interfaces, Image Processing and Vision, Cryptographic
Systems, Multi Media and Virtual Reality, Intelligent Agent Systems,
Pattern Recognition, Distributed Systems, Information Retrieval and
Data Mining, Process Broker Modeling Systems, Web Based Business
Services, Multi Media Database Systems, e-Learning, Strategic
Planning & Management of IT, IT Policy and Multi Database Systems.
The major goal
of the UCSC is to prepare students for careers in Information and
Communication Technology as Software Developers, Systems Analysts,
Network Administrators, Database Administrators, Web Developers, IT
Managers, IT Strategic Planners and IT Policy Makers. |
|
 |
|
 |
|
|
 |
Team's Achievements |
 |
 |
|
Language
Technology Research Laboratory is involved in many projects by
itself or by partnering with other organizations to promote local
language computing. |
|
|
|
|
|
|
|
- Sinhala
Lexicon database for Microsoft Cooperation
|
|
The Lexicon
is aimed to be a speller, to be used with Office applications,
which contains a set of all the words for the Sinhala language.
The idea is to develop a word pool with all parts of speech, and
generated forms of words, which provides 90% coverage for the
language.
|
|
|
|
Under this
initiative, all the speech related research are being carried
out and relevant products and updates are available freely for
the community.
|
|
|
|
Both online
and desktop application tools for converting proprietary font
encodings to Unicode and vice versa are freely available for the
community.
|
|
|
|
AMIS
is free of charge, open source
DAISY
book playback software. The AMIS Sinhala localization is
collaboratively done by LTRL and DAISY Lanka Foundation.
|
|
|
|
|
|
|
|
|
|
|
|
 |
|
|
|
|
 |
Achievements during Phase I |
 |
 |
|
In Phase I, two electronic resources and two commercial grade
applications for free (non-commercial use) were delivered. |
|
|
|
The aim was to
build an electronic corpus for various language processing
tasks. It contains large amount of Sinhala electronic text from
a wide range of sources in UNICODE format. This corpus,
containing 10,000,000 words can be obtained for research
purposes through a written request to LTRL.
|
|
|
|
The lexicon
contains a list of more than 25,000 Sinhala words together with
some grammatical features. The features identified currently
are, the part-of-speech, number and gender, but may be extended
as the requirements arise. In addition, this lexicon contains
English & Tamil translations for corresponding Sinhala words
providing a resource for language translation work
|
|
|
|
While there
were some experimental TTS systems by the UCSC for Sinhala are
already under work, the aim of this project was to produce one
that is of commercial quality. To this end, considerable effort
was being spent on quality aspects of this activity. Apart from
identifying the phonetic alphabet of the language, recording
relevant word sentences in the database and building a text
analysis component, the project also produced a synthesizing
engine that facilitates natural sounding Sinhala voice.
|
|
|
|
Previous works
at the UCSC concerning OCRs had concentrated on developing a
technique best suited for detecting printed Sinhala characters.
This component of work focused on converting that research into
a real product by making it robust for variations in font size,
particularly those commonly used by the majority of the people
including newspaper prints and government publications. Later it
will be developed into font-independent OCR software.
|
|
 |
|
|
|
|
 |
Goals during Second Phase |
 |
 |
|
Five main tasks
are identified under the Phase 2 proposal. They are |
|
|
|
This activity
is proposed in collaboration with other partners in order to
build up a repository of tools and resources across the language
groups covered by the project. It is divided into several sub
tasks and some of these are supported by the corpus collected in
phase 1.
|
|
-
Machine Assisted Translation
Tool
|
|
One of the key
enabling technologies for wide access to ICT is for content to
be available in one's native tongue. With the vast amount of
information already available on the web in English and other
non-local languages, translation becomes of crucial importance
if citizen’s of countries such as Sri Lanka are to benefit. This
task is concerned with making this process speedier for human
translators, with much of the routine translations done
automatically. The approach to be used to achieve this task is
based on Example Based Machine Translation where a mechanism for
sharing Translation Memories will be built in order to assist
translators to have access to the experience of other
translators.
|
|
-
Handwriting
Recognition on Mobile Devices
|
|
With the ever
increasing importance of the stylus as an input device to
handheld devices, online handwriting recognition is becoming
crucial. In this task, the main objective is to develop an
easy-to-use Graffiti style solution to recognize Sinhala
characters on different mobile platforms.
|
|
-
Effective Training Methodology and Material Development for
Local Language Teaching and Learning
|
|
Language is a
very powerful tool for mutual understanding between people. The
lack of knowledge in another's language on the other hand has
been the cause of many a misunderstandings and causes for wars.
Sri Lanka's ethnic conflict has roots in language among other
things. No local language project could ignore the strategic
opportunity provided by technology to scale the teaching and
learning of another language. This is the aim of this task: to
develop effective training materials and methodology to make
learning another language less arduous. The framework developed
is expected to be flexible enough to extend itself to be used by
other project partners to teach their languages.
|
|
-
Training on Sinhala Web Content Development
|
|
The
distribution of content on the World Wide Web in different
languages of the world does not accurately reflect the users of
such languages. Languages in the partner countries are grossly
under represented. In order to mitigate this anomaly, this
component is designed to encourage the publishing of local
language content on the web. Apart from the technologies
surrounding UNICODE, methods of content publishing ranging from
web site development to uploading content to blogs and wikis
will be part of this training.
|
|
 |
|
|
|
|
 |
Project Team Leader |
 |
 |
|
Dr. A.R.
Weerasinghe
With over 18 years
of teaching experience, Dr. A.R. Weerasinghe has been dealing more
recently with issues of scaling ICT education through technology. He
has served on various committees and boards at the UGC, Ministry of
Education and CINTEC/ICT Agency in this capacity over the years.
Currently he also serves as a member of the NCED Software & ICT
Cluster of the Treasury and the ICT Advisory Committee of the Export
Development Board. His research interests are in Human Language
Technology including Machine Translation, and particularly in
techniques employing statistical methods and machine learning using
corpora. He leads a research group of 6-10 involved in Language
Technology at the UCSC’s Language Technology Research Lab. He was
awarded fellowships by the European Union as an ERCIM Fellow at
France’s INRIA Labs and by the Fulbright Commission at
Carnegie-Mellon’s Language Technology Institute to pursue these
research interests in 2001 and 2002 respectively. |
|
|
|
|
 |
CPI Team Members and their Designations |
 |
 |
|
Dr. A.R.
Weerasinghe |
|
Head of Laboratory
& Project Lead of PAN Localization Project
The present Director of the UCSC,
with a PhD in Natural Language Processing and wide experience over
the years supervising many student projects in the field. |
|
Mr. Harsha
Wijewardhane |
|
Software
Consultant & Deputy Project Lead of PAN Localization Project
Experienced software developer and a
current coordinator of the Sinhala UNICODE font and keyboard
implementation work, responsible for fundamental aspects of language
support implementation like collation sequence and sorting,
development, deployment, testing and quality assurance aspects of
the products to be delivered. |
|
Prof. J.B. Disanayaka |
|
Senior Linguist
One of the widely accepted Sinhala
experts, responsible for all linguistic aspects of the project
including training. |
|
Mr. S.T. Nandasara |
|
Senior Research
Associate
Has a long history in the field of
Sinhala language-support on computers beginning from DOS
environment, primarily responsible for the Text-to-Speech (TTS)
system. |
|
Dr. Lalith Premarante |
|
Senior Research
Associate
With a PhD in signal processing,
presented a new method of optical character recognition optimized
for recognizing Sinhala characters, responsible for training and
development of the OCR system. |
|
Mr. Vincent Halahakone |
|
Corpus Linguist |
|
Mr. Harshula Jayasuriya |
|
Visiting Research Associate |
|
|
Mr. Dulip Herath |
|
Senior Research Assistant & Team
Lead of PAN Localization Project |
|
Mr. Viraj Welgama |
|
Senior Research Assistant |
|
Mr. Rajathurai Premkumar |
|
Research Assistant |
|
Mr. Asanka Wasala |
|
Research Assistant |
|
Mr. Namal Udalamatta |
|
Research Assistant |
|
Mr. Asiri Ranasinghe |
|
Research Assistant |
|
Mr. Chamila Liyanage |
|
Research Assistant |
|
|
|
 |
|
|
|
|