 |
Bangladesh |
|
|
 |
Center for
Research on Bangla Language Processing (CRBLP) |
 |
 |
|
|
|
|
|
The Center for Research
on Bangla Language Processing (CRBLP) of BRAC University is
currently conducting research projects that deal with Bangla
language processing. At present the research team is working on
Bangla Document authoring, Information Retrieval (Spelling checker,
Search Engine), Optical Character Recognition, Pronunciation
Generator, Speech Processing, Morphological Analysis, Parts of
Speech Tagging, Syntax, Grammar Checker, Text Categorization,
Language Modeling and many more interesting research areas. |
|
|
|
|
 |
|
 |
|
|
 |
Team's Achievements |
 |
 |
|
|
|
|
|
-
CRBLPConverter:
CRBLPConverter is a software package to convert various TTF
encoded Bangla documents to Unicode encoding. CRBLPConverter
includes converters for SutonnyMJ, Bangsee Alpona, Prothoma, and
Alo. This software is free and open source, released under the
GNU Public License (GPL) version 2.
|
|
|
|
-
BanglaPad:
BanglaPad is an open source, full-featured cross-platform
Unicode rich text editor capable of editing Bangla that can run
on different operating systems, such as Windows, Linux/Unix,
owing to its base on the Java programming language. Users can
type Bangla text without using external helper applications such
as keyboard drivers and can check spelling of both Bangla and
English document.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
-
Automated
Pronunciation Generator:
When you input
any Bangla word, this application will give the pronunciation of
that word in IPA (International Phonetic Alphabet).
|
|
 |
|
|
 |
Goals during Second Phase |
 |
 |
|
-
Optical
Character Recognition: OCRs are used to convert printed text to digital text that can be
used and reformatted for other uses. We are developing a Bangla
Character Recognizer that can recognize printed Bangla document
and convert to editable text.
|
|
|
|
-
Speech
Recognition:
Speech recognition is the process of converting a speech signal
to a sequence of words, by means of an algorithm implemented as
a computer program. We are currently working on Bangla speech
recognition using Hidden Markov Model (HMM) as the technique and
HTK as the toolkit.
|
|
|
|
-
Speech
Synthesis:
Speech Synthesis is the artificial production of human speech. A
Text-to-speech system converts normal text to speech. We are
working on generating speech signal from Bangla Text.
|
|
|
|
-
TTF to
Unicode Font Converter: There are number of ASCII
based Bangla fonts out there. The trouble with these fonts is
that if the host machine doesn’t have the font installed, then
the text gets jumbled up. We are working on a TTF to Unicode
Font converter which will enable us to convert the ASCII text to
a Unicode text. That way, we just need to have a Unicode Bangla
font installed and we will be able to see the text properly.
|
|
|
|
-
Corpus
Analysis: We have developed a tool for extensive corpus
analysis on word frequency distribution. We currently have a
corpus of one-year of Prothom-Alo newspaper text and Charjapad
and Baru Chandi Das Er Kabbo. We have analyzed our corpus for
regularities and anomalies in Bangla Word Usage.
|
|
|
|
-
Lexicon:
We need a rich and informative lexicon for any kind Bangla
Language Processing. We have developed a wordlist of 160
thousand words with 1st step parts of speech tagging
|
|
|
|
-
Wordnet:
We are developing a semantic lexicon for Bangla that divides
Bangla words into sets of synonyms and maps various semantic
relations between these word sets.
|
|
|
|
 |
|
|
 |
CPI Team Members and their Designations |
 |
 |
|
-
Mumit Khan,
Head, Center for Research on Bangla Language Processing,
Associate Professor, CSE Department
|
|
-
Matin Saad Abdullah,
Program
Manager,
Center for Research on Bangla Language Processing, Senior
Lecturer, CSE Department
|
|
- Naira Khan,
Linguist, Lecturer - English and Humanities
|
|
|
- Zahurul Islam,
Research Programmer
|
|
|
- Naushhad
Uzzaman, Research Programmer (on leave)
|
|
|
- Md.
Abul Hasnat, Research Programmer
|
|
|
- S.M.
Murtaza Habib, Research Programmer
|
|
|
- Firoj
Alam, Research Prograamer
|
|
|
- Fahim
Tawfique Chowdhury,
Research Programmer
|
|
|
|
|
|
 |
|
|
|
|