Monday, 1 April 2013

Singlish - Sinhala Unicode translator API in JAVA

     Hmmmm…. After a long time nuh… ;) My 2nd post… On this post I’m going to introduce a JAVA API for Singlish to Sinhala Unicode transliation or in other words an API to translate English typed sinhala text to Sinhala Unicode characters with a small application which build using the API.

     This code + API have some nice features to enhance the usability of this application. The API can use for applications that need to translate the user type english text into Sinhala Unicode at the editing place. To represent consonant modifiers you can use either uppercase letters or the same lowercase letters. User should use minus symbol to combine letters (බැඳි අකුරු), for example to type ක් letter k-SHA should use. “-“ will be ignored in the translated text and it may use to combine letter. But if user wants to type “-“ sign, then two minus letters should use as ”--”. Other features and corresponding English letters of the API has described in the user guide.

     You can get the java files from the links at the bottom. When considering about using these classes in applications, there are two main approaches to use them.

     So I’ll start from the easy way. You can directly add the Sinhala to English translation capability to your swing application’s text component, which inherited from JTextComponent by setting the document filter.

JEditorPane editor = new JEditorPane();
((AbstractDocument) editor.getDocument()).setDocumentFilter(new SinhalaTransliatorFilter());

There is another two possible ways translate.
You can directly translate text by using the static method translate, as

String str = SinhalaTranslator.translate("k K g G X gA ch CH j k-DH”);

     This method has some drawbacks, if you are using the above method you will have to keep the original english text separately and translate entire text as user editing the text. Then the next possible and efficent way is by creating an object of the SinhalaTranslator.

     In this case you can append, insert or delete text. Except to that, the entire translated text can be retrieved by calling to the getText method of the SihalaTranslator instance. Or the most recently translated text portion can be retrieved by using the getReplacedPosition, getReplacedLength and getTranslatedText methods. If you want to translate text at the editing place (Translate text at the same place, where user types, as in the demo application.), these methods will be useful. The reason is, when user typing text there is still a portion of text that hasn’t translated. So as user appends new characters, that non-translated part should also translate as necessary.

SinhalaTranslator trans = new SinhalaTranslator();
trans.appendText(“ae”);
trans.appendText(“eaa”);
String translated = trans.getText();//aa

or

trans.appendText(“ae”);
String str = trans. getTranslatedText();//ae # case 1, replacedLength – 0,
// replacedPosition - 0
trans.appendText(“eaa”);
str = trans. getTranslatedText();// aa # case 2, replacedLength – 2,
                                                 // replacedPosition - 0

     getReplacedPosition() returns the replaced position in the text, and the getReplacedLength() returns the number of chars that replaced in the original text by the most recent append, delete or insert operation.

     For future enhancements, I think we should add a feature to insert English text without translating. And there might be some bugs with text deleting. The following links contain class files and the help document.

     If there are any bugs or suggestions let me know. And on the next post I’m going to introduce a sinhala – sinhala unicode translation addon for firefox.  Mmmmmm, c ya all in next post…. Byee…….. :D



Monday, 18 February 2013

Sinhala unicode with Java


     Welcome you all to my new blog The Life 0.5. I hope to meet you with a collection of some different, different and different kind of topics, at least twice per week. Actually, I thought to start this blog on last 23rd, but time matters as you know.  I saw somewhere that there are two kind of people, one kind of people are really busy and they have no time to do anything, and other kind of people are really busy with doing nothing.  Unfortunately I’m in the later group :(..   And that’s the end of my welcome speech and thank you all… ;)

     Ok as the first topic, I’m going to write about using Sinhala Unicode in java. I was in a great trouble with this subject and I googled all over the internet and following are my final results.

     Actually Sinhala Unicode is pretty much common in these days. As we all know ASCII uses a single byte to represent a character, but Unicode may use few bytes to represent a single character. Unicode has different flavors, some of them are using a single byte, two bytes or some others are using 4 bytes to represent a single character. These different flavors are known as character encodings.

     Unicode Sinhala characters are mapped to the range 0D80 to 0DFF. So every Sinhala character can be represented by using a hex number. As an example “” can be represented by 0D85. In java we can represent characters in Unicode by their corresponding hex numbers with a forward slash and ‘u’ in front of them.

char c = ‘\u0D85’;
or
char c =

     But then if you want to represent a letter with ispili, paapili and etc. Then you have to use two characters. One for the letter and the other one for the ispilla, papilla or for whatever. So as an example if you want to show the letter ‘කැ’ then u have to use two characters one character for ‘’ (0D9A) and the other character for aelapilla (0DD0).

String s = “\u0d9A\u0DD0”

     The above string represents “කැ”. To represent some letters it has to use more than two characters.  For an example it has to use four characters to represent ර්ම (repaya). A complete list of character and the way that they should use can find in the links at the bottom of the page.  Following is a very simple programme. 

public class Sinhala { 

           public static void main(String args[]){  

                        char c1 = '\u0D85'; 
                        char c2 = ''; 
                       
                        String s1 = "බි"; 
                        String s2 = "\u0DB6\u0DD2"; 
                      
                        System.out.println(c1 + " " + c2);//prints
                        System.out.println(s1 + " " + s2);//prints බි බි 
                        
                        දුවපන්(5, 10); 
            } 
           
            public static void දුවපන්(int පටන්, int අවසානය){ 

                       for( ;පටන් < අවසානය;  පටන්++){
                                  System.out.println(පටන් +
                                         " - \u0DAF\u0DD4\u0DC0\u0DB1\u0DDD"); 
                       } 
           } 
} 

     Sometimes if you run the above code in a command prompt, it may not show the correct output. But this runs in eclipse very fine, keep in mind to save the code with Unicode. 

     Actually it’s not recommended to use Unicode letters in code as letters, it’s better to use them with their corresponding hex values. Except to that, for I used some Sinhala letters for method names and variable names and it works, because java lets to use Unicode characters for variable names, method names and etc. But it’s funny nuh  :D… 

     On the next part of the article we are going to discuss about using Sinhala letters with swing components. In order to do this, you have to change the font of the swing component to a Sinhala viewable font. As I think “Iskoola Potha” font comes with windows 7 and sometimes with Windows Vista and I will use that font to view Sinhala letters. To use Sinhala letters you have to change the font of the swing component by using the setFont method. Following is an example, 

import java.awt.*; 
import javax.swing.*; 

public class SinhalaSwing extends JFrame{ 

            public SinhalaSwing(){ 
                        
                        super("පාටතෝරන්න"); 
                        setLayout(new FlowLayout()); 
                        
                       JButton button = new JButton("රතු"); 
                       button.setFont(new Font("Iskoola Pota", Font.PLAIN, 14)); 
                       add(button); 
                       
                        button = new JButton("කහ"); 
                        button.setFont(new Font("Iskoola Pota", Font.PLAIN, 14)); 
                        add(button); 
                        
                        JTextField field = new JTextField("තේරූ පාට - රතු"); 
                        field.setFont(new Font("Iskoola Pota", Font.PLAIN, 14)); 
                       add(field); 
                        
                        setSize(100,100); 
                        setVisible(true); 
            } 
            
            public static void main(String[] args) { 
                        new SinhalaSwing(); 
            }
}










    I was lazy to type Sinhala characters using \u in the code, but it’s recommended. If you are not sure whether there is Iskoola Potha font on your system, and if you want to find the correct font type that you should use, you can view all the available fonts in your system by using following two code lines. Then use each of those and try to find the correct font.

 GraphicsEnvironment e = GraphicsEnvironment.getLocalGraphicsEnvironment()
 Font[] fonts = e.getAllFonts();  

     So that’s the end of my first article on The Life 0.5. On next article, I will introduce a Singlish to Sinhala conversion code and it would help to developers to develop their applications. C yaaaaaaaaaaa all……..  :D


මේවත් බලන්න…. 

http://www.nsrc.org/ASIA/LK/03-Jan-2003_UCSC-Paper-on-Unicode.pdf

https://docs.google.com/document/pub?id=1gaRbfdmt31W51Y6j2YVBbISQLll1x7mZd8NBEL9ftI8 

http://www.silumina.lk/punkalasa/20121216/_art.asp?fn=ar12121611