Mast Kalandar

bandar's colander of random jamun aur aam

Thu, 10 Apr 2008

< Junk mail stats from March 2008 | · | Using schroot for virtualizing >

Typing in Indic Languages

floss, indic [link] [comments ()] [raw]

I wrote the following in response to a query from Sourendu. (All this thought does not seem to help in the implementation of proper wide-char support for elvis :-( ).

There are multiple input mechanisms for unicode depending on the language.

  1. A special keyboard with a suitable "modifier key" which directly sends the appropriate unicode (multi-byte) character to the application. Basically, this is like a Compose-Lock key which shifts the keyboard into some kind of input mode where keys have different meanings.

  2. You can "emulate" the above for some situations with Alt+Numberkeys but you need to know the unicode character number for the character you want which is a pain.

  3. You use an input method which takes normal ascii input keys and converts them to unicode in the file. For example, there is the "devnag" input method which uses essentially the same ascii key sequences as that understood by the devnag TeX package. It is a bit like "abbrev" in vi. In other words, input and edit are not quite inverses of each other!

Emacs supports all the above input methods (including multiple choices for method 3).

There is one other problem with typing in indic languages --- consonants+vowels are like "ligatures".

There are two ways to treat ligatures. One way is to treat each ligature as a separate character --- and unicode has such alloted positions. The other way is to treat them as multi-character glyphs; i.e. multiple characters that get converted to single glyphs while rendering. The latter is the "natural" way from the point of view of text as one can then edit it as one would "normally" edit text. However, displaying the latter type requires a rather complex text display mechanism which understands the (possibly context-sensitive) grammar of multi-char glyphs --- a mechanism which the current xterm does not have. It looks like the new firefox does support it correctly.

A text editor that works well with unicode is yudit. It uses the "devnag" input method and displays multi-character ligatures correctly. It is a graphic-mode editor in the sense that it draws text to its own window.


< April 2008 >
   1 2 3 4 5
6 7 8 9101112

2016, 2015, 2014, 2013, 2012, 2011, 2010, 2009, 2008, 2007, 2006, 2005, 2004, 2003, 2002, 2001, 2000, 1999, 1997, 1995,