[clean-list] Sanskrit Transliteration - Parsing into Abstract
Syntax Trees
metaperl
metaperl at gmail.com
Wed Aug 27 21:23:54 MEST 2008
Ok, it's time to spill the beans. My goal in Clean parsing has to do with
Sanskrit.
Sanskrit is written in a certain script
<http://en.wikipedia.org/wiki/Devanagari>. But prior to modern typography,
people used to approximate that script with ASCII character sets.
<http://en.wikipedia.org/wiki/Devanagari_transliteration>
Over time, several systems evolved.
My goals are
1 - bidirectional transliteration between the various ascii schemes:
// Given Harvard-Kyoto, produce Velthuis encoding:
translit Harvard Velthuis "ajJAna" // output will be aj~naana
2 - unidirectional translation from any ascii scheme to Unicode
// Given Harvard-Kyoto, produce Unicode encoding:
// an expansion on http://www.iit.edu/~laksvij/language/sanskrit.html
translit Harvard Unicode "ajJAna" // output will be अज्ञान
I'm thinking of using the Velthuis encoding
<http://en.wikipedia.org/wiki/Devanagari_transliteration#Velthuis>
as the "Abstract Syntax Tree" for the whole project. Regardless of what
ascii I get, convert it to Velthuis and then convert the Velthuis to the
specified target.
I still have a few more days of banging my head against the MetarParser, but
I wanted to at least let people know where I'm heading with all these
questions.
Errata:
====
A major hitch in converting ascii to unicode is that all of the ascii
schemes are purely linear: you read them the way you would read english,
left to right.
However, Devanagari is non-linear in at least two places:
* short "i" precedes the consonants that it is pronounced after ... in other
words "agni" is written in Devanaagarii with the "i" between "a" and "g" ---
"aign" even though pronounced "agni"
* "r" goes to the far right of the consonants it _precedes_... in other
words "rgo" is written in Devanaagarii with the "r" after the "go"
There already is a good converter from Harvard-Kyoto to Devanaagarii
<http://www.iit.edu/~laksvij/language/sanskrit.html> so I may just focus on
bidirectional ASCII translation and then when I need Unicode, simply use his
online tool.
It would be nice to have all resources available in a Clean program though.
--
View this message in context: http://www.nabble.com/Sanskrit-Transliteration---Parsing-into-Abstract-Syntax-Trees-tp19187901p19187901.html
Sent from the Clean mailing list archive at Nabble.com.
More information about the clean-list
mailing list