The Karen Script

The purpose of this page is to provide documentation on the Sgaw and Pwo Karen scripts, and how they are implemented in Unicode. This information could be useful for software or web developers.

The Karen writing system can be broken down into a string of well defined syllables. Each syllable is made up of up to 5 parts:

<consonant> <medial> <vowel> <tone> <nasal or final>

Pwo Karen syllables can end with nasal sounds. There are never final consonants sounds of any kind in Sgaw Karen; however, a final is sometimes added when transliterating foreign words. The final consonant is marked with the killer character, as in Burmese.

Sgaw Karen

Character Unicode Code Point Character Frequency Pronunciation
က U+1000 2.79393 g
U+1001 0.83519 k
U+1002 0.89839 ɣ or ʁ
U+1003 0.40768 x
U+1004 0.00075 ŋ
U+1005 1.57541 s
U+1006 0.65118
ၡ (or ရှ) U+1061 (or U+101B U+103E) 0.20450 ʃ
U+100A 0.26987 ɲ
U+1010 3.55647 t
U+1011 0.85088
U+1012 3.59760 d
U+1014 2.04729 n
U+1015 2.10669 p
U+1016 1.23766
U+1018 1.35043 b
U+1019 2.06727 m
U+101A 1.32901 j
U+101B 0.31319 ɾ
U+101C 4.53955 l
U+101D 0.84533 w
U+101E 1.82657 θ
U+101F 0.58053 h
U+1021 4.93165
U+1027 0.26756

U+1061 is the correct way to write sha. However, when this character is followed by the vowels U+102F or U+1030 many people like to see them rendered underneath the consonant, which sometimes doesn't happen with U+1061. The combination of U+101B, U+103E, and one of these vowels will do that, at least in the Padauk font. The most notable syllable in which this is an issue is the Karen transliteration of 'Jesus'. It is unfortunate that the fonts do not yet render these character combinations correctly.

Sgaw Medials
Character Unicode Code Point Character Frequency Pronunciation May Follow
U+103E ɣ or ʁ U+1005, U+1006, U+1015, U+1016, U+1018, U+1019
U+1060 0.00230 [(medialgha)] U+1000, U+1001, U+1015, U+1016, U+1018, U+1019.
U+103B 0.46935 l U+1000, U+1001, U+1015, U+1016, U+1018, U+1019
U+103C 0.06656 ɾ U+1000, U+1001, U+1005, U+1006, U+1010, U+1011, U+1015, U+1016, U+101E.
U+103D 0.80900 w U+1000, U+1001, U+1005, U+1006, U+100A, U+1010, U+1011, U+1012, U+1015, U+1016, U+1019, U+101A, U+101B, U+101C, U+101E

If U+1060 follows U+1000, then the pronunciation is dʒ. ​Also, if U+1060 follows U+1001, then the pronunciation is tʃ. Otherwise, pronunciation varies.

Sgaw Vowels
Character Unicode Code Point Character Frequency IPA Pronunciation Notes
U+102B 0.73639 a If this vowel is followed by a tone mark, then it is understood and not written.
U+1036 2.44956 i
U+1062 2.70066 ɘ This character, though a vowel here, is the same as the first character used to make the 'erthee' tone mark. It's unfortunate that the Unicode standard does not provide a separate character for these to very different uses.
U+102F 1.53393 ɨ This character will normally appear below the consonant. However, if there is not space below (maybe because of the presence of a medial consonant, etc.) it may be made taller (up to the top line) and placed directly following the consonant.
U+1030 1.29936 ʉ May appear below or beside as U+102F.
U+1037 3.13266 e Is often shifted forward or backward to accommodate other characters or protrusions under the consonant.
U+1032 2.07882 ɛ
U+102D 2.90762 ɔ
U+102E 6.44103 ɒ
Sgaw Tones
Character Unicode Code Point Character Frequency Name Pronunciation
ၢ် U+1062 U+103A 4.41454 'erthee' Normal length low tone.
ာ် U+102C U+103A 1.09383 'athee' Fairly short, low tone, ending with a glottal stop.
U+1038 4.82099 'plerhsee' Very short, high-mid tone, ending with a glottal stop.
ၣ် U+1063 U+103A 5.74351 'hathee' Falling tone.
U+1064 7.23370 'gapo' Long, mid tone.
(No Written Mark) N/R N/R N/A Slightly rising tone.
Special Character
Character Unicode Code Point Notes
U+103A In standard Sgaw Karen, this character is only used following two consonants (U+1012 and U+1019) to make two special syllables (ဒ် and မ်). It is also occasionally used to transcribe foreign words that have a final consonant. The consonant is simply placed after the syllable and toped with U+103A to indicate that it is the ending of the previous syllable. This character is also used on the 'erthee', 'athee', and 'hathee' tone marks.

Pwo Karen

Character Unicode Code Point Character Frequency IPA Pronunciation Notes
က U+1000 g
U+1001 k
U+1002 ɣ or ʁ
U+100E x
U+1004 ŋ
U+1005 s
U+1007 0.00000 z
(or ရှ) U+1061 (or U+101B U+103E) ʃ As mentioned for Sgaw Karen, U+1061 is the correct way to write sha.
U+100A ɲ
U+1010 t
U+1012 d
U+1014 n
U+1015 p
U+1018 b
U+1019 m
U+101A j
U+101B ɾ
U+101C l
U+101D w
U+1065 θ
U+101F h
U+1021 This character is used as in Sgaw Karen.
U+1027 This character is similar to the Sgaw Karen equivalent.
Character Unicode Code Point Character Frequency IPA Pronunciation Notes
U+103E ɣ or ʁ
U+1060 0.00230
U+103B 0.46935 l
U+103C 0.06656 ɾ
U+103D 0.80900 w
Character Unicode Code Point Character Frequency IPA Pronunciation Notes
U+102B a If this vowel is followed by a tone mark, then it is understood and not written.
U+1036 i
U+1067 ɨ May appear below or beside, as in Sgaw Karen.
U+1030 ʉ May appear below or beside as U+102F.
U+1037 e Is often shifted forward or backward to accommodate other characters or protrusions under the consonant.
U+1032 2.07882 ɛ
U+102D 2.90762 ɔ
U+102E 6.44103 ɒ
Character Unicode Code Point Character Frequency Name Pronunciation
Character Unicode Code Point Character Frequency Name Pronunciation
U+1038 Ngathee
U+1037 Ngathee When Ngathee follows a written tone mark, it appears below the tone.

The following three tones may be followed by the nasal mark: U+1069 ( ၩ့), U+106A ( ၪ့), and U+106B ( ၫ့).

Numbers and Punctuation

Character Unicode Code Point Notes
. U+002E The period (full stop) is used in Karen in a way somewhat similar to English.
, U+002C The comma is also used in Karen in a way similar to English.
? U+003F While not normally used, the question mark can occasionally be found at the end of Sgaw Karen questions, especially in colloquial writing.
“” U+201C U+201D Opening and closing quotation marks are often used.
() U+0028 U+0029 Opening and closing quotation marks are occasionally used.
/ U+002F The forward slash (solidus) is commonly used to indicate contractions.
U+2605 The star seems to be a favorite of many Karen typists and is often used for bullet points. It should probably be included in a Karen keyboard layout.
Arabic Myanmar Unicode Code Point
0 U+1040
1 U+1041
2 U+1042
3 U+1043
4 U+1044
5 U+1045
6 U+1046
7 U+1047
8 U+1048
9 U+1049

All the character frequencies are based off my own analysis, using the Karen version of the Holy Bible. Frequencies are given as a percent of times that character occurred against all the alphabetic Karen characters in the corpus (other characters, such as spaces and quotes were discarded before counting).

The Line Breaking Problem

A problem often encountered in Sgaw Karen word processing is getting lines to break at the right points. Written Karen does not have spaces between words or syllables. It only uses them between phrases and sentences, which often can be quite long. The traditional 'correct' way to handle this is to put a space where ever you want a line to break. If you want to later edit the paragraph or change the font size, then you will need to manually move those spaces to new locations. Many Karen typists, however, don't even do that. They opt instead to use a carriage return at the end of every single line. And then, being used to doing that in Karen, do the same for their English documents too.

One way to solve this problem is to insert U+200B ZERO WIDTH SPACE between syllables. If the syllable boundary ends at the end of a line, then the word processor will automatically break the line there. If it does not land at the end of the line, then the space is invisible, and can safely remain in the document.

The advantages to this method are:

  1. It's currently one of the the only options I know of without modifying the line wrapping engines in individual word processing environments.
  2. It's simple to use and implement.

Some disadvantages include:

  1. Even though U+200B spaces are invisible, they are still a character and still must be 'hoped over' when moving the cursor with the keyboard arrows causing apparent 'cursor hangups' (it can get annoying).
  2. Since U+200B is invisible, it could be difficult to tell where they are and are not present. This could prompt the typist to inset them everywhere, even where they have already been inserted. This could further aggravate the 'cursor hangup' problem.
  3. LibreOffice and OpenOffice display a very annoying gray colored mark over places where this character is present. The mark would be fantastic for taking a look at the things if only there were a way to turn it on and off. So far, I have been unable to find such a way.

I see at least three work flows that a typist could use incorporating the U+200B space character. Here they are:

  1. Type and edit the document without paying any attention to where lines break (the word processor will break them between phrases). Then after the final edit, and before printing, go through the document and insert U+200B (or a regular space) at the end of every line. With a little practice, you will be able to accurately judge where that will be so you only have to do it once for each line.
  2. Assign the U+200B character to an easy to reach key (such as the space bar) and type it between every single syllable that you type. This may work best for documents that do not need to see a lot of editing.
  3. Write a macro or plugin that goes though your document inserting U+200B wherever a line break would be legal, and tidying up existing U+200B's. Type away, and whenever you get annoyed with the word wrapping, hit the shortcut key to you macro and tidy everything up.



Nathan Miles, 2015/12/11 01:06


I see the following at the beginning of a Sgaw Karen phrase

1016 1032 002C 103D 1064

Is it possible that the comma is really something besides a punctuation mark?

Ben Sharon, 2016/05/23 08:14

Hi Nathan, I can't see how that sequence could be Karen, if it's valid Unicode. Maybe they were using some encoding besides Unicode? Maybe the Zawgyi Karen spin-off?

