Unicode character coding

The UXLC is an XML (eXtensible Markup Language) document with Unicode representation of the Leningrad Codex text. The XML markup can be deduced by viewing files with an ordinary text editor and is not described here. The biblical text is contained in one of 3 text tags, <w>, <q>, or <k>, corresponding to ordinary, qere, or ketiv orthographic words. Text tags may contain Unicode characters from a limited set of 82 characters. This set contains 591-5ae, 5b0-5c5, 5d0-5ea from the Unicode Hebrew block plus 3 special characters, called pseudo accents, 034f (CGJ), 200d (ZWJ), 20 (space). The nun hafukha, ׆, doesn't appear in text tags. Instead, it is displayed via the self-closing XML tag <reversednun/>.

The three pseudo accents are placed in text tags to improve the display. These special characters are discussed at the bottom of this page.


Character table:

Current Hebrew font: Taamey D Web

Lower combining dots have been added to the 'h' in Unicode names to indicate a Ḥet, ח. They are not part of the formal Unicode names.

HexChar
Unicode name

Hebrew name
Accent type*UXLC content typeEquivalentNotes
0591
ס֑
Etnaḥta
אֶתְנַחְתָּ֑א
D-1 Accent      
0592
ס֒
Segol
סֶגוֹל֒
D-2 Accent  SegoltaPostpositive. This is the ACCENT Segol, not to be confused with the vowel (POINT) Segol.
0593
ס֓
Shalshelet
שַׁלְשֶׁ֓לֶת
D-2 Accent      
0594
ס֔
Zaqef Qatan
זָקֵף־קָט֔וֹן
D-2 Accent      
0595
ס֕
Zaqef Gadol
זָקֵף־גָּד֕וֹל
D-2 Accent      
0596
ס֖
Tipeḥa
טִפְּחָ֖א
D-2 Accent  Tarḥa 
0597
ס֗
Revia
רְבִ֗יעַ
D-3 Accent      
0598
ס֘
Zarqa
זַרְקָא֮
D-3 Accent  TsinoritThis Unicode character is used for a Zarqa appearing directly above a letter, a 'stress helper' Zarqa. A true Zarqa is postpositive and is shown by a Unicode Zinor. (!) The Hebrew name shows a Unicode Zinor.
0599
ס֙
Pashta
פַּשְׁטָא֙
D-3 Accent     Postpositive.
059a
ס֚
Yetiv
יְ‏֚תִיב
D-3 Accent     Prepositive.
059b
ס֛
Tevir
תְּבִ֛יר
D-3 Accent      
059c
ס֜
Geresh
גֶּ֜רֶשׁ
D-4 Accent      
059d
ס֝
Geresh Muqdam
גֵּרֵשׁ־מֻ֝קְדָּם
Accent     Occurs only in the Sifrei Emet. Prepositive.
059e
ס֞
Gershayim
גֵּרְשַׁיִ֞ם
D-4 Accent      
059f
ס֟
Qarney Para
קַרְנֵי־פָרָ֟ה
D-4 Accent  Pazer gadol  
05a0
ס֠
Telisha Gedola
תְּלִישָׁה־גְּ֠דוֹלָ֠ה
D-4 Accent     Prepositive.
05a1
ס֡
Pazer
פָּזֵ֡ר
D-4 Accent      
05a2
ס֢
Atnaḥ Hafukh
אַתְנָח־הָפ֢וּךְ
Sifrei Emet only Accent     The accent should appear as an inverted Etnaḥta.
05a3
ס֣
Munaḥ
מוּנַ֣ח
C Accent     In a word with a Munaḥ ending with a Unicode Paseq, the Unicode Paseq is often interpreted as a Legarmeh accent (D-4).
05a4
ס֤
Mahapakh
מַהְפַּ֤ךְ
C Accent      
05a5
ס֥
Merkha
מֵרְכָ֥א
C Accent  Yored 
05a6
ס֦
Merkha Kefula
מֵרְכָא־כְפוּלָ֦ה
C Accent      
05a7
ס֧
Darga
דַּרְגָ֧א
C Accent      
05a8
ס֨
Qadma
קַדְמָ֨א
C Accent     Easily confused with Pashta, which is postpositive.
05a9
ס֩
Telisha Qetana
תְּלִישָׁה־קְטַנָה֩
C Accent     Postpositive.
05aa
ס֪
Yeraḥ Ben Yomo
יֶרַח־בֶּן־יוֹמ֪וֹ
C Accent  GalgalThe name means "new moon"; the accent should appear as a semi-circular cup. Displayed as an Unicode Atnah Hafukh, x05a2, with most fonts.
05ab
ס֫
Ole
עוֹלֶ֫ה
Sifrei Emet only Accent      
05ac
ס֬
Iluy
עִלּ֬וּי
Sifrei Emet only Accent      
05ad
ס֭
Deḥi
דֶּ֭חִי
Sifrei Emet only Accent     Prepositive.
05ae
ס֮
Zinor
צִינּוֹר֮
D-3 Accent  ZarqaThis character is used for real (postpositive) Zarqa as opposed to 'stress helper' Zarqa which is implemented as a Unicode Zarqa. The transliterated name should really be Tsinor.
05b0
סְ
Sheva
שְׁוָא
Vowel      
05b1
סֱ
Ḥataf Segol
חֲטַף סֶגוֹל
Vowel      
05b2
סֲ
Ḥataf Pataḥ
חֲטַף פַּתָּח
Vowel      
05b3
סֳ
Ḥataf Qamats
חֲטַף קָמָץ
Vowel      
05b4
סִ
Ḥiriq
חִירִיק
Vowel      
05b5
סֵ
Tsere
צֵירֵי
Vowel      
05b6
סֶ
Segol
סֶגוֹל
Vowel     This is the Vowel (POINT) Segol, not to be confused with the ACCENT Segol.
05b7
סַ
Pataḥ
פַּתָּח
Vowel   Furtive pataḥ is not a distinct character.
05b8
סָ
Qamats
קָמָץ
Vowel      
05b9
וֹ ,סֹ
Ḥolam (dot)
חוֹלָם
Vowel      
05ba
וֺ
Ḥolam ḥaser for vav (dot)
חוֹלָם חָסֵר בְּוָי״ו
Vowel      
05bb
סֻ
Qubuts
קֻבּוּץ
Vowel      
05bc
סּ
Dagesh
דָּגֵשׁ
Vowel  Mapiq, Shuruq dotFalls within base letter.
05bd
סֽ
Meteg
מֱתֶג
Accent  SiluqServes often as the Hebrew accent Siluq (D-1). A center meteg is preceded by a ZWJ (x200D) for positioning.
05be
ס־
Maqaf
מַקֵּף
Vowel   The UXLC displays this accent in the Vowels content mode.
05bf
סֿ
Rafe
רָפֶה
Accent     Indicates the absence of a dagesh. The UXLC shows only "important" rafes.
05c0
ס ׀
Paseq
פָּסֵק
Accent     This character functions as a Legarmeh (D-4) in some contexts and as a true Paseq in others. Each Unicode Paseq is preceded by a Space (x0020) for positioning.
05c1
שׁ
Shin DotVowel      
05c2
שׂ
Sin DotVowel      
05c3
ס׃
Sof Pasuq
סוֹף פָּסוּק
Vowel   The UXLC displays this accent in the Vowels content mode.
05c4
סׄ
Upper Dot Accent      
05c5
סׅ
Lower Dot Accent      
05d0
א
Alef
אָלֶף
Consonant      
05d1
ב
Bet
בֵּית
Consonant      
05d2
ג
Gimel
גִימֵל
Consonant      
05d3
ד
Dalet
דָלֶת
Consonant      
05d4
ה
He
הֵא
Consonant      
05d5
ו
Vav
וָו
Consonant      
05d6
ז
Zayin
זַיִן
Consonant      
05d7
ח
Ḥet
חֵית
Consonant      
05d8
ט
Tet
טֵית
Consonant      
05d9
י
Yod
יוֹד
Consonant      
05da
ך
Final Kaf
כַּף סוֹפִית
Consonant      
05db
כ
Kaf
כַּף
Consonant      
05dc
ל
Lamed
לָמֶד
Consonant      
05dd
ם
Final Mem
מֵם סוֹפִית
Consonant      
05de
מ
Mem
מֵם
Consonant      
05df
ן
Final Nun
נוּן סוֹפִית
Consonant      
05e0
נ
Nun
נוּן
Consonant      
05e1
ס
Samekh
סָמֶךְ
Consonant      
05e2
ע
Ayin
עַיִן
Consonant      
05e3
ף
Final Pe
פֵּה סוֹפִית
Consonant      
05e4
פ
Pe
פֵּה
Consonant      
05e5
ץ
Final Tsadi
צַדִי סוֹפִית
Consonant      
05e6
צ
Tsadi
צַדִי
Consonant      
05e7
ק
Qof
קוֹף
Consonant      
05e8
ר
Resh
רֵישׁ
Consonant      
05e9
ש
Shin
שִׁין
Consonant      
05ea
ת
Tav
תָו
Consonant      
Pseudo accents
0020   Space   Accent  Space appears in an XML tag only before a Unicode Paseq.
034f
͏
Combining grapheme joiner (CGJ)   Accent  Improves browser display of Decalogue/Reuven saga, Jerusalem, and leading Metegs.
200d
Zero width joiner (ZWJ)   Accent  Center Meteg is a ZWJ + Meteg.

*  The "Accent type" column gives accent types for accents appearing in Tanach books other than Job, Proverbs, and Psalms. These 3 books are called the Sifrei Emet, where the Emet, אמת, comes from the Hebrew initials of the books: איוב משלי תהלים. That is, the listed accent types, and the comments below, apply only to Tanach books other than those of the Sifrei Emet.

The accents consist of 8 conjunctives, marked with a 'C', and 18 disjunctives, marked with a 'D-N'. The disjunctive accents have 4 levels indicated by a number after the 'D-'. Two disjunctives are not in the table. The level 1 disjunctive Siluq (D-1) is represented graphically by a Meteg. A word having the level 4 disjunctive Legarmeh, (fully: munaḥ legarmeh) has two accents, a Munaḥ and a final Unicode Paseq. One disjunctive, Zarqa, occupies two code points depending on positioning: Zarqa or Zinor.


Special characters

 27.6