CollectioML Unicode
The Use of Unicode for the Collectio Markup Language

Utilities for Data Input

XML has full support of Unicode. This is especially important for transcriptions of manuscripts or inscriptions.

All questions surrounding the use of Unicode, including software for various OS, are best answered on Alan Wood's Unicode page. More references, including converters, are to be found on Software-X' website.

Of all the programs, we'd like to single out Sharmahd Computing's Unipad, which is of extreme use for all writing purposes and extremely customizable. We've found an additional keyboard, for Polytonic Greek, which we put up here. We lost the original place of download, and unfortunately the original author's name isn't included …

Usefull is also the Unicode Inputter.

Browser Support for Unicode

Apparently, the Mozilla and Firefox browser family usually make the best job of rendering Unicode texts because they automatically choose a complete font for the display, even when another (incomplete) typeface is specified.

Unicode Typefaces

For 95% of all cases, the standard set of typefaces found in most Office packages is sufficient. Nowadays most modern typefaces have (at least) support for Extended Latin and Basic Greek. The Extended Greek set is found, for example, in Palatino Linotype, part of the latest MS Office products, and Sylfaen and Thorndale, which are part of the OpenOffice.Org package. Fonts for Polytonic Greek are discussed on TLG's Unicode test page.

A special mention must go to two more outstanding typefaces:

Samples

Here are small samples in Arabic and Polytonic Greek.

ما هي الشفرة الموحدة "يونِكود" ؟

أساسًا، تتعامل الحواسيب فقط مع الأرقام، وتقوم بتخزين الأحرف والمحارف الأخرى بعد أن تُعطي رقما معينا لكل واحد منها. وقبل اختراع "يونِكود"، كان هناك مئات الأنظمة للتشفير وتخصيص هذه الأرقام للمحارف، ولم يوجد نظام تشفير واحد يحتوي على جميع المحارف الضرورية. وعلى سبيل المثال، فإن الاتحاد الأوروبي لوحده، احتوى العديد من الشفرات المختلفة ليغطي جميع اللغات المستخدمة في الاتحاد. وحتى لو اعتبرنا لغة واحدة، كاللغة الإنجليزية، فإن جدول شفرة واحد لم يكف لاستيعاب جميع الأحرف وعلامات الترقيم والرموز الفنية والعلمية الشائعة الاستعما

ΙΛΙΑΔΟΣ

Α Μῆνιν ἄειδε θεὰ Πηληϊάδεω ᾿Αχιλῆος

οὐλομένην, ἣ μυρί’ ᾿Αχαιοῖς ἄλγε’ ἔθηκε,

πολλὰς δ’ ἰφθίμους ψυχὰς ῎Αϊδι προΐαψεν

ἡρώων, αὐτοὺς δὲ ἑλώρια τεῦχε κύνεσσιν

οἰωνοῖσί τε πᾶσι, Διὸς δ’ ἐτελείετο βουλή,

ἐξ οὗ δὴ τὰ πρῶτα διαστήτην ἐρίσαντε

᾿Ατρεΐδης τε ἄναξ ἀνδρῶν καὶ δῖος ᾿Αχιλλεύς.

Unicode Tables (Excerpt)

As a mere test, we display here an small group of Unicode signs. They should display in various degrees, according to which Unicode fonts are installed on your system (Arial Unicode MS, Gentium, Palatino Linotype, Lucida Sans Unicode, &c.). More of these tables are to be found, for example, at Tex Texin's website.

ASCII + Latin 1 Supplement

! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ;< = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ] ^ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~ ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

Latin Extended A

Ā ā Ă ă Ą ą Ć ć Ĉ ĉ Ċ ċ Č č Ď ď Đ đ Ē ē Ĕ ĕ Ė ė Ę ę Ě ě Ĝ ĝ Ğ ğ Ġ ġ Ģ ģ Ĥ ĥ Ħ ħ Ĩ ĩ Ī ī Ĭ ĭ Į į İ ı IJ ij Ĵ ĵ Ķ ķ ĸ Ĺ ĺ Ļ ļ Ľ ľ Ŀ ŀ Ł ł Ń ń Ņ ņ Ň ň ʼn Ŋ ŋ Ō ō Ŏ ŏ Ő ő Œ œ Ŕ ŕ Ŗ ŗ Ř ř Ś ś Ŝ ŝ Ş ş Š š Ţ ţ Ť ť Ŧ ŧ Ũ ũ Ū ū Ŭ ŭ Ů ů Ű ű Ų ų Ŵ ŵ Ŷ ŷ Ÿ Ź ź Ż ż Ž ž ſ

Latin Extended B

ƀ Ɓ Ƃ ƃ Ƅ ƅ Ɔ Ƈ ƈ Ɖ Ɗ Ƌ ƌ ƍ Ǝ Ə Ɛ Ƒ ƒ Ɠ Ɣ ƕ Ɩ Ɨ Ƙ ƙ ƚ ƛ Ɯ Ɲ ƞ Ɵ Ơ ơ Ƣ ƣ Ƥ ƥ Ʀ Ƨ ƨ Ʃ ƪ ƫ Ƭ ƭ Ʈ Ư ư Ʊ Ʋ Ƴ ƴ Ƶ ƶ Ʒ Ƹ ƹ ƺ ƻ Ƽ ƽ ƾ ƿ ǀ ǁ ǂ ǃ DŽ Dž dž LJ Lj lj NJ Nj nj Ǎ ǎ Ǐ ǐ Ǒ ǒ Ǔ ǔ Ǖ ǖ Ǘ ǘ Ǚ ǚ Ǜ ǜ ǝ Ǟ ǟ Ǡ ǡ Ǣ ǣ Ǥ ǥ Ǧ ǧ Ǩ ǩ Ǫ ǫ Ǭ ǭ Ǯ ǯ ǰ DZ Dz dz Ǵ ǵ Ƕ Ƿ Ǹ ǹ Ǻ ǻ Ǽ ǽ Ǿ ǿ Ȁ ȁ Ȃ ȃ Ȅ ȅ Ȇ ȇ Ȉ ȉ Ȋ ȋ Ȍ ȍ Ȏ ȏ Ȑ ȑ Ȓ ȓ Ȕ ȕ Ȗ ȗ Ș ș Ț ț Ȝ ȝ Ȟ ȟ Ƞ Ȣ ȣ Ȥ ȥ Ȧ ȧ Ȩ ȩ Ȫ ȫ Ȭ ȭ Ȯ ȯ Ȱ ȱ Ȳ ȳ

Latin Extended Additional

Ḁ ḁ Ḃ ḃ Ḅ ḅ Ḇ ḇ Ḉ ḉ Ḋ ḋ Ḍ ḍ Ḏ ḏ Ḑ ḑ Ḓ ḓ Ḕ ḕ Ḗ ḗ Ḙ ḙ Ḛ ḛ Ḝ ḝ Ḟ ḟ Ḡ ḡ Ḣ ḣ Ḥ ḥ Ḧ ḧ Ḩ ḩ Ḫ ḫ Ḭ ḭ Ḯ ḯ Ḱ ḱ Ḳ ḳ Ḵ ḵ Ḷ ḷ Ḹ ḹ Ḻ ḻ Ḽ ḽ Ḿ ḿ Ṁ ṁ Ṃ ṃ Ṅ ṅ Ṇ ṇ Ṉ ṉ Ṋ ṋ Ṍ ṍ Ṏ ṏ Ṑ ṑ Ṓ ṓ Ṕ ṕ Ṗ ṗ Ṙ ṙ Ṛ ṛ Ṝ ṝ Ṟ ṟ Ṡ ṡ Ṣ ṣ Ṥ ṥ Ṧ ṧ Ṩ ṩ Ṫ ṫ Ṭ ṭ Ṯ ṯ Ṱ ṱ Ṳ ṳ Ṵ ṵ Ṷ ṷ Ṹ ṹ Ṻ ṻ Ṽ ṽ Ṿ ṿ Ẁ ẁ Ẃ ẃ Ẅ ẅ Ẇ ẇ Ẉ ẉ Ẋ ẋ Ẍ ẍ Ẏ ẏ Ẑ ẑ Ẓ ẓ Ẕ ẕ ẖ ẗ ẘ ẙ ẚ ẛ Ạ ạ Ả ả Ấ ấ Ầ ầ Ẩ ẩ Ẫ ẫ Ậ ậ Ắ ắ Ằ ằ Ẳ ẳ Ẵ ẵ Ặ ặ Ẹ ẹ Ẻ ẻ Ẽ ẽ Ế ế Ề ề Ể ể Ễ ễ Ệ ệ Ỉ ỉ Ị ị Ọ ọ Ỏ ỏ Ố ố Ồ ồ Ổ ổ Ỗ ỗ Ộ ộ Ớ ớ Ờ ờ Ở ở Ỡ ỡ Ợ ợ Ụ ụ Ủ ủ Ứ ứ Ừ ừ Ử ử Ữ ữ Ự ự Ỳ ỳ Ỵ ỵ Ỷ ỷ Ỹ ỹ

Greek and Coptic

ʹ ͵ ͺ ; ΄ ΅ Ά · Έ Ή Ί Ό Ύ Ώ ΐ Α Β Γ Δ Ε Ζ Η Θ Ι Κ Λ Μ Ν Ξ Ο Π Ρ Σ Τ Υ Φ Χ Ψ Ω Ϊ Ϋ ά έ ή ί ΰ α β γ δ ε ζ η θ ι κ λ μ ν ξ ο π ρ ς σ τ υ φ χ ψ ω ϊ ϋ ό ύ ώ ϐ ϑ ϒ ϓ ϔ ϕ ϖ Ϙ ϙ ϗ Ϛ ϛ Ϝ ϝ Ϟ ϟ Ϡ ϡ Ϣ ϣ Ϥ ϥ Ϧ ϧ Ϩ ϩ Ϫ ϫ Ϭ ϭ Ϯ ϯ ϰ ϱ ϲ ϳ ϴ ϵ ϶

Greek Extended

ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ ἐ ἑ ἒ ἓ ἔ ἕ Ἐ Ἑ Ἒ Ἓ Ἔ Ἕ ἠ ἡ ἢ ἣ ἤ ἥ ἦ ἧ Ἠ Ἡ Ἢ Ἣ Ἤ Ἥ Ἦ Ἧ ἰ ἱ ἲ ἳ ἴ ἵ ἶ ἷ Ἰ Ἱ Ἲ Ἳ Ἴ Ἵ Ἶ Ἷ ὀ ὁ ὂ ὃ ὄ ὅ Ὀ Ὁ Ὂ Ὃ Ὄ Ὅ ὐ ὑ ὒ ὓ ὔ ὕ ὖ ὗ Ὑ Ὓ Ὕ Ὗ ὠ ὡ ὢ ὣ ὤ ὥ ὦ ὧ Ὠ Ὡ Ὢ Ὣ Ὤ Ὥ Ὦ Ὧ ὰ ά ὲ έ ὴ ή ὶ ί ὸ ό ὺ ύ ὼ ώ ᾀ ᾁ ᾂ ᾃ ᾄ ᾅ ᾆ ᾇ ᾈ ᾉ ᾊ ᾋ ᾌ ᾍ ᾎ ᾏ ᾐ ᾑ ᾒ ᾓ ᾔ ᾕ ᾖ ᾗ ᾘ ᾙ ᾚ ᾛ ᾜ ᾝ ᾞ ᾟ ᾠ ᾡ ᾢ ᾣ ᾤ ᾥ ᾦ ᾧ ᾨ ᾩ ᾪ ᾫ ᾬ ᾭ ᾮ ᾯ ᾰ ᾱ ᾲ ᾳ ᾴ ᾶ ᾷ Ᾰ Ᾱ Ὰ Ά ᾼ ᾽ ι ᾿ ῀ ῁ ῂ ῃ ῄ ῆ ῇ Ὲ Έ Ὴ Ή ῌ ῍ ῎ ῏ ῐ ῑ ῒ ΐ ῖ ῗ Ῐ Ῑ Ὶ Ί ῝ ῞ ῟ ῠ ῡ ῢ ΰ ῤ ῥ ῦ ῧ Ῠ Ῡ Ὺ Ύ Ῥ ῭ ΅ ` ῲ ῳ ῴ ῶ ῷ Ὸ Ό Ὼ Ώ ῼ ´ ῾

Runic

ᚠ ᚡ ᚢ ᚣ ᚤ ᚥ ᚦ ᚧ ᚨ ᚩ ᚪ ᚫ ᚬ ᚭ ᚮ ᚯ ᚰ ᚱ ᚲ ᚳ ᚴ ᚵ ᚶ ᚷ ᚸ ᚹ ᚺ ᚻ ᚼ ᚽ ᚾ ᚿ ᛀ ᛁ ᛂ ᛃ ᛄ ᛅ ᛆ ᛇ ᛈ ᛉ ᛊ ᛋ ᛌ ᛍ ᛎ ᛏ ᛐ ᛑ ᛒ ᛓ ᛔ ᛕ ᛖ ᛗ ᛘ ᛙ ᛚ ᛛ ᛜ ᛝ ᛞ ᛟ ᛠ ᛡ ᛢ ᛣ ᛤ ᛥ ᛦ ᛧ ᛨ ᛩ ᛪ ᛫ ᛬ ᛭ ᛮ ᛯ ᛰ

Old Italic

̀ ́ ̂ ̃ ̄ ̅ ̆ ̇ ̈ ̉ ̊ ̋ ̌ ̍ ̎ ̏ ̐ ̑ ̒ ̓ ̔ ̕ ̖ ̗ ̘ ̙ ̚ ̛ ̜ ̝ ̞ ̠ ̡ ̢ ̣

Characters for Epigraphic Use

The following is a table of characters used in epigraphic studies which you can print out.

Alternatively, we prepared a 2-page flyer with the most used epigraphic signs.

David J. Perry discusses issues for scholars of Classical languages, including the newly proposed Greek characters by the Thesaurus Linguae Graecae.

Table 1. Unicode Characters for Epigraphic Use

Character Hexadecimal Name
U+0301A LEFT WHITE SQUARE BRACKET
U+0301B RIGHT WHITE SQUARE BRACKET
U+02016 DOUBLE VERTICAL LINE
| U+0007C VERTICAL LINE
+ U+0002B PLUS SIGN
U+02282 SUBSET OF
U+02283 SUPERSET OF
U+02627 CHI RHO
U+2720 MALTESE CROSS
U+203B REFERENCE MARK
Ϝ U+03DC GREEK LETTER DIGAMMA
ϲ U+03F2 GREEK LUNATE SIGMA SYMBOL
Ϡ U+03E0 GREEK LETTER SAMPI
Ϟ U+03DE GREEK LETTER KOPPA
Ϙ U+03D8 GREEK LETTER ARCHAIC KOPPA
Ϛ U+03DA GREEK LETTER STIGMA
U+025B2 BLACK UP-POINTING TRIANGLE
U+025B4 BLACK UP-POINTING SMALL TRIANGLE
U+025BC BLACK DOWN-POINTING TRIANGLE
U+025BE BLACK DOWN-POINTING SMALL TRIANGLE
U+022EE VERTICAL ELLIPSIS
U+03008 LEFT ANGLE BRACKET
U+03009 RIGHT ANGLE BRACKET
U+0300A LEFT DOUBLE ANGLE BRACKET
U+0300B RIGHT DOUBLE ANGLE BRACKET
U+02329 LEFT-POINTING ANGLE BRACKET
U+0232A RIGHT-POINTING ANGLE BRACKET
⌜o U+0231C TOP LEFT CORNER
o⌝ U+0231D TOP RIGHT CORNER
′o U+02032 PRIME
o‵ U+02035 REVERSED PRIME
U+02218 RING OPERATOR
U+02219 BULLET OPERATOR
U+02160 ROMAN NUMERAL ONE
U+02164 ROMAN NUMERAL FIVE
U+02169 ROMAN NUMERAL TEN
U+0216C ROMAN NUMERAL FIFTY
U+0216D ROMAN NUMERAL ONE HUNDRED
U+0216E ROMAN NUMERAL FIVE HUNDRED
U+0216F ROMAN NUMERAL ONE THOUSAND
U+02180 ROMAN NUMERAL ONE THOUSAND C D
U+02181 ROMAN NUMERAL FIVE THOUSAND
U+02182 ROMAN NUMERAL TEN THOUSAND
U+00325 COMBINING RING BELOW
U+00304 COMBINING MACRON
U+00305 COMBINING OVERLINE
U+00301 COMBINING ACUTE ACCENT
U+00341 COMBINING ACUTE TONE MARK
U+00302 COMBINING CIRCUMFLEX ACCENT
U+00335 COMBINING SHORT STROKE OVERLAY
U+00336 COMBINING LONG STROKE OVERLAY
{ U+0007B LEFT CURLY BRACKET
} U+0007D RIGHT CURLY BRACKET
U+02070 SUPERSCRIPT ZERO
¹ U+000B9 SUPERSCRIPT ONE
² U+000B2 SUPERSCRIPT TWO
³ U+000B3 SUPERSCRIPT THREE
U+02074 SUPERSCRIPT FOUR
U+02075 SUPERSCRIPT FIVE
U+02076 SUPERSCRIPT SIX
U+02077 SUPERSCRIPT SEVEN
U+02078 SUPERSCRIPT EIGHT
U+02079 SUPERSCRIPT NINE