Unicode
- Unicode
- An internationalised character set. cf ASCII,
BOM.
If viewed using an ASCII text editor a Unicode file will typically appear as though every other character is a letter. This is because Unicode (typically) uses two bytes per character compared with ASCII's one byte per character.
Unicode characters when written are expressed as "U+xxxx" where "xxxx" are hex digits that define that character. So U+0041 is the letter "A", which (by design) is the same as the ASCII code for "A".
For more information see:
- www.unicode.org - Unicode Home Page, providing information about and resources relating to Unicode.
- www.unicode.org/charts - Unicode character code charts by script.
- www.sql-und-xml.de/unicode-database - Unicode-Characters which can be used in Html/XML, ordered by block, category and other properties. (Site in German, but the tables should be intelligible whatever your language.)
- www.joelonsoftware.com/articles/Unicode.html - Article aimed at developers giving the rational behind Unicode and how it relates to other character sets and encodings.
- www.jbrowse.com/text/ - Unicode terms, FAQs and mistakes.
- http://everything.explained.at/unicode/ - Unicode explained.