Unicode Codes

Home

4D v20 R7

Unicode Codes

In 4D, the language as well as the database engine store and work natively with Unicode characters. This facilitates the internationalization of 4D applications. Unicode is a standard unified character set that can handle practically every common language of the world. A character set is a character/number value correspondence table, for example “a”->1, “b”->2, “5”->15, “oe”->662, and so on. Whereas with ASCII, the basic number value is typically included between 1 and 127, with Unicode the upper limit exceeds 65,000, which means that nearly every character for all languages can be represented.

There are several ways to code the Unicode number values: UTF-16 codes them on 16-bit integers, UTF-32 uses 32-bit integers and UTF-8 uses 8-bit integers. 4D mainly uses UTF-16 (like Windows and macOS). Sometimes, essentially for specific needs related to the Internet, 4D uses UTF-8 which has the advantage of being more compact and having better readability for (a-z,0-9) characters.

Note: 4D internally uses UTF-16 encoding to store text in fields and variables. Unless mentioned specifically, a character, position or length handled through the programming language always refers to values in UTF-16.

For more information about the Unicode standard, please refer, for example, to the following page:
http://en.wikipedia.org/wiki/Unicode

A list of Unicode codes:
http://en.wikipedia.org/wiki/List_of_Unicode_characters

Warning: In Unicode, the following character codes are reserved and must never be included in a text:
0
65534 (FFFE)
65535 (FFFF)