contents   index   previous   next



ANSI (Single-byte), Multibyte and Unicode strings

 

Computers map characters into numerical (usually integer) codes. ANSI-standard or single-byte strings are stored as unsigned byte-characters, each byte containing a different character. In many cases, ANSI-strings are also null-terminated (i.e. the last character in the string is a null character), which is often used to determine the length of the string. Because the ANSI standard uses a single byte to represent data, it is limited to a maximum of 256 different characters. While this is sufficient for the English language, ANSI strings are a problem for many other languages that have alphabets with more than 256 characters.

 

Multibyte strings introduced a solution to the 256-characters limitation of single-byte strings. Multibyte characters can use one or two bytes to represent a character. A multibyte string may contain a mix of single-byte characters and double-byte characters. A two-byte multibyte character has a lead byte and a trail byte. In a particular multibyte-character set, the lead bytes fall within a certain range, as do the trail bytes. When these ranges overlap, it may be necessary to evaluate the context to determine whether a given byte is functioning as a lead byte or a trail byte. Although multibyte characters are still supported by Windows environments, they tend to be cumbersome to work with, and have been replaced by Unicode strings on 32-bit Windows.

 

Unicode is the new standard adopted by Windows 95/98 and Windows NT. Each Unicode character is composed of two bytes, independently of whether it needs the complete 16 bits or not. Because of this, Unicode characters are also known as wide characters. Generally, wide characters take more space in memory than multibyte characters but are faster to process. In addition, only one locale can be represented at a time in multibyte encoding, whereas all character sets in the world can be represented simultaneously by the Unicode representation.

 

f90VB's BStrings library can handle single-byte, Multibyte and Unicode strings. Single-byte and Multibyte strings are stored as normal Fortran Strings. Unicode strings are stored in three possible types of strings, Fixed OLE Strings, Fortran OLE Strings and BStrings.

 

Fortran strings