Strings in Tcl are encoded using UTF-8 character sequences. Different operating system interfaces or applications may generate strings in other encodings such as Shift-JIS. The encoding command helps to bridge the gap between Tcl strings and these other formats.
Performs one of several encoding related operations, depending on option. The legal options are:
set text "caf\u00e9" ;# café
set unitext [encoding convertto unicode $text]
set imax [string length $unitext] ;# do not use string bytelength
for {set i 0} {$i < $imax} {incr i} {
set ch [string index $unitext $i]
scan $ch %c ich
lappend icodes $ich
}
puts $icodes
99 0 97 0 102 0 233 0 ;# on littleEndian platforms
0 99 0 97 0 102 0 233 ;# on bigEndian platforms
The unicode encoding results in a sequence of two byte integers that have your
platform's big-endian or little-endian ordering. The tcl_platform(byteOrder)
global variable specifies the order used.
It is common practice to write script files using a text editor that produces output in the euc-jp encoding, which represents the ASCII characters as single bytes and Japanese characters as two bytes. This makes it easy to embed literal strings that correspond to non-ASCII characters by simply typing the strings in place in the script. However, because the source command always reads files using the ISO8859-1 encoding, Tcl will treat each byte in the file as a separate character that maps to the 00 page in Unicode. The resulting Tcl strings will not contain the expected Japanese characters. Instead, they will contain a sequence of Latin-1 characters that correspond to the bytes of the original string. The encoding command can be used to convert this string to the expected Japanese Unicode characters. For example,
set s [encoding convertfrom euc-jp "\xA4\xCF"]would return a string equivalent to "\u306F", which is the Hiragana letter HA.