Legacy Encodings support in Opera Presto 2.10

Although Opera Presto works with the Unicode character set and its character encodings of UTF-16 and UTF-8, most text on the Internet is encoded in legacy encodings, for instance:

  • ISO 8859-1
  • Windows-1251
  • Shift_JIS (MIME name)
  • EUC-KR

Opera Presto handles this by detecting the character encoding used, and converting it to UTF-16. The user has three options for how to handle these pages.

  • Auto-detect: in this mode Opera Presto will attempt to detect the encoding used by the page
    • If the transport protocol provides an encoding name, then that is used.
    • If not, Opera Presto will look at the page for a charset declaration.
    • If this is missing, Opera Presto will attempt to auto-detect the encoding, using the domain name to see if the script is a CJK script, and if so, which one.
    • Opera Presto can also auto-detect UTF-8.
  • Writing script auto-detect: In this mode the user can tell that this is a Japanese or Chinese page, but that the encoding is unknown. Opera Presto will then analyze the text in the page to determine which encoding is used.
  • Encoding override: In this mode the user selects an encoding. This encoding will be used by Opera Presto, regardless of what the page and transport protocol claims is the encoding for the page.

Opera Presto includes support for Unicode 5.2 character properties (class, casing, bidirectionality, mirroring, normalization) from 5.0.

Big5-HKSCS support for the HKSCS-2008 encoding standard has been updated.

Charset CP51932 is now implemented as an alias of euc-jp.

Removed support for UTS22: 1.4 charset alias matching.

Encoding Category Comments Support
ISO 8859-1 Latin Yes
ISO 8859-2 Latin Used in Eastern Europe Yes
ISO 8859-3 Latin Rare Yes
ISO 8859-4 Latin Sami and Baltic country Yes
ISO 8859-9 Latin Turkish Yes
ISO 8859-10 Latin Inuit, Sami, and Icelandic Yes
ISO 8859-13 Latin Rare Yes
ISO 8859-14 Latin Celtic Yes
ISO 8859-15 Latin Intended to supersede 8859-1 Yes
Windows-1250 Latin Used in Eastern Europe Yes
Windows-1252 Latin Yes
Windows-1254 Latin Turkish Yes
Windows-1257 Latin Baltic Yes
Windows-1258 Latin Vietnamese Yes
VISCII Latin Vietnamese Yes
IBM 866 Cyrillic Yes
ISO 8859-5 Cyrillic Yes
koi8-r Cyrillic Yes
koi8-u Cyrillic Ukrainian version of koi8-r Yes
Windows-1251 Cyrillic Yes
ISO 8859-6 Arabic Yes
Windows-1256 Arabic Yes
ISO 8859-7 Greek Yes
Windows-1253 Greek Yes
ISO 8859-8 Hebrew Yes
Windows-1255 Hebrew Yes
ISO 8859-11 Thai Also known as TIS-620 Yes
Windows-874 Thai Extension of ISO 8859-11 Yes
utf-8 Unicode Yes
utf-16 Unicode Yes
Shift-JIS Japanese Yes
ISO-2022-JP Japanese Yes
EUC-JP Japanese Charset CP51932 is now implemented as an alias of euc-jp. Yes
Big 5 Chinese Big5-HKSCS support for the HKSCS-2008 encoding standard has been updated. Yes
EUC-CN Chinese Also erroneously known as GB 2312 Yes
HZ-GB-2312 Chinese Primarily used in e-mail Yes
EUC-TW Chinese Yes
GBK Chinese EUC-CN extension Yes
EUC-KR Korean Yes

Support

Opera Help

Need help? Hit F1 anytime while using Opera to access our online help files, or go here.