CopperSpice API
1.9.2
|
Converts between text encodings and QString. More...
Classes | |
class | ConverterState |
Stores the current state of the Unicode parser More... | |
Public Typedefs | |
using | ConversionFlags = QFlags< ConversionFlag > |
Public Types | |
enum | ConversionFlag |
Public Methods | |
virtual QStringList | aliases () const |
bool | canEncode (const QString &str) const |
bool | canEncode (QChar ch) const |
QByteArray | fromUnicode (const QString &str, ConverterState *state=nullptr) const |
QByteArray | fromUnicode (QStringView str, ConverterState *state=nullptr) const |
QTextDecoder * | makeDecoder (ConversionFlags flags=DefaultConversion) const |
QTextEncoder * | makeEncoder (ConversionFlags flags=DefaultConversion) const |
virtual int | mibEnum () const = 0 |
virtual QString | name () const = 0 |
QString | toUnicode (const char *input) const |
QString | toUnicode (const char *input, int len, ConverterState *state=nullptr) const |
QString | toUnicode (const QByteArray &input) const |
Static Public Methods | |
static QStringList | availableCodecs () |
static QList< int > | availableMibs () |
static QTextCodec * | codecForHtml (const QByteArray &data) |
static QTextCodec * | codecForHtml (const QByteArray &data, QTextCodec *defaultCodec) |
static QTextCodec * | codecForLocale () |
static QTextCodec * | codecForMib (int mib) |
static QTextCodec * | codecForName (const char *name) |
static QTextCodec * | codecForName (const QString &name) |
static QTextCodec * | codecForTr () |
static QTextCodec * | codecForUtfText (const QByteArray &data) |
static QTextCodec * | codecForUtfText (const QByteArray &data, QTextCodec *defaultCodec) |
static void | setCodecForLocale (QTextCodec *c) |
static void | setCodecForTr (QTextCodec *c) |
Protected Methods | |
QTextCodec () | |
virtual | ~QTextCodec () |
virtual QByteArray | convertFromUnicode (QStringView str, ConverterState *state) const = 0 |
virtual QString | convertToUnicode (const char *input, int len, ConverterState *state) const = 0 |
The QTextCodec class provides conversions from encoded text to a QString and from a QString to encoded text. If you need to handle data which uses an encoding which QString does not support you must use a QTextCodec. If you using UTF-8 or UTF-16 this class is not required as these are supported in QString.
CopperSpice provides a set of classes which inherit from QTextCodec to support many non-Unicode formats. You can also implement your own codec by inheriting from QTextCodec and overriding the convertFromUnicode() and convertToUnicode() methods.
The currently supported encodings are listed below.
As an example, if you have a string encoded in Russian KOI8-R and want to convert it to a QString, you would use code similar to the following:
The QString str will contain the original text converted to Unicode. Converting from a QString to the local encoding is very similar.
To read or write files in various encodings use the QTextStream::setCodec() method.
Some care must be taken when trying to convert the data in chunks. For example when receiving it over a network. In such cases it is possible that a multi-byte character will be split over two chunks. At best this might result in the loss of a character and at worst cause the entire conversion will fail.
The approach to use in these situations is to create a QTextDecoder object for the codec and use this QTextDecoder for the whole decoding process, as shown below:
The QTextDecoder object maintains state between chunks and therefore works correctly even if a multi-byte character is split between chunks.
Support for new text encodings can be added to CopperSpice by inheriting from QTextCodec and then override the methods listed in the table below.
You may find it more convenient to make your codec class available as a plugin. Refer to Creating Plugins for details.
Function | Description |
---|---|
name() | Returns the official name for the encoding. If the encoding is listed in the IANA character-sets encoding file, the name should be the preferred MIME name for the encoding. |
aliases() | Returns a list of alternative names for the encoding. QTextCodec provides a default implementation which returns an empty list. For example, "ISO-8859-1" has "latin1", "CP819", "IBM819", and "iso-ir-100" as aliases. |
mibEnum() | Return the MIB enum for the encoding if it is listed in the IANA character-sets encoding file. |
convertToUnicode() | Converts an 8-bit character string to Unicode. |
convertFromUnicode() | Converts a Unicode string to an 8-bit character string. |
Typedef for QFlags<ConversionFlag> which contains an OR combination of ConversionFlag values.
Refer to QTextCodec::ConversionFlag for the enum documentation.
Constant | Value | Description |
---|---|---|
QTextCodec::DefaultConversion | 0 | No flag is set. |
QTextCodec::ConvertInvalidToNull | 0x80000000 | If this flag is set, each invalid input character is output as a null character. |
QTextCodec::IgnoreHeader | 0x1 | Ignore any Unicode byte-order mark and do not generate any. |
|
protected |
Constructs a QTextCodec. The QTextCodec should always be constructed on the heap using new. CopperSpice takes ownership and will delete it when the application terminates.
|
protectedvirtual |
Destroys the QTextCodec. You should not delete codecs. Once created their lifetime becomes the responsibility of CopperSpice.
|
virtual |
Subclasses can return a number of aliases for the codec in question. Standard aliases for codecs can be found in the IANA character-sets encoding file.
|
static |
Returns the list of all available codecs, by name. Call QTextCodec::codecForName() to obtain the QTextCodec for the name. The list may contain many mentions of the same codec if the codec has aliases.
|
static |
Returns the list of MIBs for all available codecs. Call QTextCodec::codecForMib() to obtain the QTextCodec for the MIB.
bool QTextCodec::canEncode | ( | const QString & | str | ) | const |
The QString s contains the string being tested for encode-ability.
bool QTextCodec::canEncode | ( | QChar | ch | ) | const |
Returns true if the Unicode character ch can be fully encoded with this codec, otherwise returns false.
|
static |
Tries to detect the encoding of the provided section of HTML in the given byte array, data by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.
|
static |
Tries to detect the encoding of the provided section of HTML in the given byte array, data, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.
|
static |
Returns a pointer to the codec most suitable for this locale. On Windows the codec will be based on a system locale. On Unix systems the codec will be using the iconv library. In both cases the codec's name will be "System".
|
static |
Returns the QTextCodec which matches the MIBenum mib.
|
inlinestatic |
Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.
|
static |
Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.
|
inlinestatic |
Returns the codec used by QObject::tr() on its argument. If this function returns 0 (the default), tr() assumes Latin-1.
|
static |
Tries to detect the encoding of the provided data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.
|
static |
Tries to detect the encoding of the provided data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.
|
protectedpure virtual |
Converts the str to the encoding of this codec and returns the result in a QByteArray.
The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.
Classes which inherit from QTextCodec must override this method.
|
protectedpure virtual |
Converts the first len characters of input from an encoded format to Unicode and returns the result in a QString.
The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.
Classes which inherit from QTextCodec must override this method.
|
inline |
Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.
|
inline |
Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.
QTextDecoder * QTextCodec::makeDecoder | ( | ConversionFlags | flags = DefaultConversion | ) | const |
Creates a QTextDecoder with a specified flags to decode chunks of char *
data to create chunks of Unicode data.
The caller is responsible for deleting the returned object.
QTextEncoder * QTextCodec::makeEncoder | ( | ConversionFlags | flags = DefaultConversion | ) | const |
Creates a QTextEncoder with a specified flags to encode chunks of Unicode data as char *
data.
The caller is responsible for deleting the returned object.
|
pure virtual |
Returns the MIBenum corresponding to the selected encoding. It is important that each QTextCodec subclass returns the correct unique value for this method. Subclasses of QTextCodec must reimplement this method.
Refer to IANA character-sets encoding file for more information.
|
pure virtual |
QTextCodec subclasses must reimplement this method. It returns the name of the encoding supported by the subclass.
If the codec is registered as a character set in the IANA character-sets encoding file this method should return the preferred mime name for the codec if defined, otherwise its name.
|
static |
Set the codec to c which will be returned by codecForLocale(). If c is a null pointer the codec is reset to the default. This might be needed for some applications that want to use their own mechanism for setting the locale.
|
inlinestatic |
Sets the codec used by QObject::tr() on its argument to c. If c is a nullptr tr() assumes Latin-1. If the literal quoted text in the program is not in the Latin-1 encoding, this function can be used to set the appropriate encoding.
For example, software developed by Korean programmers might use eucKR for all the text in the program, in which case the main() function might look like the following.
This is not the way to select the encoding the user has chosen. To convert an application containing literal English strings to Korean, all that is required is for the English strings to be passed through tr() and for translation files to be loaded. For details of internationalization refer to Internationalization.
QString QTextCodec::toUnicode | ( | const char * | input | ) | const |
The value for input contains the source characters.
|
inline |
Converts the first len characters from the input from the encoding of this codec to Unicode and returns the result in a QString. The state of the converter used is updated.
QString QTextCodec::toUnicode | ( | const QByteArray & | input | ) | const |
Converts input from the encoding of this codec to Unicode, and returns the result in a QString.