CopperSpice API  1.8.0
QTextCodec Class Referenceabstract

The QTextCodec class provides conversions between text encodings and QString. More...

## Classes

class  ConverterState
Stores the current state of the Unicode parser More...

## Public Typedefs

using ConversionFlags = QFlags< ConversionFlag >

## Public Types

enum  ConversionFlag

## Public Methods

virtual QStringList aliases () const

bool canEncode (const QString &str) const

bool canEncode (QChar ch) const

QByteArray fromUnicode (const QString &str, ConverterState *state=nullptr) const

QByteArray fromUnicode (QStringView str, ConverterState *state=nullptr) const

QTextDecodermakeDecoder (ConversionFlags flags=DefaultConversion) const

QTextEncodermakeEncoder (ConversionFlags flags=DefaultConversion) const

virtual int mibEnum () const = 0

virtual QString name () const = 0

QString toUnicode (const char *input) const

QString toUnicode (const char *input, int len, ConverterState *state=nullptr) const

QString toUnicode (const QByteArray &input) const

## Static Public Methods

static QStringList availableCodecs ()

static QList< int > availableMibs ()

static QTextCodec * codecForHtml (const QByteArray &data)

static QTextCodec * codecForHtml (const QByteArray &data, QTextCodec *defaultCodec)

static QTextCodec * codecForLocale ()

static QTextCodec * codecForMib (int mib)

static QTextCodec * codecForName (const char *name)

static QTextCodec * codecForName (const QString &name)

static QTextCodec * codecForTr ()

static QTextCodec * codecForUtfText (const QByteArray &data)

static QTextCodec * codecForUtfText (const QByteArray &data, QTextCodec *defaultCodec)

static void setCodecForLocale (QTextCodec *c)

static void setCodecForTr (QTextCodec *c)

## Protected Methods

QTextCodec ()

virtual ~QTextCodec ()

virtual QByteArray convertFromUnicode (QStringView str, ConverterState *state) const = 0

virtual QString convertToUnicode (const char *input, int len, ConverterState *state) const = 0

## Detailed Description

The QTextCodec class provides conversions from encoded text to a QString and from a QString to encoded text. If you need to handle data which uses an encoding which QString does not support you must use a QTextCodec. If you using UTF-8 or UTF-16 this class is not required as these are supported in QString.

CopperSpice provides a set of classes which inherit from QTextCodec to support many non-Unicode formats. You can also implement your own codec by inheriting from QTextCodec and overriding the convertFromUnicode() and convertToUnicode() methods.

The currently supported encodings are listed below.

• Apple Roman
• Big5
• Big5-HKSCS
• CP949
• EUC-JP
• EUC-KR
• GB18030-0
• IBM 850
• IBM 866
• IBM 874
• ISO 2022-J
• ISO 8859-1 to 10
• ISO 8859-13 to 16
• Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml
• JIS X 0201
• JIS X 0208
• KOI8-R
• KOI8-U
• MuleLao-1
• ROMAN8
• Shift-JIS
• TIS-620
• UTF-8
• UTF-16
• UTF-16BE
• UTF-16LE
• UTF-32
• UTF-32BE
• UTF-32LE
• Windows-1250 to 1258
• WINSAMI2

As an example, if you have a string encoded in Russian KOI8-R and want to convert it to a QString, you would use code similar to the following:

QByteArray encodedString = "...";
QString str = codec->toUnicode(encodedString);

The QString str will contain the original text converted to Unicode. Converting from a QString to the local encoding is very similar.

QString string = "...";
QByteArray encodedString = codec->fromUnicode(string);

To read or write files in various encodings use the QTextStream::setCodec() method.

Some care must be taken when trying to convert the data in chunks. For example when receiving it over a network. In such cases it is possible that a multi-byte character will be split over two chunks. At best this might result in the loss of a character and at worst cause the entire conversion will fail.

The approach to use in these situations is to create a QTextDecoder object for the codec and use this QTextDecoder for the whole decoding process, as shown below:

QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
QTextDecoder *decoder = codec->makeDecoder();
QString string;
while (new_data_available()) {
QByteArray chunk = get_new_data();
string += decoder->toUnicode(chunk);
}
delete decoder;

The QTextDecoder object maintains state between chunks and therefore works correctly even if a multi-byte character is split between chunks.

### Creating Your Own Codec Class

Support for new text encodings can be added to CopperSpice by creating QTextCodec subclasses.

The pure virtual functions describe the encoder to the system and the coder is used as required in the different text file formats supported by QTextStream, and under X11, for the locale-specific character input and output.

To add support for another encoding create a subclass of QTextCodec and implement the functions listed in the table below.

FunctionDescription
name()Returns the official name for the encoding. If the encoding is listed in the IANA character-sets encoding file, the name should be the preferred MIME name for the encoding.
aliases()Returns a list of alternative names for the encoding. QTextCodec provides a default implementation which returns an empty list. For example, "ISO-8859-1" has "latin1", "CP819", "IBM819", and "iso-ir-100" as aliases.
mibEnum()Return the MIB enum for the encoding if it is listed in the IANA character-sets encoding file.
convertToUnicode()Converts an 8-bit character string to Unicode.
convertFromUnicode()Converts a Unicode string to an 8-bit character string.

You may find it more convenient to make your codec class available as a plugin. Refer to Creating Plugins for details.

QTextStream, QTextDecoder, QTextEncoder

## Member Typedef Documentation

Typedef for QFlags<ConversionFlag>. Refer to QTextCodec::ConversionFlag for documentation.

## Member Enumeration Documentation

ConstantValueDescription
QTextCodec::DefaultConversion0No flag is set.
QTextCodec::ConvertInvalidToNull0x80000000If this flag is set, each invalid input character is output as a null character.
QTextCodec::IgnoreHeader0x1Ignore any Unicode byte-order mark and do not generate any.

## Constructor & Destructor Documentation

 QTextCodec::QTextCodec ( )
protected

Constructs a QTextCodec. The QTextCodec should always be constructed on the heap using new. CopperSpice takes ownership and will delete it when the application terminates.

 QTextCodec::~QTextCodec ( )
protectedvirtual

Destroys the QTextCodec. You should not delete codecs. Once created their lifetime becomes the responsibility of CopperSpice.

Warning
This method is not conditionally thread safe.

## Method Documentation

 QStringList QTextCodec::aliases ( ) const
virtual

Subclasses can return a number of aliases for the codec in question. Standard aliases for codecs can be found in the IANA character-sets encoding file.

 QStringList QTextCodec::availableCodecs ( )
static

Returns the list of all available codecs, by name. Call QTextCodec::codecForName() to obtain the QTextCodec for the name. The list may contain many mentions of the same codec if the codec has aliases.

availableMibs(), name(), aliases()
 QList< int > QTextCodec::availableMibs ( )
static

Returns the list of MIBs for all available codecs. Call QTextCodec::codecForMib() to obtain the QTextCodec for the MIB.

availableCodecs(), mibEnum()
 bool QTextCodec::canEncode ( const QString & str ) const

The QString s contains the string being tested for encode-ability.

 bool QTextCodec::canEncode ( QChar ch ) const

Returns true if the Unicode character ch can be fully encoded with this codec, otherwise returns false.

 QTextCodec * QTextCodec::codecForHtml ( const QByteArray & data )
static

Tries to detect the encoding of the provided section of HTML in the given byte array, data by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.

 QTextCodec * QTextCodec::codecForHtml ( const QByteArray & data, QTextCodec * defaultCodec )
static

Tries to detect the encoding of the provided section of HTML in the given byte array, data, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.

codecForUtfText()
 QTextCodec * QTextCodec::codecForLocale ( )
static

Returns a pointer to the codec most suitable for this locale. On Windows the codec will be based on a system locale. On Unix systems the codec will be using the iconv library. In both cases the codec's name will be "System".

setCodecForLocale()
 QTextCodec * QTextCodec::codecForMib ( int mib )
static

Returns the QTextCodec which matches the MIBenum mib.

 QTextCodec * QTextCodec::codecForName ( const char * name )
inlinestatic

Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.

 QTextCodec * QTextCodec::codecForName ( const QString & name )
static

Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.

 QTextCodec * QTextCodec::codecForTr ( )
inlinestatic

Returns the codec used by QObject::tr() on its argument. If this function returns 0 (the default), tr() assumes Latin-1.

setCodecForTr()
 QTextCodec * QTextCodec::codecForUtfText ( const QByteArray & data )
static

Tries to detect the encoding of the provided data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.

codecForHtml()
 QTextCodec * QTextCodec::codecForUtfText ( const QByteArray & data, QTextCodec * defaultCodec )
static

Tries to detect the encoding of the provided data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.

codecForHtml()
 QByteArray QTextCodec::convertFromUnicode ( QStringView str, ConverterState * state ) const
protectedpure virtual

Converts the str to the encoding of this codec and returns the result in a QByteArray.

The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.

Classes which inherit from QTextCodec must override this method.

 QString QTextCodec::convertToUnicode ( const char * input, int len, ConverterState * state ) const
protectedpure virtual

Converts the first len characters of input from an encoded format to Unicode and returns the result in a QString.

The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.

Classes which inherit from QTextCodec must override this method.

 QByteArray QTextCodec::fromUnicode ( const QString & str, ConverterState * state = nullptr ) const
inline

Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.

 QByteArray QTextCodec::fromUnicode ( QStringView str, ConverterState * state = nullptr ) const
inline

Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.

 QTextDecoder * QTextCodec::makeDecoder ( ConversionFlags flags = DefaultConversion ) const

Creates a QTextDecoder with a specified flags to decode chunks of char * data to create chunks of Unicode data.

The caller is responsible for deleting the returned object.

 QTextEncoder * QTextCodec::makeEncoder ( ConversionFlags flags = DefaultConversion ) const

Creates a QTextEncoder with a specified flags to encode chunks of Unicode data as char * data.

The caller is responsible for deleting the returned object.

 int QTextCodec::mibEnum ( ) const
pure virtual

Returns the MIBenum corresponding to the selected encoding. It is important that each QTextCodec subclass returns the correct unique value for this method. Subclasses of QTextCodec must reimplement this method.

 QString QTextCodec::name ( ) const
pure virtual

QTextCodec subclasses must reimplement this method. It returns the name of the encoding supported by the subclass.

If the codec is registered as a character set in the IANA character-sets encoding file this method should return the preferred mime name for the codec if defined, otherwise its name.

 void QTextCodec::setCodecForLocale ( QTextCodec * c )
static

Set the codec to c which will be returned by codecForLocale(). If c is a null pointer the codec is reset to the default. This might be needed for some applications that want to use their own mechanism for setting the locale.

codecForLocale()
 void QTextCodec::setCodecForTr ( QTextCodec * c )
inlinestatic

Sets the codec used by QObject::tr() on its argument to c. If c is a nullptr tr() assumes Latin-1. If the literal quoted text in the program is not in the Latin-1 encoding, this function can be used to set the appropriate encoding.

For example, software developed by Korean programmers might use eucKR for all the text in the program, in which case the main() function might look like the following.

int main(int argc, char *argv[])
{
QApplication app(argc, argv);
// do something
}

This is not the way to select the encoding the user has chosen. To convert an application containing literal English strings to Korean, all that is required is for the English strings to be passed through tr() and for translation files to be loaded. For details of internationalization refer to Internationalization.

Warning
This method is not conditionally thread safe.
 QString QTextCodec::toUnicode ( const char * input, int len, ConverterState * state = nullptr ) const