CopperSpice API  1.7.2
QTextCodec Class Referenceabstract

Provides conversions between text encodings and QString. More...

Inherited by QFontLaoCodec, QIsciiCodec, QLatin15Codec, QLatin1Codec, QSimpleTextCodec, QUtf16Codec, QUtf32Codec, and QUtf8Codec.

Classes

class  ConverterState
 Stores the current state of the Unicode parser More...
 

Public Typedefs

using ConversionFlags = QFlags< ConversionFlag >
 

Public Types

enum  ConversionFlag
 

Public Methods

virtual QStringList aliases () const
 
bool canEncode (const QString &str) const
 
bool canEncode (QChar ch) const
 
QByteArray fromUnicode (const QString &str, ConverterState *state=nullptr) const
 
QByteArray fromUnicode (QStringView str, ConverterState *state=nullptr) const
 
QTextDecodermakeDecoder (ConversionFlags flags=DefaultConversion) const
 
QTextEncodermakeEncoder (ConversionFlags flags=DefaultConversion) const
 
virtual int mibEnum () const = 0
 
virtual QString name () const = 0
 
QString toUnicode (const char *input) const
 
QString toUnicode (const char *input, int len, ConverterState *state=nullptr) const
 
QString toUnicode (const QByteArray &input) const
 

Static Public Methods

static QStringList availableCodecs ()
 
static QList< int > availableMibs ()
 
static QTextCodec * codecForHtml (const QByteArray &data)
 
static QTextCodec * codecForHtml (const QByteArray &data, QTextCodec *defaultCodec)
 
static QTextCodec * codecForLocale ()
 
static QTextCodec * codecForMib (int mib)
 
static QTextCodec * codecForName (const char *name)
 
static QTextCodec * codecForName (const QString &name)
 
static QTextCodec * codecForTr ()
 
static QTextCodec * codecForUtfText (const QByteArray &data)
 
static QTextCodec * codecForUtfText (const QByteArray &data, QTextCodec *defaultCodec)
 
static void setCodecForLocale (QTextCodec *c)
 
static void setCodecForTr (QTextCodec *c)
 

Protected Methods

 QTextCodec ()
 
virtual ~QTextCodec ()
 
virtual QByteArray convertFromUnicode (QStringView str, ConverterState *state) const = 0
 
virtual QString convertToUnicode (const char *input, int len, ConverterState *state) const = 0
 

Detailed Description

The QTextCodec class provides conversions from encoded text to a QString and from a QString to encoded text. If you need to handle data which uses an encoding which QString does not support you must use a QTextCodec. If you using UTF-8 or UTF-16 this class is not required as these are supported in QString.

CopperSpice provides a set of classes which inherit from QTextCodec to support many non-Unicode formats. You can also implement your own codec by inheriting from QTextCodec and overriding the convertFromUnicode() and convertToUnicode() methods.

The currently supported encodings are listed below.

  • Apple Roman
  • Big5
  • Big5-HKSCS
  • CP949
  • EUC-JP
  • EUC-KR
  • GB18030-0
  • IBM 850
  • IBM 866
  • IBM 874
  • ISO 2022-J
  • ISO 8859-1 to 10
  • ISO 8859-13 to 16
  • Iscii-Bng, Dev, Gjr, Knd, Mlm, Ori, Pnj, Tlg, and Tml
  • JIS X 0201
  • JIS X 0208
  • KOI8-R
  • KOI8-U
  • MuleLao-1
  • ROMAN8
  • Shift-JIS
  • TIS-620
  • UTF-8
  • UTF-16
  • UTF-16BE
  • UTF-16LE
  • UTF-32
  • UTF-32BE
  • UTF-32LE
  • Windows-1250 to 1258
  • WINSAMI2

As an example, if you have a string encoded in Russian KOI8-R and want to convert it to a QString, you would use code similar to the following:

QByteArray encodedString = "...";
QString str = codec->toUnicode(encodedString);

The QString str will contain the original text converted to Unicode. Converting from a QString to the local encoding is very similar.

QString string = "...";
QByteArray encodedString = codec->fromUnicode(string);

Reading & Writing Files

To read or write files in various encodings use the QTextStream::setCodec() method.

Some care must be taken when trying to convert the data in chunks. For example when receiving it over a network. In such cases it is possible that a multi-byte character will be split over two chunks. At best this might result in the loss of a character and at worst cause the entire conversion will fail.

The approach to use in these situations is to create a QTextDecoder object for the codec and use this QTextDecoder for the whole decoding process, as shown below:

QTextCodec *codec = QTextCodec::codecForName("Shift-JIS");
QTextDecoder *decoder = codec->makeDecoder();
QString string;
while (new_data_available()) {
QByteArray chunk = get_new_data();
string += decoder->toUnicode(chunk);
}
delete decoder;

The QTextDecoder object maintains state between chunks and therefore works correctly even if a multi-byte character is split between chunks.

Creating Your Own Codec Class

Support for new text encodings can be added to CopperSpice by creating QTextCodec subclasses.

The pure virtual functions describe the encoder to the system and the coder is used as required in the different text file formats supported by QTextStream, and under X11, for the locale-specific character input and output.

To add support for another encoding create a subclass of QTextCodec and implement the functions listed in the table below.

FunctionDescription
name()Returns the official name for the encoding. If the encoding is listed in the IANA character-sets encoding file, the name should be the preferred MIME name for the encoding.
aliases()Returns a list of alternative names for the encoding. QTextCodec provides a default implementation which returns an empty list. For example, "ISO-8859-1" has "latin1", "CP819", "IBM819", and "iso-ir-100" as aliases.
mibEnum()Return the MIB enum for the encoding if it is listed in the IANA character-sets encoding file.
convertToUnicode()Converts an 8-bit character string to Unicode.
convertFromUnicode()Converts a Unicode string to an 8-bit character string.

You may find it more convenient to make your codec class available as a plugin. Refer to Creating Plugins for details.

See also
QTextStream, QTextDecoder, QTextEncoder

Member Typedef Documentation

Member Enumeration Documentation

ConstantValueDescription
QTextCodec::DefaultConversion0No flag is set.
QTextCodec::ConvertInvalidToNull0x80000000If this flag is set, each invalid input character is output as a null character.
QTextCodec::IgnoreHeader0x1Ignore any Unicode byte-order mark and do not generate any.

Constructor & Destructor Documentation

QTextCodec::QTextCodec ( )
protected

Constructs a QTextCodec. The QTextCodec should always be constructed on the heap using new. CopperSpice takes ownership and will delete it when the application terminates.

QTextCodec::~QTextCodec ( )
protectedvirtual

Destroys the QTextCodec. You should not delete codecs. Once created their lifetime becomes the responsibility of CopperSpice.

Warning
This method is not conditionally thread safe.

Method Documentation

QStringList QTextCodec::aliases ( ) const
virtual

Subclasses can return a number of aliases for the codec in question. Standard aliases for codecs can be found in the IANA character-sets encoding file.

QStringList QTextCodec::availableCodecs ( )
static

Returns the list of all available codecs, by name. Call QTextCodec::codecForName() to obtain the QTextCodec for the name. The list may contain many mentions of the same codec if the codec has aliases.

See also
availableMibs(), name(), aliases()
QList< int > QTextCodec::availableMibs ( )
static

Returns the list of MIBs for all available codecs. Call QTextCodec::codecForMib() to obtain the QTextCodec for the MIB.

See also
availableCodecs(), mibEnum()
bool QTextCodec::canEncode ( const QString str) const

The QString s contains the string being tested for encode-ability.

bool QTextCodec::canEncode ( QChar  ch) const

Returns true if the Unicode character ch can be fully encoded with this codec, otherwise returns false.

QTextCodec * QTextCodec::codecForHtml ( const QByteArray data)
static

Tries to detect the encoding of the provided snippet of HTML in the given byte array, data by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.

QTextCodec * QTextCodec::codecForHtml ( const QByteArray data,
QTextCodec *  defaultCodec 
)
static

Tries to detect the encoding of the provided snippet of HTML in the given byte array, dta, by checking the BOM (Byte Order Mark) and the content-type meta header and returns a QTextCodec instance that is capable of decoding the html to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.

See also
codecForUtfText()
QTextCodec * QTextCodec::codecForLocale ( )
static

Returns a pointer to the codec most suitable for this locale.

On Windows the codec will be based on a system locale. On Unix systems the codec will be using the iconv library. Note that in both cases the codec's name will be "System".

See also
setCodecForLocale()
QTextCodec * QTextCodec::codecForMib ( int  mib)
static

Returns the QTextCodec which matches the MIBenum mib.

QTextCodec * QTextCodec::codecForName ( const char *  name)
inlinestatic

Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.

QTextCodec * QTextCodec::codecForName ( const QString name)
static

Searches all installed QTextCodec objects and returns the one which best matches name, the match is case insensitive. Returns a nullptr if no codec matching the name could be found.

QTextCodec * QTextCodec::codecForTr ( )
inlinestatic

Returns the codec used by QObject::tr() on its argument. If this function returns 0 (the default), tr() assumes Latin-1.

See also
setCodecForTr()
QTextCodec * QTextCodec::codecForUtfText ( const QByteArray data)
static

Tries to detect the encoding of the provided snippet data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected, this overload returns a Latin-1 QTextCodec.

See also
codecForHtml()
QTextCodec * QTextCodec::codecForUtfText ( const QByteArray data,
QTextCodec *  defaultCodec 
)
static

Tries to detect the encoding of the provided snippet data by using the BOM (Byte Order Mark) and returns a QTextCodec instance that is capable of decoding the text to unicode. If the codec can not be detected from the content provided, defaultCodec is returned.

See also
codecForHtml()
QByteArray QTextCodec::convertFromUnicode ( QStringView  str,
ConverterState state 
) const
protectedpure virtual

Converts the str to the encoding of this codec and returns the result in a QByteArray.

The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.

Classes which inherit from QTextCodec must override this method.

QString QTextCodec::convertToUnicode ( const char *  input,
int  len,
ConverterState state 
) const
protectedpure virtual

Converts the first len characters of input from an encoded format to Unicode and returns the result in a QString.

The state can be a nullptr in which case the conversion is stateless and default conversion rules will be used. If state is not a nullptr the codec saves the state after the conversion and adjusts the remainingChars and invalidChars members of the struct.

Classes which inherit from QTextCodec must override this method.

QByteArray QTextCodec::fromUnicode ( const QString str,
ConverterState state = nullptr 
) const
inline

Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.

QByteArray QTextCodec::fromUnicode ( QStringView  str,
ConverterState state = nullptr 
) const
inline

Converts str from Unicode to the encoding of this codec and returns the result in a QByteArray. This method updates the state.

QTextDecoder * QTextCodec::makeDecoder ( ConversionFlags  flags = DefaultConversion) const

Creates a QTextDecoder with a specified flags to decode chunks of char * data to create chunks of Unicode data.

The caller is responsible for deleting the returned object.

QTextEncoder * QTextCodec::makeEncoder ( ConversionFlags  flags = DefaultConversion) const

Creates a QTextEncoder with a specified flags to encode chunks of Unicode data as char * data.

The caller is responsible for deleting the returned object.

int QTextCodec::mibEnum ( ) const
pure virtual

Returns the MIBenum corresponding to the selected encoding. It is important that each QTextCodec subclass returns the correct unique value for this method.

Subclasses of QTextCodec must reimplement this function.

Refer to IANA character-sets encoding file for more information.

QString QTextCodec::name ( ) const
pure virtual

QTextCodec subclasses must reimplement this function. It returns the name of the encoding supported by the subclass.

If the codec is registered as a character set in the IANA character-sets encoding file this method should return the preferred mime name for the codec if defined, otherwise its name.

void QTextCodec::setCodecForLocale ( QTextCodec *  c)
static

Set the codec to c which will be returned by codecForLocale(). If c is a null pointer the codec is reset to the default. This might be needed for some applications that want to use their own mechanism for setting the locale.

See also
codecForLocale()
void QTextCodec::setCodecForTr ( QTextCodec *  c)
inlinestatic

Sets the codec used by QObject::tr() on its argument to c. If c is a nullptr tr() assumes Latin-1.

If the literal quoted text in the program is not in the Latin-1 encoding, this function can be used to set the appropriate encoding. For example, software developed by Korean programmers might use eucKR for all the text in the program, in which case the main() function might look like this:

int main(int argc, char *argv[])
{
QApplication app(argc, argv);
...
}

Note that this is not the way to select the encoding that the user has chosen. For example, to convert an application containing literal English strings to Korean, all that is needed is for the English strings to be passed through tr() and for translation files to be loaded. For details of internationalization refer to Internationalization.

Warning
This method is not conditionally thread safe.
See also
codecForTr(), setCodecForCStrings()
QString QTextCodec::toUnicode ( const char *  input) const

The value for input contains the source characters.

QString QTextCodec::toUnicode ( const char *  input,
int  len,
ConverterState state = nullptr 
) const
inline

Converts the first len characters from the input from the encoding of this codec to Unicode and returns the result in a QString. The state of the convertor used is updated.

QString QTextCodec::toUnicode ( const QByteArray input) const

Converts input from the encoding of this codec to Unicode, and returns the result in a QString.