Object Serialization and Persistence using QDataStream

Object Serialization and Persistence using QDataStream

By ICS Development Team

Introduction

When I first started using Qt in 2005, one of the classes that I found the most interesting was QDataStream. It was similar to something I had written many years earlier for standard C++ that performed a similar function but with iostreams. I wanted to cover enough aspects of this topic to make it immediately useful to the reader. As a result this post is a bit on the long side. Some of this is already covered in the Qt documentation for QDataStream. I recommend reading that first. The source code is written to support Qt 4 or 5.

QDataStream is a very useful class for supporting the serialization of objects to and from a file, socket or memory buffer (or any subclass of QIODevice). Serialization is the method of turning a data structure into a sequence of bytes that can later be restored through deserialization. Much like the C++ Standard Library iostream classes, QDataStream has stream insertion and extraction operators (<< and >>) for all basic C/C++ data types as well as many Qt types such as QString, QDateTime and QVariant. Serialization is then as easy as decomposing objects into fundamental types or using the raw read and write functions to write binary data directly. It is important to note that QDataStream does serialization as binary data and not text data like the iostream classes. A closer relative to iostream would be QTextStream.

This article will demonstrate how a user-defined class can use overloaded QDataStream operators for serialization and persistence. Also demonstrated are uses of QFile, QBuffer, QFileInfo, QDir and some Qt debugging and error handling techniques.

There are three classes defined in this project:

UserException is a simple exception class (a subclass of std::exception) that can also provide source filename, line and function-name information about where it originated. It will be used to flag extraction errors. The reason for using an exception for this will become apparent later.

UserRecord is a simple class that mainly has data members of various types. The types were chosen for demonstrating various QDataStream features. This class will include functions for serialization.

UserRecordFile is a utility class for storing a QList of UserRecords in a file. This functionality could also be achieved by simple functions but in other use cases it can be advantageous to have the concept of a file format encapsulated in a class. This class could also be templatized to support saving and loading of any type or container of types. It could also be extended to support persistence of QLists of pointers to objects.

The general design of the streaming operations in both UserRecord and UserRecordFile are similar to what is in the QDataStream documentation. A header sequence of a relatively unique sequence of bytes will be used for error checking and an integer version number will be used to support backwards compatibility for cases where we are reading a file created with an older version of the application.

In the case of UserRecord, the header sequence is a simple 32-bit unsigned integer. For UserRecordFile, the header sequence includes a text string that gives a hint as to the file type. Many file formats include a short bit of text that helps identify the file type when it is opened as raw data in an editor.

User Record

UserRecord is primarily data with a constructor for proper initialization and QDataStream insertion and extraction member functions. The streamHeader and classVersion constants are static members. The types included in this class are: QString, QDate, QImage, QVector, quint64, float, QVariant and a custom enumerated type. Also in this file are global QDataStream insertion and extraction operators (<< and >>) for UserRecord object types. These functions simply call the UserRecord member functions.

class UserRecord
{
public:
    UserRecord( 
        const QString& firstName_ = "",
        const QString& middleName_ = "",
        const QString& lastName_ = "",
        const QDate& birthDate_ = QDate::currentDate() );

// For simplicity in this example, just allow direct access to these.
    QString firstName;
    QString middleName;
    QString lastName;
    QDate birthDate;
    QImage avatar;
    QVector<quint16> productIds;
    quint64 serialNumber;
    float gpa;
    QVariant variant;
    
    enum MaritalStatusEnum
    {
        MaritalStatus_Unknown,
        MaritalStatus_Single,
        MaritalStatus_Divorced,
        MaritalStatus_Other
    } maritalStatus;

    void insertToDataStream( QDataStream& dataStream ) const;

    // Important: this will throw an Exception on error
    void extractFromDataStream( QDataStream& dataStream ); // deprecated: throw( UserException )

private:
    static const quint32 streamHeader;
    static const quint16 classVersion;
};

QDataStream& operator<<( QDataStream& dataStream, const UserRecord& userRecord );

// Important: this will throw a UserException on error
QDataStream& operator>>( QDataStream& dataStream, UserRecord& userRecord ); // deprecated: throw( UserException )

The stream insertion function insertToDataStream() is relatively simple. It takes a reference to an existing QDataStream instance and uses the Qt-defined QDataStream insertion operators for most of the data members. For the enumerated member, the value is stored as a qint32. Enumerated types are not guaranteed to be 32-bit ints for all compilers and platforms so we are making this explicit. It is important to note that floating point data will be streamed as double-precision data by default so the float value will actually be inserted into the stream as a double. When extracting it, it will be correctly converted to a float. See the function QDataStream::setFloatingPointPrecision() in the Qt documentation for more information about this.

void UserRecord::insertToDataStream( QDataStream& dataStream ) const
{
    dataStream << streamHeader << classVersion;

    // Added in classVersion = 1
    dataStream << firstName << middleName << lastName << birthDate;

    // Added in classVersion = 2
    dataStream << avatar;

    // Added in classVersion = 3
    dataStream << productIds << serialNumber << gpa << variant;

    // Stream enumerated types as qint32
    const qint32 maritalStatus_int = static_cast<qint32>( maritalStatus );
    dataStream << maritalStatus_int;
}

The stream extraction function extractFromDataStream() is a bit more complex. The additional complexity is due to the error handling and backwards compatibility support.

// Important: this will throw a UserException on error
void UserRecord::extractFromDataStream( QDataStream& dataStream ) // deprecated: throw( UserException )
{
    quint32 actualStreamHeader = 0;
    dataStream >> actualStreamHeader;

    if ( actualStreamHeader != streamHeader )
    {
        QString message = QString( 
            "UserRecord::extractFromDataStream() failed.\n"
            "UserRecord prefix mismatch error: actualStreamHeader = 0x%1 and streamHeader = 0x%2" )
                .arg( actualStreamHeader, 8, 0x10, QChar( '0' ) )
                .arg( streamHeader, 8, 0x10, QChar( '0' ) );
        throw UserException_( message );
    }

    quint16 actualClassVersion = 0;
    dataStream >> actualClassVersion;

    if ( actualClassVersion > classVersion )
    {
        QString message = QString( 
            "UserRecord::extractFromDataStream() failed.\n"
            "UserRecord compatibility error: actualClassVersion = %1 and classVersion = %2" )
                .arg( actualClassVersion ).arg( classVersion );
        throw UserException_( message );
    }

    dataStream >> firstName >> middleName >> lastName >> birthDate;

    if ( actualClassVersion >= 2 )
        dataStream >> avatar;

    if ( actualClassVersion >= 3 )
    {
        dataStream >> productIds >> serialNumber >> gpa >> variant;

        // Stream enumerated types as qint32
        qint32 maritalStatus_int = 0;
        dataStream >> maritalStatus_int;
        maritalStatus = static_cast<MaritalStatusEnum>( maritalStatus_int );
    }
}

First, the stream header byte sequence is read into a temporary variable, actualStreamHeader. If this value doesn't match the class member, an explanatory message is constructed and thrown as a UserException. Note that instead of throwing a UserException directly, it is throwing one created using the macro UserException_ (same name with an underscore at the end). This macro includes C macros __FILE__ and __LINE__ which represent the source filename and the line number. It also includes the Qt macro Q_FUNC_INFO which may or may not translate to the current function name (compiler dependent). This is not really necessary for this example but could be useful in other debugging scenarios.

After the stream header is extracted and checked, we read the class version number. If the class version number we read is greater than the static class variable then we must be loading a file from a future version of the program. It is possible to write insertion and extraction functions in a way that is forward compatible but it is difficult and it is generally a reasonable expectation that a user will not be loading a file that is from a future version of the application. If this situation is detected, an error message is constructed and thrown in the same way as the stream header mismatch error.

Once the error checking is done, most of the values can be read in using the QDataStream extraction operators in much the same way that they were inserted. Note that the extraction function checks the actualClassVersion value that it read to determine whether or not some values will be extracted. If the version in the file is older, only the values that were known about for that version are extracted. Also, when extracting the enumerated type value, it is extracted into a temporary qint32 and then cast into the final enumerated type. If you find yourself having to do this a lot you can create a template function to do it.

UserRecordFile

UserRecordFile is similar to UserRecord in that there is a header and version but these are for the file type and not the object type. To support a file format version string in the header (readable if you open the file in a text editor), the header is a QByteArray.

This class is used by creating an instance and calling the readFile or writeFile functions which return true on success. If false is returned then an error occurred. The error description is stored in the errorString member. This is different from the UserRecord implementation where the error is thrown as an exception. There are three sets of read and write functions. The first set of functions takes a pointer to a QIODevice. The functions that take a QString argument are for files and the ones that take a QByteArray are for memory buffers.

class UserRecordFile
{
public:
    UserRecordFile( void );

    // all read/write functions return true on success. On failure, the error is stored in errorString();

    bool writeFile( QIODevice* ioDevice, const QList<UserRecord>& userRecordList );
    bool readFile( QIODevice* ioDevice, QList<UserRecord>& userRecordList );

    bool writeFile( const QString& filePath, const QList<UserRecord>& userRecordList );
    bool readFile( const QString& filePath, QList<UserRecord>& userRecordList );

    bool writeFile( const QByteArray& byteArray, const QList<UserRecord>& userRecordList );
    bool readFile( QByteArray& byteArray, QList<UserRecord>& userRecordList );

    const QString& errorString( void ) const { return m_errorString; }

private:
    QString m_errorString;

    static const QByteArray fileHeaderByteArray;
    static const quint16 fileVersion;
};

The writeFile function that takes a QIODevice pointer can take any QIODevice-derived type. This includes QFile, QBuffer and QTcpSocket. The QString and QByteArray versions use this function to do the actual work. Most of the work in this function is to set up the stream and do error checking.

bool UserRecordFile::writeFile( QIODevice* ioDevice, const QList<UserRecord>& userRecordList )
{
    m_errorString.clear();

    const bool wasOpen = ioDevice->isOpen();

    if ( wasOpen || ioDevice->open( QIODevice::WriteOnly ) )
    {
        QDataStream dataStream( ioDevice );
        dataStream.setVersion( QDataStream::Qt_4_6 );

        // Don't use the << operator for QByteArray. See the note in readFile() below.
        dataStream.writeRawData( fileHeaderByteArray.constData(), fileHeaderByteArray.size() );
        dataStream << fileVersion;
        dataStream << userRecordList;

        if ( !wasOpen )
            ioDevice->close(); // Only close this if it was opened by this function.
        return true;
    }
    else
    {
        m_errorString = ioDevice->errorString();
        return false;
    }

    return true;
}

The actual writing of the QList<UserRecord> is done in one line. This part could be moved into a virtual function making this class a good starting place for supporting many file formats. This function starts by clearing the error string and setting a local flag that indicates whether or not the QIODevice was already open or not. If it was not already open, this function will open it and close it. Otherwise it will not do either. If the open fails, we set the errorString member variable to the QIODevice errorString value and return failure. The reason to support devices that are either opened already or not yet opened is that QTcpSocket would always be passed to this function in an open state but, in general, QFile and QBuffer will use the common open/close capability of this function, mainly to limit code duplication.

If the open succeeds or we were already open, a QDataStream adapter is created using the specified QIODevice. The QDataStream stream version is set to Qt_4_6 which will allow us to support Qt 4 and Qt 5. This will affect the way that some data is handled. For example, prior to Qt_4_6, floats and doubles were stored as floats and doubles. For Qt_4_6 and later, floats and doubles are stored as doubles by default. It is conceivable that this could be written as part of a file header but it's probably better to have it be explicit for a given file format.

The file header QByteArray is then written as raw data. The size is not written as it normally would be for a variable length array of bytes. In this case, it should always be the same size as the static member variable which should never change for a given file format. We could have used the QDataStream insertion operator to store the QByteArray but if the wrong file type is read in the readFile() function, the first 4 bytes will be interpreted as the size and could result in excessive memory allocation before we determine that this was the wrong file type.

The file version is then written and the vector of UserRecords is inserted using the << operator. QList knows how to store its own information and then insert each member to the stream. If the QIODevice was opened by this function then it is closed and we return a successful flag.

The readFile function also keeps track of whether the QIODevice was already open and opens it if required. The error handling for failure is similar to that in UserRecord.

bool UserRecordFile::readFile( QIODevice* ioDevice, QList<UserRecord>& userRecordList )
{
    userRecordList.clear();
    m_errorString.clear();

    const bool wasOpen = ioDevice->isOpen();

    if ( wasOpen || ioDevice->open( QIODevice::ReadOnly ) )
    {
        QDataStream dataStream( ioDevice );
        dataStream.setVersion( QDataStream::Qt_4_6 );

        // Note: we could have used the QDataStream << and >> operators on QByteArray but since the first
        // bytes of the stream will be the size of the array, we might end up attempting to allocate
        // a large amount of memory if the wrong file type was read. Instead, we'll just read the
        // same number of bytes that are in the array we are comparing it to. No size was written.
        const int len  = fileHeaderByteArray.size();
        QByteArray actualFileHeaderByteArray( len, '\0' );
        dataStream.readRawData( actualFileHeaderByteArray.data(), len );

        if ( actualFileHeaderByteArray != fileHeaderByteArray )
        {
            // prefixes don't match
            m_errorString = QString( "UserRecordFile::readFile() failed. UserRecordFile prefix mismatch error." );
            if ( !wasOpen ) // Only close this if it was opened by this function.
                ioDevice->close();
            return false;
        }

        quint16 actualFileVersion = 0;
        dataStream >> actualFileVersion;

        if ( actualFileVersion > fileVersion )
        {
            // file is from a future version that we don't know how to load
            m_errorString = QString( 
                "UserRecordFile::readFile() failed.\n"
                "UserRecordFile compatibility error: actualFileVersion = %1 and fileVersion = %2" )
                    .arg( actualFileVersion ).arg( fileVersion );
            if ( !wasOpen ) // Only close this if it was opened by this function.
                ioDevice->close();
            return false;
        }

        try
        {
            // This may throw an exception if one of the UserRecord objects is corrupt or unsupported.
            // For example, if this file is from a future version of this code.
            dataStream >> userRecordList;
        }
        catch ( const UserException& except )
        {
            // Uses the overloaded ostream << operator defined in UserException.h
            std::cerr << except << std::endl;

            m_errorString = except.message();
            if ( !wasOpen )
                ioDevice->close();
            return false;
        }

        if ( !wasOpen )
            ioDevice->close();
        return true;
    }
    else
    {
        m_errorString = ioDevice->errorString();
        return false;
    }

    return true;
}

As with the extractFromDataStream function for UserRecord, reading the file is more complicated due to the error handling. In this case, we aren't throwing exceptions but returning a success flag and storing the error in a member variable. This is one of the reasons why a class instance was used for this. If we wanted to just use global functions, we could have also had the error string be the return value and success is indicated by an empty string. Or, the string could have been added as a reference argument. Exceptions could have also been used.

The first data that is read is the file header. We use the QDataStream::readRawData() function and only read in the number of bytes that we are expecting to be in the header. The actual file header is compared to the expected one, much like with UserRecord, and if they don't match, the errorString member variable is set and a failure flag is returned. Then the file version is read and compared to make sure we aren't trying to read an unsupported version. In this case, there is only a single file version so this value isn't being used except to check for unsupported versions. The QList<UserRecord> is then extracted from the stream and is encapsulated in a try-catch block. This is where the exception usage helps. The QList<UserRecord> QDataStream extraction function will read each UserRecord from the stream but has no means to interrupt it if there is an error other than for UserRecord to throw an exception. In complex data structures, the failure point may be within an object that is even more inaccessible. Throwing an exception makes error handling much cleaner in this situation. If an exception is caught, it is printed to the console and the errorString member is set. The QIODevice is closed if required and we return a failure flag.

The writeFile() function that takes a file path will use the strategy of writing to a temp file to help ensure that a failed file write doesn't clobber a good file. A backup of an existing file is also created. The corresponding readFile() function is relatively simple. It checks to make sure that the file exists, and if it doesn't, it returns an error. A QFile is then created and passed to the readFile() function that takes a QIODevice pointer. The versions of these functions that take QByteArray use the QBuffer adapter which allows a QByteArray to be treated like a QIODevice.

The main() function is used for demonstrating and testing these classes.

int main(int argc, char *argv[])
{
#if QT_VERSION >= 0x050000
    qInstallMessageHandler( myMessageOutput );
#else
    qInstallMsgHandler( myMessageOutput );
#endif

    QApplication application(argc, argv);

#ifdef Q_OS_WIN
    // On Windows, show the console and redirect stdout and stderr.
    AllocConsole();
    FILE* stream = NULL;
    freopen_s( &stream, "CONOUT$", "w", stdout );
    freopen_s( &stream, "CONOUT$", "w", stderr );
#endif

    // Generate some fake data records.

    const int numRecords = 10;
    QList<UserRecord> userRecordList;

    const char* const names[ 3 * numRecords ] = { 
        "George", "", "Washington",
        "John", "", "Adams",
        "Thomas", "", "Jefferson",
        "James", "", "Madison",
        "James", "", "Monroe",
        "John", "Quincy", "Adams",
        "Andrew", "", "Jackson",
        "Martin", "", "Van Buren",
        "William", "Henry", "Harrison",
        "John", "", "Tyler"
    };

    for ( int i = 0 ; i < numRecords ; ++i ) 
    {
        qDebug() << names[ i*3 ];

        userRecordList.append( UserRecord( names[ i*3 ], names[ i*3+1 ], names[ i*3+2 ] ) );
    }

    // Write it to a file and read it back.
#if QT_VERSION >= 0x050000
    QDir outputDir( QStandardPaths::writableLocation( QStandardPaths::DataLocation ) );
#else
    QDir outputDir( QDesktopServices::storageLocation( QDesktopServices::DataLocation ) );
#endif

    if ( !outputDir.exists() )
    {
        std::cout << "creating: " << outputDir.absolutePath() << std::endl;
        outputDir.mkpath(".");
    }

    // or use the application directory ...
    // QDir outputDir( QCoreApplication::applicationDirPath() );
    QString filePath = outputDir.absoluteFilePath( "UserDataFile.userDataList" );

    std::cout << "Writing to: " << filePath << std::endl;

    // write the file data
    { // userRecordFile scope
        UserRecordFile userRecordFile;
        if ( !userRecordFile.writeFile( filePath, userRecordList ) )
            std::cout << "An error occurred: " << userRecordFile.errorString() << std::endl;
    }

    // read the file data
    { 
        UserRecordFile userRecordFile;
        QList<UserRecord> fileUserRecordList;
        if ( !userRecordFile.readFile( filePath, fileUserRecordList ) )
            std::cout << "An error occurred: " << userRecordFile.errorString() << std::endl;
    }

    // Write it to a byteArray and read it back.

    QByteArray byteArray;

    { // write the byteArray data
        UserRecordFile userRecordFile;
        if ( !userRecordFile.writeFile( byteArray, userRecordList ) )
            std::cout << "An error occurred: " << userRecordFile.errorString() << std::endl;
    }

    { // read the byteArray data
        UserRecordFile userRecordFile;
        QList<UserRecord> fileUserRecordList;
        if ( !userRecordFile.readFile( byteArray, fileUserRecordList ) )
            std::cout << "An error occurred: " << userRecordFile.errorString() << std::endl;
    }

    QPushButton button( "Press to Close" );
    QObject::connect( &button, SIGNAL( clicked() ), &application, SLOT( quit() ) );
    button.show();

    return application.exec();
}

There are a few tips and tricks here for debugging to the console such as overloading the std::ostream << operator to allow QStrings to be streamed to std::cout. Also, a custom Qt message handler is installed which outputs to stderr. Though this is configured as a GUI application, running from the command line on OS X and Linux are sufficient to allow output to go to the console. On Windows, a console is created and stdout and stderr are redirected to it.

A QList of UserRecords is created and populated with fake data. We output some of the data to the console using qDebug(). This list will be used to test the UserRecordFile write and read functions. First, we determine a suitable output location for testing. For this, QStandardPaths (for Qt 5, QDesktopServices for Qt 4) is used to find the application data directory for this user. If this directory doesn't already exist, it is created.

Next, using UserRecordFile, two sets of writes and reads are performed. First to the data file and then to a QByteArray. If errors occur, they are printed to the console. If this was a real GUI application, QMessageDialog would be used.

There are many different approaches to handling serialization and file output. This example provides a methodology that is consistent with that provided in the documentation for Qt and expands on it to demonstrate different methods of error handling.

You can download a zip archive of the source files for the application from here.