Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Date: Sat, 24 Dec 2016 16:18:33 +0100
From: Solar Designer <solar@...nwall.com>
To: oss-security@...ts.openwall.com
Subject: Qt QXmlSimpleReader

Hi,

To what extent has Qt's QXmlSimpleReader class been reviewed for
vulnerabilities?  I found only Florian Weimer's CVE-2013-4549
"XML entity expansion denial of service", which Red Hat somehow chose
not to fix (no intent to parse untrusted XML?) even though they got
upstream to fix it.

https://bugzilla.redhat.com/show_bug.cgi?id=955375
http://lists.qt-project.org/pipermail/announce/2013-December/000036.html
https://codereview.qt-project.org/#/c/71010/
http://blog.qt.io/blog/2014/04/24/qt-4-8-6-released/

Is high memory consumption for large XML files/inputs expected by users
of this library, or is there an expectation that there would be some
safety limits in place in the library?  In my testing, for very long tag
names or element contents, memory consumption is 4x their size - e.g.,
about 8 GB for an almost 2 GB tag or element.

I guess CVE-2013-4549 was worse than that?  Did it result in recursive
expansion, meaning that even a tiny input would exhaust all memory?
(I didn't try triggering it specifically.)

I just found that (at least for a rebuild of the RHEL7 package of
qt-4.8.5-12) it is possible to trigger a stack overflow by nesting many
XML opening tags.  Luckily, there doesn't appear to be a way to jump
over the guard page to another thread's stack on RHEL7/x86_64, but
that's platform specific.

Sample crash 1:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd13c395700 (LWP 875129)]
0x00007fd176289ee1 in _int_malloc () from /lib64/libc.so.6
(gdb) bt
#0  0x00007fd176289ee1 in _int_malloc () from /lib64/libc.so.6
#1  0x00007fd17628c26c in malloc () from /lib64/libc.so.6
#2  0x00007fd176b450cd in operator new(unsigned long) () from /lib64/libstdc++.so.6
#3  0x00007fd1776365b0 in QDomDocumentPrivate::createElement(QString const&) () from /lib64/libQtXml.so.4
#4  0x00007fd17763c3a0 in QDomHandler::startElement(QString const&, QString const&, QString const&, QXmlAttributes const&) () from /lib64/libQtXml.so.4
#5  0x00007fd177650c27 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#6  0x00007fd177651398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#7  0x00007fd177650b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#8  0x00007fd177651398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#9  0x00007fd177650b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#10 0x00007fd177651398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#11 0x00007fd177650b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#12 0x00007fd177651398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4

   0x00007fd176289ed0 <+384>:   callq  0x7fd1762879a0 <malloc_consolidate>
   0x00007fd176289ed5 <+389>:   mov    0x8(%rsp),%r9
   0x00007fd176289eda <+394>:   mov    (%rsp),%r10d
   0x00007fd176289ede <+398>:   mov    %r14d,%eax
=> 0x00007fd176289ee1 <+401>:   movq   $0x1,(%rsp)
   0x00007fd176289ee9 <+409>:   lea    0x58(%rbp),%r15
   0x00007fd176289eed <+413>:   shr    $0x4,%eax
   0x00007fd176289ef0 <+416>:   mov    %r9,%r12
   0x00007fd176289ef3 <+419>:   mov    %eax,0x8(%rsp)

(gdb) p $rsp
$1 = (void *) 0x7fd13c295fd0

7fd13c295000-7fd13c296000 ---p 00000000 00:00 0 
7fd13c296000-7fd13e361000 rw-p 00000000 00:00 0                          [stack:875129]

Sample crash 2 (combining nested tags with long content in each element,
but maybe the difference from the above crash is actually just luck):

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f21a85f7700 (LWP 876178)]
0x00007f21c2a405a9 in sysmalloc () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f21c2a405a9 in sysmalloc () from /lib64/libc.so.6
#1  0x00007f21c2a416c5 in _int_malloc () from /lib64/libc.so.6
#2  0x00007f21c2a4241c in _int_realloc () from /lib64/libc.so.6
#3  0x00007f21c2a438b2 in realloc () from /lib64/libc.so.6
#4  0x00007f21c366eb68 in QString::realloc(int) () from /lib64/libQtCore.so.4
#5  0x00007f21c366ec8d in QString::resize(int) () from /lib64/libQtCore.so.4
#6  0x00007f21c3df6a8d in updateValue(QString&, QChar const*, int&, int&) () from /lib64/libQtXml.so.4
#7  0x00007f21c3e08706 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#8  0x00007f21c3e07b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#9  0x00007f21c3e08398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#10 0x00007f21c3e07b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4
#11 0x00007f21c3e08398 in QXmlSimpleReaderPrivate::parseContent() () from /lib64/libQtXml.so.4
#12 0x00007f21c3e07b00 in QXmlSimpleReaderPrivate::parseElement() () from /lib64/libQtXml.so.4

   0x00007f21c2a40596 <sysmalloc+38>:   je     0x7f21c2a406ca <sysmalloc+346>
   0x00007f21c2a4059c <sysmalloc+44>:   cmp    %rdi,0x33dc0d(%rip)        # 0x7f21c2d7e1b0 <mp_+16>
   0x00007f21c2a405a3 <sysmalloc+51>:   jbe    0x7f21c2a406b8 <sysmalloc+328>
=> 0x00007f21c2a405a9 <sysmalloc+57>:   movb   $0x0,0x8(%rsp)
   0x00007f21c2a405ae <sysmalloc+62>:   mov    0x58(%rbx),%r12
   0x00007f21c2a405b2 <sysmalloc+66>:   lea    0x33e1a7(%rip),%r15        # 0x7f21c2d7e760 <main_arena>
   0x00007f21c2a405b9 <sysmalloc+73>:   mov    0x8(%r12),%r13

(gdb) p $rsp
$1 = (void *) 0x7f21a83f7fd0

7f21a83f7000-7f21a83f8000 ---p 00000000 00:00 0 
7f21a83f8000-7f21a85f8000 rw-p 00000000 00:00 0                          [stack:876178]

So parseElement() and parseContent() call each other recursively.

Since thread stacks are typically tiny this issue is easy to trigger
with small XML files/streams.

Looking further at the source code for a version of Qt similar to RHEL7's:

http://cep.xray.aps.anl.gov/software/qt4-x11-4.8.6-browser/d6/d5d/qxml_8cpp_source.html

I notice this:

 8187 inline static void updateValue(QString &value, const QChar *array, int &arrayPos, int &valueLen)
 8188 {
 8189     value.resize(valueLen + arrayPos);
 8190     memcpy(value.data() + valueLen, array, arrayPos * sizeof(QChar));
 8191     valueLen += arrayPos;
 8192     arrayPos = 0;
 8193 }

Here, arrayPos is at most 256, but valueLen might be large.  Can this
signed int overflow on "valueLen + arrayPos" and "valueLen += arrayPos"?
I guess so.  (Unfortunately, the specific app I am testing with limits
me to passing < 2 GB, so I can't test this easily.  Someone should write
a testcase.)

This would be UB, but since it's not detectable as UB at compile time, I
guess in practice and in typical builds the value will wrap around.

Then there's value.resize(), which also accepts a signed int (so the
above code's use of signed int may have been justified, after all):

http://doc.qt.io/qt-4.8/qstring.html#resize

"If size is greater than the current size, the string is extended to
make it size characters long with the extra characters added to the end.
The new characters are uninitialized.

If size is less than the current size, characters are removed from the end."

No clear explanation on what will happen on a negative size, and besides
it might also be possible to exceed 4 GB and get to positive values again.

Is there anything at higher layers, yet applicable to all published Qt's
APIs, consistenly limiting XML inputs to below 2 GB?  If so, this may be
OK (but a comment would be nice).  If not, we have a problem.

Speaking of the CVE-2013-4549 fix, it also looks susceptible to integer
overflows, albeit possibly only for inputs so ridiculously large that
mitigating that original DoS is irrelevant?  In the code below
"++referencesToOtherEntities[toSearch][entityName];" and/or
"*expandedIt += expandedSizes.value(referenceTo) * references +
literalEntitySizes.value(referenceTo) * references;" might overflow,
including possibly in ways that the second main loop doesn't run or that
the check "*expandedIt > entityCharacterLimit" is false.

  429     // The entity at (QMap<QString,) referenced the entities at (QMap<QString,) (int>) times.
  430     QHash<QString, QHash<QString, int> > referencesToOtherEntities;
  431     QHash<QString, int> expandedSizes;

 6651 bool QXmlSimpleReaderPrivate::isExpandedEntityValueTooLarge(QString *errorMessage)
 6652 {
 6653     QString entityNameBuffer;
 6654 
 6655     // For every entity, check how many times all entity names were referenced in its value.
 6656     for (QMap<QString,QString>::const_iterator toSearchIt = entities.constBegin();
 6657          toSearchIt != entities.constEnd();
 6658          ++toSearchIt) {
 6659         const QString &toSearch = toSearchIt.key();
 6660 
 6661         // Don't check the same entities twice.
 6662         if (!literalEntitySizes.contains(toSearch)) {
 6663             // The amount of characters that weren't entity names, but literals, like 'X'.
 6664             QString leftOvers = entities.value(toSearch);
 6665             // How many times was entityName referenced by toSearch?
 6666             for (QMap<QString,QString>::const_iterator referencedIt = entities.constBegin();
 6667                  referencedIt != entities.constEnd();
 6668                  ++referencedIt) {
 6669                 const QString &entityName = referencedIt.key();
 6670 
 6671                 for (int i = 0; i < leftOvers.size() && i != -1; ) {
 6672                     entityNameBuffer = QLatin1Char('&') + entityName + QLatin1Char(';');
 6673 
 6674                     i = leftOvers.indexOf(entityNameBuffer, i);
 6675                     if (i != -1) {
 6676                         leftOvers.remove(i, entityName.size() + 2);
 6677                         // The entityName we're currently trying to find was matched in this string; increase our count.
 6678                         ++referencesToOtherEntities[toSearch][entityName];
 6679                     }
 6680                 }
 6681             }
 6682             literalEntitySizes[toSearch] = leftOvers.size();
 6683         }
 6684     }
 6685 
 6686     for (QHash<QString, QHash<QString, int> >::const_iterator entityIt = referencesToOtherEntities.constBegin();
 6687          entityIt != referencesToOtherEntities.constEnd();
 6688          ++entityIt) {
 6689         const QString &entity = entityIt.key();
 6690 
 6691         QHash<QString, int>::iterator expandedIt = expandedSizes.find(entity);
 6692         if (expandedIt == expandedSizes.end()) {
 6693             expandedIt = expandedSizes.insert(entity, literalEntitySizes.value(entity));
 6694             for (QHash<QString, int>::const_iterator referenceIt = entityIt->constBegin();
 6695                  referenceIt != entityIt->constEnd();
 6696                  ++referenceIt) {
 6697                 const QString &referenceTo = referenceIt.key();
 6698                 const int references = referencesToOtherEntities.value(entity).value(referenceTo);
 6699                 // The total size of an entity's value is the expanded size of all of its referenced entities, plus its literal size.
 6700                 *expandedIt += expandedSizes.value(referenceTo) * references + literalEntitySizes.value(referenceTo) * references;
 6701             }
 6702 
 6703             if (*expandedIt > entityCharacterLimit) {
 6704                 if (errorMessage) {
 6705                     *errorMessage = QString::fromLatin1("The XML entity \"%1\" expands to a string that is too large to process (%2 characters > %3).")
 6706                         .arg(entity, *expandedIt, entityCharacterLimit);
 6707                 }
 6708                 return true;
 6709             }
 6710         }
 6711     }
 6712     return false;
 6713 }

I'd appreciate any comments, especially from Florian and from upstream.

I am Bcc'ing this to the address given at
https://wiki.qt.io/Qt_Project_Security_Policy so that they have this
message with the Message-ID for replies to the same thread, and I will
also notify them separately.

Alexander

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.