July 2013
Please note that republishing this article in full or in part is only allowed under the conditions described here.
Dubious HTTP II - Unusual HTTP Content-Encodings
The Content-Encoding header is usually used to specify a compression of the content. The usual values are either gzip (RFC1952) and deflate (RFC1951). While combining these encodings does not make much sense, the HTTP standard (RFC2616) allows for other Content-Encodings and also allows to apply multiple encodings.
To determine the behavior of the browsers I tested with:
- Microsoft Internet Explorer (MSIE) versions 8 and 10
- Firefox 22
- Google Chrome 28
- Opera 12.15 (before WebKit)
- Rekonq (KDE project) 2.2.1 - Konqueror (KDE) seems to behave the same
To evaluate the behavior of intermediate systems I let virustotal.com (2013/7/1) check some URLs with unusual content-encodings. I also looked at the source code of common IDS:
- Bro IDS 2.1
- Snort IDS 2.9.4.6
- Suricata IDS 1.4.3
To reproduce the results you might point your browser to my test site or set up your own using my test suite.
Supported Encodings
- Microsoft Internet Explorer and Bro IDS seems to support only gzip and deflate,
- Firefox, Google Chrome, Opera, Rekonq and virustotal.com support also deflate according to RFC1950, e.g. raw zlib without header and checksum
- Firefox, Chrome, Opera, Rekonq, virustotal.com and Snort IDS support x-gzip as an alias to gzip
- Opera and Rekonq support x-deflate as an alias to deflate
- Suricata IDS supports gzip and x-gzip, but no deflate
Interpretation of Content-Encoding Header
- MSIE and Rekonq do not support continuation lines (e.g. Content-encoding:\r\ngzip)
- Opera and Firefox interpret "Content-encoding: gzip x" as gzip
- Firefox also interprets "Content-encoding: x gzip" as gzip
Mismatch Between Specified and Real Encoding
- if an encoding of deflate is specified, but gzip content is provided, MSIE8, MSIE10 and Opera detect it and apply gzip decoding, but specififying gzip and using deflate is not detected
- the others don't try to guess encoding
- virustotal.com reports invalid data if real and specified encoding don't match
Stacking of Multiple Content-Encodings
- Chrome, Firefox and Rekonq can handle multiple encodings, like double gzip or deflate followed by gzip
- the rest only understand a single encoding, but
- Opera tries to use the first encoding
- virustotal.com tries to use the latest encoding
- MSIE8 and MSIE10 ignore any encoding and will use the raw data
- based on looking at the source code Bro IDS and Snort IDS seem to use the latest encoding, while Suricata uses the first (but Suricata only knows about gzip/x-gzip anyway)
Behavior on Unknown Encodings
- virustotal.com reports invalid data if Content-encoding is not gzip, x-gzip or deflate
- the tested IDS ignore any content-encoding header they don't understand without even logging the problem
- all browsers ignore the content-encoding header if they don't know the encoding, thus using the raw data
Transfer-Encoding versus Content-Encoding
- the HTTP standard explicitly allows compression using the Transfer-Encoding header, but only Opera and Rekonq support it (gzip and deflate)
Conclusion
If an attacker has full control over a web server serving malware, he can use Content-Encoding or Transfer-Encoding to easily bypass security systems.