Offzip - deflate/zlib scanning and extraction

aluigi · Post by **aluigi** » Tue Aug 05, 2014 9:34 am

Tool:
Offzip - http://aluigi.org/mytoolz.htm#offzip

Abstract:
dump zlib/deflate compressed data, help in reversing file formats, why scanning compressions isn't possible with any algorithm

Imagine a situation in which you have a file and you have no idea of its content or you don't know its format to create a parser.

The first thing to try is usually checking if there is compressed data in it and extracting it on the fly or you can use some information about the compressed and uncompressed size to write a file format parser.

The most used compression algorithm in the world is deflate, also known as zlib.
Deflate and zlib are basically the same algorithm but with a small important difference:

deflate [url=http://www.ietf.org/rfc/rfc1951.txt‎]RFC1951[/url] is the pure compressed stream: no headers, just data
zlib [url=http://www.ietf.org/rfc/rfc1950.txt‎]RFC1950[/url] is a deflate stream with a small header at the beginning and a CRC at the end
You can recognize it due to the presence of the 0x78 byte at the beginning of the compressed data

Just for the record, there is also the Gzip [url=http://www.ietf.org/rfc/rfc1952.txt‎]RFC1952[/url] compression but it's mainly a container rather than a stream compression and it supports other algorithms, not only deflate.

The problem of compression algorithms is that without a CRC it's impossible to know if the decompressed data is valid or just a sort of "false positive".

In this case zlib gives excellent results and we can really scan a whole file extracting all the compressed streams almost without false positives.

Instead with deflate we will get many false positives and so it's up to us to understand if it's used for real or not.

There is a command-line tool of mine which is very helpful to scan the zlib and deflate streams, it's called offzip.
Like all the tools of mine there is a runtime help that is displayed when you launch the tool without arguments, so refer to it for any additional option.

The quick command-line example is the following:

Code: Select all

offzip -S file.dat 0 0

It will scan the file "file.dat" searching for zlib compressed streams and will return the offset, the compressed and decompressed size and the total number of data found.

It's also possible to dump the results directly in a folder:

Code: Select all

offzip -a file.dat output_folder 0

If you want to scan the file for raw deflate compressed data use the following:

Code: Select all

offzip -z -15 -S file.dat 0 0

The "-z" option specifies the windowBits value used by the [url=http://zlib.net]zlib[/lib] library where a positive number is used for zlib and a negative one for deflate.

The following is an example of successful zlib scan:

Code: Select all

Offset file unzipper 0.3.5
by Luigi Auriemma
e-mail: aluigi@autistici.org
web:    aluigi.org

- open input file:    steamservice.idb
- zip data to check:  32 bytes
- zip windowBits:     15
- seek offset:        0x00000000  (0)

+------------+-------------+-------------------------+
| hex_offset | blocks_dots | zip_size --> unzip_size |
+------------+-------------+-------------------------+
  0x00000105  ...................... 4342469 --> 18112512
  0x004243d3  ...................... 1687404 --> 11886592
  0x005c0348  ........ 14626 --> 49152
  0x005c3c73  . 601 --> 1380


- 4 valid zip blocks found

And how you know if there are false positives?
Simple, you will see lot of dots without the compressed/uncompressed sizes on the right and many error messages.
With deflate (-z -15) don't trust the results, often they are false positives except if you see many subsequent sequences.

Personally I find offzip very useful when I work on not-so-simple archives and I want a quick way to retrieve the compressed and uncompressed size values that I can search in the file with a hex editor to locate the header.
In this case I use the -x option that dumps the size values in hexadecimal format.
I take one of the two value, search it in 32bit little endian with the hex editor and then I check if there is also the other value close to it.
After you locate the header containing information about the archived files, it's more easy to have an idea of the structure used for each entry.

Just to recap, if you want to:

know if there are zlib compressed streams: offzip -S file.dat 0 0
know if there are deflate compressed streams (false positives!): offzip -S -z -15 -q file.dat 0 0
dump the zlib results: offzip -a file.dat c:\output_folder 0
dump the results in one unique file, useful when there are chunked files: offzip -a -1 file.dat c:\output_folder 0
analyze an archive to retrieve the index table containing offset/zsize/size: offzip -S -x file.dat 0 0
dump the results in a file (maybe to use later with packzip [work-in-progress!]): offzip -a -L c:\output_folder\dump.txt file.dat c:\output_folder 0

Mondraconus · Post by **Mondraconus** » Fri Aug 25, 2017 5:46 am

Thanks for the code. Yesterday, I got a problem for .STREAM2 file ( from Project Gotham Racing 4 game - Xbox 360 ). I have tried your Quickbms with Project_Gotham_4.bms but it can't extract that file. It suggest me to use offzip -a to dump this raw file. So, I'm going to offzip. I already tried all possibilities, from offzip -a, offzip -z -15 until offzip -z 15, but the result is error. It says there are not valid zip file. So, how I can offzip this .STREAM2 file ? Or maybe you can do something to your bms script on Project_Gotham_4.bms on this line :

elif EXT == "STREAM2"

print "Use offzip -a on this file do dump the raw data:\nhttp://aluigi.altervista.org/mytoolz.htm#offzip"
cleanexit

Hope .STREAM2 contains texture and your Quickbms or offzip can extract these texture inside it. Thanks for your work...

~~~~~~~~
Regards

aluigi · Post by **aluigi** » Fri Aug 25, 2017 9:40 am

It's something more appropriate for the Game Archive section since it's a topic related to a specific game.
When I tested the STREAM2 files they contained zlib data, probably you got a stream2 file containing non-zlib data too. That's perfectly possible because often there are archives which are just containers without information or have a very complex structure that can't be easily handled (except if you want to waste tons of time and effort on reverse engineering them) so the best and easiest solution is just scanning them with offzip.

Mondraconus · Post by **Mondraconus** » Fri Aug 25, 2017 6:05 pm

aluigi wrote:It's something more appropriate for the Game Archive section since it's a topic related to a specific game.
When I tested the STREAM2 files they contained zlib data, probably you got a stream2 file containing non-zlib data too. That's perfectly possible because often there are archives which are just containers without information or have a very complex structure that can't be easily handled (except if you want to waste tons of time and effort on reverse engineering them) so the best and easiest solution is just scanning them with offzip.

Thanks for the reply, I have been doing scan for Macau_sunny.stream2 file. First, I using scan offzip -S, then using offzip -S -x -Q, the result is no valid full zip data found. Maybe this file, as you say before, containing non-zlib data. So, what I can do now for this .STREAM2 file ? Is it meant I can't do anything for this file either unpack it or export the textures inside it ?

~~~~~~~~
regards

aluigi · Post by **aluigi** » Fri Aug 25, 2017 9:18 pm

Open a new topic in the Game Archive section and upload the file there. Here you are off-topic.

Mondraconus · Post by **Mondraconus** » Sat Aug 26, 2017 4:44 am

aluigi wrote:Open a new topic in the Game Archive section and upload the file there. Here you are off-topic.

Thanks Aluigi for your suggest, I already open new topic about this. Also, I already upload the file via Mega, maybe as moderator you can help me to identify and unpack textures inside this .STREAM2 file. If you interest to help me, you can visit the new thread at : http://zenhax.com/viewtopic.php?f=9&t=4824, thank you so much again for your work.

~~~~~~~~~~
Regards

Vido · Post by **Vido** » Sun Jan 14, 2018 4:50 am

Great tool tnx

ass · Post by **ass** » Tue Mar 20, 2018 1:50 pm

in the version of "Offzip 4.0" there is a parameter -D FD use a dictionary from file FD (File Description -i quess), what is this parameter, and where can I take samples?

aluigi · Post by **aluigi** » Fri Mar 23, 2018 10:49 am

It's not what you think.
it's the zlib dictionary (preset deflate dictionary).

cn_tony · Post by **cn_tony** » Sun Apr 22, 2018 7:15 am

Hello Aluigi，

Thanks for developing this great tool!
I have a question about using this tool, I use this tool unzipped a file where I downlaod from ONU ( optical network unit) and got 6 files, I would like to edit one of the 6 files and zip it back to one file in *.bin format and then flash it back to a ONU, is that possible with this offzip tools.
Thanks and look forward your feedback.

following is the scanning result of this bin file:

c:\offzip>offzip -S config.bin 0 0

Offzip 0.4
by Luigi Auriemma
e-mail: aluigi@autistici.org
web: aluigi.org

- open input file: config.bin
- zip data to check: 32 bytes
- zip windowBits: 15
- seek offset: 0x00000000 (0)

+------------+-----+----------------------------+----------------------+
| hex_offset | ... | zip -> unzip size / offset | spaces before | info |
+------------+-----+----------------------------+----------------------+
0x000000d8 .... 7382 -> 65536 / 0x00001dae _ 216 8:7:26:0:1:ad6bc1c9
0x00001dba .. 3423 -> 65536 / 0x00002b19 _ 12 8:7:26:0:1:153fee04
0x00002b25 . 1888 -> 65536 / 0x00003285 _ 12 8:7:26:0:1:2d1bcef2
0x00003291 .. 2624 -> 65536 / 0x00003cd1 _ 12 8:7:26:0:1:cbedd83e
0x00003cdd .. 3269 -> 65536 / 0x000049a2 _ 12 8:7:26:0:1:d74bf3ca
0x000049ae ... 5554 -> 54683 / 0x00005f60 _ 12 8:7:26:0:1:301d258f

- 6 valid compressed streams found
- 0x00005e4c -> 0x0005d59b bytes covering the 98% of the file

aluigi · Post by **aluigi** » Sun Apr 22, 2018 7:29 am

Yes that's possible, just use the -r option and the "same" other arguments you used for extraction.
Example:

Extract:
offzip -a config.bin output_folder 0

Edit the files in output_folder and try to keep their size and entropy as same/lower than the original.

Reimport:
offzip -a -r config.bin output_folder 0

Yes output_folder will act as an input folder in this mode and config.bin will be edited by reinjecting the real-time compressed files in it.

underwater · Post by **underwater** » Fri Sep 21, 2018 9:46 pm

If I understand the gzip documentation correctly, there should be a way of embedding the filename in the data.

Is it possible to use offzip to extract those names?

aluigi · Post by **aluigi** » Sat Sep 22, 2018 3:10 am

offzip is a scanner of deflate streams.
zlib is a container for deflate and it's the default type of data searched by offzip because it's the most used and has a very limited number of false positives due to the 32bit checksum at the end.

gzip is a container for various types of compression algorithms including deflate which is the most used, and yes, it "may" contain the original filename but in practice that's never implemented, indeed in all the gzip data I have seen in various games I don't remember one time in which the filename was stored in it.

Offzip doesn't scan the gzip containers but it can find the deflate streams used in them if launched with -z -15, and therefore there is no support for names in gzip.

underwater · Post by **underwater** » Sat Sep 22, 2018 1:47 pm

gzip is a container for various types of compression algorithms including deflate which is the most used, and yes, it "may" contain the original filename but in practice that's never implemented, indeed in all the gzip data I have seen in various games I don't remember one time in which the filename was stored in it.

I'm sorry if this is going off-topic now, and if I should post it in the relevant game thread, but didn't some of the PAKs for the Trials games do that?

E.g. the one below has no visible filenames in the data, but your BMS script extracts the proper names just fine:
https://mega.nz/#!WOwEmC7Z!dML-l_oW4UtR ... tPl6G4LPrk

aluigi · Post by **aluigi** » Sat Sep 22, 2018 6:44 pm

No, the filenames are well visible in the file and easy to search in the hex editor.
Additionally trials2.bms doesn't even use the gzip compression/format so it's not just OT related to this thread but even OT than your same post

Anyway the filenames are stored in one of the archived files.

underwater · Post by **underwater** » Sat Sep 22, 2018 6:56 pm

Oh, sorry. Found them now.
I was looking at the end (where they were in some of the other files, but in this one they ended up in the middle somehow...)
Feel free to delete all these offtopic posts. I'm obviously still learning, and had somehow become fixated on the wrong path...

aluigi · Post by **aluigi** » Sat Sep 22, 2018 7:17 pm

Don't worry it's ok and I like to explain how these things work.

johnz1 · Post by **johnz1** » Fri Jan 18, 2019 4:39 am

In the offzip output, what is the "info" column (the last column, after "spaces before")? By any chance is it giving details about the compression level of each file?

If not, is there any way to analyze a zlib-compressed file to determine it's compression level/settings? I have a zlib file that offzip can decompress into an XML file, but when I try to use Packzip to compress and re-inject that XML file, the file is different from the original. The file is part of a PS3 save game, and the game always says the file is corrupt after the re-compressed file has been injected, whether or not the XML file was modified.
This is what the offzip output looks like for the file:

Code: Select all

0x0000ace8 .. 3098 -> 17663 / 0x0000b902 _ 4 8:7:26:0:1:a994c01c

I'd like to figure this out on my own so I haven't created a new post, but maybe I'm not the only person who'd like more info about the "info" column.

Thanks for the wonderful tools and site!

aluigi · Post by **aluigi** » Sat Jan 19, 2019 7:50 pm

Code: Select all

8:7:26:0:1:a994c01c

CM, CINFO, FCHECK, FDICT, FLEVEL, ADLER32

predprey · Post by **predprey** » Tue Aug 06, 2019 5:47 am

Offzip 0.4 throws an "Invalid Argument" error if the offset is larger than 0x7FFFFFFF

ZenHAX

Offzip - deflate/zlib scanning and extraction

Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction

Re: Offzip - deflate/zlib scanning and extraction