QuickBMS - Scan all the supported compressions

Videos, guides, manuals, documents and tutorials about using tools and performing tasks
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

QuickBMS - Scan all the supported compressions

Post by aluigi »

During the reverse engineering of an archive or an unknown file it may happen to see that it uses compression due to some parameters found in the index table and/or due to its "scrambled" content:
Image


Usually there are some tricks to know if it's a known compression algorithm, for example zlib starts with 0x78, lzma with 0x5d followed by some zeroes, lzss and lzo show parts of the uncompressed content and so on.

But if we don't know the algorithm or we want to be sure of its name or we want to know what's the result which is closer to the original uncompressed file, we need to use the following script and bat file:
http://aluigi.org/papers/bms/comtype_scan2.bat
http://aluigi.org/papers/bms/comtype_scan2.bms

The following is the situation in our folder, with dump.dat that is our compressed file:
Image


And this is the runtime help of comtype_scan2.bat:
Image


Let's insert this command-line to start the scan:

Code: Select all

comtype_scan2.bat comtype_scan2.bms dump.dat output

Please note that if we already know what is the uncompressed size, it's HIGHLY recommended to add it to the command-line like in this example:

Code: Select all

comtype_scan2.bat comtype_scan2.bms dump.dat output 0x7cf


During the scanning QuickBMS will show lot of messages and errors.
That's perfectly normal.
Usually you will notice that it freezes like in this case:
Image


No problem, press CTRL-C and type 'n':
Image


Finally we reach the end of the scanning:
Image


The next step is the manual checking of the results dumped in the output folder.
There are some ways to automize this process, anyway the simplest way is ordering the files by size in decrescent order:
Image


And then open them one-by-one with a hex editor:
Image


That 8.dmp seems to contain valid PNG data, let's try to open it with an image viewer:
Image


Bingo, that's the correct algorithm.

Now open defs.h text file inside the QuickBMS source code (src folder in quickbms.zip) and check what algorithm is that number 8:
Image


Yeah, the algorithm is lzo1x.

Don't think that it's ever so easy to find the correct algorithm, sometimes you don't know the name of the file and its content is a custom format or a raw audio/image.
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

Ah, I have attached the original dump.dat in case someone wants to make his own tests.

You can even create it by yourself with quickbms:

Code: Select all

comtype lzo1x_compress
get SIZE asize
clog "dump.dat" 0 SIZE SIZE
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

I want to stress the fact that the comtype scanner should be used only if you know really what you are doing.

Very quickly:

- do you have a file that may contain chunks of compressed data?
DO NOT USE the comtype scanner

- do you have a raw file that may contain anything?
DO NOT USE the comtype scanner

- do you have a raw file that you are sure contain compressed data from offset 0 till its end?
YES, USE the comtype scanner

- is the comtype scanner a way to find compressed chunks of data in a file?
NO

- is the comtype scanner a way to find what algorithm is used on a specific piece of data?
YES, the compressed data must cover the whole file, so if the file is 0x123 bytes big and the compressed data is from offset 0 to 0x10 or from offset 0x10 to 0x123 it will fail!

- example, if you use comtype scanner on a ZIP archive you will find absolutely NOTHING

- example, if you use comtype scanner on the compressed part of a ZIP archive you will have success (deflate algorithm)

In general the rule is not using the scanner except if you want to waste your time and your resources, that's up to you but then don't complain with quickbms for your faults.
usabdt
Posts: 12
Joined: Tue Nov 13, 2018 7:34 pm

Re: QuickBMS - Scan all the supported compressions

Post by usabdt »

comtype_scan2.bat not used on win 10 ?
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

@usabdt
It works with win10 too, do you get any error and what error?
usabdt
Posts: 12
Joined: Tue Nov 13, 2018 7:34 pm

Re: QuickBMS - Scan all the supported compressions

Post by usabdt »

aluigi wrote:@usabdt
It works with win10 too, do you get any error and what error?

can not run the file comtype_scan2.bat . Can you capture your operation when doing on Win 10 ??
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

usabdt wrote:can not run the file comtype_scan2.bat .

Details?
Anyway that's something meant only for advanced users. If you want support for a format or a compression ask on the forum and do NOT try it, just as written in my FAQ post above.
BCGhost
Posts: 35
Joined: Fri Dec 15, 2017 1:42 pm

Re: QuickBMS - Scan all the supported compressions

Post by BCGhost »

In version 0.1.2 of the bms script, there're lines writing:

Code: Select all

set NAME string QUICKBMS_COMTYPE
if NAME & "_COMTYPE"    # check if the variable is set
    set NAME string i
endif

But with QuickBMS v0.9.2 the output NAME will always be the value of the index coz QUICKBMS_COMTYPE will be considered as the string "QUICKBMS_COMTYPE", is that normal?
aluigi wrote:Please note that if we already know what is the uncompressed size, it's HIGHLY recommended to add it to the command-line

I found out that for some algorithms like oodle, if you don't specify the uncompressed size, it'll just throw you an error even if the buffer size is ever larger than the uncompressed size.
But in most of the cases, I'd like not to add the uncompressed size but using it as a filter condition for the right result since the amount of the outputs don't seem to be reduced after adding it anyway.
So is it possible to always have both the benefits at the same time? :roll:
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

Yes because it's an internal variable used by quickbms :)
Just like QUICKBMS_HASH and QUICKBMS_CRC.
So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

The problem with oodle is correct, it's just like oodle is meant to work. The library requires the exact compressed and uncompressed size or it gives an error.
For bypassing these "picky" compression algorithm I implemented the -e option in quickbms that ignores any error from the algorithms, but it forces you to check every scanned file by hand... 700 files! :D
BCGhost
Posts: 35
Joined: Fri Dec 15, 2017 1:42 pm

Re: QuickBMS - Scan all the supported compressions

Post by BCGhost »

aluigi wrote:So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

But if so then that condition will always be true, which means the Name will be set as a numeric index(or it's never meant to assign the algo name to the output?), whereas the indexs are removed in "comtype.h".
aluigi wrote:The problem with oodle is correct, it's just like oodle is meant to work. The library requires the exact compressed and uncompressed size or it gives an error.
For bypassing these "picky" compression algorithm I implemented the -e option in quickbms that ignores any error from the algorithms, but it forces you to check every scanned file by hand... 700 files! :D

If the decompression succeed and the size is correct -- havn't actually tested it yet though, it's easy to narrow the search basing on the uncompressed size in the raw file.
Point is, if the compression you're dealing with happens to be one of those "picky" algorithms and none of the outputs matches, it's just futile to check all of them. Perhaps it's better to specify the size and take another chance then? :mrgreen:
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

BCGhost wrote:
aluigi wrote:So if QUICKBMS_COMTYPE contains the string "_COMTYPE" it means that the variable has not been set by quickbms.

But if so then that condition will always be true, which means the Name will be set as a numeric index(or it's never meant to assign the algo name to the output?), whereas the indexs are removed in "comtype.h".

What's the exact problem?
I mean do you have a script or command proof-of-concept that gives a problem?
Here everything works perfectly.


BCGhost wrote:If the decompression succeed and the size is correct -- havn't actually tested it yet though, it's easy to narrow the search basing on the uncompressed size in the raw file.
Point is, if the compression you're dealing with happens to be one of those "picky" algorithms and none of the outputs matches, it's just futile to check all of them. Perhaps it's better to specify the size and take another chance then? :mrgreen:

comtype_scan2.bat already has an optional field for specifying the decompressed size, do you mean that one?

Code: Select all

comtype_scan2 c:\comtype_scan2.bms c:\dump.dat c:\output_folder [max_size]
BCGhost
Posts: 35
Joined: Fri Dec 15, 2017 1:42 pm

Re: QuickBMS - Scan all the supported compressions

Post by BCGhost »

aluigi wrote:What's the exact problem?
I mean do you have a script or command proof-of-concept that gives a problem?
Here everything works perfectly.

Here's the thing: since when QuickBMS was updated(not sure from which version on), the enum of the algos are placed into a separate file where the comments of scan ids per 5 algos that appeared in the original defs.h are now gone in this comtype.h. So I assumed that comtype scan should have been updated as well and basing on that extra code I thought it might use the names of the algos directly as the output names instead of their IDs. But the truth is that I still need to look for the IDs to find the algo names, in the OLD defs.h in order not to count them one by one in comtype.h. But if comtype scan hasn't been added any new feature then everything do work perfectly.

aluigi wrote:comtype_scan2.bat already has an optional field for specifying the decompressed size, do you mean that one?

What I was talking about is that if I knew the decompressed size, I would like not to specify this field but use it to filter the possible results that meet with this size after comtype scan did its job, coz on the one hand, not specifying the decompressed size but using it as a filter condition can increase the chance to find the correct output, but I have to make sure that the algo I'm about to deal with doesn't necessarily need a decompressed size to perform the decompression correctly; and on the other hand, if I did specify the decompressed size, some wrong outputs might just be truncated to this size after the decompression process being terminated. Of course, since I don't really know too much about compression algorithms all those questions are based on my assumptions, which is why I would like someone famiiar enough with this area to answer them.
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

The numeric IDs are no longer available in the source code because they are no longer used in comtype_scan2.bat/bms.
I guess you are using an old version of the bat/bms.
If you have an old dump from a previous version of comtype_scan2, you can easily recover the ID by counting the lines. ID 1 starts from QUICKBMS_COMP(ZLIB).

Regarding the decompressed size you can filter the size from Windows or by using the hexdump_scanner.bms script that reports the exact size of the file:
http://aluigi.org/bms/hexdump_scanner.bms
Usage:

Code: Select all

quickbms.exe hexdump_scanner.bms c:\folder_with_dumps_from_comtype_scan2


From my experience I suggest to specify ever the exact decompressed size once avoiding false positives and false negatives (lz4, oodle and so on).
BCGhost
Posts: 35
Joined: Fri Dec 15, 2017 1:42 pm

Re: QuickBMS - Scan all the supported compressions

Post by BCGhost »

aluigi wrote:I guess you are using an old version of the bat/bms.

Something like that. Turns out that I forgot to replace the old QuickBMS executable(v0.9.0) in my work directory while I didn't add it to the environment variable. Now everything works just fine. Thanks for this new update!
aluigi wrote:From my experience I suggest to specify ever the exact decompressed size once avoiding false positives and false negatives (lz4, oodle and so on).

Um, guess it's the optimal solution from now on. :)
mk_fan
Posts: 15
Joined: Sun Aug 23, 2020 1:47 pm

Re: QuickBMS - Scan all the supported compressions

Post by mk_fan »

More than 6 years after the initial post, this is still useful for those who want to start with quickbms scripting, like me.

Thanks!
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: QuickBMS - Scan all the supported compressions

Post by aluigi »

I agree. The comtype scanner is definitely the most powerful feature of the tool.
The next version (currently in beta, already available for download) includes almost 900 compression algorithms to scan.

But remember that this feature is necessary only in few rare cases.
The most used algorithms are ever the following:
  • zlib
  • deflate
  • lzma
  • lzss (sometimes lzss0)
  • lz4
  • lz77wii
  • xmemdecompress
  • lzo1x
  • gzip