Identify Unknown Compression

Extraction and unpacking of game archives and compression, encryption, obfuscation, decoding of unknown files
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Identify Unknown Compression

Post by ngc_kor »

I had a strange compression file that I couldn't find any clue.
(From - Eternal Darkness by Silicon Knights, Published by Nintendo)
File signature start with *SK_ASC* and unknown compression.

The list below compression method that I tested, but doesn't match:

    LZ10
    LZ11
    LZ77
    LZO1x-1
    LZO1x-999
    LZSS
    LZW
    LZMA
    HUFF blocksize 4 & 8 byte
    RLE
    ZLIB

I had a decompression data, and still no clue for a few days.
Can anyone identify this compression method?

Here is the file: http://goo.gl/JJQfl4

p.s It seems like a bpe (Byte Pair Encoding). but not sure :(


[File Information]

CMP is a compressed data which always start with SK_ASC.
BIN is a same as CMP.
DCMP is a instant copy from memory file, which was extracted from offset 0x80CA6980.
DMP is a data value correction by subtracted 0x80CA6980 because there so many memory scrapping dummy. (so this is the cleaned decompressed data if i guess right)
RAM is a ram dump data

I already posted several forum. But still no clue for weeks :(
http://reverseengineering.stackexchange ... d-gamecube
http://encode.ru/threads/2074-Identifyi ... #post41188
Last edited by ngc_kor on Sun Nov 16, 2014 11:30 am, edited 3 times in total.
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: Identify Unknown Compression

Post by aluigi »

If we consider that at offset 8 it may contain a big endian uncompressed size (32bit) and the data starts from offset 0xc, the only "good" result I got from the scanner was 230.dat which is a RLE3 (which is just for tests, it's not used) but it's for sure invalid because doesn't contain strings or fields.
I tried also lzma from offset 8 but still nothing.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

aluigi wrote:If we consider that at offset 8 it may contain a big endian uncompressed size (32bit) and the data starts from offset 0xc, the only "good" result I got from the scanner was 230.dat which is a RLE3 (which is just for tests, it's not used) but it's for sure invalid because doesn't contain strings or fields.
I tried also lzma from offset 8 but still nothing.


All of files start with 0x00 0x** from 0x08 which mean we can guess it's a some kind of size related.
But... I compared a lot of data, that is not a size value AFAIK.

e.g) EKisokMenu.dmp file is 2184192 byte, which mean size is 0x215400, but there is no match value inside of EKisokMenu.cmp!

I just guessing first 2-4 byte from 0x08 is a some kind of dictionary size.

Anyway, I uploaded data compare sheet here: http://goo.gl/wu6Pkv (re-compress size with various comp method also here)
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: Identify Unknown Compression

Post by aluigi »

I made another compression scanning from offset 0x8 of EMnMenu.cmp with an uncompressed size of 0x26a920 (the size of the dmp one) but still nothing.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

Hump...
Last edited by ngc_kor on Sun Nov 23, 2014 9:08 am, edited 1 time in total.
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

http://blog.delroth.net/2012/03/gcwii-d ... r-ida-6-1/

If you have access to IDA you can use the above plugin to load the DOL file. GC/Wii isn't my thing so I'm not able to tell you much about it, but with enough time you should be able to figure it out with that.

IDA found a few references to decompression.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

Thanks to wulf, Now I see some clue. And look like this.

sub_8014191C: (IDA PRO + DOL PLUGIN + START.dol)

.set var_28, -0x28
.set var_14, -0x14
.set arg_4, 4

stwu r1, -0x30(r1)
mflr r0
stw r0, 0x30+arg_4(r1)
stmw r27, 0x30+var_14(r1)
mr r28, r4
mr r27, r3
mr r30, r6
mr r29, r5
mr r12, r28
mr r31, r7
mr r5, r30
addi r3, r1, 0x30+var_28
li r4, 8
mtctr r12
bctrl
lis r3, ((aSk_asc+0x10000)@h) # "*SK_ASC*"
addi r4, r1, 0x30+var_28
addi r3, r3, -0xB68 # aSk_asc
li r5, 8
bl sub_800F3B34
cmpwi r3, 0
bne loc_80141994

So, this could be a some kind of structure of decompression.
BUT, what should I do? I'm not good in dis-assembly. :(

My goal is find out a compression method, making a compresser/decompresser tool.
If anyone could help me, I would be appreciated that.
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

Do you have the capability to debug the game while it runs?

If so, I'd start by feeding it smaller and smaller chunks of the archive and watching how far it gets. Does it read the first 20 bytes before branching to a new sub? 40? Etc. Once it copies out the first chunk of the code, does it do some math on it that converts it into chunks of the RAM dump you found? Or does it create some sort of header information that is then processed in its decrypted state? It could easily be a known compression method wrapped in some form of simple encryption.

What setup are you using to play around with the game? Original disc on original hardware, one of the emulators, something else? Do you have IDA 6.1, 6.5, or other?

Disclaimer: I have no clue what I'm talking about but usually fake it well enough to fluke into a solution.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

I'm using a IDA Pro 6.1 and I have a original disc and original hardware, but if I do some test, I use emulators.
And emulator have the debug function that I could debug.
Now I'm set 8014191C for breakpoint for seeking.

Wulf wrote:Do you have the capability to debug the game while it runs?

If so, I'd start by feeding it smaller and smaller chunks of the archive and watching how far it gets. Does it read the first 20 bytes before branching to a new sub? 40? Etc. Once it copies out the first chunk of the code, does it do some math on it that converts it into chunks of the RAM dump you found? Or does it create some sort of header information that is then processed in its decrypted state? It could easily be a known compression method wrapped in some form of simple encryption.

What setup are you using to play around with the game? Original disc on original hardware, one of the emulators, something else? Do you have IDA 6.1, 6.5, or other?

Disclaimer: I have no clue what I'm talking about but usually fake it well enough to fluke into a solution.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

When I was set the breaking point in 80141374 / 800f3b34 / 8014191c / 80141998 / 8013FC58
(Except first signature) First 4000 byte of Compressed file(JMnMenu.cmp) is loaded (80)5ADEC0~(80)5AC3E0 in Memory File.
And this data is decompressed which was stored at (80)5AC3E0~

So I guessing that 8014191C (main) / 8013FC58 (sub_routine) is sure for Decompression algorithms.
I try to figure out by myself, but I'm not well in PPC Disassembly, So I couldn't do it.
Would anyone help me find out describe how it works?

I uploaded Power PC ASM code(decompression algorithms) here: http://goo.gl/2bQNfj

And also decompressed file/compressed file/main executive file/ram dump file uploaded
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

I'll see if I can get something set up to debug the code on my own tomorrow. Understanding it all hands-off just from seeing the code is beyond my skills.

What I'd do is confirm the first instruction to read that byte, see if any operations are done to it, then see where it is stored. Then set a read breakpoint on the new location where it is stored, and repeat.

The fewer steps the data takes between first read and final write, the easier it is to figure out what's being done to it.
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

I got everything set up and got the debugger working, but I don't have it in me tonight to do more than that. See if I can do more tomorrow.
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

Wulf wrote:I got everything set up and got the debugger working, but I don't have it in me tonight to do more than that. See if I can do more tomorrow.


I'm really counting on you. Thank you wulf.
I can't wait tomorrow! :)


p.s Yesterday, I got a de-compiled code.
I'm not sure whether the code is correct or not, so It could not much help for now, but it's better than asm code.

Here-> http://goo.gl/343oVv

3 different code start offset from 8013F29C, 8013F40C, 8013FC58 each.
Entrypoint: 8014191C, End offset: 801419CC
Decompiled from http://decompiler.fit.vutbr.cz/decompilation-run.
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

The biggest holdup at this moment is trying to get the Dolphin debugging environment similar enough to what I'm used to working in.

And I'll give it my best shot, but this is still pretty unfamiliar ground for me. I wouldn't even want to attach a number to my probability of success with it.
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

Didn't get a chance to work on it last night, and if I do get a chance tonight it won't be enough time to do much.

Have you made any progress yourself?
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

I'm looking for a compression algorithms used at before the year 2002 (when the game released) to find the source and check the compression rate to compare with decompiled algorithm structure..
Because of compression rate is better than gzip/deflate(zlib), I'm guessing 2 or more algorithm combination (like a LZ77 + Huffman + Dictionary).

I'm also checking suspected algorithm like Arithmetic Coding / Byte Pair Encoding.
Interestingly, some of decompressed script data uses BPE compression. (Check here: http://goo.gl/5ztlRu)
It could be related with original one. So I'm deeply analysing too.

Wulf wrote:Didn't get a chance to work on it last night, and if I do get a chance tonight it won't be enough time to do much.

Have you made any progress yourself?
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

Last night, I got that compressed data loaded from register 28 (r28) and decompressed byte stored at register 25 (r25) by debugging.
When the breakpoint set to 8013fca8, loaded data stored at r28, and when break through the 80140688, the data stored at memory (c.g. 805ac3e0)
Um... I almost arrived at goal... Right?


Anyway, I uploaded decompiled routine graph here.

[English ver routine 80144F48-80147E64 (Entry: 80146110)] http://goo.gl/oc78ja
[Japanese ver routine 8013F29C-801419CC (Entry: 8013Fce8)] http://goo.gl/rJlS9J
[English ver routine 80146110-80148EB4 (Entry: 80146110)] http://goo.gl/X0elVU
ngc_kor
Posts: 11
Joined: Tue Oct 28, 2014 4:11 am

Re: Identify Unknown Compression

Post by ngc_kor »

Upload debugging (register / breakpoint) data: http://goo.gl/hm2Q1z


p.s Can anyone help me to compile decompress routine program?
Argonaut
Posts: 46
Joined: Sat Sep 27, 2014 10:24 pm

Re: Identify Unknown Compression

Post by Argonaut »

Would you by any chance be interested in cracking another unknown compression type which would unlock the data for five games? Just a thought, no problem in taking alook:

http://forum.xentax.com/viewtopic.php?f=21&t=12133

Thanks (even if you don't check it out)
Wulf
Posts: 49
Joined: Mon Oct 27, 2014 8:30 pm

Re: Identify Unknown Compression

Post by Wulf »

ngc, I still haven't given it my best shot but I haven't been able to get Dolphin into any sort of debug setup that I'm comfortable working in. I'm not giving up yet, but if I can't figure out how to set it up how I need, I won't be able to figure it out.

Argonaut, if that was directed at me then I've got too many projects going on to take a look. If it wasn't, you'll probably have better luck creating a new topic.

edit: Just read viewtopic.php?p=1829#p1921 closer, that definitely seems like you're close. I'll take another crack at it tonight, focusing on that area.
Are you working with the US or JPN version mainly? Also, what did you use to generate those charts? They look pretty slick.