Unknown Compression

Extraction and unpacking of game archives and compression, encryption, obfuscation, decoding of unknown files
rengareng
Posts: 46
Joined: Thu Aug 14, 2014 3:29 am

Unknown Compression

Post by rengareng »

I have a compressed file, and a potential uncompressed file. It looks like a custom compression. I used comtype_scan.bat without any result.
The uncompressed file is from PC version of the same file, and compressed file is from the PS4 version of the same file.
The uncompressed size of PS4 version is 37066.
LokiReborn
Posts: 190
Joined: Fri Aug 26, 2016 3:11 pm

Re: Unknown Compression

Post by LokiReborn »

rengareng wrote:I have a compressed file, and a potential uncompressed file. It looks like a custom compression. I used comtype_scan.bat without any result.
The uncompressed file is from PC version of the same file, and compressed file is from the PS4 version of the same file.
The uncompressed size of PS4 version is 37066.


this looks more like a deflation algorithm rather than traditional compression (usually part of but not only component in compression), a lot of the text is still very readable which makes me think there's no dictionary to start with. You'll see that it's back referencing bytes where letters were before and there are only 2 bytes at the start that don't align to the original file. I'll see if I can figure out which it is but I would look at deflate / inflate stuff rather than normal compression.

I'm thinking it's lz77, going to make a script to test.
rengareng
Posts: 46
Joined: Thu Aug 14, 2014 3:29 am

Re: Unknown Compression

Post by rengareng »

LokiReborn wrote:
rengareng wrote:I have a compressed file, and a potential uncompressed file. It looks like a custom compression. I used comtype_scan.bat without any result.
The uncompressed file is from PC version of the same file, and compressed file is from the PS4 version of the same file.
The uncompressed size of PS4 version is 37066.


this looks more like a deflation algorithm rather than traditional compression (usually part of but not only component in compression), a lot of the text is still very readable which makes me think there's no dictionary to start with. You'll see that it's back referencing bytes where letters were before and there are only 2 bytes at the start that don't align to the original file. I'll see if I can figure out which it is but I would look at deflate / inflate stuff rather than normal compression.

I'm thinking it's lz77, going to make a script to test.

I saw similar compression in some EA games like here: http://wiki.niotso.org/RefPack
Problem is that I don’t know the way they encoded distance length pairs. I’d use a disassembler if it was in PC, rather than in PS4.
LokiReborn
Posts: 190
Joined: Fri Aug 26, 2016 3:11 pm

Re: Unknown Compression

Post by LokiReborn »

rengareng wrote:
LokiReborn wrote:
rengareng wrote:I have a compressed file, and a potential uncompressed file. It looks like a custom compression. I used comtype_scan.bat without any result.
The uncompressed file is from PC version of the same file, and compressed file is from the PS4 version of the same file.
The uncompressed size of PS4 version is 37066.


this looks more like a deflation algorithm rather than traditional compression (usually part of but not only component in compression), a lot of the text is still very readable which makes me think there's no dictionary to start with. You'll see that it's back referencing bytes where letters were before and there are only 2 bytes at the start that don't align to the original file. I'll see if I can figure out which it is but I would look at deflate / inflate stuff rather than normal compression.

I'm thinking it's lz77, going to make a script to test.

I saw similar compression in some EA games like here: http://wiki.niotso.org/RefPack
Problem is that I don’t know the way they encoded distance length pairs. I’d use a disassembler if it was in PC, rather than in PS4.


Ya I don't think it will be that difficult, the part that's screwing me up right now is that the byte order seems to be big endian, i would try rerunning the regular script with that set.
rengareng
Posts: 46
Joined: Thu Aug 14, 2014 3:29 am

Re: Unknown Compression

Post by rengareng »

It looks easy, but still, I don't have any idea for the actual encoding.
This compression used in fat/dat files of Watch Dogs PS4 version.
rengareng
Posts: 46
Joined: Thu Aug 14, 2014 3:29 am

Re: Unknown Compression

Post by rengareng »

LokiReborn wrote:
rengareng wrote:
LokiReborn wrote:
this looks more like a deflation algorithm rather than traditional compression (usually part of but not only component in compression), a lot of the text is still very readable which makes me think there's no dictionary to start with. You'll see that it's back referencing bytes where letters were before and there are only 2 bytes at the start that don't align to the original file. I'll see if I can figure out which it is but I would look at deflate / inflate stuff rather than normal compression.

I'm thinking it's lz77, going to make a script to test.

I saw similar compression in some EA games like here: http://wiki.niotso.org/RefPack
Problem is that I don’t know the way they encoded distance length pairs. I’d use a disassembler if it was in PC, rather than in PS4.


Ya I don't think it will be that difficult, the part that's screwing me up right now is that the byte order seems to be big endian, i would try rerunning the regular script with that set.


I found the algorithm. If I delete the first byte, I can use the following quickbms script to unpack the compressed file:

Code: Select all

comtype lz77ea_970
get SIZE asize
get NAME filename
string NAME += ".unpacked"
clog NAME 0 SIZE 10000000

I don't know the purpose of the first byte. Any guess?
Unfortunately, it failed for the attached file after deleting first byte. First byte could be some options for the algorithm. Uncompressed size should be 103475.
However, it extracts to 94550.
LokiReborn
Posts: 190
Joined: Fri Aug 26, 2016 3:11 pm

Re: Unknown Compression

Post by LokiReborn »

rengareng wrote:
LokiReborn wrote:
rengareng wrote:I saw similar compression in some EA games like here: http://wiki.niotso.org/RefPack
Problem is that I don’t know the way they encoded distance length pairs. I’d use a disassembler if it was in PC, rather than in PS4.


Ya I don't think it will be that difficult, the part that's screwing me up right now is that the byte order seems to be big endian, i would try rerunning the regular script with that set.


I found the algorithm. If I delete the first byte, I can use the following quickbms script to unpack the compressed file:

Code: Select all

comtype lz77ea_970
get SIZE asize
get NAME filename
string NAME += ".unpacked"
clog NAME 0 SIZE 10000000

I don't know the purpose of the first byte. Any guess?
Unfortunately, it failed for the attached file after deleting first byte. First byte could be some options for the algorithm. Uncompressed size should be 103475.
However, it extracts to 94550.


The first 3 bytes of this file are FEU (this is the file extension, and implying it's probably safe to be it's magic number) so I don't think there is anything in front of it to remove, as for the other file even in the LUA name the -pc was removed so I'm not sure they're the exact same file, if we go on that premise it could be something for the LUA script itself and maybe not garbage data? So maybe try with removing nothing or if that doesn't work understand what that lz77 variation is doing better? I might be offbase but usually the simpler things are the correct ones.

Edit:
Actually looking at the FEU file again I'm seeing multiple repeated strings without LZ style compression, you may already have the file in its correct form.
rengareng
Posts: 46
Joined: Thu Aug 14, 2014 3:29 am

Re: Unknown Compression

Post by rengareng »

LokiReborn wrote:
rengareng wrote:
LokiReborn wrote:
Ya I don't think it will be that difficult, the part that's screwing me up right now is that the byte order seems to be big endian, i would try rerunning the regular script with that set.


I found the algorithm. If I delete the first byte, I can use the following quickbms script to unpack the compressed file:

Code: Select all

comtype lz77ea_970
get SIZE asize
get NAME filename
string NAME += ".unpacked"
clog NAME 0 SIZE 10000000

I don't know the purpose of the first byte. Any guess?
Unfortunately, it failed for the attached file after deleting first byte. First byte could be some options for the algorithm. Uncompressed size should be 103475.
However, it extracts to 94550.


The first 3 bytes of this file are FEU (this is the file extension, and implying it's probably safe to be it's magic number) so I don't think there is anything in front of it to remove, as for the other file even in the LUA name the -pc was removed so I'm not sure they're the exact same file, if we go on that premise it could be something for the LUA script itself and maybe not garbage data? So maybe try with removing nothing or if that doesn't work understand what that lz77 variation is doing better? I might be offbase but usually the simpler things are the correct ones.

Edit:
Actually looking at the FEU file again I'm seeing multiple repeated strings without LZ style compression, you may already have the file in its correct form.


It's definitely LZ4 which is explained here (https://fastcompression.blogspot.com/20 ... ained.html). I've found that they used some tricks to have offsets >= 65536. Here is the template for 010Editor that worked for that file:

Code: Select all

// author: celikeins
// watch_dogs 1, cmp type 4 in fat/dat files
LittleEndian();

int read() {
    local int a = 0;
    do {
        struct { ubyte b; } n;
        a += n.b;
    } while (n.b == 0xFF);
    return a;
}
local int out = 0;
local int infile = GetFileNum();
local int outfile = FileNew();
local int i;

byte unknown;
while (!FEof()) {
    struct {
        local int outpos = out;
        ubyte hl;
        local int proceed = hl >> 4, copy = hl & 0x0F;
        if (proceed == 0x0F) {
            proceed += read();
        }
        if (proceed > 0) {
            ubyte proceed_from_input[proceed];
            FileSelect(outfile);
            WriteBytes(proceed_from_input, out, proceed);
            out += proceed;
            FileSelect(infile);
        }
        if (FEof()) { break; };
        ushort offset0;
        local int offset = offset0;
        // offset can be beyond 64KB
        if (offset >= 0xE000) {
            ubyte offset1;
            offset += offset1 * 0x2000;
        }
        Assert(offset > 0 && (out - offset) >= 0);
        if (copy == 0x0F) {
            copy += read();
        }
        copy += 4;
        FileSelect(outfile);
        // copy from output to output
        for (i = 0; i < copy; ++i) {
            WriteByte(out, ReadByte(out - offset));
            ++out;
        }
        FileSelect(infile);
    } block;
}

However, the problem is that some files have extra 2 bytes in the beginning (for example menu_selfshadow.xbt_compressed).
Another problem is that, in some files there is uncompressed data in the end after the LZ4 sequence blocks.
For example, in the attached barkconfig_37fd2f17.obj file last 0x49 bytes are not belong to any LZ4 sequence blocks.
I suspect, decompressor in the game knows when to stop decompressing using the extra bytes in the beginning of the file.
Example files are attached.