Patient, possibly infected with "lz77" is in need of treatment!

Programming related discussions related to game research
Endymion
Posts: 2
Joined: Wed Mar 20, 2019 7:50 pm

Patient, possibly infected with "lz77" is in need of treatment!

Post by Endymion »

Hey guys. I have a request that would probably be pretty easy for some of you, so would be cool if you can assist!
I have a small file from PS3. It looks like lz77 but I was not able to properly decompress it with the algorithms I've found online.

So here are my questions:
Is this really lz77 or it is a common variation that has some specific name?
Is there a code somewhere that can properly decompress this file? Preferably not C++, "bare" and as a single function. I'd like to see a code without any frills not required to perform the actual decompression because I am mainly interested in understanding how to decompress this file.

Bonus question:
If decompression solution will be found, in case I want to create a fake-compressed file that would work when fed to a compliant decoder, what would a stream of such a file look like?
Would it be something like this: [uncompsize-long]FFdata-here-data-hereFFdata-here-data-here-and-more-dataFFeven-more-data ?
I'd like to know what exactly these "inserted" bytes would be, would they always be the same, how often they should occur and what are the "rules" of adding them in general?

Additional info:
Provided file has a header of 4 bytes that looks like an uncompressed size.
I don't know where the actual compressed data starts. In other words I am not 100% positive that there is no some kind of additional header beyond 1st 4 bytes and what may look like a header may in fact be the header of an uncompressed file and thus not relevant.
The very last human-readable sentence in file should be something like this "ことも好き" (sjis). I suspect that if it would be possible to obtain it, the result can be declared correct.

EDIT:never mind I am stoopid. It is not lz77, it is classical lzss. Still curious about bonus question though!
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: Patient, possibly infected with "lz77" is in need of treatment!

Post by aluigi »

Exist many implementations for all the various compression algorithms.
Basically may exist 1000 ways for using lz77, all incompatible between them.

Regarding the bonus question, if the algorithm is byte-based it's very easy to write a fake compressor just by using the flags for non-compressed data followed by the original data as-is.
It depends by the flags and how they are used in the algorithm, so there is no general rule.

In some algorithms 0xff does the job as flag:

Code: Select all

int moh_lzss_compress_fake(unsigned char *in, int insz, unsigned char *out) {
    QUICK_IN_OUT2
    unsigned char   b = 0;
    while(in < inl) {
        b <<= 1;
        if(!b) {
            *o++ = 0xff;
            b = 1;
        }
        *o++ = *in++;
    }
    return o - out;
}


In RLE algorithms you need to use a specific byte (maybe 0x80) followed by the original data byte, or a certain range.

In other algorithms a certain amount of bits of that byte are the length and the remaining ones are the flag:

Code: Select all

int dr_enc_fake(unsigned char *in, int insz, unsigned char *out) {
    unsigned char   *inl = in + insz;
    unsigned char   *o = out;

    while(in < inl) {
        int     len = 0x1f;
        if((in + len) >= inl) len = inl - in;
        unsigned char flags = 0 | len;
        *o++ = flags;
        memcpy(o, in, len);
        o  += len;
        in += len;
    }
    return o - out;
}


And so on...

For bit-based algorithms... good luck :D