PS2 Textfiles in .dat File from Yakuza Game - possible to rip with QuickBMS ?

Extraction and unpacking of game archives and compression, encryption, obfuscation, decoding of unknown files
SoulPatrol
Posts: 13
Joined: Wed May 15, 2019 8:45 am

PS2 Textfiles in .dat File from Yakuza Game - possible to rip with QuickBMS ?

Post by SoulPatrol »

Hello,
im in need to extract this .dat Files from the very old Yakuza Playstation2 Game.
I searched this Page and Google, but did not found a working Script.
Im not sure if this Info is right or help, but was the closest i found :(

The compression is a LZ Renau

Code: Select all

/*------------------------------------------------------------------------------
 DAT file structure:
 - header
   + 4 bytes: number of files, low endian
   + for each file
     - 4 bytes: LBA, low endian
   + fill
     - aligned to 0x800 bytes, padded with 0x00
 - data
   + X bytes: encoded data
   + fill
     - aligned to 0x800 bytes, padded with 0x00
------------------------------------------------------------------------------*/

Code: Select all

/*------------------------------------------------------------------------------
 LZR file structure:
 - header
   + 2 bytes: signature, always "CM"
   + 4 bytes: decoded length, low endian
   + 4 bytes: encoded data length, low endian
 - data
   + X bytes: encoded data
   + Y bytes: flags
 - fill
   + aligned to 0x800 bytes, padded with 0x00

 'flag' data, 8 bits, right to left:
 - 0: uncompressed data, copy 1 byte to 'target'
 - 1: compressed data, copy 'length+3' bytes from 'target-offset-1' to 'target'
      + 12-bits offset, bits 0-11
      + 4-bits length, bits 12-15
------------------------------------------------------------------------------*/



Would be very nice if i can get some help, maybe someone allready worked on the PS2 Version ?
I attached 3 Files to Test for the USA Language.
To bad that Sega did not use the Language Files from the PS2 Game, as there are a lot different Languages then only English in the PC Version.

Thank you :)
Kaplas
Posts: 60
Joined: Fri Jan 25, 2019 2:47 pm

Re: PS2 Textfiles in .dat File from Yakuza Game - possible to rip with QuickBMS ?

Post by Kaplas »

Well, I've looking into this and that file struct doesn't fit with the attached files.

This is what I've found:

Code: Select all

Yakuza DAT header structure:
        uint fileCount; // Number of files, little endian
        uint[3] padding; // zeroes
       
        for each file:
        uint fileOffset;
        uint fileSize;
        uint fileIndex;
        char[4] fileSignature; // TLFD

Code: Select all

TLFD header structure:
        uint fileCount; // Number of files, little endian
        char[4] fileSignature; // TLFD
        uint[2] padding; // zeroes
       
        for each file:
        uint fileOffset; // relative to the TLFD start
        uint fileSize;
        uint[2] padding; // zeroes

Many of the TLFD files are empty, but some has 2 files inside: a "OCB" and a "AVLZ". I haven't looked into OCBs, but AVLZ are compressed files. The compression algorithm used is similar to the one you post:

Code: Select all

var uncompressedSize = data[4] + (data[5] << 8) + (data[6] << 16) + (data[7] << 24);
var compressedSize = data[8] + (data[9] << 8) + (data[10] << 16) + (data[11] << 24);

var output = new byte[uncompressedSize];

var processedBytes = 0;
var inputPosition = 0x0C;
var outputPosition = 0;

var flagCount = 0;
var flag = 0;

do
{
    if (flagCount == 0)
    {
        flag = data[inputPosition];
        inputPosition++;
        flagCount = 8;
    }

    if ((flag & 0x01) == 0x01)
    {
        flag = (byte)(flag >> 1);
        flagCount--;

        output[outputPosition] = data[inputPosition];
        inputPosition++;
        outputPosition++;
        processedBytes++;
    }
    else
    {
        flag = (byte)(flag >> 1);
        flagCount--;

        var copyFlags = (ushort)((data[inputPosition] << 8) | data[inputPosition + 1]);
        inputPosition += 2;

        var copyDistance = ??????;
        var copyCount = 3 + (copyFlags & 0xF);

        var i = 0;
        do
        {
            output[outputPosition] = 0xFF;
            outputPosition++;
            i++;
        } while (i < copyCount);

        processedBytes += copyCount;
    }
} while (processedBytes < uncompressedSize);


I haven't found how to calculate the "copyDistance" value. Maybe it uses a preset dictionary or other structure.

For example, OGRE4DIR.BIN contains small files compressed with this algorithm:
Image

As you can see, the first flag is 'F5' (11110101), so I copy the next byte (03) and the second bit is 0 so it is a 'jump and copy'. The copyflags are 'EBF4', so I get that the copy length is 7 (04+3), but I don't know what to do with 'EBF'. The uncompressed data should be: 03 00 00 00 03 00 00 00
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: PS2 Textfiles in .dat File from Yakuza Game - possible to rip with QuickBMS ?

Post by aluigi »

Isn't just lzss?
Kaplas
Posts: 60
Joined: Fri Jan 25, 2019 2:47 pm

Re: PS2 Textfiles in .dat File from Yakuza Game - possible to rip with QuickBMS ?

Post by Kaplas »

It's similar, but in lzss (as far as I know), seek offset can't go beyond the beggining of the uncompressed data.

In the example I wrote, I just have a '03' in the uncompressed data and the offset is 'EBF', so I think there has to be some calculation with this value.