Insomniac Engine TOC & DAG files parsing

Extraction and unpacking of game archives and compression, encryption, obfuscation, decoding of unknown files
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

Hello ZenHax!

I'm trying to parse toc and dag files from Ratchet & Clank (PS4) and Sunset Overdrive.

I'll tell what i already know:
- Both Ratchet & Clank (PS4) and Sunset Overdrive uses the same engine (i'm guessing that, because there's a lot of similarities in these files (also filenames are the same))
- TOC file looks like that:

Code: Select all

struct TOC
{
   int Magic; // 0xAF12AF17 - in both of games
   int DecompressedTOCSize; // Decompressed size (in bytes) of CompressedTOC
   byte[FileSize - 8] CompressedTOC; // Till the end of the file (FileSize is the "psychical file size" - 8 because of 2 ints)
}

Note: This is not correct struct, i written that just to let you know what i think it looks like

CompressedTOC is an ZLib compressed data (starts with familiar header

Code: Select all

78 DA
)
After decompressing it, i saw another header "1TAD"

And then there is some data:

Code: Select all

        public struct TAD_Header
        {
            public int Unknown;
            public int FileEndOffset;
            public int Unknown2; // Always 6? (Version?)
            public int Unknown3;
            public int ArchiveFilesBlockOffset;
            public int Unknown4; // 480? What?
            public int Unknown5;
            public int Unknown6; // 1024? (Block size?)
            public int UnknownOffset; // Most probably offset
            public int Unknown7;
            public int UnknownOffset2; // Also most probably offset
            public int UnknownOffset3;
            public int Unknown8;
            public int UnknownOffset4;
            public int Unknown9; // 5628??
            public uint Unknown10;
            public int UnknownOffset5;
            public int UnknownOffset6;
            public uint Unknown11;
            public int Unknown12; // 112?
            public int Unknown13; // 912?
        }

Note: This is not correct struct, i written that just to let you know what i think it looks like
Note 2: Please ignore my comments :P

I didn't looked at dag file but it also seems that it also have Magic, *some data*, and compressed data.
After decompressing dag files i noticed filenames?

Somebody can help me with parsing that data? I'm making some tools to make modding of these games easy.

I almost forgotten, here's both decompressed & compressed files from both games: https://drive.google.com/file/d/1XmzV8iE2GazF9HwhaLldtGNPlgZw3kpO/view?usp=sharing

Also there's useful files not used in game but (i think) it's useful for us:
- layout.csv
- scan.csv

(I'll upload them if needed)

Files in asset_archive folder from Ratchet & Clank (PS4):
2018-01-22 03:13 3 219 804 160 g00s001
2018-01-22 03:13 127 696 896 a00s003.us
2018-01-22 03:13 205 807 616 a00s004.dk
2018-01-22 03:13 205 791 232 a00s005.nl
2018-01-22 03:13 206 594 048 a00s006.fi
2018-01-22 03:13 206 540 800 a00s007.fr
2018-01-22 03:13 207 151 104 a00s008.de
2018-01-22 03:13 206 016 512 a00s009.it
2018-01-22 03:13 219 938 816 a00s011.no
2018-01-22 03:13 221 827 072 a00s012.pl
2018-01-22 03:13 206 778 368 a00s013.pt
2018-01-22 03:13 205 426 688 a00s014.ru
2018-01-22 03:13 188 735 488 a00s015.es
2018-01-22 03:13 206 430 208 a00s016.se
2018-01-22 03:13 206 946 304 a00s019.ar
2018-01-22 03:13 218 836 992 a00s020.tr
2018-01-22 03:13 662 472 archive_input.json
2018-01-22 03:13 603 chunkmap.txt
2018-01-22 03:13 3 388 548 dag
2018-01-22 03:13 2 702 249 984 g00s000
2018-01-22 03:13 1 067 180 032 g00s002
2018-01-22 03:13 2 311 626 752 g01s000
2018-01-22 03:13 1 062 604 800 g02s000
2018-01-22 03:13 2 294 554 624 g03s000
2018-01-22 03:13 1 063 329 792 g04s000
2018-01-22 03:13 813 359 104 g05s000
2018-01-22 03:13 986 714 112 g06s000
2018-01-22 03:13 887 775 232 g07s000
2018-01-22 03:13 1 244 053 504 g08s000
2018-01-22 03:13 802 992 128 g09s000
2018-01-22 03:13 499 343 360 g10s000
2018-01-22 03:13 2 984 398 848 g11s000
2018-01-22 03:13 599 597 056 g11s001
2018-01-22 03:13 1 386 311 680 g12s000
2018-01-22 03:13 1 159 507 968 g13s000
2018-01-22 03:13 41 573 019 layout.csv
2018-02-14 15:16 2 826 240 p000035
2018-02-14 15:16 28 049 408 p000036
2018-02-14 15:16 2 060 288 p000037
2018-02-14 15:16 1 929 216 p000038
2018-02-14 15:16 1 765 376 p000039
2018-02-14 15:16 1 249 280 p000040
2018-02-14 15:16 712 704 p000041
2018-02-14 15:16 1 961 984 p000042
2018-02-14 15:16 585 728 p000043
2018-02-14 15:16 606 208 p000044
2018-02-14 15:16 1 118 208 p000045
2018-02-14 15:16 901 120 p000046
2018-02-14 15:16 684 032 p000047
2018-02-14 15:16 20 480 p000048
2018-01-22 03:13 26 997 867 scan.csv
2018-02-14 15:16 2 998 864 toc


Thank you in advance :)
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

Small update:
I noticed that . already made BMS script for Insomniac Engine "edge_of_nowhere.bms"

Based on that script i updated my structs

Code: Select all

        public struct TAD_Header
        {
            public uint ID;
            public int EndOffset;
            public int PartsCount;
            public TAD_Header_Part[] Parts;
        }

        public struct TAD_Header_Part
        {
            public uint ID;
            public int Offset;
            public int Size;
        }


I'm reading it via this code:

Code: Select all


        private void LoadHeader()
        {
            tocHeader.ID = compressedReader.ReadUInt32();
            tocHeader.EndOffset = compressedReader.ReadInt32();
            tocHeader.PartsCount = compressedReader.ReadInt32();
            tocHeader.Parts = new TAD_Header_Part[tocHeader.PartsCount];

            Console.WriteLine("TOC Header ID: " + tocHeader.ID);
            Console.WriteLine("TOC Header EndOffset: " + tocHeader.EndOffset);
            Console.WriteLine("TOC Header Parts: " + tocHeader.PartsCount);

            for (int i = 0; i != tocHeader.Parts.Length; i++)
            {
                Console.WriteLine("--- TOC Header Part " + i + " ---");

                tocHeader.Parts[i].ID = compressedReader.ReadUInt32();
                tocHeader.Parts[i].Offset = compressedReader.ReadInt32();
                tocHeader.Parts[i].Size = compressedReader.ReadInt32();

                Console.WriteLine("ID: " + tocHeader.Parts[i].ID);
                Console.WriteLine("Offset: " + tocHeader.Parts[i].Offset);
                Console.WriteLine("Size: " + tocHeader.Parts[i].Size);
            }
        }


It works very well, but has anybody parsed DAG file(s)?
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

Okay guys.

It seems that DAG file only lists files that have full-names, and are not generated by engine(?)
I noticed that every "readable" file(names, paths) are in gXXsXXX archives, NONE of the files (based on layout.csv) seems to be in audio files (aXXsXXX.LANG).

., you did great job in this script "edge_of_nowhere.bms", unfortunately you aren't parsing DAG files at all.

Somebody will help me?
I want to extract files with correct names.
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

Small update:
Audio files are listed in TOC files
And it seems that entry looks like that:

- ID of audio archive file?
- Name of audio file with .wem extension

Image
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

Okay,
I finally uploaded what i say "Help Files" which may can help you with parsing toc & dag files (archive_input.json, chunkmap.txt, layout.csv, scan.csv)

https://drive.google.com/file/d/1YlZEGj ... sp=sharing

Anybody can help?
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by aluigi »

Just downloaded the samples and tried the script.
The format seems correct and the script works if you rename toc_rac as toc.
The only thing I can add to the script is avoiding to open "TOC" if you already selected a toc* file, let me know if that helps.

Regarding the filenames, there are no names stored in the archive.
B1naryKill3r
Posts: 7
Joined: Mon Jan 28, 2019 9:16 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by B1naryKill3r »

., filenames and other stuff are stored in DAG file, but it seems that not every file have filename.

Some of the files (like audio files) have static path and their filename is basically built path saved as integer in layout.csv

As you now TOC & DAG files are splitted into parts, your script reads part 0 (archive names), part 2 (file sizes) and part 4 (file offsets & archive)

I realized that part 1 is built file names (the one listed in layout.csv) (also, remember the endianness)

Back to the topic: it seems that some file names are being generated by these built file names (these files arent listed in DAG file BUT it exists in layout.csv)

Also in your script you wrote that DAG file have 1/4 of the file names, I'm wondering why.

It seems that layout.csv is being generated BY these DAG & TOC files.

I know how to parse TOC files (even built files part (just read 8 bytes till end of the part)), but I want filenames (it much better than just xxxxxx.dat file, huh?)
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by aluigi »

Let me know if you come up with an updated script or other solutions.
redspike474
Posts: 7
Joined: Sat Apr 22, 2017 6:50 pm

Re: Insomniac Engine TOC & DAG files parsing

Post by redspike474 »

Anyone made any progress on this? can I extract any files from Sunset Overdrive in 2021?
even if they don't have filenames at least being able to dump the contents would be useful
Slappy
Posts: 12
Joined: Mon Mar 20, 2017 11:52 am

Re: Insomniac Engine TOC & DAG files parsing

Post by Slappy »

+1 for this format.
It looks like it is possible to export files from Spiderman PS4 which uses similar format viewtopic.php?t=11623