New World .datasheet file format

Extraction and unpacking of game archives and compression, encryption, obfuscation, decoding of unknown files
Samael
Posts: 2
Joined: Fri Jul 09, 2021 2:41 pm

New World .datasheet file format

Post by Samael »

Hi !

I have troubles understanding these files containing tables of data. So far I understood this :

Code: Select all

@offset 0x18   4B   data size
@offset 0x38   4B   header size
@offset 0x44   4B   number of columns
@offset 0x48   4B   number of lines
@offset 0x5c   start of the header data

The header data contains offsets relative to the beginning of the data block (@offset 0x3c + header size).

Code: Select all

4B   some kind of ID
4B   offset of what I believe is the name of the table

Then the columns' title are represented on 12B as follow :
4B   1 or 2, I believe that 2 means that the column is not used
4B   some kind of ID
4B   offset of the end of text, the beginning is the previous offset

The last entry is only on 8B because there's no ID :
4B   1 or 2
4B   last offset

Directly after are the offsets of the actual cell data and that's what I don't understand.
On a very easy file where all the columns are used (1) the data is indicated as follows :
4B start offset
4B end offset

But things get more complicated when some columns are unset. I give you some of the smallest files if you're interested in making sense of these. :)

https://mega.nz/file/4XxijCTJ#za049-ZPhni5UHjuJrAbFob3eiDWPeyNCKpl0X4pI7M
togogo
Posts: 4
Joined: Tue Jul 20, 2021 7:31 am

Re: New World .datasheet file format

Post by togogo »

I'm not sure if you've made progress since then but I have found that headers can also be marked 3.

Looking through the cell data, a 1 in the column means that the data is found by the offsets in the data tables like you said. But from what I've found a 2 represents a float in that cell and a 3 represents an integer in that cell.
badmp3
Posts: 5
Joined: Tue Jul 20, 2021 11:10 am

Re: New World .datasheet file format

Post by badmp3 »

Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...
togogo
Posts: 4
Joined: Tue Jul 20, 2021 7:31 am

Re: New World .datasheet file format

Post by togogo »

badmp3 wrote:Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.
badmp3
Posts: 5
Joined: Tue Jul 20, 2021 11:10 am

Re: New World .datasheet file format

Post by badmp3 »

togogo wrote:
badmp3 wrote:Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.


And the message your saying is scrambled also... why cant this be EZ like the Pak Unpacker script for quickbms?
newworld411
Posts: 2
Joined: Wed Jul 21, 2021 6:14 pm

Re: New World .datasheet file format

Post by newworld411 »

What editor/tool can you use to open these datasheet-files? does anybody know. Appreciate an answer!
togogo
Posts: 4
Joined: Tue Jul 20, 2021 7:31 am

Re: New World .datasheet file format

Post by togogo »

newworld411 wrote:What editor/tool can you use to open these datasheet-files? does anybody know. Appreciate an answer!


You can use HxD to read the binaries, far as I know there are no posted scripts/tools to read these files in a readable format without writing your own.
togogo
Posts: 4
Joined: Tue Jul 20, 2021 7:31 am

Re: New World .datasheet file format

Post by togogo »

badmp3 wrote:
togogo wrote:
badmp3 wrote:Anyone got a update on this? New World Closed Beta just started and hit the same block, the datasheets are all scrambled up...

The data is not scrambled, it is saved in a way described above. Using the original post and the information provided about the data types you should be able to figure out how it is formatted to extract the data.


And the message your saying is scrambled also... why cant this be EZ like the Pak Unpacker script for quickbms?


Why can't this be easy? That's because its firstly a new game and secondly you can't expect everyone to give everything to you. The information in this thread is enough to get started on parsing these files with your scripts. QuickBMS is not used for reading binaries, it is for unpacking files. It's like using a screwdriver on a nail, its not the right tool. You need to learn the basics if you cannot understand how to even start.
Ethal
Posts: 1
Joined: Sun Jul 25, 2021 3:37 pm

Re: New World .datasheet file format

Post by Ethal »

Well, I'm clearly bumping my head on this one as well.

Once one thing makes sense and other doesn't :twisted:

I will keep trying.
badmp3
Posts: 5
Joined: Tue Jul 20, 2021 11:10 am

Re: New World .datasheet file format

Post by badmp3 »

Ethal wrote:Well, I'm clearly bumping my head on this one as well.

Once one thing makes sense and other doesn't :twisted:

I will keep trying.


Make a similar thread on the New World subreddit, tell people we have the way to unpack but are stalled at the database stuff...

someone with a programmer skill set there would prob help out with this whole binary stuff..
Lord Vaako
Posts: 26
Joined: Tue Oct 17, 2017 7:36 pm

Re: New World .datasheet file format

Post by Lord Vaako »

I'm not sure about the floats but I think I'm close ;-). Can someone look at the attached examples to see if the converted data looks reasonable?
Image
retriton
Posts: 1
Joined: Sun Aug 01, 2021 7:38 am

Re: New World .datasheet file format

Post by retriton »

@Lord Vaako that does look pretty close! Seems workable to me at least.

Does anyone have any idea what these files are: https://i.imgur.com/b4FR1oK.png

They look like database files but they are completely unreadable (unlike the datasheet files)
Vag
Posts: 1
Joined: Mon Aug 02, 2021 9:49 am

Re: New World .datasheet file format

Post by Vag »

@Lord Vaako, you should be close. Row 29, Quest Battle Embrace. Here are 2 screenshots from 2 other databases. Maybe they can help.
Samael
Posts: 2
Joined: Fri Jul 09, 2021 2:41 pm

Re: New World .datasheet file format

Post by Samael »

It looks pretty close to me. How did you extract the values whose column type is 1 ? I didn't get how the two offsets works. The type 2 column is indeed a float and the value is directly accessible in the header, so there's no need to extract it.
Lord Vaako
Posts: 26
Joined: Tue Oct 17, 2017 7:36 pm

Re: New World .datasheet file format

Post by Lord Vaako »

Samael wrote:It looks pretty close to me. How did you extract the values whose column type is 1 ? I didn't get how the two offsets works. The type 2 column is indeed a float and the value is directly accessible in the header, so there's no need to extract it.


I used just one (start), strings are separated / terminated by zeros
"strings block" starts at 0x3c + header size

I start parsing header at 0x3c + 0x24

cols * 12bytes (offset to column_name, column type, unknown)

then

rows * cols * 8bytes (depending on the column type, it is a float / int value or an offset)
Kattoor
Posts: 3
Joined: Sun Aug 08, 2021 3:42 pm

Re: New World .datasheet file format

Post by Kattoor »

I'm stuck at extracting the data from the actual cells.
Attached is a small datasheet (had to append the .txt extension for it to upload).

Headers:

Code: Select all

72 55 23 CC 
1C 00 00 00    (FactionType)
01 00 00 00

A6 CC 1D C7
28 00 00 00    (DisplayName)
01 00 00 00

F4 1E 8B 92
34 00 00 00    (DisplayDescription)
01 00 00 00

FD 4A 83 B2
40 00 00 00    (ForegroundColorIndex)
02 00 00 00

57 E1 64 86
55 00 00 00    (ForegroundCrestIndex)
02 00 00 00

5D A5 C6 E8
6A 00 00 00    (BackgroundColorIndex)
02 00 00 00    2

F7 0E 21 DC
7F 00 00 00    (BackgroundCrestIndex)
02 00 00 00    2


The data starts at 0x190.
For type 1 columns I can just read 12 bytes of data.

First column, FactionType:

Code: Select all

94 00 00 00 
94 00 00 00
99 00 00 00

0x190+0x94 to 0x190+0x99 reads 'None'. This is correct. I'm not sure why 0x94 (the start offset) is here two times..?

Second column, DisplayName:

Code: Select all

99 00 00 00
AA 00 00 00
AA 00 00 00

0x190+0x99 to 0x190 + 0xaa reads '@ui_tooltip_none'. This is also correct. I'm not sure why 0xaa (the end offset) is here two times..?

So for column 1 there was a duplicate pointer for the start offset, and now for column 2 there is a duplicate pointer for the end offset? What am I missing :(

Column 3 is even weirder:

Code: Select all

AB 00 00 00
00 00 00 00
AB 00 00 00

Why is the start offset == the end offset? Why is there a 00 00 00 00 pointer?

Now for the last 4 columns (these are type 2 columns) I have 5*4 bytes left:

Code: Select all

00 00 00 00 
AB 00 00 00
00 00 00 00
AB 00 00 00
00 00 00 00

Can't really make sense of these pointers / this data..

I'm probably misinterpreting the data. Could someone help me please?
Sir Kane
Posts: 16
Joined: Sun Mar 27, 2016 7:20 pm

Re: New World .datasheet file format

Post by Sir Kane »

Code: Select all

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>


enum eType : uint32_t
{
   Type_String = 1,
   Type_Float,
   Type_Bool,
};


struct SStringValue
{
   uint32_t   hash;
   uint32_t   offset;
};


union SCellValue
{
   float      floatValue;
   uint32_t   offset;
   bool      boolValue;
};

struct SCell
{
   uint32_t   stringOffset;
   SCellValue   value;
};

struct SColumnHeader
{
   SStringValue   name;
   eType         type;
};

struct STable
{
   SStringValue   outputName;
   uint32_t      columnCount;
   uint32_t      rowCount;
   SColumnHeader*   pColumnHeaders;
   SCell*         pCells;
};

struct SDataSheet
{
   static constexpr uint32_t MagicVal = 12;
   uint32_t magic;
   SStringValue   datasheetName;
   SStringValue   dataTypeName;
   uint32_t      tableCount;
   uint32_t      stringDataSize;
   uint32_t*      pTableEnds;
   byte_t*         pTables;
   char*         pStrings;
};

inline STable* GetTable(SDataSheet* pDataSheet, uint32_t index)
{
   if (index == 0)
   {
      return (STable*)pDataSheet->pTables;
   }
   else
   {
      return (STable*)(pDataSheet->pTables + pDataSheet->pTableEnds[index - 1]);
   }
}

inline const STable* GetTable(const SDataSheet* pDataSheet, uint32_t index)
{
   if (index == 0)
   {
      return (const STable*)pDataSheet->pTables;
   }
   else
   {
      return (const STable*)(pDataSheet->pTables + pDataSheet->pTableEnds[index - 1]);
   }
}

const SDataSheet* LocateDataSheet(void* pData)
{
   SDataSheet* pDataSheet = (SDataSheet*)pData;
   byte_t* pCur = (byte_t*)(pDataSheet + 1);

   if (pDataSheet->magic != SDataSheet::MagicVal)
   {
      return nullptr;
   }

   pDataSheet->pTableEnds = (uint32_t*)pCur;
   if (pDataSheet->tableCount != 0)
   {
      pCur += sizeof(uint32_t) * pDataSheet->tableCount;
      pDataSheet->pTables = pCur;

      for (uint32_t i = 0; i < pDataSheet->tableCount; ++i)
      {
         STable* pTable = GetTable(pDataSheet, i);
         pCur += sizeof(STable);
         pTable->pColumnHeaders = (SColumnHeader*)(pCur);
         pCur += sizeof(SColumnHeader) * pTable->columnCount;
         pTable->pCells = (SCell*)(pCur);
         pCur += sizeof(SCell) * pTable->columnCount * pTable->rowCount;
      }
      pDataSheet->pStrings = (char*)pCur;
      return pDataSheet;
   }
   else
   {
      return nullptr;
   }
}

void* LoadToMem(const char* pPath)
{
   //FILE* pFile = fopen(pPath, "rb");
   FILE* pFile;
   if (fopen_s(&pFile, pPath, "rb") != 0)
   {
      return nullptr;
   }
   if (pFile == nullptr)
   {
      return nullptr;
   }
   fseek(pFile, 0, SEEK_END);
   long size = ftell(pFile);
   fseek(pFile, 0, SEEK_SET);
   void* pBuffer = malloc(size_t(size));
   if (pFile == nullptr)
   {
      fclose(pFile);
      return nullptr;
   }
   if (fread(pBuffer, size_t(size), 1, pFile) != 1)
   {
      free(pBuffer);
      fclose(pFile);
      return nullptr;
   }
   fclose(pFile);
   return pBuffer;
}

void Test()
{
   void* pData = LoadToMem("javelindata_perks.datasheet");
   if (pData == nullptr)
   {
      return;
   }
   const SDataSheet* pDataSheet = LocateDataSheet(pData);
   if (pDataSheet != nullptr)
   {
      for (uint32_t i = 0; i < pDataSheet->tableCount; ++i)
      {
         const STable* pTable = GetTable(pDataSheet, i);

         const SColumnHeader* pHeaders = pTable->pColumnHeaders;
         for (uint32_t j = 0; j < pTable->columnCount; ++j)
         {
            if (j > 0)
            {
               printf(",");
            }
            printf("%s", pDataSheet->pStrings + pHeaders[j].name.offset);
         }
         printf("\n");
         const SCell* pCells = pTable->pCells;
         for (uint32_t j = 0; j < pTable->rowCount; ++j)
         {
            const SCell* pRow = pCells + (j * pTable->columnCount);
            for (uint32_t k = 0; k < pTable->columnCount; ++k)
            {
               if (k > 0)
               {
                  printf(",");
               }
               switch (pHeaders[k].type)
               {
               case Type_String:
               {
                  printf("%s", pDataSheet->pStrings + pRow[k].value.offset);
                  break;
               }
               case Type_Float:
               {
                  printf("%G", pRow[k].value.floatValue);
                  break;
               }
               case Type_Bool:
               {
                  printf("%s", pRow[k].value.boolValue ? "true" : "false");
                  break;
               }
               }
            }
            printf("\n");
         }

      }
   }
   free(pData);
}
int main(int argc, const char*const*argv)
{
   Test();
   return 0;
}
TrainMan
Posts: 1
Joined: Mon Aug 09, 2021 12:04 am

Re: New World .datasheet file format

Post by TrainMan »

Kattoor wrote:Can't really make sense of these pointers / this data..

I'm probably misinterpreting the data. Could someone help me please?


I think you're off by 4 bytes when reading the columns, and you only need to read 8 bytes when reading the row/cell data.


Lord Vaako wrote:rows * cols * 8bytes (depending on the column type, it is a float / int value or an offset)


How are you checking if a cell is empty? For type 1 cols seems like both the start/end offset are the same, althought not always and thats whats throwing me off atm.
Kattoor
Posts: 3
Joined: Sun Aug 08, 2021 3:42 pm

Re: New World .datasheet file format

Post by Kattoor »

Sir Kane wrote:

Code: Select all

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
...


Thank you!!

For anyone else like me without a background in C, here's my working Node.js version for datasheet parsing:

https://gist.github.com/Kattoor/50155a2 ... 9def622b27
Soller
Posts: 1
Joined: Fri Aug 06, 2021 8:52 pm

Datasheet Header

Post by Soller »

Rust Version:

Code: Select all

use binread::BinReaderExt;
use binread::{
 derive_binread,
 io::{Cursor, Read, Seek, SeekFrom},
 BinRead, BinResult, NullString, ReadOptions,
};
// use serde::*;
use serde_json::Result as JsonResult;
use std::collections::HashMap;

fn get_string<R, BR, A>(reader: &mut R, ro: &ReadOptions, args: A) -> BinResult<BR>
where
 R: Read + Seek,
 BR: BinRead<Args = A>,
 A: Copy + 'static,
{
 let _pos = reader.seek(SeekFrom::Start(ro.offset))?;
 BR::read_options(reader, &ro, args)
}

#[derive_binread]
#[derive(Debug, Clone, Copy)]
pub struct DatasheetHeader {
 revision: u32,
 unknown1: u32,
 unique_id_offset: u32,
 unknown2: u32,
 type_offset: u32,
 row_number: u32,
 plain_text_length: u32,
 unknown3: u32,
 unknown4: u32,
 unknown5: u32,
 unknown6: u32,
 unknown7: u32,
 unknown8: u32,
 unknown9: u32,
 #[br(temp)]
 _plain_text_offset: u32,
 #[br(calc = 60 + _plain_text_offset)]
 plain_text_offset: u32,
 header_sig: u32,
 unknown10: u32,
 pub columns: u32,
 pub rows: u32,
 unknown11: u32,
 unknown12: u32,
 unknown13: u32,
 unknown14: u32,
}

#[derive_binread]
#[derive(Debug, Clone)]
#[br(import(data_offset: u32))]
pub struct DatasheetColumn {
 unknown15: u32,
 #[br(temp)]
 _column_name_offset: u32,
 #[br(calc = data_offset + _column_name_offset)]
 column_name_offset: u32,
 column_type: u32,
 #[br(parse_with = get_string, offset=column_name_offset as u64)]
 #[br(restore_position)]
 pub column_name: NullString,
}

#[derive_binread]
#[derive(Debug, Clone)]
#[br(import(data_offset: u32))]
pub struct DatasheetRow {
 #[br(temp)]
 _row_value_offset: u32,
 #[br(calc = data_offset + _row_value_offset)]
 row_value_offset: u32,
 row_value_or_something: u32,
 #[br(parse_with = get_string, offset=row_value_offset as u64)]
 #[br(restore_position)]
 pub value: NullString,
}

#[derive(Debug, BinRead, Clone)]
#[br(import(column_count: u32, data_offset: u32))]
#[br(assert(row.len() as u32 == column_count))]
pub struct DatasheetRows {
 #[br(count = column_count)]
 #[br(args(data_offset))]
 pub row: Vec<DatasheetRow>,
}

#[derive(Debug, BinRead)]
#[br(assert(columns.len() as u32 == header.columns))]
#[br(assert(rows.len() as u32 == header.rows))]
pub struct Datasheet {
 pub header: DatasheetHeader,
 #[br(args(header.plain_text_offset))]
 #[br(count = header.columns)]
 pub columns: Vec<DatasheetColumn>,
 #[br(count = header.rows)]
 #[br(args(header.columns, header.plain_text_offset))]
 pub rows: Vec<DatasheetRows>,
}

#[allow(dead_code)]
pub struct DatasheetParser {
 pub datasheet: Datasheet,
}

#[allow(dead_code)]
impl DatasheetParser {
 pub fn to_json(&self) -> JsonResult<String> {
  let _json = serde_json::to_string_pretty(&self.get_data())?;
  Ok(_json)
 }
 pub fn to_xml(&self) -> anyhow::Result<String> {
  let xml = quick_xml::se::to_string(&self.get_data()).unwrap();
  Ok(xml)
 }

 pub fn get_data(&self) -> Vec<HashMap<String, String>> {
  let columns: Vec<String> = self
   .datasheet
   .columns
   .iter()
   .map(|c| c.column_name.clone().into_string())
   .collect();
  let mut combined: Vec<HashMap<_, _>> = Vec::new();
  for n in &self.datasheet.rows {
   let row_data: Vec<String> = n
    .row
    .iter()
    .map(|v| v.value.clone().into_string())
    .collect();
   let data: HashMap<_, _> = columns
    .clone()
    .into_iter()
    .zip(row_data.into_iter())
    .collect();
   combined.push(data)
  }
  combined
 }
 pub fn new(file: Vec<u8>) -> DatasheetParser {
  DatasheetParser {
   datasheet: Cursor::new(file).read_le().unwrap(),
  }
 }
}


Typescript:

Code: Select all

/* eslint-disable unicorn/filename-case */
import { Parser } from "binary-parser";

let textOffset = 0;
const DataSheetColumn = Parser.start()
  .uint32le("unknown")
  .uint32le("column_name_offset")
  .uint32le("type");

const DataSheetData = Parser.start().array(null, {
  type: Parser.start().array(null, {
    type: Parser.start()
      .uint32le("value_offset")
      .uint32le("offset_or_something"),
    formatter: (val) => {
      return val.map((v) => v.value_offset);
    },
    length: (items) => items.column_count,
  }),
  length: (items) => items.row_count,
});

const DataSheetRow = Parser.start()
  .saveOffset("offset", { formatter: (val) => val - textOffset })
  .string("value", { zeroTerminated: true });

const GetString = Parser.start().string(null, { zeroTerminated: true });

const DataSheetHeader = new Parser()
  .uint32le("revision")
  .uint32le("unknown1")
  .uint32le("unique_id_offset")
  .uint32le("unknown2")
  .uint32le("type_offset")
  .uint32le("row_number")
  .uint32le("plain_text_length")
  .uint32le("unknown3")
  .uint32le("unknown4")
  .uint32le("unknown5")
  .uint32le("unknown6")
  .uint32le("unknown7")
  .uint32le("unknown8")
  .uint32le("unknown9")
  .uint32le("plain_text_offset", {
    formatter: (x) => {
      const offs = (x as number) + 60;
      textOffset = offs;
      return offs;
    },
  })
  .uint32be("header_sig", {
    formatter: (val) => `0x${val.toString(16).toUpperCase()}`,
  })
  .uint32le("unknown10")
  .uint32le("column_count")
  .uint32le("row_count")
  .uint32le("unknown11")
  .uint32le("unknown12")
  .uint32le("unknown13")
  .uint32le("unknown14")
  .pointer("unique_id", {
    type: GetString,
    offset: (items) => {
      return items.plain_text_offset + items.unique_id_offset;
    },
  })
  .pointer("data_type", {
    type: GetString,
    offset: (items) => {
      return items.plain_text_offset + items.type_offset;
    },
  })
  .array("column_names", {
    type: DataSheetColumn,
    length: (items) => {
      return items.column_count;
    },
    formatter: (items) => {
      return items.map((item) => item.column_name_offset);
    },
  })
  .nest("rows", {
    type: DataSheetData,
  })
  .array("plain_text", {
    type: DataSheetRow,
    readUntil: "eof",
    zeroTerminated: true,
    key: "offset",
  });

export function parseDataSheet(data: Buffer) {
  return new Parser().nest(null, { type: DataSheetHeader }).parse(data);
}



And I've attached a https://kaitai.io/ struct file as well. Have fun