Wrong obfuscation result on the files with size > 2 GB

grandshot · Post by **grandshot** » Wed Sep 21, 2022 1:48 pm

Hello
I'm trying to implement the file table obfuscation algo from Boiling Point Road to Hell game *.grp archives.
Pseudocode getted from one of engine *.dll working fine for archives smallest then 2 gb, but wrong with biggest.

I wrote this simply C++ script for demonstrate:

Code: Select all

#include <stdio.h>
#include <stdlib.h>

int main()
{
    unsigned int newgrpSize = 2859709129, newgrp1Size = 1168718524, initialValue = 47536;
    unsigned char data[22] = {7, 211, 11, 95, 141, 76, 227, 180, 107, 133, 207, 242, 88, 9, 168, 238, 124, 46, 30, 15, 251, 38};
    unsigned char data2[22] = { 31, 40, 102, 193, 185, 74, 118, 82, 147, 67, 151, 16, 205, 161, 54, 63, 230, 105, 167, 64, 171, 182 };

    srand((newgrp1Size + initialValue | newgrp1Size + initialValue >> 31 << 32) % 65535);

    for (int i = 0; i < 22; i++)
    {
        int v2 = rand();
        data[i] ^= (v2 | v2 >> 31 << 32) % 255;
        printf("%c", data[i]);
    }

    printf("\n");
    

    srand((newgrpSize + initialValue | newgrpSize + initialValue >> 31 << 32) % 65535);

    for (int i = 0; i < 22; i++)
    {
        int v2 = rand();
        data2[i] ^= (v2 | v2 >> 31 << 32) % 255;
        printf("%c", data2[i]);
    }
}

data's chars represent obfuscated path strings cuted from tables of different archives. As seen in the result, after re-obfuscation first data looks like readable string, but data2 isnt.

Where might be issue?

spiritovod · Post by **spiritovod** » Thu Sep 22, 2022 3:01 am

@grandshot: I would suggest to use IDE with some code assistant, errors like that will be explained there. srand accept uint32, but left shift for 32 bits is producing larger number (at least int64) for any non-zero value, thus result is undefined and compiler probably cast it to uint32, which may work in some cases and may not in others. The same goes for v2 usage with left shift.

grandshot · Post by **grandshot** » Thu Sep 22, 2022 5:29 pm

@spiritovod: Thanks for reply.

I suppose this, but in original function from dll the rand() result always casts to int32_t and all work correct.

'size' is size of obfuscated data, 'pwd' - the whole size of grp archive

Actually i write c++ simple only for test purpose. I prefer Python and already implement my version of obfuscator. It works also fast as c++ version (with same issue), and even have little improvement, which allow parse a file table on the fly without allocate whole table data.

Code: Select all

XENUS_INITIAL_VALUE = 47536

    def _randomizer_set_seed(self, initial_value=XENUS_INITIAL_VALUE) -> None:
        value = self.group_file_size + initial_value
        self._seed = (value | value >> 31 << 32) % 0xFFFF
    
    def _randomizer(self) -> int:
        # implementation of C++ rand() function.
        
        self._seed = (self._seed * 214013 + 2531011) % 2**64
        return (self._seed >> 16) & 0x7fff
        
        
    def _obfuscate_bytes(self, data: bytes) -> bytes:
        return bytes(i ^ ((randomizer := self._randomizer()) | randomizer >> 31 << 32) % 255 for i in data)

Code: Select all

    def _parse_file_table(self):
        for file_id in range(self.num_files):
            path_length = unpack('<H', self._obfuscate_bytes(self.file_stream.read(2)))[0]
            
            entry_size = path_length + 12
            if self.file_stream.tell() + entry_size > (self.file_table_size + 16):
                    self.error_message = 'Can\'t allocate %d bytes for file entry %d from offset 0x%X.' % (entry_size, file_id, self.file_stream.tell())
                                                    
                    return 0
                    
            file = SingleFile(unpack('<%ds3I' % (path_length), self._obfuscate_bytes(self.file_stream.read(entry_size))))
            self.files.append(file)
            #print(list(i for i in self.files[-1].path))
        return 1

Well, i can calling original dll function from Python, but i want make my code clear and independ from engine files.

I presume, i need to the hooking variables in assembler code, trace their changes in all steps of process, for comparing them with results of my code. Not say what i know how to do that.

spiritovod · Post by **spiritovod** » Fri Sep 23, 2022 1:52 am

@grandshot: This function looks much easier in IDA:

Code: Select all

void __cdecl gfLoadMajor(unsigned __int8 *data, int size, int pwd)
{
  int i; // esi

  srand((pwd + 47536) % 0xFFFF);
  for ( i = 0; i < size; data[i - 1] ^= rand() % 255 )
    ++i;
}

which is the same implementation as in original aluigi's script, mentioned in this topic.

It means you can just use "srand((fileSize + initialValue) % 0xFFFF)" and "data[i] ^= v2 % 255" in both cases, just declare file sizes and seed/initial value as int instead of unsigned int - this way it will work properly.
For reference, compare behavior of C++ app with quickbms script. The script works with grp smaller then 0x7FFFFFFF (which is around 2GB and max value for signed int) with both usual quickbms and 4gb_files, but for bigger grp over that size only usual quickbms will work. It's related to how values are processed in both versions.

grandshot · Post by **grandshot** » Sat Sep 24, 2022 2:54 am

Thanks. I edit my C++ test script and all works fine.

Now I need in some way edit my Python _randomizer function to get same result.

Suspect, this modulus and exponent ops at the end is guilty:

Code: Select all

    def _randomizer(self) -> int:
        # implementation of C++ rand() function.
        
        self._seed = (self._seed * 214013 + 2531011) % 2**64 #It's convert signed to unsigned, what not needed
        return (self._seed >> 16) & 0x7FFF

But without them code execution is very very very slow. I even can't wait the ending. Endured two minuts and make keyboard interrupt)

spiritovod · Post by **spiritovod** » Sat Sep 24, 2022 1:27 pm

@grandshot: I'm not familiar with python, but still can't understand why "% 2**64" is even required there. If you want to explicitly cast value to int, there should be other ways to do so. Maybe it's worth to take a look at numpy / ctypes functionality or just use bitwise operations.

ZenHAX

Wrong obfuscation result on the files with size > 2 GB

Wrong obfuscation result on the files with size > 2 GB

Re: Wrong obfuscation result on the files with size > 2 GB

Re: Wrong obfuscation result on the files with size > 2 GB

Re: Wrong obfuscation result on the files with size > 2 GB

Re: Wrong obfuscation result on the files with size > 2 GB

Re: Wrong obfuscation result on the files with size > 2 GB