古剑奇谭三(Gujian3) text tool

How to translate the files of a game
bruhmoment
Posts: 5
Joined: Sun Feb 14, 2021 3:36 pm

古剑奇谭三(Gujian3) text tool

Post by bruhmoment »

is it possible to make tool for texts? i think it's encrypted .exe
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by aluigi »

Have you checked if there are tools available for the other games based on the Havok Vision Engine?

Apparently that's the engine used in this game too, probably all the files and formats are the same.
bruhmoment
Posts: 5
Joined: Sun Feb 14, 2021 3:36 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by bruhmoment »

aluigi wrote:Have you checked if there are tools available for the other games based on the Havok Vision Engine?

Apparently that's the engine used in this game too, probably all the files and formats are the same.


do you have a full list this engine scripts? i use ctrl f quickbms script page but couldn't find

files look like this

https://prnt.sc/zlfcd6

https://prnt.sc/zlfexs

https://prnt.sc/zlfhkc
yusuf2020
Posts: 106
Joined: Wed Jun 17, 2020 1:12 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by yusuf2020 »

bruhmoment wrote:
aluigi wrote:Have you checked if there are tools available for the other games based on the Havok Vision Engine?

Apparently that's the engine used in this game too, probably all the files and formats are the same.


do you have a full list this engine scripts? i use ctrl f quickbms script page but couldn't find

files look like this

https://prnt.sc/zlfcd6

https://prnt.sc/zlfexs

https://prnt.sc/zlfhkc

I guess you can extract datas with https://github.com/wmltogether/CriPakTools
bruhmoment
Posts: 5
Joined: Sun Feb 14, 2021 3:36 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by bruhmoment »

yusuf2020 wrote:
bruhmoment wrote:
aluigi wrote:Have you checked if there are tools available for the other games based on the Havok Vision Engine?

Apparently that's the engine used in this game too, probably all the files and formats are the same.


do you have a full list this engine scripts? i use ctrl f quickbms script page but couldn't find

files look like this

https://prnt.sc/zlfcd6

https://prnt.sc/zlfexs

https://prnt.sc/zlfhkc

I guess you can extract datas with https://github.com/wmltogether/CriPakTools


not work cri

https://github.com/Kaplas80/GuJian3Manager i find this but i don't know how to use it.
yusuf2020
Posts: 106
Joined: Wed Jun 17, 2020 1:12 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by yusuf2020 »

bruhmoment wrote:
yusuf2020 wrote:
bruhmoment wrote:
do you have a full list this engine scripts? i use ctrl f quickbms script page but couldn't find

files look like this

https://prnt.sc/zlfcd6

https://prnt.sc/zlfexs

https://prnt.sc/zlfhkc

I guess you can extract datas with https://github.com/wmltogether/CriPakTools


not work cri

https://github.com/Kaplas80/GuJian3Manager i find this but i don't know how to use it.

Compile with Visual Studio.
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

I run the .exe in IDA debugger and after lot of failed attempts found the assembly calls that decrypt game text into a memory buffer. Dumped this buffer to file, it is ~62MB in size. Its seems to contain all cutscene subtitle and UI strings. Loaded this file in Notepad++ and set encoding to UTF-8, I found 3 language bundle together in some format with cutsceneID and VoiceID:

Image

Image
Kaplas
Posts: 60
Joined: Fri Jan 25, 2019 2:47 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by Kaplas »

Hello alanm

Could you tell me what are the addresses of the decryption functions in the executable?

Thanks!
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Hi Kaplas,
Are you the same Kaplas of Gujian3Manger? Really appreciate your good work of the GuJian3 file extractor/decryptor.

The original Steam version of .exe has encrypted .text section and it quit when a debugger is running. You need the "alternative" .exe from the web that does not have a encrypted .text section.

Load .exe into Ghidra as a PE executable.
Find the decrypt function entry by offset or byte search:
On entry: R14 contains address of the output buffer. R8 (not EAX, my bad) contains the decoded size, when R8=0x3DF0367. it is handling the text buffer.
Image


What I did was going to the exit of the decrypt function, set a conditional breakpoint there only break when EAX=0x3DF0367. Dump the memory content starting at address pointer R14 for 0x3DF0367 bytes long
Image
Last edited by alanm on Sat Nov 13, 2021 8:57 pm, edited 1 time in total.
Kaplas
Posts: 60
Joined: Fri Jan 25, 2019 2:47 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by Kaplas »

Thank you!!

I'll let you know if I'm able to reverse the encryption.
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Hello Kaplas,
Hope you make progress with the encryption code. If the code turn out to be too convoluted to be reversed, there maybe another way to replace the text buffer using the un-encrypted .exe. This is assuming the buffer I found contain all the localization text, I did not study it in detail.

-Using a PE editor to add a new code section to .exe, this add space to the end of .exe
-Put buffer checking assembly code and a copy of the translated text buffer in the new segment space.
-Change original decrypted function epilogue code, add a jmp instruction which direct control to new code segment.
-In new segment, check output buffer size. If buffer size matches, copy translated text buffer to output buffer. jmp back to the encrypted function and return to caller.

This method works like a malware, AV probably will flag the .exe :lol:

Also the translated text cannot be longer than the original text it replaces, since altering the format of text buffer probably upset game code.

Subtitle text:
Image
Quest text
Image
Lazy_Cat_2k3
Posts: 31
Joined: Sat Aug 22, 2020 12:43 am

Re: 古剑奇谭三(Gujian3) text tool

Post by Lazy_Cat_2k3 »

Can we create a hook and load the translated text buffer from and external file like text.bin ?
Also if we can remove another language, there will be more space to make translated text longer.
Just some ideas like alanm said, I haven't done anything yet :mrgreen:
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Lazy_Cat_2k3 wrote:Can we create a hook and load the translated text buffer from and external file like text.bin ?
Also if we can remove another language, there will be more space to make translated text longer.
Just some ideas like alanm said, I haven't done anything yet :mrgreen:


Definitely can read the translated text buffer from file. Take a bit more time to code that.
Thanks for the idea of getting space from another language. Initial test look promising. All the text strings in text buffer is prefix with a hex value "0x08 <length>". Need to adjust that <length> byte if text size change. The total buffer size must remain constant, for example an English text get 10 bytes longer , one must remove 10 bytes from one of the other languages text and adjust the length prefix to match the changes. Game will crash if the total buffer size change. What is needed at minimum is to write a parser to extract localizable strings from the text buffer file to a utf-8 text file for translation. And a injector script to inject translated text back to text buffer file, automatically balance the text size from different languages. The text buffer file contains a varieties of text include skills/ items/ dialog/menu/mission text etc. etc, they all have different structure and very challenging to find the display strings. It will take some time to find the tag/value pairs, if it even possible to get them all.
Lazy_Cat_2k3
Posts: 31
Joined: Sat Aug 22, 2020 12:43 am

Re: 古剑奇谭三(Gujian3) text tool

Post by Lazy_Cat_2k3 »

Can you upload the text buffer file (decrypted ofc) ?
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Here is the decrypted text buffer file. Have fun :) .
https://www.mediafire.com/file/m4fc3o5kqqznrgv/gujian3_text.bin/file
Lazy_Cat_2k3
Posts: 31
Joined: Sat Aug 22, 2020 12:43 am

Re: 古剑奇谭三(Gujian3) text tool

Post by Lazy_Cat_2k3 »

After digging around, it seem like they use Lua CJSON to read text buffer file (binary json). And I don't have any experience on lua :(
Since size of string (1 - 2 byte ?) is always before the string , make pattern scanning some tag like "description", "hint" (0x0448696E7408), "EN" (0x02454E08), "DialogText", "text" ... is another way but will take a lot of time to find all tag.
Kaplas
Posts: 60
Joined: Fri Jan 25, 2019 2:47 pm

Re: 古剑奇谭三(Gujian3) text tool

Post by Kaplas »

alanm wrote:Game will crash if the total buffer size change.

The encrypted section is at offset 0x132d070 in the exe file, and the first value is the uncompressed section size. If we are able to read the unencrypted section from a file, maybe changing that value let us translate without the need of keeping the section size.
Lazy_Cat_2k3
Posts: 31
Joined: Sat Aug 22, 2020 12:43 am

Re: 古剑奇谭三(Gujian3) text tool

Post by Lazy_Cat_2k3 »

Here are all strings extracted with their offset (some strings are wrong because I haven't parsed all types and maybe I've parsed it wrong in the beggining :) )
https://www.mediafire.com/file/8xm20hza ... s.rar/file
The code behind:

Code: Select all

            while (input.Position < input.Length)
            {
                byte Type = input.ReadValueU8();
                int sizeString = 0;
                switch (Type)
                {
                    case 8:
                        sizeString = input.ReadValueU8();
                        output.WriteLine(input.ReadString(sizeString));
                        break;
                    case 9:
                        sizeString = input.ReadValueU16();
                        output.WriteLine(input.ReadString(sizeString));
                        break;
                    case 3:
                        input.ReadBytes(1);
                        break;
                    case 4:
                        input.ReadBytes(2);
                        break;
                    case 5:
                        input.ReadBytes(3);
                        break;
                    case 6:
                        input.ReadBytes(3);
                        break;
                    case 7:
                        input.ReadBytes(8);
                        break;
                    case 10:
                        input.ReadBytes(3);
                        break;
                    case 11:
                        input.ReadBytes(4);
                        break;
                    case 16:
                        input.ReadBytes(4);
                        break;
                }
            }
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Kaplas wrote:The encrypted section is at offset 0x132d070 in the exe file, and the first value is the uncompressed section size. If we are able to read the unencrypted section from a file, maybe changing that value let us translate without the need of keeping the section size.


A quick test reducing the size at offset 0x132d070 a little bit causing the game quit. there may be other check depends on the size.

The text buffer hook now read text from external file.
alanm
Posts: 21
Joined: Mon Aug 17, 2020 4:54 am

Re: 古剑奇谭三(Gujian3) text tool

Post by alanm »

Lazy_Cat_2k3 wrote:Here are all strings extracted with their offset (some strings are wrong because I haven't parsed all types and maybe I've parsed it wrong in the beggining :) )
https://www.mediafire.com/file/8xm20hza ... s.rar/file
The code behind:


The extracted file looks pretty complete. how to tell which strings are wrong?

What would be even better is if we can further reduce this file down to only show the localized strings of the 3 language version. , for example for a dialog line, the keyID shows up only once but with 3 language dialog text strings show up one after another. By doing that the translator know exactly which strings require translation since it will have 3 lines of texts pair with one key.

There are clues in the strings file that tell us where a language block started. Default Chinese text block started with tag "Content", traditional Chinese block started with tag "CHT" and English block started with tag "EN"