utf8 string to unicode

Programming related discussions related to game research
happydance
Posts: 81
Joined: Sun Jul 10, 2016 11:07 am

utf8 string to unicode

Post by happydance »

not sure if it's a bug or something wrong on how I code it, but i'm trying to convert some Japanese text strings from UTF8 to Unicode

so I've used this code to test it out first

Code: Select all

get TEXT string
put TEXT unicode MEMORY_FILE
get SIZE ASIZE MEMORY_FILE
log _001.txt 0 SIZE MEMORY_FILE


so I tried wrting "ダミー" to a UTF8 text file, but when I ran the file though quickbms 8.0 it converts wrongly ダミー on hex E3 00 92 01 80 00 DF 30 FC 30

when I same the same document to Unicode I got C0 30 DF 30 FC 30 in hex

so I tried using 0.76 and alo got C0 30 DF 30 FC 30 in hex

tied using plain English text UTF8 and not Japanese characters and it seems to be properly converted to Unicode.
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: utf8 string to unicode

Post by aluigi »

Try using the following instruction at the beginning of the script:
codepage 932

You can even specify the codepage with the command-line throught the option -P 932
happydance
Posts: 81
Joined: Sun Jul 10, 2016 11:07 am

Re: utf8 string to unicode

Post by happydance »

not sure I can use codepage 932 since 932 is SHIFT-JIS Japanese, is there a code page for UTF8 Japanese?

I also tried codepage 932

it gives me garbled text with a hex of 5D 7E 80 00 5D 7E 98 6E 92 01 7C FF
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: utf8 string to unicode

Post by aluigi »

I understand. Strings, unicode and codepages are a pain.
Do you mean that the output of quickbms 0.7.x is correct instead?
happydance
Posts: 81
Joined: Sun Jul 10, 2016 11:07 am

Re: utf8 string to unicode

Post by happydance »

yes, it displays properly on 0.76, haven't tested on other 7.x build

actually I made that script back in January I think when 0.76 is the latest and i'm trying to rip/repack text form other files I haven't touched yet of the same game and been using the updated 8.0, then encountered that problem, after hours of testing I've used the old files and to my suspired it worked properly on 7.6
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: utf8 string to unicode

Post by aluigi »

I will investigate it.
The TODO list for quickbms 0.8.1 is now very long... eh it's almost time to start working on it :)
aluigi
Site Admin
Posts: 12984
Joined: Wed Jul 30, 2014 9:32 pm

Re: utf8 string to unicode

Post by aluigi »

I confirm that it was a stupid bug caused by a very bad "fix" I used for a rare endless loop issue.
It will be fixed in 0.8.1 that hopefully will be released this week.