I was sent from RomHacking to this forum. I try to translate Dragon Quest IV on PSX and found the textblocks.
This is an example textblock, the parsing result and the two interesting byte arrays (dataCE and dataED).
The second byte array (dataED) seems to be a huffman tree?
I checked:
- a japanese letter (leaf) occurs only once
- the tree always starts with a node 0x8***
- the node numbers always ranging from 0 to (number of nodes - 1)
- the highest node number seems to be always on top
- in the decoding routine in PSX-EXE it jumps from node to node until a leaf is found (see image)
- the binary tree seems to be always full: n nodes and n+1 leafs (0x7f counts also as leafs, null are no leafs)
I tried the typical tree traversals and other binary tree representations, but it did not work out. Maybe I did something wrong? Some example trees are:
The left number is just an index. The 0x7f seems also be leafs (but maybe I am wrong). The "null" is 0x0000 in the data (maybe a placeholder).
For example, what bothers me is that the second tree starts with 3 nodes but then 5 leafs follow or sometimes the node numbers are not sorted well (0 before 1).
The dataCE block has a length of a muliple of 4 (len mod 4 == 0). I think here are the bits to traverse the tree and find the leafs.
This is the related text decoding routine. r17 has the 0x8149 which is node '329' (because 0x149 = 329). The routine uses addresses to point to the parts of the tree, I assume.
You can find more details in my blog post. I will update it, if I find something new.