Add data type import for Ghidra

This commit is contained in:
KeybadeBlox 2026-02-04 19:52:12 -05:00
parent 30f8a5879e
commit 63002e0f08
9 changed files with 233 additions and 31 deletions

View file

@ -83,15 +83,31 @@ executable where objdiff doesn't expect them to be, which will mess up our
diffs. To correct this, open the memory map (`Window > Memory Map`) and
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
Now we'll import symbols from the JSRF decompilation repository. After running
the analysis, open the script manager (`Window > Script Manager`) and select
the "Data" folder in the left pane. Double click the script titled
`ImportSymbolsScript.py`, and a file picker will open after a moment. Select
`symboltable.tsv` from the `delink/` directory of your cloned JSRF
decompilation repository, and you should see a bunch of `Created function...`
and `Created label...` in the scripting console window. Save your changes
(save icon in the top left of the CodeBrowser window), and your Ghidra project
should be all ready for creating object files for objdiff.
Now we'll import data types from the decompilation. Open a shell in the
`ghidra/` directory of your copy of the repository and run `make_header.sh`,
which will produce a `jsrf.h` in the same directory with the combined contents
of every header in a format suitable for Ghidra. Then, in Ghidra, select
`File > Parse C Source...` to open a window for importing C headers. Remove
everything from the "Source files to parse" and "Parse options" boxes, and add
`jsrf.h` to the former (click the green + symbol on the right and select the
`jsrf.h` file). Click the "..." on the "Program Architecture:" box and select
the row with the values "x86," "default," "32," "little," and "Visual Studio."
Finally, click the "Parse to Program" button, "Continue" to confirm, and
"Don't Use Open Archives" (the header is completely self-contained and doesn't
need any information from any other data type archives). You should then see a
window reporting successful import, and you'll be able to find `jsrf.h` with
all of its definitions under `default.xbe` in the Data Type Manager window in
the bottom left.
Lastly, we'll import symbols from the JSRF decompilation repository. Open the
script manager (`Window > Script Manager`) and select the "Data" folder in the
left pane. Double click the script titled `ImportSymbolsScript.py`, and a file
picker will open after a moment. Select `symboltable.tsv` from the `ghidra/`
directory of your cloned JSRF decompilation repository, and you should see a
bunch of `Created function...` and `Created label...` printed to the scripting
console window. Save your changes (save icon in the top left of the
CodeBrowser window), and your Ghidra project should be all ready for creating
object files for objdiff.
### Producing Object Files
@ -198,12 +214,12 @@ request to merge it back into the online copy.
## Contributing to Delinking
Getting the JSRF binary delinked is just as important as decompiling the
resulting object files, but takes a bit more investment. The concrete task of
a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in
the `delink/` directory, which together enable consistent delinking of object
files. The former lists symbols at different addresses through the whole
executable, while the latter lists the address ranges that have been identified
as separable objects. Both of these things are figured out by combing over the
whole executable in Ghidra.
a delinking contributor is to populate `symboltable.tsv` in the `ghidra/`
directory and `objects.csv` in the `delink/` directory, which together enable
consistent delinking of object files. The former lists symbols at different
addresses through the whole executable, while the latter lists the address
ranges that have been identified as separable objects. Both of these things
are figured out by combing over the whole executable in Ghidra.
### Updating `symboltable.tsv`
@ -243,12 +259,37 @@ Now, to actually export the table, right-click on one of the table cells, click
to CSV..." before selecting where to save your exported symbol table.
The final step is converting this CSV file to the format expected by
`ImportSymbolsScript.py`. Open a shell in the repository's `delink/` directory
`ImportSymbolsScript.py`. Open a shell in the repository's `ghidra/` directory
and run `make_symboltable.sh` with the path of your exported CSV as an
argument, and `symboltable.tsv` will be overwritten with a new table containing
your exported symbols.
### Updating `make_header.sh`
If you've added any header files, you'll want to add them to the `HEADERS`
variable in `ghidra/make_header.sh`. Make sure that any other header files
they depend on are earlier in the list, as this script combines everything into
one file without any `#include` directives. Make sure the script runs
successfully and Ghidra is able to import the resulting `jsrf.h`.
Keep in mind that `make_header.sh` uses a fairly rudimentary `awk` script to
convert C++ headers to C, which places some gentle constraints on how
declarations need to be written. In general, it's enough to just keep things
simple and not do anything unusual (keep data type and variable declarations
separate, don't use macros for declarations, etc.), but the one big catch is
that the body of a data type definition must not be on the same line as the
opening or closing braces. That is, do not write
```c++
struct X { unsigned x; };
```
but rather
```c++
struct X {
unsigned x;
};
```
### Updating `objects.csv`
`objects.csv` is a listing of addresses for each object file or group of object
files that we've identified. Each column after the first two corresponds to a