Add data type import for Ghidra

2026-02-20 02:07:02 +03:00 · 2026-02-04 19:52:12 -05:00 · 2026-02-04 19:52:12 -05:00 · 63002e0f08
commit 63002e0f08
parent 30f8a5879e
9 changed files with 233 additions and 31 deletions
--- a/documentation/gettingstarted.md
+++ b/documentation/gettingstarted.md
@ -83,15 +83,31 @@ executable where objdiff doesn't expect them to be, which will mess up our
 diffs.  To correct this, open the memory map (`Window > Memory Map`) and
 uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.

-Now we'll import symbols from the JSRF decompilation repository.  After running
-the analysis, open the script manager (`Window > Script Manager`) and select
-the "Data" folder in the left pane.  Double click the script titled
-`ImportSymbolsScript.py`, and a file picker will open after a moment.  Select
-`symboltable.tsv` from the `delink/` directory of your cloned JSRF
-decompilation repository, and you should see a bunch of `Created function...`
-and `Created label...` in the scripting console window.  Save your changes
-(save icon in the top left of the CodeBrowser window), and your Ghidra project
-should be all ready for creating object files for objdiff.
+Now we'll import data types from the decompilation.  Open a shell in the
+`ghidra/` directory of your copy of the repository and run `make_header.sh`,
+which will produce a `jsrf.h` in the same directory with the combined contents
+of every header in a format suitable for Ghidra.  Then, in Ghidra, select
+`File > Parse C Source...` to open a window for importing C headers.  Remove
+everything from the "Source files to parse" and "Parse options" boxes, and add
+`jsrf.h` to the former (click the green + symbol on the right and select the
+`jsrf.h` file).  Click the "..." on the "Program Architecture:" box and select
+the row with the values "x86," "default," "32," "little," and "Visual Studio."
+Finally, click the "Parse to Program" button, "Continue" to confirm, and
+"Don't Use Open Archives" (the header is completely self-contained and doesn't
+need any information from any other data type archives).  You should then see a
+window reporting successful import, and you'll be able to find `jsrf.h` with
+all of its definitions under `default.xbe` in the Data Type Manager window in
+the bottom left.
+
+Lastly, we'll import symbols from the JSRF decompilation repository.  Open the
+script manager (`Window > Script Manager`) and select the "Data" folder in the
+left pane.  Double click the script titled `ImportSymbolsScript.py`, and a file
+picker will open after a moment.  Select `symboltable.tsv` from the `ghidra/`
+directory of your cloned JSRF decompilation repository, and you should see a
+bunch of `Created function...` and `Created label...` printed to the scripting
+console window.  Save your changes (save icon in the top left of the
+CodeBrowser window), and your Ghidra project should be all ready for creating
+object files for objdiff.


 ### Producing Object Files
@ -198,12 +214,12 @@ request to merge it back into the online copy.
 ## Contributing to Delinking
 Getting the JSRF binary delinked is just as important as decompiling the
 resulting object files, but takes a bit more investment.  The concrete task of
-a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in
-the `delink/` directory, which together enable consistent delinking of object
-files.  The former lists symbols at different addresses through the whole
-executable, while the latter lists the address ranges that have been identified
-as separable objects.  Both of these things are figured out by combing over the
-whole executable in Ghidra.
+a delinking contributor is to populate `symboltable.tsv` in the `ghidra/`
+directory and `objects.csv` in the `delink/` directory, which together enable
+consistent delinking of object files.  The former lists symbols at different
+addresses through the whole executable, while the latter lists the address
+ranges that have been identified as separable objects.  Both of these things
+are figured out by combing over the whole executable in Ghidra.


 ### Updating `symboltable.tsv`
@ -243,12 +259,37 @@ Now, to actually export the table, right-click on one of the table cells, click
 to CSV..." before selecting where to save your exported symbol table.

 The final step is converting this CSV file to the format expected by
-`ImportSymbolsScript.py`.  Open a shell in the repository's `delink/` directory
+`ImportSymbolsScript.py`.  Open a shell in the repository's `ghidra/` directory
 and run `make_symboltable.sh` with the path of your exported CSV as an
 argument, and `symboltable.tsv` will be overwritten with a new table containing
 your exported symbols.


+### Updating `make_header.sh`
+If you've added any header files, you'll want to add them to the `HEADERS`
+variable in `ghidra/make_header.sh`.  Make sure that any other header files
+they depend on are earlier in the list, as this script combines everything into
+one file without any `#include` directives.  Make sure the script runs
+successfully and Ghidra is able to import the resulting `jsrf.h`.
+
+Keep in mind that `make_header.sh` uses a fairly rudimentary `awk` script to
+convert C++ headers to C, which places some gentle constraints on how
+declarations need to be written.  In general, it's enough to just keep things
+simple and not do anything unusual (keep data type and variable declarations
+separate, don't use macros for declarations, etc.), but the one big catch is
+that the body of a data type definition must not be on the same line as the
+opening or closing braces.  That is, do not write
+```c++
+struct X { unsigned x; };
+```
+but rather
+```c++
+struct X {
+    unsigned x;
+};
+```
+
+
 ### Updating `objects.csv`
 `objects.csv` is a listing of addresses for each object file or group of object
 files that we've identified.  Each column after the first two corresponds to a