Update documentation for new scripts

This includes the enhanced export/import scripts and the class fixup
script (with the name mangler being used implicitly).  With this, the
switchover from simple label-based sharing of Ghidra project information
to rich type and class information is complete.
This commit is contained in:
KeybadeBlox 2026-02-19 21:16:38 -05:00
parent aac010eb71
commit bbe9d63294

View file

@ -83,38 +83,54 @@ executable where objdiff doesn't expect them to be, which will mess up our
diffs. To correct this, open the memory map (`Window > Memory Map`) and
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
Now we'll import data types from the decompilation. Open a shell in the
`ghidra/` directory of your copy of the repository and run `make_header.sh`,
which will produce a `jsrf.h` in the same directory with the combined contents
of every header in a format suitable for Ghidra. Then, in Ghidra, select
`File > Parse C Source...` to open a window for importing C headers. Remove
everything from the "Source files to parse" and "Parse options" boxes, and add
`jsrf.h` to the former (click the green + symbol on the right and select the
`jsrf.h` file). Click the "..." on the "Program Architecture:" box and select
the row with the values "x86," "default," "32," "little," and "Visual Studio."
Finally, click the "Parse to Program" button, "Continue" to confirm, and
"Don't Use Open Archives" (the header is completely self-contained and doesn't
need any information from any other data type archives). You should then see a
window reporting successful import, and you'll be able to find `jsrf.h` with
all of its definitions under `default.xbe` in the Data Type Manager window in
the bottom left.
Now we'll import data types from the decompilation. Open a Unix-style shell
(e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
directory with the combined contents of every header in a format suitable for
Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window
for importing C headers. Remove everything from the "Source files to parse"
and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
symbol on the right and select the `jsrf.h` file). Click the "..." on the
"Program Architecture:" box and select the row with the values "x86,"
"default," "32," "little," and "Visual Studio." Finally, click the "Parse to
Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
header is completely self-contained and doesn't need any information from any
other data type archives). You should then see a window reporting successful
import, and you'll be able to find `jsrf.h` with all of its definitions under
`default.xbe` in the Data Type Manager window in the bottom left.
Lastly, we'll import symbols from the JSRF decompilation repository. Open the
script manager (`Window > Script Manager`) and select the "Data" folder in the
left pane. Double click the script titled `ImportSymbolsScript.py`, and a file
picker will open after a moment. Select `symboltable.tsv` from the `ghidra/`
directory of your cloned JSRF decompilation repository, and you should see a
bunch of `Created function...` and `Created label...` printed to the scripting
console window. Save your changes (save icon in the top left of the
CodeBrowser window), and your Ghidra project should be all ready for creating
object files for objdiff.
Much of our work with Ghidra will make use of some custom scripts we've
written, so we'll have to tell it where to find them. Open up the Script
Manager (`Window > Script Manager`) and then open the Bundle Manager by
clicking the "manage script directories" button (it looks sort of like a
bulleted list). Click the green + in the top right to add a new directory and
select the `ghidra/ghidra_scripts` directory in this repository.
The first script we'll want to run is the symbol importer to get known data and
functions into your Ghidra project. In the Script Manager window, select the
"Import" category in the left pane and double click the `EnhancedImport.java`
script in the right pane to run it. You'll then be asked for an input file;
select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a
bunch of "Importing ..." messages in a console in the main CodeBrowser window,
some of which may have "can't find data type X" added on if something's marked
with a type that hasn't made its way into our decompiled code yet, and there'll
be a bunch of new functions and labels defined.
While we imported a bunch of data types earlier, Ghidra's C parser leaves out
some important information that we'll have to fill in with another script. In
the Script Manager, run `ClassFixup.java` from the "Data Types" category, and
you should see some "Converting X to class" and "Fixing calling convention of
X" messages in the console.
Now you've got a Ghidra project containing everything we know about JSRF's
code! Make sure you save your Ghidra project now that everything's set up.
### Producing Object Files
Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if
on Windows) in the decompilation repository's `ghidra/` directory. The
`delink.sh` script is our automated tool for extracting all the object files
that have been identified so far. Invoke it with three arguments:
Close all of your Ghidra windows and open a Unix-style shell in the
decompilation repository's `ghidra/` directory. The `delink.sh` script is our
automated tool for extracting all the object files that have been identified so
far. The easiest way to run it is to invoke it with three arguments:
- The path to your Ghidra installation (the directory with files like
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
@ -128,27 +144,30 @@ Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g.
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
correctly.
If you find typing out these arguments to be too much of a pain, you can also
define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
`$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
There are a couple errors you might get here:
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
sure you've completely closed every Ghidra window before running `delink.sh`.
- `Script not found: DelinkProgram.java` and
`Invalid script: DelinkProgram.java`: This means that the either the Ghidra
delinker extension isn't properly installed, or you've somehow invoked the
script in a way that can't see the extension (e.g. installing Ghidra on
Windows and then invoking the script from WSL). Ensure it's installed and
enabled first, and that you're not running in some kind of environment
different from where you installed Ghidra.
- `Script not found` and `Invalid script`: This means that you haven't added
the repository's `ghidra_scripts` directory to the script search path as
described in the previous section (particulary if it mentions
`MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
(particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
the script in a way that can't see the scripts (e.g. installing Ghidra on
Windows and then invoking the script from WSL).
- `java.lang.RuntimeException: Failed to export ...`: This means that the
delinker extension doesn't like something about what it was told to delink.
One known cause is duplicate symbol names. If you haven't modified
`objects.csv` or `symboltable.tsv`, let other people on the project know so
that they can look into fixing it.
If all goes well, you'll see the message `Delinking complete!` at the end of
the script's output, and the extracted object files will be in the
`decompile/target/` directory of the repository. Now we're ready to start
recompiling and diffing code with objdiff.
If all goes well, the extracted object files will be in the `decompile/target/`
directory of the repository. Now we're ready to start recompiling and diffing
code with objdiff.
### Setting Up objdiff
@ -167,9 +186,11 @@ correctly set up on your `PATH`.
One important piece of information, to make sure you get the correct match
percentages: set `Diff Options > Function relocation diffs` to "None."
Otherwise, approximately all references to functions and non-local variables
will be marked as nonmatching (this has to do with the delinking process not
applying name mangling, which isn't expected to be fixed).
Otherwise, some references to non-local variables will be marked as nonmatching
(this is because it's sometimes not possible to make certain things named
variables in Ghidra, particularly thread-local storage, and other times it's
not possible to assign a fixed name to certain implicitly generated output in
the recompiled code).
### Using objdiff
@ -180,14 +201,13 @@ them. In the best case, corresponding functions in each file will have the
same name and be in the same section, at which point objdiff can link them
automatically. Otherwise, one has to click on one of the corresponding
functions in one pane and the other function in the other pane to tell objdiff
to link them. Common cases of this are class methods (the names won't match)
and implicitly generated functions, such as exception handling code placed in
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching
does not appear fully reliable in some cases, particularly when diffing data
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
marked as non-matching but still somehow reduce the match percentage, so you'll
have to use a tiny amount of judgement to determine when you actually have a
match.
to link them. The most common cases of this are implicitly generated functions
and data, such as exception handling code placed in `.text$x` in the recompiled
object file. Be aware that objdiff's matching does not appear fully reliable
in some cases, particularly when diffing data with external pointers (which
appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
still somehow reduce the match percentage, so you'll have to use a tiny amount
of judgement to determine when you actually have a match.
Clicking on a function that's been linked across both object files shows a diff
of the disassembly of both versions of the function, with any differences
@ -197,8 +217,20 @@ reaches 100%. Depending on how you configure objdiff, it will rebuild
automatically whenever you save a change to a source file, or you can manually
rebuild with the "Build" button at the top of the right pane.
There are no concrete instructions to give for writing decompiled code. Try
importing headers from `decompile/src/` into Ghidra
When viewing and editing decompiled source files, be mindful of the
`// Status:` annotation above each function, which has the following meanings:
- `unimplemented`: The decompiled function does not yet reproduce the behaviour
of the original
- `nonmatching`: The decompiled function is believed to behave the same as the
original, but it does not fully match in objdiff
- `matching`: The decompiled function perfectly matches the original in objdiff
Be sure to update them as you decompile if appropriate. Some functions may
also have other annotations describing nontrivial effects of link-time code
generation (LTCG), such as a nonstandard calling convention or multiple
functions being merged into one.
Otherwise, there are no concrete instructions to give for writing decompiled
code. Try importing headers from `decompile/src/` into Ghidra
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
decompilation of the function in the CodeBrowser as a starting point for
writing your matching function, exercising whatever C++ and x86 assembly
@ -223,46 +255,11 @@ whole executable in Ghidra.
### Updating `symboltable.tsv`
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
workflow has been devised to generate it from your Ghidra project. Before
regenerating the table, however, make sure that you have all of it symbols
already in your project so that you don't end up deleting any. One option is
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
this will overwrite any names you've assigned to the same symbols. You will
also have to ensure that no two symbols share the same name. This can be
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
coexist), but function overloading must be avoided (you may not have one
function with the signature `void X::f(int)` and another with the signature
`void X::f(float)`), else errors can arise when delinking, as the delinker
extension does not mangle symbol names. Thunked functions can also cause
problems because Ghidra does not include them alongside other functions in the
symbol table, so convert them to regular functions (right click on the thunked
function in the symbol tree and unset it as a thunk in the `Function` submenu).
Once you're ready to export your symbols, open the symbol table
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
top right), and uncheck everything but "User Defined" under "Symbol Source,"
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
only export symbols that you've defined and that are useful for delinking.
Now we need to configure the columns that we want to export. Right-click on
one of the colum headers, click "Add/Remove Columns..." to open the "Select
Columns" window, and in it check only "Location," "Name," "Namespace," and
"Type." Click "OK" to close the window and ensure that the column order is
"Location," "Namespace," "Name," "Type" (you can drag the column headers to
reorder them if needed).
Now, to actually export the table, right-click on one of the table cells, click
"Select All," and then right-click again on a cell to select "Export > Export
to CSV..." before selecting where to save your exported symbol table.
The final step is converting this CSV file to the format expected by
`ImportSymbolsScript.py`. Open a shell in the repository's `ghidra/` directory
and run `make_symboltable.sh` with the path of your exported CSV as an
argument, and `symboltable.tsv` will be overwritten with a new table containing
your exported symbols.
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
can generate a new copy from your Ghidra project by running the
`EnhancedExport.java` script from the "Export" category. If you want to merge
the new table into the repository, make sure to take a look at the diff first
to ensure you're not inadvertently deleting anything.
### Updating `make_header.sh`
@ -314,12 +311,16 @@ correctly (exception-handling code might be appended onto another function, for
example). Because `symboltable.tsv` should only be populated with symbols that
have been manually defined as per the previous section, this means that you
need to define variable names and labels in Ghidra for everything therein (and
ideally everything referenced externally, as well). Do try to maintain basic
ideally everything referenced externally, as well). Strive to maintain basic
consistency with the rest of the codebase: functions and methods begin with
lowercase letters, for instance, while class/struct/enum names begin with
capital letters, and special methods like constructors and destructors should
have the names they would have in real C++ code (i.e. `Class::Class` and
`Class::~Class`, respectively).
`Class::~Class`, respectively). Special class methods and members like
constructors and vtables must follow their established naming conventions for
our tooling to work properly. Also note that you can (mostly) disable name
mangling for a symbol by making it a member of the `extern_"C"` namespace,
which applies C-style name mangling as used by some symbols.
Once an object is ready for extracting, its `Delink?` column should be set to
`true` and the `objdiff.json` file in the `decompile/` directory should be