Update documentation for new scripts

This includes the enhanced export/import scripts and the class fixup
script (with the name mangler being used implicitly).  With this, the
switchover from simple label-based sharing of Ghidra project information
to rich type and class information is complete.
This commit is contained in:
KeybadeBlox 2026-02-19 21:16:38 -05:00
parent aac010eb71
commit bbe9d63294

View file

@ -83,38 +83,54 @@ executable where objdiff doesn't expect them to be, which will mess up our
diffs. To correct this, open the memory map (`Window > Memory Map`) and diffs. To correct this, open the memory map (`Window > Memory Map`) and
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`. uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
Now we'll import data types from the decompilation. Open a shell in the Now we'll import data types from the decompilation. Open a Unix-style shell
`ghidra/` directory of your copy of the repository and run `make_header.sh`, (e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
which will produce a `jsrf.h` in the same directory with the combined contents repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
of every header in a format suitable for Ghidra. Then, in Ghidra, select directory with the combined contents of every header in a format suitable for
`File > Parse C Source...` to open a window for importing C headers. Remove Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window
everything from the "Source files to parse" and "Parse options" boxes, and add for importing C headers. Remove everything from the "Source files to parse"
`jsrf.h` to the former (click the green + symbol on the right and select the and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
`jsrf.h` file). Click the "..." on the "Program Architecture:" box and select symbol on the right and select the `jsrf.h` file). Click the "..." on the
the row with the values "x86," "default," "32," "little," and "Visual Studio." "Program Architecture:" box and select the row with the values "x86,"
Finally, click the "Parse to Program" button, "Continue" to confirm, and "default," "32," "little," and "Visual Studio." Finally, click the "Parse to
"Don't Use Open Archives" (the header is completely self-contained and doesn't Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
need any information from any other data type archives). You should then see a header is completely self-contained and doesn't need any information from any
window reporting successful import, and you'll be able to find `jsrf.h` with other data type archives). You should then see a window reporting successful
all of its definitions under `default.xbe` in the Data Type Manager window in import, and you'll be able to find `jsrf.h` with all of its definitions under
the bottom left. `default.xbe` in the Data Type Manager window in the bottom left.
Lastly, we'll import symbols from the JSRF decompilation repository. Open the Much of our work with Ghidra will make use of some custom scripts we've
script manager (`Window > Script Manager`) and select the "Data" folder in the written, so we'll have to tell it where to find them. Open up the Script
left pane. Double click the script titled `ImportSymbolsScript.py`, and a file Manager (`Window > Script Manager`) and then open the Bundle Manager by
picker will open after a moment. Select `symboltable.tsv` from the `ghidra/` clicking the "manage script directories" button (it looks sort of like a
directory of your cloned JSRF decompilation repository, and you should see a bulleted list). Click the green + in the top right to add a new directory and
bunch of `Created function...` and `Created label...` printed to the scripting select the `ghidra/ghidra_scripts` directory in this repository.
console window. Save your changes (save icon in the top left of the
CodeBrowser window), and your Ghidra project should be all ready for creating The first script we'll want to run is the symbol importer to get known data and
object files for objdiff. functions into your Ghidra project. In the Script Manager window, select the
"Import" category in the left pane and double click the `EnhancedImport.java`
script in the right pane to run it. You'll then be asked for an input file;
select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a
bunch of "Importing ..." messages in a console in the main CodeBrowser window,
some of which may have "can't find data type X" added on if something's marked
with a type that hasn't made its way into our decompiled code yet, and there'll
be a bunch of new functions and labels defined.
While we imported a bunch of data types earlier, Ghidra's C parser leaves out
some important information that we'll have to fill in with another script. In
the Script Manager, run `ClassFixup.java` from the "Data Types" category, and
you should see some "Converting X to class" and "Fixing calling convention of
X" messages in the console.
Now you've got a Ghidra project containing everything we know about JSRF's
code! Make sure you save your Ghidra project now that everything's set up.
### Producing Object Files ### Producing Object Files
Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if Close all of your Ghidra windows and open a Unix-style shell in the
on Windows) in the decompilation repository's `ghidra/` directory. The decompilation repository's `ghidra/` directory. The `delink.sh` script is our
`delink.sh` script is our automated tool for extracting all the object files automated tool for extracting all the object files that have been identified so
that have been identified so far. Invoke it with three arguments: far. The easiest way to run it is to invoke it with three arguments:
- The path to your Ghidra installation (the directory with files like - The path to your Ghidra installation (the directory with files like
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and `ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
@ -128,27 +144,30 @@ Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g.
`'C:\path\to\whatever'`), else the shell won't understand the backslashes `'C:\path\to\whatever'`), else the shell won't understand the backslashes
correctly. correctly.
If you find typing out these arguments to be too much of a pain, you can also
define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
`$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
There are a couple errors you might get here: There are a couple errors you might get here:
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make - `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
sure you've completely closed every Ghidra window before running `delink.sh`. sure you've completely closed every Ghidra window before running `delink.sh`.
- `Script not found: DelinkProgram.java` and - `Script not found` and `Invalid script`: This means that you haven't added
`Invalid script: DelinkProgram.java`: This means that the either the Ghidra the repository's `ghidra_scripts` directory to the script search path as
delinker extension isn't properly installed, or you've somehow invoked the described in the previous section (particulary if it mentions
script in a way that can't see the extension (e.g. installing Ghidra on `MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
Windows and then invoking the script from WSL). Ensure it's installed and (particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
enabled first, and that you're not running in some kind of environment the script in a way that can't see the scripts (e.g. installing Ghidra on
different from where you installed Ghidra. Windows and then invoking the script from WSL).
- `java.lang.RuntimeException: Failed to export ...`: This means that the - `java.lang.RuntimeException: Failed to export ...`: This means that the
delinker extension doesn't like something about what it was told to delink. delinker extension doesn't like something about what it was told to delink.
One known cause is duplicate symbol names. If you haven't modified One known cause is duplicate symbol names. If you haven't modified
`objects.csv` or `symboltable.tsv`, let other people on the project know so `objects.csv` or `symboltable.tsv`, let other people on the project know so
that they can look into fixing it. that they can look into fixing it.
If all goes well, you'll see the message `Delinking complete!` at the end of If all goes well, the extracted object files will be in the `decompile/target/`
the script's output, and the extracted object files will be in the directory of the repository. Now we're ready to start recompiling and diffing
`decompile/target/` directory of the repository. Now we're ready to start code with objdiff.
recompiling and diffing code with objdiff.
### Setting Up objdiff ### Setting Up objdiff
@ -167,9 +186,11 @@ correctly set up on your `PATH`.
One important piece of information, to make sure you get the correct match One important piece of information, to make sure you get the correct match
percentages: set `Diff Options > Function relocation diffs` to "None." percentages: set `Diff Options > Function relocation diffs` to "None."
Otherwise, approximately all references to functions and non-local variables Otherwise, some references to non-local variables will be marked as nonmatching
will be marked as nonmatching (this has to do with the delinking process not (this is because it's sometimes not possible to make certain things named
applying name mangling, which isn't expected to be fixed). variables in Ghidra, particularly thread-local storage, and other times it's
not possible to assign a fixed name to certain implicitly generated output in
the recompiled code).
### Using objdiff ### Using objdiff
@ -180,14 +201,13 @@ them. In the best case, corresponding functions in each file will have the
same name and be in the same section, at which point objdiff can link them same name and be in the same section, at which point objdiff can link them
automatically. Otherwise, one has to click on one of the corresponding automatically. Otherwise, one has to click on one of the corresponding
functions in one pane and the other function in the other pane to tell objdiff functions in one pane and the other function in the other pane to tell objdiff
to link them. Common cases of this are class methods (the names won't match) to link them. The most common cases of this are implicitly generated functions
and implicitly generated functions, such as exception handling code placed in and data, such as exception handling code placed in `.text$x` in the recompiled
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching object file. Be aware that objdiff's matching does not appear fully reliable
does not appear fully reliable in some cases, particularly when diffing data in some cases, particularly when diffing data with external pointers (which
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
marked as non-matching but still somehow reduce the match percentage, so you'll still somehow reduce the match percentage, so you'll have to use a tiny amount
have to use a tiny amount of judgement to determine when you actually have a of judgement to determine when you actually have a match.
match.
Clicking on a function that's been linked across both object files shows a diff Clicking on a function that's been linked across both object files shows a diff
of the disassembly of both versions of the function, with any differences of the disassembly of both versions of the function, with any differences
@ -197,8 +217,20 @@ reaches 100%. Depending on how you configure objdiff, it will rebuild
automatically whenever you save a change to a source file, or you can manually automatically whenever you save a change to a source file, or you can manually
rebuild with the "Build" button at the top of the right pane. rebuild with the "Build" button at the top of the right pane.
There are no concrete instructions to give for writing decompiled code. Try When viewing and editing decompiled source files, be mindful of the
importing headers from `decompile/src/` into Ghidra `// Status:` annotation above each function, which has the following meanings:
- `unimplemented`: The decompiled function does not yet reproduce the behaviour
of the original
- `nonmatching`: The decompiled function is believed to behave the same as the
original, but it does not fully match in objdiff
- `matching`: The decompiled function perfectly matches the original in objdiff
Be sure to update them as you decompile if appropriate. Some functions may
also have other annotations describing nontrivial effects of link-time code
generation (LTCG), such as a nonstandard calling convention or multiple
functions being merged into one.
Otherwise, there are no concrete instructions to give for writing decompiled
code. Try importing headers from `decompile/src/` into Ghidra
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's (`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
decompilation of the function in the CodeBrowser as a starting point for decompilation of the function in the CodeBrowser as a starting point for
writing your matching function, exercising whatever C++ and x86 assembly writing your matching function, exercising whatever C++ and x86 assembly
@ -223,46 +255,11 @@ whole executable in Ghidra.
### Updating `symboltable.tsv` ### Updating `symboltable.tsv`
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
workflow has been devised to generate it from your Ghidra project. Before can generate a new copy from your Ghidra project by running the
regenerating the table, however, make sure that you have all of it symbols `EnhancedExport.java` script from the "Export" category. If you want to merge
already in your project so that you don't end up deleting any. One option is the new table into the repository, make sure to take a look at the diff first
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py` to ensure you're not inadvertently deleting anything.
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
this will overwrite any names you've assigned to the same symbols. You will
also have to ensure that no two symbols share the same name. This can be
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
coexist), but function overloading must be avoided (you may not have one
function with the signature `void X::f(int)` and another with the signature
`void X::f(float)`), else errors can arise when delinking, as the delinker
extension does not mangle symbol names. Thunked functions can also cause
problems because Ghidra does not include them alongside other functions in the
symbol table, so convert them to regular functions (right click on the thunked
function in the symbol tree and unset it as a thunk in the `Function` submenu).
Once you're ready to export your symbols, open the symbol table
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
top right), and uncheck everything but "User Defined" under "Symbol Source,"
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
only export symbols that you've defined and that are useful for delinking.
Now we need to configure the columns that we want to export. Right-click on
one of the colum headers, click "Add/Remove Columns..." to open the "Select
Columns" window, and in it check only "Location," "Name," "Namespace," and
"Type." Click "OK" to close the window and ensure that the column order is
"Location," "Namespace," "Name," "Type" (you can drag the column headers to
reorder them if needed).
Now, to actually export the table, right-click on one of the table cells, click
"Select All," and then right-click again on a cell to select "Export > Export
to CSV..." before selecting where to save your exported symbol table.
The final step is converting this CSV file to the format expected by
`ImportSymbolsScript.py`. Open a shell in the repository's `ghidra/` directory
and run `make_symboltable.sh` with the path of your exported CSV as an
argument, and `symboltable.tsv` will be overwritten with a new table containing
your exported symbols.
### Updating `make_header.sh` ### Updating `make_header.sh`
@ -314,12 +311,16 @@ correctly (exception-handling code might be appended onto another function, for
example). Because `symboltable.tsv` should only be populated with symbols that example). Because `symboltable.tsv` should only be populated with symbols that
have been manually defined as per the previous section, this means that you have been manually defined as per the previous section, this means that you
need to define variable names and labels in Ghidra for everything therein (and need to define variable names and labels in Ghidra for everything therein (and
ideally everything referenced externally, as well). Do try to maintain basic ideally everything referenced externally, as well). Strive to maintain basic
consistency with the rest of the codebase: functions and methods begin with consistency with the rest of the codebase: functions and methods begin with
lowercase letters, for instance, while class/struct/enum names begin with lowercase letters, for instance, while class/struct/enum names begin with
capital letters, and special methods like constructors and destructors should capital letters, and special methods like constructors and destructors should
have the names they would have in real C++ code (i.e. `Class::Class` and have the names they would have in real C++ code (i.e. `Class::Class` and
`Class::~Class`, respectively). `Class::~Class`, respectively). Special class methods and members like
constructors and vtables must follow their established naming conventions for
our tooling to work properly. Also note that you can (mostly) disable name
mangling for a symbol by making it a member of the `extern_"C"` namespace,
which applies C-style name mangling as used by some symbols.
Once an object is ready for extracting, its `Delink?` column should be set to Once an object is ready for extracting, its `Delink?` column should be set to
`true` and the `objdiff.json` file in the `decompile/` directory should be `true` and the `objdiff.json` file in the `decompile/` directory should be