Update documentation for new scripts

This includes the enhanced export/import scripts and the class fixup script (with the name mangler being used implicitly). With this, the switchover from simple label-based sharing of Ghidra project information to rich type and class information is complete.
2026-02-20 18:27:04 +03:00 · 2026-02-19 21:16:38 -05:00 · 2026-02-19 21:16:38 -05:00 · bbe9d63294
commit bbe9d63294
parent aac010eb71
1 changed files with 95 additions and 94 deletions
--- a/documentation/gettingstarted.md
+++ b/documentation/gettingstarted.md
@ -83,38 +83,54 @@ executable where objdiff doesn't expect them to be, which will mess up our
 diffs.  To correct this, open the memory map (`Window > Memory Map`) and
 uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
-Now we'll import data types from the decompilation.  Open a shell in the
+Now we'll import data types from the decompilation.  Open a Unix-style shell
-`ghidra/` directory of your copy of the repository and run `make_header.sh`,
+(e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
-which will produce a `jsrf.h` in the same directory with the combined contents
+repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
-of every header in a format suitable for Ghidra.  Then, in Ghidra, select
+directory with the combined contents of every header in a format suitable for
-`File > Parse C Source...` to open a window for importing C headers.  Remove
+Ghidra.  Then, in Ghidra, select `File > Parse C Source...` to open a window
-everything from the "Source files to parse" and "Parse options" boxes, and add
+for importing C headers.  Remove everything from the "Source files to parse"
-`jsrf.h` to the former (click the green + symbol on the right and select the
+and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
-`jsrf.h` file).  Click the "..." on the "Program Architecture:" box and select
+symbol on the right and select the `jsrf.h` file).  Click the "..." on the
-the row with the values "x86," "default," "32," "little," and "Visual Studio."
+"Program Architecture:" box and select the row with the values "x86,"
-Finally, click the "Parse to Program" button, "Continue" to confirm, and
+"default," "32," "little," and "Visual Studio." Finally, click the "Parse to
-"Don't Use Open Archives" (the header is completely self-contained and doesn't
+Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
-need any information from any other data type archives).  You should then see a
+header is completely self-contained and doesn't need any information from any
-window reporting successful import, and you'll be able to find `jsrf.h` with
+other data type archives).  You should then see a window reporting successful
-all of its definitions under `default.xbe` in the Data Type Manager window in
+import, and you'll be able to find `jsrf.h` with all of its definitions under
-the bottom left.
+`default.xbe` in the Data Type Manager window in the bottom left.
-Lastly, we'll import symbols from the JSRF decompilation repository.  Open the
+Much of our work with Ghidra will make use of some custom scripts we've
-script manager (`Window > Script Manager`) and select the "Data" folder in the
+written, so we'll have to tell it where to find them.  Open up the Script
-left pane.  Double click the script titled `ImportSymbolsScript.py`, and a file
+Manager (`Window > Script Manager`) and then open the Bundle Manager by
-picker will open after a moment.  Select `symboltable.tsv` from the `ghidra/`
+clicking the "manage script directories" button (it looks sort of like a
-directory of your cloned JSRF decompilation repository, and you should see a
+bulleted list).  Click the green + in the top right to add a new directory and
-bunch of `Created function...` and `Created label...` printed to the scripting
+select the `ghidra/ghidra_scripts` directory in this repository.
-console window.  Save your changes (save icon in the top left of the
+
-CodeBrowser window), and your Ghidra project should be all ready for creating
+The first script we'll want to run is the symbol importer to get known data and
-object files for objdiff.
+functions into your Ghidra project.  In the Script Manager window, select the
 "Import" category in the left pane and double click the `EnhancedImport.java`
 script in the right pane to run it.  You'll then be asked for an input file;
 select `ghidra/symboltable.tsv` from this repository.  Afterwards, you'll see a
 bunch of "Importing ..." messages in a console in the main CodeBrowser window,
 some of which may have "can't find data type X" added on if something's marked
 with a type that hasn't made its way into our decompiled code yet, and there'll
 be a bunch of new functions and labels defined.
 While we imported a bunch of data types earlier, Ghidra's C parser leaves out
 some important information that we'll have to fill in with another script.  In
 the Script Manager, run `ClassFixup.java` from the "Data Types" category,  and
 you should see some "Converting X to class" and "Fixing calling convention of
 X" messages in the console.
 Now you've got a Ghidra project containing everything we know about JSRF's
 code!  Make sure you save your Ghidra project now that everything's set up.
 ### Producing Object Files
-Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if
+Close all of your Ghidra windows and open a Unix-style shell in the
-on Windows) in the decompilation repository's `ghidra/` directory.  The
+decompilation repository's `ghidra/` directory.  The `delink.sh` script is our
-`delink.sh` script is our automated tool for extracting all the object files
+automated tool for extracting all the object files that have been identified so
-that have been identified so far.  Invoke it with three arguments:
+far.  The easiest way to run it is to invoke it with three arguments:
 - The path to your Ghidra installation (the directory with files like
  `ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
@ -128,27 +144,30 @@ Unix-style paths.  Make sure the paths are surrounded by quotes, too (e.g.
 `'C:\path\to\whatever'`), else the shell won't understand the backslashes
 correctly.
 If you find typing out these arguments to be too much of a pain, you can also
 define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
 `$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
 There are a couple errors you might get here:
 - `Unable to lock project!`: This means that Ghidra isn't fully closed.  Make
  sure you've completely closed every Ghidra window before running `delink.sh`.
- `Script not found: DelinkProgram.java` and
+- `Script not found` and `Invalid script`: This means that you haven't added
-  `Invalid script: DelinkProgram.java`: This means that the either the Ghidra
+  the repository's `ghidra_scripts` directory to the script search path as
-  delinker extension isn't properly installed, or you've somehow invoked the
+  described in the previous section (particulary if it mentions
-  script in a way that can't see the extension (e.g. installing Ghidra on
+  `MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
-  Windows and then invoking the script from WSL).  Ensure it's installed and
+  (particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
-  enabled first, and that you're not running in some kind of environment
+  the script in a way that can't see the scripts (e.g. installing Ghidra on
-  different from where you installed Ghidra.
+  Windows and then invoking the script from WSL).
 - `java.lang.RuntimeException: Failed to export ...`: This means that the
  delinker extension doesn't like something about what it was told to delink.
  One known cause is duplicate symbol names.  If you haven't modified
  `objects.csv` or `symboltable.tsv`, let other people on the project know so
  that they can look into fixing it.
-If all goes well, you'll see the message `Delinking complete!` at the end of
+If all goes well, the extracted object files will be in the `decompile/target/`
-the script's output, and the extracted object files will be in the
+directory of the repository.  Now we're ready to start recompiling and diffing
-`decompile/target/` directory of the repository.  Now we're ready to start
+code with objdiff.
 recompiling and diffing code with objdiff.
 ### Setting Up objdiff
@ -167,9 +186,11 @@ correctly set up on your `PATH`.
 One important piece of information, to make sure you get the correct match
 percentages: set `Diff Options > Function relocation diffs` to "None."
-Otherwise, approximately all references to functions and non-local variables
+Otherwise, some references to non-local variables will be marked as nonmatching
-will be marked as nonmatching (this has to do with the delinking process not
+(this is because it's sometimes not possible to make certain things named
-applying name mangling, which isn't expected to be fixed).
+variables in Ghidra, particularly thread-local storage, and other times it's
 not possible to assign a fixed name to certain implicitly generated output in
 the recompiled code).
 ### Using objdiff
@ -180,14 +201,13 @@ them.  In the best case, corresponding functions in each file will have the
 same name and be in the same section, at which point objdiff can link them
 automatically.  Otherwise, one has to click on one of the corresponding
 functions in one pane and the other function in the other pane to tell objdiff
-to link them.  Common cases of this are class methods (the names won't match)
+to link them.  The most common cases of this are implicitly generated functions
-and implicitly generated functions, such as exception handling code placed in
+and data, such as exception handling code placed in `.text$x` in the recompiled
-`.text$x` in the recompiled object file.  Keep in mind that objdiff's matching
+object file.  Be aware that objdiff's matching does not appear fully reliable
-does not appear fully reliable in some cases, particularly when diffing data
+in some cases, particularly when diffing data with external pointers (which
-with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
+appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
-marked as non-matching but still somehow reduce the match percentage, so you'll
+still somehow reduce the match percentage, so you'll have to use a tiny amount
-have to use a tiny amount of judgement to determine when you actually have a
+of judgement to determine when you actually have a match.
 match.
 Clicking on a function that's been linked across both object files shows a diff
 of the disassembly of both versions of the function, with any differences
@ -197,8 +217,20 @@ reaches 100%.  Depending on how you configure objdiff, it will rebuild
 automatically whenever you save a change to a source file, or you can manually
 rebuild with the "Build" button at the top of the right pane.
-There are no concrete instructions to give for writing decompiled code.  Try
+When viewing and editing decompiled source files, be mindful of the
-importing headers from `decompile/src/` into Ghidra
+`// Status:` annotation above each function, which has the following meanings:
 - `unimplemented`: The decompiled function does not yet reproduce the behaviour
  of the original
 - `nonmatching`: The decompiled function is believed to behave the same as the
  original, but it does not fully match in objdiff
 - `matching`: The decompiled function perfectly matches the original in objdiff
 Be sure to update them as you decompile if appropriate.  Some functions may
 also have other annotations describing nontrivial effects of link-time code
 generation (LTCG), such as a nonstandard calling convention or multiple
 functions being merged into one.
 Otherwise, there are no concrete instructions to give for writing decompiled
 code.  Try importing headers from `decompile/src/` into Ghidra
 (`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
 decompilation of the function in the CodeBrowser as a starting point for
 writing your matching function, exercising whatever C++ and x86 assembly
@ -223,46 +255,11 @@ whole executable in Ghidra.
 ### Updating `symboltable.tsv`
-If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
+If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
-workflow has been devised to generate it from your Ghidra project.  Before
+can generate a new copy from your Ghidra project by running the
-regenerating the table, however, make sure that you have all of it symbols
+`EnhancedExport.java` script from the "Export" category.  If you want to merge
-already in your project so that you don't end up deleting any.  One option is
+the new table into the repository, make sure to take a look at the diff first
-to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
+to ensure you're not inadvertently deleting anything.
 script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
 this will overwrite any names you've assigned to the same symbols.  You will
 also have to ensure that no two symbols share the same name.  This can be
 avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
 coexist), but function overloading must be avoided (you may not have one
 function with the signature `void X::f(int)` and another with the signature
 `void X::f(float)`), else errors can arise when delinking, as the delinker
 extension does not mangle symbol names.  Thunked functions can also cause
 problems because Ghidra does not include them alongside other functions in the
 symbol table, so convert them to regular functions (right click on the thunked
 function in the symbol tree and unset it as a thunk in the `Function` submenu).
 Once you're ready to export your symbols, open the symbol table
 (`Window > Symbol Table`).  Open the symbol filter window (cog button near the
 top right), and uncheck everything but "User Defined" under "Symbol Source,"
 "Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
 Filters," and "Non-Externals" under "Non-Externals."  This ensures that you
 only export symbols that you've defined and that are useful for delinking.
 Now we need to configure the columns that we want to export.  Right-click on
 one of the colum headers, click "Add/Remove Columns..." to open the "Select
 Columns" window, and in it check only "Location," "Name," "Namespace," and
 "Type."  Click "OK" to close the window and ensure that the column order is
 "Location," "Namespace," "Name," "Type" (you can drag the column headers to
 reorder them if needed).
 Now, to actually export the table, right-click on one of the table cells, click
 "Select All," and then right-click again on a cell to select "Export > Export
 to CSV..." before selecting where to save your exported symbol table.
 The final step is converting this CSV file to the format expected by
 `ImportSymbolsScript.py`.  Open a shell in the repository's `ghidra/` directory
 and run `make_symboltable.sh` with the path of your exported CSV as an
 argument, and `symboltable.tsv` will be overwritten with a new table containing
 your exported symbols.
 ### Updating `make_header.sh`
@ -314,12 +311,16 @@ correctly (exception-handling code might be appended onto another function, for
 example).  Because `symboltable.tsv` should only be populated with symbols that
 have been manually defined as per the previous section, this means that you
 need to define variable names and labels in Ghidra for everything therein (and
-ideally everything referenced externally, as well).  Do try to maintain basic
+ideally everything referenced externally, as well).  Strive to maintain basic
 consistency with the rest of the codebase: functions and methods begin with
 lowercase letters, for instance, while class/struct/enum names begin with
 capital letters, and special methods like constructors and destructors should
 have the names they would have in real C++ code (i.e. `Class::Class` and
-`Class::~Class`, respectively).
+`Class::~Class`, respectively).  Special class methods and members like
 constructors and vtables must follow their established naming conventions for
 our tooling to work properly.  Also note that you can (mostly) disable name
 mangling for a symbol by making it a member of the `extern_"C"` namespace,
 which applies C-style name mangling as used by some symbols.
 Once an object is ready for extracting, its `Delink?` column should be set to
 `true` and the `objdiff.json` file in the `decompile/` directory should be