mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 10:17:03 +03:00
Update documentation for new scripts
This includes the enhanced export/import scripts and the class fixup script (with the name mangler being used implicitly). With this, the switchover from simple label-based sharing of Ghidra project information to rich type and class information is complete.
This commit is contained in:
parent
aac010eb71
commit
bbe9d63294
1 changed files with 95 additions and 94 deletions
|
|
@ -83,38 +83,54 @@ executable where objdiff doesn't expect them to be, which will mess up our
|
|||
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
||||
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
||||
|
||||
Now we'll import data types from the decompilation. Open a shell in the
|
||||
`ghidra/` directory of your copy of the repository and run `make_header.sh`,
|
||||
which will produce a `jsrf.h` in the same directory with the combined contents
|
||||
of every header in a format suitable for Ghidra. Then, in Ghidra, select
|
||||
`File > Parse C Source...` to open a window for importing C headers. Remove
|
||||
everything from the "Source files to parse" and "Parse options" boxes, and add
|
||||
`jsrf.h` to the former (click the green + symbol on the right and select the
|
||||
`jsrf.h` file). Click the "..." on the "Program Architecture:" box and select
|
||||
the row with the values "x86," "default," "32," "little," and "Visual Studio."
|
||||
Finally, click the "Parse to Program" button, "Continue" to confirm, and
|
||||
"Don't Use Open Archives" (the header is completely self-contained and doesn't
|
||||
need any information from any other data type archives). You should then see a
|
||||
window reporting successful import, and you'll be able to find `jsrf.h` with
|
||||
all of its definitions under `default.xbe` in the Data Type Manager window in
|
||||
the bottom left.
|
||||
Now we'll import data types from the decompilation. Open a Unix-style shell
|
||||
(e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
|
||||
repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
|
||||
directory with the combined contents of every header in a format suitable for
|
||||
Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window
|
||||
for importing C headers. Remove everything from the "Source files to parse"
|
||||
and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
|
||||
symbol on the right and select the `jsrf.h` file). Click the "..." on the
|
||||
"Program Architecture:" box and select the row with the values "x86,"
|
||||
"default," "32," "little," and "Visual Studio." Finally, click the "Parse to
|
||||
Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
|
||||
header is completely self-contained and doesn't need any information from any
|
||||
other data type archives). You should then see a window reporting successful
|
||||
import, and you'll be able to find `jsrf.h` with all of its definitions under
|
||||
`default.xbe` in the Data Type Manager window in the bottom left.
|
||||
|
||||
Lastly, we'll import symbols from the JSRF decompilation repository. Open the
|
||||
script manager (`Window > Script Manager`) and select the "Data" folder in the
|
||||
left pane. Double click the script titled `ImportSymbolsScript.py`, and a file
|
||||
picker will open after a moment. Select `symboltable.tsv` from the `ghidra/`
|
||||
directory of your cloned JSRF decompilation repository, and you should see a
|
||||
bunch of `Created function...` and `Created label...` printed to the scripting
|
||||
console window. Save your changes (save icon in the top left of the
|
||||
CodeBrowser window), and your Ghidra project should be all ready for creating
|
||||
object files for objdiff.
|
||||
Much of our work with Ghidra will make use of some custom scripts we've
|
||||
written, so we'll have to tell it where to find them. Open up the Script
|
||||
Manager (`Window > Script Manager`) and then open the Bundle Manager by
|
||||
clicking the "manage script directories" button (it looks sort of like a
|
||||
bulleted list). Click the green + in the top right to add a new directory and
|
||||
select the `ghidra/ghidra_scripts` directory in this repository.
|
||||
|
||||
The first script we'll want to run is the symbol importer to get known data and
|
||||
functions into your Ghidra project. In the Script Manager window, select the
|
||||
"Import" category in the left pane and double click the `EnhancedImport.java`
|
||||
script in the right pane to run it. You'll then be asked for an input file;
|
||||
select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a
|
||||
bunch of "Importing ..." messages in a console in the main CodeBrowser window,
|
||||
some of which may have "can't find data type X" added on if something's marked
|
||||
with a type that hasn't made its way into our decompiled code yet, and there'll
|
||||
be a bunch of new functions and labels defined.
|
||||
|
||||
While we imported a bunch of data types earlier, Ghidra's C parser leaves out
|
||||
some important information that we'll have to fill in with another script. In
|
||||
the Script Manager, run `ClassFixup.java` from the "Data Types" category, and
|
||||
you should see some "Converting X to class" and "Fixing calling convention of
|
||||
X" messages in the console.
|
||||
|
||||
Now you've got a Ghidra project containing everything we know about JSRF's
|
||||
code! Make sure you save your Ghidra project now that everything's set up.
|
||||
|
||||
|
||||
### Producing Object Files
|
||||
Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if
|
||||
on Windows) in the decompilation repository's `ghidra/` directory. The
|
||||
`delink.sh` script is our automated tool for extracting all the object files
|
||||
that have been identified so far. Invoke it with three arguments:
|
||||
Close all of your Ghidra windows and open a Unix-style shell in the
|
||||
decompilation repository's `ghidra/` directory. The `delink.sh` script is our
|
||||
automated tool for extracting all the object files that have been identified so
|
||||
far. The easiest way to run it is to invoke it with three arguments:
|
||||
|
||||
- The path to your Ghidra installation (the directory with files like
|
||||
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
||||
|
|
@ -128,27 +144,30 @@ Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g.
|
|||
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
|
||||
correctly.
|
||||
|
||||
If you find typing out these arguments to be too much of a pain, you can also
|
||||
define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
|
||||
`$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
|
||||
|
||||
There are a couple errors you might get here:
|
||||
|
||||
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
||||
sure you've completely closed every Ghidra window before running `delink.sh`.
|
||||
- `Script not found: DelinkProgram.java` and
|
||||
`Invalid script: DelinkProgram.java`: This means that the either the Ghidra
|
||||
delinker extension isn't properly installed, or you've somehow invoked the
|
||||
script in a way that can't see the extension (e.g. installing Ghidra on
|
||||
Windows and then invoking the script from WSL). Ensure it's installed and
|
||||
enabled first, and that you're not running in some kind of environment
|
||||
different from where you installed Ghidra.
|
||||
- `Script not found` and `Invalid script`: This means that you haven't added
|
||||
the repository's `ghidra_scripts` directory to the script search path as
|
||||
described in the previous section (particulary if it mentions
|
||||
`MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
|
||||
(particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
|
||||
the script in a way that can't see the scripts (e.g. installing Ghidra on
|
||||
Windows and then invoking the script from WSL).
|
||||
- `java.lang.RuntimeException: Failed to export ...`: This means that the
|
||||
delinker extension doesn't like something about what it was told to delink.
|
||||
One known cause is duplicate symbol names. If you haven't modified
|
||||
`objects.csv` or `symboltable.tsv`, let other people on the project know so
|
||||
that they can look into fixing it.
|
||||
|
||||
If all goes well, you'll see the message `Delinking complete!` at the end of
|
||||
the script's output, and the extracted object files will be in the
|
||||
`decompile/target/` directory of the repository. Now we're ready to start
|
||||
recompiling and diffing code with objdiff.
|
||||
If all goes well, the extracted object files will be in the `decompile/target/`
|
||||
directory of the repository. Now we're ready to start recompiling and diffing
|
||||
code with objdiff.
|
||||
|
||||
|
||||
### Setting Up objdiff
|
||||
|
|
@ -167,9 +186,11 @@ correctly set up on your `PATH`.
|
|||
|
||||
One important piece of information, to make sure you get the correct match
|
||||
percentages: set `Diff Options > Function relocation diffs` to "None."
|
||||
Otherwise, approximately all references to functions and non-local variables
|
||||
will be marked as nonmatching (this has to do with the delinking process not
|
||||
applying name mangling, which isn't expected to be fixed).
|
||||
Otherwise, some references to non-local variables will be marked as nonmatching
|
||||
(this is because it's sometimes not possible to make certain things named
|
||||
variables in Ghidra, particularly thread-local storage, and other times it's
|
||||
not possible to assign a fixed name to certain implicitly generated output in
|
||||
the recompiled code).
|
||||
|
||||
|
||||
### Using objdiff
|
||||
|
|
@ -180,14 +201,13 @@ them. In the best case, corresponding functions in each file will have the
|
|||
same name and be in the same section, at which point objdiff can link them
|
||||
automatically. Otherwise, one has to click on one of the corresponding
|
||||
functions in one pane and the other function in the other pane to tell objdiff
|
||||
to link them. Common cases of this are class methods (the names won't match)
|
||||
and implicitly generated functions, such as exception handling code placed in
|
||||
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching
|
||||
does not appear fully reliable in some cases, particularly when diffing data
|
||||
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
|
||||
marked as non-matching but still somehow reduce the match percentage, so you'll
|
||||
have to use a tiny amount of judgement to determine when you actually have a
|
||||
match.
|
||||
to link them. The most common cases of this are implicitly generated functions
|
||||
and data, such as exception handling code placed in `.text$x` in the recompiled
|
||||
object file. Be aware that objdiff's matching does not appear fully reliable
|
||||
in some cases, particularly when diffing data with external pointers (which
|
||||
appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
|
||||
still somehow reduce the match percentage, so you'll have to use a tiny amount
|
||||
of judgement to determine when you actually have a match.
|
||||
|
||||
Clicking on a function that's been linked across both object files shows a diff
|
||||
of the disassembly of both versions of the function, with any differences
|
||||
|
|
@ -197,8 +217,20 @@ reaches 100%. Depending on how you configure objdiff, it will rebuild
|
|||
automatically whenever you save a change to a source file, or you can manually
|
||||
rebuild with the "Build" button at the top of the right pane.
|
||||
|
||||
There are no concrete instructions to give for writing decompiled code. Try
|
||||
importing headers from `decompile/src/` into Ghidra
|
||||
When viewing and editing decompiled source files, be mindful of the
|
||||
`// Status:` annotation above each function, which has the following meanings:
|
||||
- `unimplemented`: The decompiled function does not yet reproduce the behaviour
|
||||
of the original
|
||||
- `nonmatching`: The decompiled function is believed to behave the same as the
|
||||
original, but it does not fully match in objdiff
|
||||
- `matching`: The decompiled function perfectly matches the original in objdiff
|
||||
Be sure to update them as you decompile if appropriate. Some functions may
|
||||
also have other annotations describing nontrivial effects of link-time code
|
||||
generation (LTCG), such as a nonstandard calling convention or multiple
|
||||
functions being merged into one.
|
||||
|
||||
Otherwise, there are no concrete instructions to give for writing decompiled
|
||||
code. Try importing headers from `decompile/src/` into Ghidra
|
||||
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
||||
decompilation of the function in the CodeBrowser as a starting point for
|
||||
writing your matching function, exercising whatever C++ and x86 assembly
|
||||
|
|
@ -223,46 +255,11 @@ whole executable in Ghidra.
|
|||
|
||||
|
||||
### Updating `symboltable.tsv`
|
||||
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
|
||||
workflow has been devised to generate it from your Ghidra project. Before
|
||||
regenerating the table, however, make sure that you have all of it symbols
|
||||
already in your project so that you don't end up deleting any. One option is
|
||||
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
|
||||
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
|
||||
this will overwrite any names you've assigned to the same symbols. You will
|
||||
also have to ensure that no two symbols share the same name. This can be
|
||||
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
|
||||
coexist), but function overloading must be avoided (you may not have one
|
||||
function with the signature `void X::f(int)` and another with the signature
|
||||
`void X::f(float)`), else errors can arise when delinking, as the delinker
|
||||
extension does not mangle symbol names. Thunked functions can also cause
|
||||
problems because Ghidra does not include them alongside other functions in the
|
||||
symbol table, so convert them to regular functions (right click on the thunked
|
||||
function in the symbol tree and unset it as a thunk in the `Function` submenu).
|
||||
|
||||
Once you're ready to export your symbols, open the symbol table
|
||||
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
|
||||
top right), and uncheck everything but "User Defined" under "Symbol Source,"
|
||||
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
|
||||
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
|
||||
only export symbols that you've defined and that are useful for delinking.
|
||||
|
||||
Now we need to configure the columns that we want to export. Right-click on
|
||||
one of the colum headers, click "Add/Remove Columns..." to open the "Select
|
||||
Columns" window, and in it check only "Location," "Name," "Namespace," and
|
||||
"Type." Click "OK" to close the window and ensure that the column order is
|
||||
"Location," "Namespace," "Name," "Type" (you can drag the column headers to
|
||||
reorder them if needed).
|
||||
|
||||
Now, to actually export the table, right-click on one of the table cells, click
|
||||
"Select All," and then right-click again on a cell to select "Export > Export
|
||||
to CSV..." before selecting where to save your exported symbol table.
|
||||
|
||||
The final step is converting this CSV file to the format expected by
|
||||
`ImportSymbolsScript.py`. Open a shell in the repository's `ghidra/` directory
|
||||
and run `make_symboltable.sh` with the path of your exported CSV as an
|
||||
argument, and `symboltable.tsv` will be overwritten with a new table containing
|
||||
your exported symbols.
|
||||
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
|
||||
can generate a new copy from your Ghidra project by running the
|
||||
`EnhancedExport.java` script from the "Export" category. If you want to merge
|
||||
the new table into the repository, make sure to take a look at the diff first
|
||||
to ensure you're not inadvertently deleting anything.
|
||||
|
||||
|
||||
### Updating `make_header.sh`
|
||||
|
|
@ -314,12 +311,16 @@ correctly (exception-handling code might be appended onto another function, for
|
|||
example). Because `symboltable.tsv` should only be populated with symbols that
|
||||
have been manually defined as per the previous section, this means that you
|
||||
need to define variable names and labels in Ghidra for everything therein (and
|
||||
ideally everything referenced externally, as well). Do try to maintain basic
|
||||
ideally everything referenced externally, as well). Strive to maintain basic
|
||||
consistency with the rest of the codebase: functions and methods begin with
|
||||
lowercase letters, for instance, while class/struct/enum names begin with
|
||||
capital letters, and special methods like constructors and destructors should
|
||||
have the names they would have in real C++ code (i.e. `Class::Class` and
|
||||
`Class::~Class`, respectively).
|
||||
`Class::~Class`, respectively). Special class methods and members like
|
||||
constructors and vtables must follow their established naming conventions for
|
||||
our tooling to work properly. Also note that you can (mostly) disable name
|
||||
mangling for a symbol by making it a member of the `extern_"C"` namespace,
|
||||
which applies C-style name mangling as used by some symbols.
|
||||
|
||||
Once an object is ready for extracting, its `Delink?` column should be set to
|
||||
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue