mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 10:17:03 +03:00
Update documentation for new scripts
This includes the enhanced export/import scripts and the class fixup script (with the name mangler being used implicitly). With this, the switchover from simple label-based sharing of Ghidra project information to rich type and class information is complete.
This commit is contained in:
parent
aac010eb71
commit
bbe9d63294
1 changed files with 95 additions and 94 deletions
|
|
@ -83,38 +83,54 @@ executable where objdiff doesn't expect them to be, which will mess up our
|
||||||
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
||||||
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
||||||
|
|
||||||
Now we'll import data types from the decompilation. Open a shell in the
|
Now we'll import data types from the decompilation. Open a Unix-style shell
|
||||||
`ghidra/` directory of your copy of the repository and run `make_header.sh`,
|
(e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
|
||||||
which will produce a `jsrf.h` in the same directory with the combined contents
|
repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
|
||||||
of every header in a format suitable for Ghidra. Then, in Ghidra, select
|
directory with the combined contents of every header in a format suitable for
|
||||||
`File > Parse C Source...` to open a window for importing C headers. Remove
|
Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window
|
||||||
everything from the "Source files to parse" and "Parse options" boxes, and add
|
for importing C headers. Remove everything from the "Source files to parse"
|
||||||
`jsrf.h` to the former (click the green + symbol on the right and select the
|
and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
|
||||||
`jsrf.h` file). Click the "..." on the "Program Architecture:" box and select
|
symbol on the right and select the `jsrf.h` file). Click the "..." on the
|
||||||
the row with the values "x86," "default," "32," "little," and "Visual Studio."
|
"Program Architecture:" box and select the row with the values "x86,"
|
||||||
Finally, click the "Parse to Program" button, "Continue" to confirm, and
|
"default," "32," "little," and "Visual Studio." Finally, click the "Parse to
|
||||||
"Don't Use Open Archives" (the header is completely self-contained and doesn't
|
Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
|
||||||
need any information from any other data type archives). You should then see a
|
header is completely self-contained and doesn't need any information from any
|
||||||
window reporting successful import, and you'll be able to find `jsrf.h` with
|
other data type archives). You should then see a window reporting successful
|
||||||
all of its definitions under `default.xbe` in the Data Type Manager window in
|
import, and you'll be able to find `jsrf.h` with all of its definitions under
|
||||||
the bottom left.
|
`default.xbe` in the Data Type Manager window in the bottom left.
|
||||||
|
|
||||||
Lastly, we'll import symbols from the JSRF decompilation repository. Open the
|
Much of our work with Ghidra will make use of some custom scripts we've
|
||||||
script manager (`Window > Script Manager`) and select the "Data" folder in the
|
written, so we'll have to tell it where to find them. Open up the Script
|
||||||
left pane. Double click the script titled `ImportSymbolsScript.py`, and a file
|
Manager (`Window > Script Manager`) and then open the Bundle Manager by
|
||||||
picker will open after a moment. Select `symboltable.tsv` from the `ghidra/`
|
clicking the "manage script directories" button (it looks sort of like a
|
||||||
directory of your cloned JSRF decompilation repository, and you should see a
|
bulleted list). Click the green + in the top right to add a new directory and
|
||||||
bunch of `Created function...` and `Created label...` printed to the scripting
|
select the `ghidra/ghidra_scripts` directory in this repository.
|
||||||
console window. Save your changes (save icon in the top left of the
|
|
||||||
CodeBrowser window), and your Ghidra project should be all ready for creating
|
The first script we'll want to run is the symbol importer to get known data and
|
||||||
object files for objdiff.
|
functions into your Ghidra project. In the Script Manager window, select the
|
||||||
|
"Import" category in the left pane and double click the `EnhancedImport.java`
|
||||||
|
script in the right pane to run it. You'll then be asked for an input file;
|
||||||
|
select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a
|
||||||
|
bunch of "Importing ..." messages in a console in the main CodeBrowser window,
|
||||||
|
some of which may have "can't find data type X" added on if something's marked
|
||||||
|
with a type that hasn't made its way into our decompiled code yet, and there'll
|
||||||
|
be a bunch of new functions and labels defined.
|
||||||
|
|
||||||
|
While we imported a bunch of data types earlier, Ghidra's C parser leaves out
|
||||||
|
some important information that we'll have to fill in with another script. In
|
||||||
|
the Script Manager, run `ClassFixup.java` from the "Data Types" category, and
|
||||||
|
you should see some "Converting X to class" and "Fixing calling convention of
|
||||||
|
X" messages in the console.
|
||||||
|
|
||||||
|
Now you've got a Ghidra project containing everything we know about JSRF's
|
||||||
|
code! Make sure you save your Ghidra project now that everything's set up.
|
||||||
|
|
||||||
|
|
||||||
### Producing Object Files
|
### Producing Object Files
|
||||||
Close all of your Ghidra windows and open a Unix-style shell (e.g. Git Bash if
|
Close all of your Ghidra windows and open a Unix-style shell in the
|
||||||
on Windows) in the decompilation repository's `ghidra/` directory. The
|
decompilation repository's `ghidra/` directory. The `delink.sh` script is our
|
||||||
`delink.sh` script is our automated tool for extracting all the object files
|
automated tool for extracting all the object files that have been identified so
|
||||||
that have been identified so far. Invoke it with three arguments:
|
far. The easiest way to run it is to invoke it with three arguments:
|
||||||
|
|
||||||
- The path to your Ghidra installation (the directory with files like
|
- The path to your Ghidra installation (the directory with files like
|
||||||
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
||||||
|
|
@ -128,27 +144,30 @@ Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g.
|
||||||
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
|
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
|
||||||
correctly.
|
correctly.
|
||||||
|
|
||||||
|
If you find typing out these arguments to be too much of a pain, you can also
|
||||||
|
define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
|
||||||
|
`$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
|
||||||
|
|
||||||
There are a couple errors you might get here:
|
There are a couple errors you might get here:
|
||||||
|
|
||||||
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
||||||
sure you've completely closed every Ghidra window before running `delink.sh`.
|
sure you've completely closed every Ghidra window before running `delink.sh`.
|
||||||
- `Script not found: DelinkProgram.java` and
|
- `Script not found` and `Invalid script`: This means that you haven't added
|
||||||
`Invalid script: DelinkProgram.java`: This means that the either the Ghidra
|
the repository's `ghidra_scripts` directory to the script search path as
|
||||||
delinker extension isn't properly installed, or you've somehow invoked the
|
described in the previous section (particulary if it mentions
|
||||||
script in a way that can't see the extension (e.g. installing Ghidra on
|
`MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
|
||||||
Windows and then invoking the script from WSL). Ensure it's installed and
|
(particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
|
||||||
enabled first, and that you're not running in some kind of environment
|
the script in a way that can't see the scripts (e.g. installing Ghidra on
|
||||||
different from where you installed Ghidra.
|
Windows and then invoking the script from WSL).
|
||||||
- `java.lang.RuntimeException: Failed to export ...`: This means that the
|
- `java.lang.RuntimeException: Failed to export ...`: This means that the
|
||||||
delinker extension doesn't like something about what it was told to delink.
|
delinker extension doesn't like something about what it was told to delink.
|
||||||
One known cause is duplicate symbol names. If you haven't modified
|
One known cause is duplicate symbol names. If you haven't modified
|
||||||
`objects.csv` or `symboltable.tsv`, let other people on the project know so
|
`objects.csv` or `symboltable.tsv`, let other people on the project know so
|
||||||
that they can look into fixing it.
|
that they can look into fixing it.
|
||||||
|
|
||||||
If all goes well, you'll see the message `Delinking complete!` at the end of
|
If all goes well, the extracted object files will be in the `decompile/target/`
|
||||||
the script's output, and the extracted object files will be in the
|
directory of the repository. Now we're ready to start recompiling and diffing
|
||||||
`decompile/target/` directory of the repository. Now we're ready to start
|
code with objdiff.
|
||||||
recompiling and diffing code with objdiff.
|
|
||||||
|
|
||||||
|
|
||||||
### Setting Up objdiff
|
### Setting Up objdiff
|
||||||
|
|
@ -167,9 +186,11 @@ correctly set up on your `PATH`.
|
||||||
|
|
||||||
One important piece of information, to make sure you get the correct match
|
One important piece of information, to make sure you get the correct match
|
||||||
percentages: set `Diff Options > Function relocation diffs` to "None."
|
percentages: set `Diff Options > Function relocation diffs` to "None."
|
||||||
Otherwise, approximately all references to functions and non-local variables
|
Otherwise, some references to non-local variables will be marked as nonmatching
|
||||||
will be marked as nonmatching (this has to do with the delinking process not
|
(this is because it's sometimes not possible to make certain things named
|
||||||
applying name mangling, which isn't expected to be fixed).
|
variables in Ghidra, particularly thread-local storage, and other times it's
|
||||||
|
not possible to assign a fixed name to certain implicitly generated output in
|
||||||
|
the recompiled code).
|
||||||
|
|
||||||
|
|
||||||
### Using objdiff
|
### Using objdiff
|
||||||
|
|
@ -180,14 +201,13 @@ them. In the best case, corresponding functions in each file will have the
|
||||||
same name and be in the same section, at which point objdiff can link them
|
same name and be in the same section, at which point objdiff can link them
|
||||||
automatically. Otherwise, one has to click on one of the corresponding
|
automatically. Otherwise, one has to click on one of the corresponding
|
||||||
functions in one pane and the other function in the other pane to tell objdiff
|
functions in one pane and the other function in the other pane to tell objdiff
|
||||||
to link them. Common cases of this are class methods (the names won't match)
|
to link them. The most common cases of this are implicitly generated functions
|
||||||
and implicitly generated functions, such as exception handling code placed in
|
and data, such as exception handling code placed in `.text$x` in the recompiled
|
||||||
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching
|
object file. Be aware that objdiff's matching does not appear fully reliable
|
||||||
does not appear fully reliable in some cases, particularly when diffing data
|
in some cases, particularly when diffing data with external pointers (which
|
||||||
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
|
appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
|
||||||
marked as non-matching but still somehow reduce the match percentage, so you'll
|
still somehow reduce the match percentage, so you'll have to use a tiny amount
|
||||||
have to use a tiny amount of judgement to determine when you actually have a
|
of judgement to determine when you actually have a match.
|
||||||
match.
|
|
||||||
|
|
||||||
Clicking on a function that's been linked across both object files shows a diff
|
Clicking on a function that's been linked across both object files shows a diff
|
||||||
of the disassembly of both versions of the function, with any differences
|
of the disassembly of both versions of the function, with any differences
|
||||||
|
|
@ -197,8 +217,20 @@ reaches 100%. Depending on how you configure objdiff, it will rebuild
|
||||||
automatically whenever you save a change to a source file, or you can manually
|
automatically whenever you save a change to a source file, or you can manually
|
||||||
rebuild with the "Build" button at the top of the right pane.
|
rebuild with the "Build" button at the top of the right pane.
|
||||||
|
|
||||||
There are no concrete instructions to give for writing decompiled code. Try
|
When viewing and editing decompiled source files, be mindful of the
|
||||||
importing headers from `decompile/src/` into Ghidra
|
`// Status:` annotation above each function, which has the following meanings:
|
||||||
|
- `unimplemented`: The decompiled function does not yet reproduce the behaviour
|
||||||
|
of the original
|
||||||
|
- `nonmatching`: The decompiled function is believed to behave the same as the
|
||||||
|
original, but it does not fully match in objdiff
|
||||||
|
- `matching`: The decompiled function perfectly matches the original in objdiff
|
||||||
|
Be sure to update them as you decompile if appropriate. Some functions may
|
||||||
|
also have other annotations describing nontrivial effects of link-time code
|
||||||
|
generation (LTCG), such as a nonstandard calling convention or multiple
|
||||||
|
functions being merged into one.
|
||||||
|
|
||||||
|
Otherwise, there are no concrete instructions to give for writing decompiled
|
||||||
|
code. Try importing headers from `decompile/src/` into Ghidra
|
||||||
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
||||||
decompilation of the function in the CodeBrowser as a starting point for
|
decompilation of the function in the CodeBrowser as a starting point for
|
||||||
writing your matching function, exercising whatever C++ and x86 assembly
|
writing your matching function, exercising whatever C++ and x86 assembly
|
||||||
|
|
@ -223,46 +255,11 @@ whole executable in Ghidra.
|
||||||
|
|
||||||
|
|
||||||
### Updating `symboltable.tsv`
|
### Updating `symboltable.tsv`
|
||||||
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
|
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
|
||||||
workflow has been devised to generate it from your Ghidra project. Before
|
can generate a new copy from your Ghidra project by running the
|
||||||
regenerating the table, however, make sure that you have all of it symbols
|
`EnhancedExport.java` script from the "Export" category. If you want to merge
|
||||||
already in your project so that you don't end up deleting any. One option is
|
the new table into the repository, make sure to take a look at the diff first
|
||||||
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
|
to ensure you're not inadvertently deleting anything.
|
||||||
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
|
|
||||||
this will overwrite any names you've assigned to the same symbols. You will
|
|
||||||
also have to ensure that no two symbols share the same name. This can be
|
|
||||||
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
|
|
||||||
coexist), but function overloading must be avoided (you may not have one
|
|
||||||
function with the signature `void X::f(int)` and another with the signature
|
|
||||||
`void X::f(float)`), else errors can arise when delinking, as the delinker
|
|
||||||
extension does not mangle symbol names. Thunked functions can also cause
|
|
||||||
problems because Ghidra does not include them alongside other functions in the
|
|
||||||
symbol table, so convert them to regular functions (right click on the thunked
|
|
||||||
function in the symbol tree and unset it as a thunk in the `Function` submenu).
|
|
||||||
|
|
||||||
Once you're ready to export your symbols, open the symbol table
|
|
||||||
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
|
|
||||||
top right), and uncheck everything but "User Defined" under "Symbol Source,"
|
|
||||||
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
|
|
||||||
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
|
|
||||||
only export symbols that you've defined and that are useful for delinking.
|
|
||||||
|
|
||||||
Now we need to configure the columns that we want to export. Right-click on
|
|
||||||
one of the colum headers, click "Add/Remove Columns..." to open the "Select
|
|
||||||
Columns" window, and in it check only "Location," "Name," "Namespace," and
|
|
||||||
"Type." Click "OK" to close the window and ensure that the column order is
|
|
||||||
"Location," "Namespace," "Name," "Type" (you can drag the column headers to
|
|
||||||
reorder them if needed).
|
|
||||||
|
|
||||||
Now, to actually export the table, right-click on one of the table cells, click
|
|
||||||
"Select All," and then right-click again on a cell to select "Export > Export
|
|
||||||
to CSV..." before selecting where to save your exported symbol table.
|
|
||||||
|
|
||||||
The final step is converting this CSV file to the format expected by
|
|
||||||
`ImportSymbolsScript.py`. Open a shell in the repository's `ghidra/` directory
|
|
||||||
and run `make_symboltable.sh` with the path of your exported CSV as an
|
|
||||||
argument, and `symboltable.tsv` will be overwritten with a new table containing
|
|
||||||
your exported symbols.
|
|
||||||
|
|
||||||
|
|
||||||
### Updating `make_header.sh`
|
### Updating `make_header.sh`
|
||||||
|
|
@ -314,12 +311,16 @@ correctly (exception-handling code might be appended onto another function, for
|
||||||
example). Because `symboltable.tsv` should only be populated with symbols that
|
example). Because `symboltable.tsv` should only be populated with symbols that
|
||||||
have been manually defined as per the previous section, this means that you
|
have been manually defined as per the previous section, this means that you
|
||||||
need to define variable names and labels in Ghidra for everything therein (and
|
need to define variable names and labels in Ghidra for everything therein (and
|
||||||
ideally everything referenced externally, as well). Do try to maintain basic
|
ideally everything referenced externally, as well). Strive to maintain basic
|
||||||
consistency with the rest of the codebase: functions and methods begin with
|
consistency with the rest of the codebase: functions and methods begin with
|
||||||
lowercase letters, for instance, while class/struct/enum names begin with
|
lowercase letters, for instance, while class/struct/enum names begin with
|
||||||
capital letters, and special methods like constructors and destructors should
|
capital letters, and special methods like constructors and destructors should
|
||||||
have the names they would have in real C++ code (i.e. `Class::Class` and
|
have the names they would have in real C++ code (i.e. `Class::Class` and
|
||||||
`Class::~Class`, respectively).
|
`Class::~Class`, respectively). Special class methods and members like
|
||||||
|
constructors and vtables must follow their established naming conventions for
|
||||||
|
our tooling to work properly. Also note that you can (mostly) disable name
|
||||||
|
mangling for a symbol by making it a member of the `extern_"C"` namespace,
|
||||||
|
which applies C-style name mangling as used by some symbols.
|
||||||
|
|
||||||
Once an object is ready for extracting, its `Delink?` column should be set to
|
Once an object is ready for extracting, its `Delink?` column should be set to
|
||||||
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue