mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 10:17:03 +03:00
This includes the enhanced export/import scripts and the class fixup script (with the name mangler being used implicitly). With this, the switchover from simple label-based sharing of Ghidra project information to rich type and class information is complete.
340 lines
19 KiB
Markdown
340 lines
19 KiB
Markdown
# Getting Started
|
|
Anybody is welcome to contribute to the decompilation effort! There are two
|
|
main roles a contributor can fulfill:
|
|
|
|
- *Delinking*, which entails analyzing the JSRF executable in-situ to figure
|
|
out how to break it up into small chunks of code and data, and
|
|
- *Decompiling*, which is writing C++ code that compiles down to the same code
|
|
and data found in those chunks.
|
|
|
|
Of these two tasks, the latter is more accessible and benefits more from a
|
|
large group of volunteers, so we'll begin there. Those who want to participate
|
|
in the delinking effort can follow the decompilation guide and then continue on
|
|
to the delinking guide afterwards.
|
|
|
|
|
|
## Setting Up Decompilation
|
|
You'll need a few things to get a decompilation workflow ready:
|
|
|
|
- The JSRF executable (`default.xbe` in the root directory of the game disc) to
|
|
provide the target compiled code to match
|
|
- The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile
|
|
your C++ code
|
|
- You'll also want to add its `Bin/` directory to your `PATH` so that objdiff
|
|
can find it
|
|
- The [Git](https://git-scm.com/) version control tool to clone and work on
|
|
this repository
|
|
- The [Ghidra](https://github.com/NationalSecurityAgency/ghidra) reverse
|
|
engineering tool to analyze and browse the executable
|
|
- The [XBE extension](https://github.com/XboxDev/ghidra-xbe) for Ghidra to
|
|
import and analyze the JSRF executable
|
|
- The [delinker extension](https://github.com/boricj/ghidra-delinker-extension)
|
|
for Ghidra to export object files from the executable
|
|
- The [objdiff](https://github.com/encounter/objdiff) code diffing tool to
|
|
compare your C++ code's compiled output to the delinked object files
|
|
|
|
Keep in mind that Ghidra and its extensions need to have their versions
|
|
coordinated. The safest thing to do is to get the same version of each, e.g.
|
|
11.4. The general flow for installing extensions is to download a release
|
|
`.zip` for the extension from the linked repository's releases page, open
|
|
Ghidra, open the `File > Install Extensions` menu, click the green plus at the
|
|
top right of the extensions window, and then select the `.zip` you just
|
|
downloaded. Make sure the box to the left of the extension's name is checked
|
|
to enable it before clicking "OK" to close the extensions window.
|
|
|
|
With all these tools acquired, the last thing to get is this repository. Clone
|
|
it with `git` in the usual fashion:
|
|
```
|
|
git clone https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
|
|
```
|
|
|
|
The following sections detail how to use all these tools to start writing
|
|
decompiled code.
|
|
|
|
|
|
### Creating a JSRF Ghidra Project
|
|
Even if you have no intention of analyzing the executable in Ghidra otherwise,
|
|
Ghidra is needed to produce the object files that objdiff will compare your
|
|
recompiled code against. This section will only cover the steps needed to get
|
|
to that point.
|
|
|
|
Open Ghidra and create a new project (`File > New Project...`). Select the
|
|
"Non-Shared Project" option, and set whatever location and name you'd like.
|
|
With the project created, open the file import dialogue
|
|
(`File > Import File...`) and select the `default.xbe` from JSRF. Ensure that
|
|
the format in the next window is set to "Xbox Executable Format (XBE)" (if this
|
|
isn't an option, you need to install/enable the XBE extension), and that the
|
|
name is "default.xbe" (our tooling depends on it having this specific name).
|
|
Click "OK," and you should see a window with a successful import results
|
|
summary after a moment (you'll probably see the message
|
|
`[xboxkrnl.exe] -> not found in project`, but this is fine and expected).
|
|
|
|
`default.xbe` should now be visible in the file listing for the project.
|
|
Double click it to open it in the CodeBrowser. The window that opens is where
|
|
you'll do all your in-situ analysis, should you choose to do so. You'll be
|
|
asked whether you want to run analyzers; say yes. Afterwards, simply clicking
|
|
"Analyze" in the analysis options window without changing anything is fine, and
|
|
the analysis will probably take a couple minutes. You can tell that the
|
|
analysis is still running if there's a progress bar in the bottom right saying
|
|
what it's currently analyzing.
|
|
|
|
There's a small oddity that needs fixing: certain parts of memory are marked as
|
|
executable where objdiff doesn't expect them to be, which will mess up our
|
|
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
|
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
|
|
|
Now we'll import data types from the decompilation. Open a Unix-style shell
|
|
(e.g. Git Bash if on Windows) in the `ghidra/` directory of your copy of the
|
|
repository and run `make_header.sh`, which will produce a `jsrf.h` in the same
|
|
directory with the combined contents of every header in a format suitable for
|
|
Ghidra. Then, in Ghidra, select `File > Parse C Source...` to open a window
|
|
for importing C headers. Remove everything from the "Source files to parse"
|
|
and "Parse options" boxes, and add `jsrf.h` to the former (click the green +
|
|
symbol on the right and select the `jsrf.h` file). Click the "..." on the
|
|
"Program Architecture:" box and select the row with the values "x86,"
|
|
"default," "32," "little," and "Visual Studio." Finally, click the "Parse to
|
|
Program" button, "Continue" to confirm, and "Don't Use Open Archives" (the
|
|
header is completely self-contained and doesn't need any information from any
|
|
other data type archives). You should then see a window reporting successful
|
|
import, and you'll be able to find `jsrf.h` with all of its definitions under
|
|
`default.xbe` in the Data Type Manager window in the bottom left.
|
|
|
|
Much of our work with Ghidra will make use of some custom scripts we've
|
|
written, so we'll have to tell it where to find them. Open up the Script
|
|
Manager (`Window > Script Manager`) and then open the Bundle Manager by
|
|
clicking the "manage script directories" button (it looks sort of like a
|
|
bulleted list). Click the green + in the top right to add a new directory and
|
|
select the `ghidra/ghidra_scripts` directory in this repository.
|
|
|
|
The first script we'll want to run is the symbol importer to get known data and
|
|
functions into your Ghidra project. In the Script Manager window, select the
|
|
"Import" category in the left pane and double click the `EnhancedImport.java`
|
|
script in the right pane to run it. You'll then be asked for an input file;
|
|
select `ghidra/symboltable.tsv` from this repository. Afterwards, you'll see a
|
|
bunch of "Importing ..." messages in a console in the main CodeBrowser window,
|
|
some of which may have "can't find data type X" added on if something's marked
|
|
with a type that hasn't made its way into our decompiled code yet, and there'll
|
|
be a bunch of new functions and labels defined.
|
|
|
|
While we imported a bunch of data types earlier, Ghidra's C parser leaves out
|
|
some important information that we'll have to fill in with another script. In
|
|
the Script Manager, run `ClassFixup.java` from the "Data Types" category, and
|
|
you should see some "Converting X to class" and "Fixing calling convention of
|
|
X" messages in the console.
|
|
|
|
Now you've got a Ghidra project containing everything we know about JSRF's
|
|
code! Make sure you save your Ghidra project now that everything's set up.
|
|
|
|
|
|
### Producing Object Files
|
|
Close all of your Ghidra windows and open a Unix-style shell in the
|
|
decompilation repository's `ghidra/` directory. The `delink.sh` script is our
|
|
automated tool for extracting all the object files that have been identified so
|
|
far. The easiest way to run it is to invoke it with three arguments:
|
|
|
|
- The path to your Ghidra installation (the directory with files like
|
|
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
|
`Extensions/`
|
|
- The path to your JSRF Ghidra project (the directory with a `.gpr` file and a
|
|
directory with a name ending in `.rep`)
|
|
- The name of your JSRF Ghidra project
|
|
|
|
If you're on Windows, the paths you provide should be Windows filepaths, not
|
|
Unix-style paths. Make sure the paths are surrounded by quotes, too (e.g.
|
|
`'C:\path\to\whatever'`), else the shell won't understand the backslashes
|
|
correctly.
|
|
|
|
If you find typing out these arguments to be too much of a pain, you can also
|
|
define the environment variables `$GHIDRA_HOME`, `$JSRFDECOMP_PROJECTPATH`, and
|
|
`$JSRFDECOMP_PROJECTNAME` and invoke the script without arguments.
|
|
|
|
There are a couple errors you might get here:
|
|
|
|
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
|
sure you've completely closed every Ghidra window before running `delink.sh`.
|
|
- `Script not found` and `Invalid script`: This means that you haven't added
|
|
the repository's `ghidra_scripts` directory to the script search path as
|
|
described in the previous section (particulary if it mentions
|
|
`MSVC7Mangle.java`), the Ghidra delinker extension isn't properly installed
|
|
(particularly if it mentions `DelinkProgram.java`), or you've somehow invoked
|
|
the script in a way that can't see the scripts (e.g. installing Ghidra on
|
|
Windows and then invoking the script from WSL).
|
|
- `java.lang.RuntimeException: Failed to export ...`: This means that the
|
|
delinker extension doesn't like something about what it was told to delink.
|
|
One known cause is duplicate symbol names. If you haven't modified
|
|
`objects.csv` or `symboltable.tsv`, let other people on the project know so
|
|
that they can look into fixing it.
|
|
|
|
If all goes well, the extracted object files will be in the `decompile/target/`
|
|
directory of the repository. Now we're ready to start recompiling and diffing
|
|
code with objdiff.
|
|
|
|
|
|
### Setting Up objdiff
|
|
Open the objdiff GUI program (by default named something like
|
|
`objdiff-os-arch`, e.g. `objdiff-windows-x86_64.exe`). Click "Settings" in the
|
|
left sidebar and then "Select" next to "Project directory" in the popup window.
|
|
In the file picker, select the `decompile/` directory in the JSRF decompilation
|
|
repository.
|
|
|
|
The sidebar will now have a listing of all the extracted object files. Click
|
|
on one, and you should see two panes: one on the left labelled "Target object"
|
|
that lists the contents of the extracted object file, and one on the right
|
|
listing the contents of the recompiled object file. If the right pane displays
|
|
an error like "program not found," the Visual C++ 7.0 compiler probably wasn't
|
|
correctly set up on your `PATH`.
|
|
|
|
One important piece of information, to make sure you get the correct match
|
|
percentages: set `Diff Options > Function relocation diffs` to "None."
|
|
Otherwise, some references to non-local variables will be marked as nonmatching
|
|
(this is because it's sometimes not possible to make certain things named
|
|
variables in Ghidra, particularly thread-local storage, and other times it's
|
|
not possible to assign a fixed name to certain implicitly generated output in
|
|
the recompiled code).
|
|
|
|
|
|
### Using objdiff
|
|
The basic idea of objdiff is to match up the contents of an object file
|
|
compiled from our own decompiled code to the contents of an object file
|
|
extracted from the game. To that end, functions have to be matched up between
|
|
them. In the best case, corresponding functions in each file will have the
|
|
same name and be in the same section, at which point objdiff can link them
|
|
automatically. Otherwise, one has to click on one of the corresponding
|
|
functions in one pane and the other function in the other pane to tell objdiff
|
|
to link them. The most common cases of this are implicitly generated functions
|
|
and data, such as exception handling code placed in `.text$x` in the recompiled
|
|
object file. Be aware that objdiff's matching does not appear fully reliable
|
|
in some cases, particularly when diffing data with external pointers (which
|
|
appear as `?? ?? ?? ??`) that aren't explicitly marked as non-matching but
|
|
still somehow reduce the match percentage, so you'll have to use a tiny amount
|
|
of judgement to determine when you actually have a match.
|
|
|
|
Clicking on a function that's been linked across both object files shows a diff
|
|
of the disassembly of both versions of the function, with any differences
|
|
highlighted. The task at hand is to modify the function in the corresponding
|
|
source file (in the `decompile/src/` directory) such that the match percentage
|
|
reaches 100%. Depending on how you configure objdiff, it will rebuild
|
|
automatically whenever you save a change to a source file, or you can manually
|
|
rebuild with the "Build" button at the top of the right pane.
|
|
|
|
When viewing and editing decompiled source files, be mindful of the
|
|
`// Status:` annotation above each function, which has the following meanings:
|
|
- `unimplemented`: The decompiled function does not yet reproduce the behaviour
|
|
of the original
|
|
- `nonmatching`: The decompiled function is believed to behave the same as the
|
|
original, but it does not fully match in objdiff
|
|
- `matching`: The decompiled function perfectly matches the original in objdiff
|
|
Be sure to update them as you decompile if appropriate. Some functions may
|
|
also have other annotations describing nontrivial effects of link-time code
|
|
generation (LTCG), such as a nonstandard calling convention or multiple
|
|
functions being merged into one.
|
|
|
|
Otherwise, there are no concrete instructions to give for writing decompiled
|
|
code. Try importing headers from `decompile/src/` into Ghidra
|
|
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
|
decompilation of the function in the CodeBrowser as a starting point for
|
|
writing your matching function, exercising whatever C++ and x86 assembly
|
|
knowledge you have. If you have basic decompilation experience but are new to
|
|
decompiling C++ specifically, you might want to take a look at the
|
|
[Decompiling C++](decompilingcpp.md) article.
|
|
|
|
Whenever you have some decompiled code that you'd like to contribute to the
|
|
repository, commit it to your local copy of the repository and create a merge
|
|
request to merge it back into the online copy.
|
|
|
|
|
|
## Contributing to Delinking
|
|
Getting the JSRF binary delinked is just as important as decompiling the
|
|
resulting object files, but takes a bit more investment. The concrete task of
|
|
a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in
|
|
the `ghidra/` directory, which together enable consistent delinking of object
|
|
files. The former lists symbols at different addresses through the whole
|
|
executable, while the latter lists the address ranges that have been identified
|
|
as separable objects. Both of these things are figured out by combing over the
|
|
whole executable in Ghidra.
|
|
|
|
|
|
### Updating `symboltable.tsv`
|
|
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, you
|
|
can generate a new copy from your Ghidra project by running the
|
|
`EnhancedExport.java` script from the "Export" category. If you want to merge
|
|
the new table into the repository, make sure to take a look at the diff first
|
|
to ensure you're not inadvertently deleting anything.
|
|
|
|
|
|
### Updating `make_header.sh`
|
|
If you've added any header files, you'll want to add them to the `HEADERS`
|
|
variable in `ghidra/make_header.sh`. Make sure that any other header files
|
|
they depend on are earlier in the list, as this script combines everything into
|
|
one file without any `#include` directives. Make sure the script runs
|
|
successfully and Ghidra is able to import the resulting `jsrf.h`.
|
|
|
|
Keep in mind that `make_header.sh` uses a fairly rudimentary `awk` script to
|
|
convert C++ headers to C, which places some gentle constraints on how
|
|
declarations need to be written. In general, it's enough to just keep things
|
|
simple and not do anything unusual (keep data type and variable declarations
|
|
separate, don't use macros for declarations, etc.), but the one big catch is
|
|
that the body of a data type definition must not be on the same line as the
|
|
opening or closing braces. That is, do not write
|
|
```c++
|
|
struct X { unsigned x; };
|
|
```
|
|
but rather
|
|
```c++
|
|
struct X {
|
|
unsigned x;
|
|
};
|
|
```
|
|
|
|
|
|
### Updating `objects.csv`
|
|
`objects.csv` is a listing of addresses for each object file or group of object
|
|
files that we've identified. Each column after the first two corresponds to a
|
|
section of the executable, with filled cells indicating an address range
|
|
occupied by that object file, empty cells indicating that the object occupies
|
|
none of that section, and a `?` indicating an unknown address range or
|
|
boundary. The `Object` column gives the path under `decompile/target/` to
|
|
extract the object file to if the `Delink?` column is `true`, otherwise it's
|
|
just a human-readable label for that row. `delink.sh` parses this file and
|
|
uses any rows marked for delinking to produce object files.
|
|
|
|
A couple criteria should be fulfilled before marking row in `objects.csv` for
|
|
extraction. First, of course, the whole row should be filled with an object
|
|
path and with address ranges that we're certain of. Make sure that not just
|
|
the `.text` section, but also `.text$x` (exception handling code), `.data`,
|
|
`.rdata`, and `.rdata$x` (data pointing to exception-handing code) are included
|
|
in the object file if applicable! Address ranges also should not include any
|
|
padding before or after data or code. Second, all of the symbols within those
|
|
address ranges need to be present in `symboltable.tsv`, else delinking after
|
|
only importing those symbols won't arrange the object file's internals
|
|
correctly (exception-handling code might be appended onto another function, for
|
|
example). Because `symboltable.tsv` should only be populated with symbols that
|
|
have been manually defined as per the previous section, this means that you
|
|
need to define variable names and labels in Ghidra for everything therein (and
|
|
ideally everything referenced externally, as well). Strive to maintain basic
|
|
consistency with the rest of the codebase: functions and methods begin with
|
|
lowercase letters, for instance, while class/struct/enum names begin with
|
|
capital letters, and special methods like constructors and destructors should
|
|
have the names they would have in real C++ code (i.e. `Class::Class` and
|
|
`Class::~Class`, respectively). Special class methods and members like
|
|
constructors and vtables must follow their established naming conventions for
|
|
our tooling to work properly. Also note that you can (mostly) disable name
|
|
mangling for a symbol by making it a member of the `extern_"C"` namespace,
|
|
which applies C-style name mangling as used by some symbols.
|
|
|
|
Once an object is ready for extracting, its `Delink?` column should be set to
|
|
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
|
updated to include it (give it an entry in the `units` list, modelled after
|
|
other existing entries minus the `complete` and `symbol_mappings` fields), plus
|
|
a `.cpp` file (and `.hpp` file if suitable) for it should be added for it in
|
|
the `decompile/src/` directory. Make sure that any relevant data structures
|
|
you've figured out are included in the new source files, then give extraction
|
|
via `delink.sh` a test. Add a new prerequisite to `all:` at the top of the
|
|
`Makefile` at the top of the `decompile/` directory, and add an entry at the
|
|
bottom to record which header files need to be up to date to build the new
|
|
object file (including anything included transitively!). Finally, make sure
|
|
that the new object file builds in objdiff, even if its functions haven't
|
|
actually been implemented yet.
|
|
|
|
When you have it all sorted out, make a merge request to share your work with
|
|
us!
|