mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 02:07:02 +03:00
291 lines
16 KiB
Markdown
291 lines
16 KiB
Markdown
# Getting Started
|
|
Anybody is welcome to contribute to the decompilation effort! There are two
|
|
main roles a contributor can fulfill:
|
|
|
|
- *Delinking*, which entails analyzing the JSRF executable in-situ to figure
|
|
out how to break it up into small chunks of code and data, and
|
|
- *Decompiling*, which is writing C++ code that compiles down to the same code
|
|
and data found in those chunks.
|
|
|
|
Of these two tasks, the latter is more accessible and benefits more from a
|
|
large group of volunteers, so we'll begin there. Those who want to participate
|
|
in the delinking effort can follow the decompilation guide and then continue on
|
|
to the delinking guide afterwards.
|
|
|
|
|
|
## Setting Up Decompilation
|
|
You'll need a few things to get a decompilation workflow ready:
|
|
|
|
- The JSRF executable (`default.xbe` in the root directory of the game disc) to
|
|
provide the target compiled code to match
|
|
- The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile
|
|
your C++ code
|
|
- You'll also want to add its `Bin/` directory to your `PATH` so that objdiff
|
|
can find it
|
|
- The [Git](https://git-scm.com/) version control tool to clone and work on
|
|
this repository
|
|
- The [Ghidra](https://github.com/NationalSecurityAgency/ghidra) reverse
|
|
engineering tool to analyze and browse the executable
|
|
- The [XBE extension](https://github.com/XboxDev/ghidra-xbe) for Ghidra to
|
|
import and analyze the JSRF executable
|
|
- The [delinker extension](https://github.com/boricj/ghidra-delinker-extension)
|
|
for Ghidra to export object files from the executable
|
|
- The [objdiff](https://github.com/encounter/objdiff) code diffing tool to
|
|
compare your C++ code's compiled output to the delinked object files
|
|
|
|
Keep in mind that Ghidra and its extensions need to have their versions
|
|
coordinated. The safest thing to do is to get the same version of each, e.g.
|
|
11.4. The general flow for installing extensions is to download a release
|
|
`.zip` for the extension from the linked repository's releases page, open
|
|
Ghidra, open the `File > Install Extensions` menu, click the green plus at the
|
|
top right of the extensions window, and then select the `.zip` you just
|
|
downloaded. Make sure the box to the left of the extension's name is checked
|
|
to enable it before clicking "OK" to close the extensions window.
|
|
|
|
With all these tools acquired, the last thing to get is this repository. Clone
|
|
it with `git` in the usual fashion:
|
|
```
|
|
git clone https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
|
|
```
|
|
|
|
The following sections detail how to use all these tools to start writing
|
|
decompiled code.
|
|
|
|
|
|
### Creating a JSRF Ghidra Project
|
|
Even if you have no intention of analyzing the executable in Ghidra otherwise,
|
|
Ghidra is needed to produce the object files that objdiff will compare your
|
|
recompiled code against. This section will only cover the steps needed to get
|
|
to that point.
|
|
|
|
Open Ghidra and create a new project (`File > New Project...`). Select the
|
|
"Non-Shared Project" option, and set whatever location and name you'd like.
|
|
With the project created, open the file import dialogue
|
|
(`File > Import File...`) and select the `default.xbe` from JSRF. Ensure that
|
|
the format in the next window is set to "Xbox Executable Format (XBE)" (if this
|
|
isn't an option, you need to install/enable the XBE extension), and that the
|
|
name is "default.xbe" (our tooling depends on it having this specific name).
|
|
Click "OK," and you should see a window with a successful import results
|
|
summary after a moment (you'll probably see the message
|
|
`[xboxkrnl.exe] -> not found in project`, but this is fine and expected).
|
|
|
|
`default.xbe` should now be visible in the file listing for the project.
|
|
Double click it to open it in the CodeBrowser. The window that opens is where
|
|
you'll do all your in-situ analysis, should you choose to do so. You'll be
|
|
asked whether you want to run analyzers; say yes. Afterwards, simply clicking
|
|
"Analyze" in the analysis options window without changing anything is fine, and
|
|
the analysis will probably take a couple minutes.
|
|
|
|
There's a small oddity that needs fixing: certain parts of memory are marked as
|
|
executable where objdiff doesn't expect them to be, which will mess up our
|
|
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
|
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
|
|
|
Now we'll import symbols from the JSRF decompilation repository. After running
|
|
the analysis, open the script manager (`Window > Script Manager`) and select
|
|
the "Data" folder in the left pane. Double click the script titled
|
|
`ImportSymbolsScript.py`, and a file picker will open after a moment. Select
|
|
`symboltable.tsv` from the `delink/` directory of your cloned JSRF
|
|
decompilation repository, and you should see a bunch of `Created function...`
|
|
and `Created label...` in the scripting console window. Save your changes
|
|
(save icon in the top left of the CodeBrowser window), and your Ghidra project
|
|
should be all ready for creating object files for objdiff.
|
|
|
|
|
|
### Producing Object Files
|
|
Close all of your Ghidra windows and open a shell in the decompilation
|
|
repository's `delink/` directory. The `delink.sh` script is our automated tool
|
|
for extracting all the object files that have been identified so far. Invoke
|
|
it with three arguments:
|
|
|
|
- The path to your Ghidra installation (the directory with files like
|
|
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
|
`Extensions/`
|
|
- The path to your JSRF Ghidra project (the directory with a `.gpr` file and a
|
|
directory with a name ending in `.rep`)
|
|
- The name of your JSRF Ghidra project
|
|
|
|
There are two common errors you might get here:
|
|
|
|
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
|
sure you've completely closed every Ghidra window before running `delink.sh`.
|
|
- `Script not found: DelinkProgram.java` and
|
|
`Invalid script: DelinkProgram.java`: This means that the Ghidra delinker
|
|
extension isn't properly installed. Ensure it's installed and enabled first.
|
|
|
|
If all goes well, you'll see the message `Delinking complete!` at the end of
|
|
the script's output, and the extracted object files will be in the
|
|
`decompile/target/` directory of the repository. Now we're ready to start
|
|
recompiling and diffing code with objdiff.
|
|
|
|
|
|
### Setting Up objdiff
|
|
Open the objdiff GUI program (by default named something like
|
|
`objdiff-os-arch`, e.g. `objdiff-windows-x86_64.exe`). Click "Settings" in the
|
|
left sidebar and then "Select" next to "Project directory" in the popup window.
|
|
In the file picker, select the `decompile/` directory in the JSRF decompilation
|
|
repository.
|
|
|
|
The sidebar will now have a listing of all the extracted object files. Click
|
|
on one, and you should see two panes: one on the left labelled "Target object"
|
|
that lists the contents of the extracted object file, and one on the right
|
|
listing the contents of the recompiled object file. If the right pane displays
|
|
an error like "program not found," the Visual C++ 7.0 compiler probably wasn't
|
|
correctly set up on your `PATH`.
|
|
|
|
One important piece of information, to make sure you get the correct match
|
|
percentages: set `Diff Options > Function relocation diffs` to "None."
|
|
Otherwise, approximately all references to functions and non-local variables
|
|
will be marked as nonmatching (this has to do with the delinking process not
|
|
applying name mangling, which isn't expected to be fixed).
|
|
|
|
|
|
### Using objdiff
|
|
The basic idea of objdiff is to match up the contents of an object file
|
|
compiled from our own decompiled code to the contents of an object file
|
|
extracted from the game. To that end, functions have to be matched up between
|
|
them. In the best case, corresponding functions in each file will have the
|
|
same name and be in the same section, at which point objdiff can link them
|
|
automatically. Otherwise, one has to click on one of the corresponding
|
|
functions in one pane and the other function in the other pane to tell objdiff
|
|
to link them. Common cases of this are class methods (the names won't match)
|
|
and implicitly generated functions, such as exception handling code placed in
|
|
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching
|
|
does not appear fully reliable in some cases, particularly when diffing data
|
|
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
|
|
marked as non-matching but still somehow reduce the match percentage, so you'll
|
|
have to use a tiny amount of judgement to determine when you actually have a
|
|
match.
|
|
|
|
Clicking on a function that's been linked across both object files shows a diff
|
|
of the disassembly of both versions of the function, with any differences
|
|
highlighted. The task at hand is to modify the function in the corresponding
|
|
source file (in the `decompile/src/` directory) such that the match percentage
|
|
reaches 100%. Depending on how you configure objdiff, it will rebuild
|
|
automatically whenever you save a change to a source file, or you can manually
|
|
rebuild with the "Build" button at the top of the right pane.
|
|
|
|
There are no concrete instructions to give for writing decompiled code. Try
|
|
importing headers from `decompile/src/` into Ghidra
|
|
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
|
decompilation of the function in the CodeBrowser as a starting point for
|
|
writing your matching function, exercising whatever C++ and x86 assembly
|
|
knowledge you have. Exception handling code in particular can appear in
|
|
unexpected places (e.g. around `new` statements and in constructors) and has
|
|
unambiguous but nonobvious signs in the disassembly, so it might be worth
|
|
[reading](https://www.openrce.org/articles/full_view/21) up
|
|
[on](https://www.openrce.org/articles/full_view/23) how they're
|
|
[implemented](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx)
|
|
to learn to recognize them in disassembly and recreate them in C++ code.
|
|
|
|
Whenever you have some decompiled code that you'd like to contribute to the
|
|
repository, commit it to your local copy of the repository and create a merge
|
|
request to merge it back into the online copy.
|
|
|
|
|
|
## Contributing to Delinking
|
|
Getting the JSRF binary delinked is just as important as decompiling the
|
|
resulting object files, but takes a bit more investment. The concrete task of
|
|
a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in
|
|
the `delink/` directory, which together enable consistent delinking of object
|
|
files. The former lists symbols at different addresses through the whole
|
|
executable, while the latter lists the address ranges that have been identified
|
|
as separable objects. Both of these things are figured out by combing over the
|
|
whole executable in Ghidra.
|
|
|
|
|
|
### Updating `symboltable.tsv`
|
|
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
|
|
workflow has been devised to generate it from your Ghidra project. Before
|
|
regenerating the table, however, make sure that you have all of it symbols
|
|
already in your project so that you don't end up deleting any. One option is
|
|
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
|
|
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
|
|
this will overwrite any names you've assigned to the same symbols. You will
|
|
also have to ensure that no two symbols share the same name. This can be
|
|
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
|
|
coexist), but function overloading must be avoided (you may not have one
|
|
function with the signature `void X::f(int)` and another with the signature
|
|
`void X::f(float)`), else errors can arise when delinking, as the delinker
|
|
extension does not mangle symbol names.
|
|
|
|
Once you're ready to export your symbols, open the symbol table
|
|
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
|
|
top right), and uncheck everything but "User Defined" under "Symbol Source,"
|
|
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
|
|
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
|
|
only export symbols that you've defined and that are useful for delinking.
|
|
|
|
Now we need to configure the columns that we want to export. Right-click on
|
|
one of the colum headers, click "Add/Remove Columns..." to open the "Select
|
|
Columns" window, and in it check only "Location," "Name," and "Type." Click
|
|
"OK" to close the window and ensure that the column order is "Name,"
|
|
"Location," "Type" (you can drag the column headers to reorder them if needed).
|
|
|
|
Now, to actually export the table, right-click on one of the table cells, click
|
|
"Select All," and then right-click again on a cell to select "Export > Export
|
|
to CSV..." before selecting where to save your exported symbol table.
|
|
|
|
The final step is converting this CSV file to the format expected by
|
|
`ImportSymbolsScript.py`. Open a shell in the repository's `delink/` directory
|
|
and run `make_symboltable.sh` with the path of your exported CSV as an
|
|
argument, and `symboltable.tsv` will be overwritten with a new table containing
|
|
your exported symbols.
|
|
|
|
|
|
### Updating `objects.csv`
|
|
`objects.csv` is a listing of addresses for each object file or group of object
|
|
files that we've identified. Each column after the first two corresponds to a
|
|
section of the executable, with filled cells indicating an address range
|
|
occupied by that object file, empty cells indicating that the object occupies
|
|
none of that section, and a `?` indicating an unknown address range or
|
|
boundary. The `Object` column gives the path under `decompile/target/` to
|
|
extract the object file to if the `Delink?` column is `true`, otherwise it's
|
|
just a human-readable label for that row. `delink.sh` parses this file and
|
|
uses any rows marked for delinking to produce object files.
|
|
|
|
A couple criteria should be fulfilled before marking row in `objects.csv` for
|
|
extraction. First, of course, the whole row should be filled with an object
|
|
path and with address ranges that we're certain of. Make sure that not just
|
|
the `.text` section, but also `.text$x` (exception handling code), `.data`,
|
|
`.rdata`, and `.rdata$x` (data pointing to exception-handing code) are included
|
|
in the object file if applicable! Address ranges also should not include any
|
|
padding before or after data or code. Second, all of the symbols within those
|
|
address ranges need to be present in `symboltable.tsv`, else delinking after
|
|
only importing those symbols won't arrange the object file's internals
|
|
correctly (exception-handling code might be appended onto another function, for
|
|
example). Because `symboltable.tsv` should only be populated with symbols that
|
|
have been manually defined as per the previous section, this means that you
|
|
need to define variable names and labels in Ghidra for everything therein (and
|
|
ideally everything referenced externally, as well). Do try to maintain basic
|
|
consistency with the rest of the codebase: functions and methods begin with
|
|
lowercase letters, for instance, while class/struct/enum names begin with
|
|
capital letters, and special methods like constructors and destructors should
|
|
have the names they would have in real C++ code (i.e. `Class::Class` and
|
|
`Class::~Class`, respectively).
|
|
|
|
Once an object is ready for extracting, its `Delink?` column should be set to
|
|
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
|
updated to include it (give it an entry in the `units` list, modelled after
|
|
other existing entries minus the `complete` and `symbol_mappings` fields), plus
|
|
a `.cpp` file (and `.hpp` file if suitable) for it should be added for it in
|
|
the `decompile/src/` directory. Make sure that any relevant data structures
|
|
you've figured out are included in the new source files, then give extraction
|
|
via `delink.sh` a test. Add a new prerequisite to `all:` at the top of the
|
|
`Makefile` at the top of the `decompile/` directory, and add an entry at the
|
|
bottom to record which header files need to be up to date to build the new
|
|
object file (including anything included transitively!). Finally, make sure
|
|
that the new object file builds in objdiff, even if its functions haven't
|
|
actually been implemented yet.
|
|
|
|
When you have it all sorted out, make a merge request to share your work with
|
|
us!
|
|
|
|
|
|
# Special Topics
|
|
This would be a good place to include guidance on some trickier aspects of
|
|
reverse engineering C++ code, like an accessible explanation of navigating
|
|
exception handling in Ghidra, implementing classes with virtual methods or
|
|
inheritance Ghidra and writing decompiled code for them, or what in the world a
|
|
COM object is and how to make Ghidra understand it (especially the one wrapping
|
|
all of JSRF's Direct3D calls).
|