mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 02:07:02 +03:00
Create docs directory; begin "Decompiling C++"
This commit is contained in:
parent
683818b637
commit
547f2ba179
3 changed files with 159 additions and 19 deletions
152
documentation/decompilingcpp.md
Normal file
152
documentation/decompilingcpp.md
Normal file
|
|
@ -0,0 +1,152 @@
|
|||
# Decompiling C++
|
||||
Like most (all?) Xbox titles and most sixth-generation games more generally,
|
||||
JSRF is not written in assembly or C as those before it were, but rather C++.
|
||||
C++ introduces new features that both complicate the final machine code and
|
||||
weaken the correspondence between said machine code and the original C++
|
||||
source.
|
||||
|
||||
This guide will cover various C++ features appearing in JSRF, explaining how
|
||||
they manifest in the game's executable and how to properly decompile them, to
|
||||
the extent possible. Basic familiarity with C features (e.g. functions,
|
||||
structs) and how to decompile them is assumed.
|
||||
|
||||
|
||||
## Name Mangling
|
||||
(on the off chance you actually get symbol names, like from debug info; also
|
||||
why symbol names don't match in objdiff)
|
||||
|
||||
|
||||
## Classes
|
||||
C++ classes evolve the C struct to associate the data structure with code,
|
||||
which are called methods in this context. Classes can also inherit from one or
|
||||
more other classes, sharing their data members and access to their methods.
|
||||
Certain special methods called constructors and destructors can also be added
|
||||
to a class, and these can be called implicitly when an instance of a class goes
|
||||
in or out of scope. Classes can also have fields and methods marked as
|
||||
private, but these permissions are usually completely erased during
|
||||
compilation and don't need to be respected by a decompilation.
|
||||
|
||||
### `class` vs. `struct`
|
||||
The `struct` keyword can still be used in C++ and is equivalent to `class`,
|
||||
except that the former makes all members public by default and the latter makes
|
||||
all private by default. Since there's not much reason to make anything private
|
||||
in a decompilation, one will usually use `struct` declarations in
|
||||
decompilations rather than `class`.
|
||||
|
||||
```c++
|
||||
// These two declarations are equivalent
|
||||
class SomeClass {
|
||||
public: // Makes everything after public
|
||||
float someMemberVariable;
|
||||
unsigned anotherMemberVariable;
|
||||
};
|
||||
|
||||
struct SomeStruct {
|
||||
float someMemberVariable;
|
||||
unsigned anotherMemberVariable;
|
||||
};
|
||||
```
|
||||
|
||||
A reasonable way to implement an inherited struct in Ghidra is to define the
|
||||
base class normally, and then define the child with a first member called
|
||||
`super` of the parent class type. Members specific to the child class can then
|
||||
be inserted afterwards.
|
||||
|
||||
### Class Methods
|
||||
Methods are functions declared within a class's namespace, like so:
|
||||
```c++
|
||||
class SomeClass {
|
||||
// Regular data members
|
||||
float someMemberVariable;
|
||||
unsigned anotherMemberVariable;
|
||||
|
||||
// Methods declared in class definition
|
||||
SomeClass(int anArgument); // Constructor
|
||||
~SomeClass(); // Destructor
|
||||
|
||||
void regularMethod(unsigned anArgument);
|
||||
virtual void virtualMethod(char * anArgument);
|
||||
static void staticMethod (char * anArgument);
|
||||
|
||||
// Can also provide entire definition in class
|
||||
float anotherMethod(float x) {
|
||||
this->someMemberVariable += x;
|
||||
return this->someMemberVariable;
|
||||
}
|
||||
};
|
||||
|
||||
// Definition of a method declared in class
|
||||
void SomeClass::regularMethod(unsigned anArgument) {
|
||||
this->anotherMemberVariable -= anArgument;
|
||||
}
|
||||
```
|
||||
|
||||
Methods can then be accessed and called with member access syntax, like
|
||||
`classInstance.regularMethod(3)` and `instancePtr->anotherMethod(1.2)`.
|
||||
|
||||
Static methods are indistinguishable from regular functions in compiled code,
|
||||
so they probably won't see much use in decompilations. They don't have access
|
||||
to the `this` pointer that other types of methods can use.
|
||||
|
||||
Regular methods are similar to regular functions, but have an implicit first
|
||||
argument called `this` representing a pointer to the object that the method
|
||||
was called from. Some C++ implementations use a different calling convention
|
||||
for method calls, such as Microsoft's implementation for the Xbox using the
|
||||
`__thiscall` convention where the `this` pointer is passed in the ECX register
|
||||
while all other arguments are passed on the stack.
|
||||
|
||||
Constructors and destructors function largely like regular methods, but
|
||||
implicitly return the `this` pointer.
|
||||
|
||||
Virtual methods are methods that can be overridden on child classes. They're
|
||||
not called directly, but instead called through a hidden first member that
|
||||
points to an array of method function pointers, usually called a vtable (Visual
|
||||
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
|
||||
made virtual, additional "deleting destructors" may be generated as well, which
|
||||
are methods taking one boolean argument that call the destructor and then,
|
||||
depending on the argument, free the object's memory.
|
||||
|
||||
(TODO: how to implement methods and vtables in Ghidra)
|
||||
|
||||
### Inheritance
|
||||
Child classes can be used in most places that their parent class can be used:
|
||||
```c++
|
||||
// Class inheriting from SomeStruct
|
||||
struct SomeStructChild : SomeStruct {
|
||||
// Inherits these from SomeStruct:
|
||||
// float someMemberVariable;
|
||||
// unsigned anotherMemberVariable;
|
||||
char * additionalMemberVariable;
|
||||
};
|
||||
|
||||
// Could call this with either a SomeStruct* or SomeStructChild* argument
|
||||
float getSomeMemberVariable(SomeStruct const * const ss) {
|
||||
return ss->someMemberVariable;
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## The `new` and `delete` Operators
|
||||
One way to allocate an object in C++ is using `new` and `delete`. The former
|
||||
can both allocate and construct the object, while the latter is analogous to
|
||||
calling `free()`. Each has a corresponding `operator new()` or
|
||||
`operator delete()` function called implicitly.
|
||||
|
||||
The generated code for a use of `new` with a constructor (like
|
||||
`SomeStruct ss = new SomeStruct(7)`) performs the allocator and constructor
|
||||
calls separately, roughly as follows (as it would appear in Ghidra; note that
|
||||
Ghidra shows explicitly the passing of the `this` pointer):
|
||||
```c++
|
||||
SomeStruct *ss;
|
||||
ss = (SomeStruct *)operator_new(0xc);
|
||||
if (ss == NULL) {
|
||||
ss = NULL; // No, I'm not sure what the point of reassigning NULL is
|
||||
}
|
||||
else {
|
||||
SomeStruct::SomeStruct(7);
|
||||
}
|
||||
```
|
||||
|
||||
|
||||
## Exception Handling
|
||||
|
||||
278
documentation/gettingstarted.md
Normal file
278
documentation/gettingstarted.md
Normal file
|
|
@ -0,0 +1,278 @@
|
|||
# Getting Started
|
||||
Anybody is welcome to contribute to the decompilation effort! There are two
|
||||
main roles a contributor can fulfill:
|
||||
|
||||
- *Delinking*, which entails analyzing the JSRF executable in-situ to figure
|
||||
out how to break it up into small chunks of code and data, and
|
||||
- *Decompiling*, which is writing C++ code that compiles down to the same code
|
||||
and data found in those chunks.
|
||||
|
||||
Of these two tasks, the latter is more accessible and benefits more from a
|
||||
large group of volunteers, so we'll begin there. Those who want to participate
|
||||
in the delinking effort can follow the decompilation guide and then continue on
|
||||
to the delinking guide afterwards.
|
||||
|
||||
|
||||
## Setting Up Decompilation
|
||||
You'll need a few things to get a decompilation workflow ready:
|
||||
|
||||
- The JSRF executable (`default.xbe` in the root directory of the game disc) to
|
||||
provide the target compiled code to match
|
||||
- The Microsoft Visual C++ 7.0 (AKA Visual C++ .NET 2002) compiler to compile
|
||||
your C++ code
|
||||
- You'll also want to add its `Bin/` directory to your `PATH` so that objdiff
|
||||
can find it
|
||||
- The [Git](https://git-scm.com/) version control tool to clone and work on
|
||||
this repository
|
||||
- The [Ghidra](https://github.com/NationalSecurityAgency/ghidra) reverse
|
||||
engineering tool to analyze and browse the executable
|
||||
- The [XBE extension](https://github.com/XboxDev/ghidra-xbe) for Ghidra to
|
||||
import and analyze the JSRF executable
|
||||
- The [delinker extension](https://github.com/boricj/ghidra-delinker-extension)
|
||||
for Ghidra to export object files from the executable
|
||||
- The [objdiff](https://github.com/encounter/objdiff) code diffing tool to
|
||||
compare your C++ code's compiled output to the delinked object files
|
||||
|
||||
Keep in mind that Ghidra and its extensions need to have their versions
|
||||
coordinated. The safest thing to do is to get the same version of each, e.g.
|
||||
11.4. The general flow for installing extensions is to download a release
|
||||
`.zip` for the extension from the linked repository's releases page, open
|
||||
Ghidra, open the `File > Install Extensions` menu, click the green plus at the
|
||||
top right of the extensions window, and then select the `.zip` you just
|
||||
downloaded. Make sure the box to the left of the extension's name is checked
|
||||
to enable it before clicking "OK" to close the extensions window.
|
||||
|
||||
With all these tools acquired, the last thing to get is this repository. Clone
|
||||
it with `git` in the usual fashion:
|
||||
```
|
||||
git clone https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
|
||||
```
|
||||
|
||||
The following sections detail how to use all these tools to start writing
|
||||
decompiled code.
|
||||
|
||||
|
||||
### Creating a JSRF Ghidra Project
|
||||
Even if you have no intention of analyzing the executable in Ghidra otherwise,
|
||||
Ghidra is needed to produce the object files that objdiff will compare your
|
||||
recompiled code against. This section will only cover the steps needed to get
|
||||
to that point.
|
||||
|
||||
Open Ghidra and create a new project (`File > New Project...`). Select the
|
||||
"Non-Shared Project" option, and set whatever location and name you'd like.
|
||||
With the project created, open the file import dialogue
|
||||
(`File > Import File...`) and select the `default.xbe` from JSRF. Ensure that
|
||||
the format in the next window is set to "Xbox Executable Format (XBE)" (if this
|
||||
isn't an option, you need to install/enable the XBE extension), and that the
|
||||
name is "default.xbe" (our tooling depends on it having this specific name).
|
||||
Click "OK," and you should see a window with a successful import results
|
||||
summary after a moment (you'll probably see the message
|
||||
`[xboxkrnl.exe] -> not found in project`, but this is fine and expected).
|
||||
|
||||
`default.xbe` should now be visible in the file listing for the project.
|
||||
Double click it to open it in the CodeBrowser. The window that opens is where
|
||||
you'll do all your in-situ analysis, should you choose to do so. You'll be
|
||||
asked whether you want to run analyzers; say yes. Afterwards, simply clicking
|
||||
"Analyze" in the analysis options window without changing anything is fine, and
|
||||
the analysis will probably take a couple minutes.
|
||||
|
||||
There's a small oddity that needs fixing: certain parts of memory are marked as
|
||||
executable where objdiff doesn't expect them to be, which will mess up our
|
||||
diffs. To correct this, open the memory map (`Window > Memory Map`) and
|
||||
uncheck the "X" column for `.rdata`, `.data`, and `DOLBY`.
|
||||
|
||||
Now we'll import symbols from the JSRF decompilation repository. After running
|
||||
the analysis, open the script manager (`Window > Script Manager`) and select
|
||||
the "Data" folder in the left pane. Double click the script titled
|
||||
`ImportSymbolsScript.py`, and a file picker will open after a moment. Select
|
||||
`symboltable.tsv` from the `delink/` directory of your cloned JSRF
|
||||
decompilation repository, and you should see a bunch of `Created function...`
|
||||
and `Created label...` in the scripting console window. Save your changes
|
||||
(save icon in the top left of the CodeBrowser window), and your Ghidra project
|
||||
should be all ready for creating object files for objdiff.
|
||||
|
||||
|
||||
### Producing Object Files
|
||||
Close all of your Ghidra windows and open a shell in the decompilation
|
||||
repository's `delink/` directory. The `delink.sh` script is our automated tool
|
||||
for extracting all the object files that have been identified so far. Invoke
|
||||
it with three arguments:
|
||||
|
||||
- The path to your Ghidra installation (the directory with files like
|
||||
`ghidraRun` and `ghidraRun.bat`, and directories like `docs/` and
|
||||
`Extensions/`
|
||||
- The path to your JSRF Ghidra project (the directory with a `.gpr` file and a
|
||||
directory with a name ending in `.rep`)
|
||||
- The name of your JSRF Ghidra project
|
||||
|
||||
There are two common errors you might get here:
|
||||
|
||||
- `Unable to lock project!`: This means that Ghidra isn't fully closed. Make
|
||||
sure you've completely closed every Ghidra window before running `delink.sh`.
|
||||
- `Script not found: DelinkProgram.java` and
|
||||
`Invalid script: DelinkProgram.java`: This means that the Ghidra delinker
|
||||
extension isn't properly installed. Ensure it's installed and enabled first.
|
||||
|
||||
If all goes well, you'll see the message `Delinking complete!` at the end of
|
||||
the script's output, and the extracted object files will be in the
|
||||
`decompile/target/` directory of the repository. Now we're ready to start
|
||||
recompiling and diffing code with objdiff.
|
||||
|
||||
|
||||
### Setting Up objdiff
|
||||
Open the objdiff GUI program (by default named something like
|
||||
`objdiff-os-arch`, e.g. `objdiff-windows-x86_64.exe`). Click "Settings" in the
|
||||
left sidebar and then "Select" next to "Project directory" in the popup window.
|
||||
In the file picker, select the `decompile/` directory in the JSRF decompilation
|
||||
repository.
|
||||
|
||||
The sidebar will now have a listing of all the extracted object files. Click
|
||||
on one, and you should see two panes: one on the left labelled "Target object"
|
||||
that lists the contents of the extracted object file, and one on the right
|
||||
listing the contents of the recompiled object file. If the right pane displays
|
||||
an error like "program not found," the Visual C++ 7.0 compiler probably wasn't
|
||||
correctly set up on your `PATH`.
|
||||
|
||||
One important piece of information, to make sure you get the correct match
|
||||
percentages: set `Diff Options > Function relocation diffs` to "None."
|
||||
Otherwise, approximately all references to functions and non-local variables
|
||||
will be marked as nonmatching (this has to do with the delinking process not
|
||||
applying name mangling, which isn't expected to be fixed).
|
||||
|
||||
|
||||
### Using objdiff
|
||||
The basic idea of objdiff is to match up the contents of an object file
|
||||
compiled from our own decompiled code to the contents of an object file
|
||||
extracted from the game. To that end, functions have to be matched up between
|
||||
them. In the best case, corresponding functions in each file will have the
|
||||
same name and be in the same section, at which point objdiff can link them
|
||||
automatically. Otherwise, one has to click on one of the corresponding
|
||||
functions in one pane and the other function in the other pane to tell objdiff
|
||||
to link them. Common cases of this are class methods (the names won't match)
|
||||
and implicitly generated functions, such as exception handling code placed in
|
||||
`.text$x` in the recompiled object file. Keep in mind that objdiff's matching
|
||||
does not appear fully reliable in some cases, particularly when diffing data
|
||||
with external pointers (which appear as `?? ?? ?? ??`) that aren't explicitly
|
||||
marked as non-matching but still somehow reduce the match percentage, so you'll
|
||||
have to use a tiny amount of judgement to determine when you actually have a
|
||||
match.
|
||||
|
||||
Clicking on a function that's been linked across both object files shows a diff
|
||||
of the disassembly of both versions of the function, with any differences
|
||||
highlighted. The task at hand is to modify the function in the corresponding
|
||||
source file (in the `decompile/src/` directory) such that the match percentage
|
||||
reaches 100%. Depending on how you configure objdiff, it will rebuild
|
||||
automatically whenever you save a change to a source file, or you can manually
|
||||
rebuild with the "Build" button at the top of the right pane.
|
||||
|
||||
There are no concrete instructions to give for writing decompiled code. Try
|
||||
importing headers from `decompile/src/` into Ghidra
|
||||
(`File > Parse C Source...`) to get access to JSRF classes, and use Ghidra's
|
||||
decompilation of the function in the CodeBrowser as a starting point for
|
||||
writing your matching function, exercising whatever C++ and x86 assembly
|
||||
knowledge you have. If you have basic decompilation experience but are new to
|
||||
decompiling C++ specifically, you might want to take a look at the
|
||||
[Decompiling C++](decompilingcpp.md) article.
|
||||
|
||||
Whenever you have some decompiled code that you'd like to contribute to the
|
||||
repository, commit it to your local copy of the repository and create a merge
|
||||
request to merge it back into the online copy.
|
||||
|
||||
|
||||
## Contributing to Delinking
|
||||
Getting the JSRF binary delinked is just as important as decompiling the
|
||||
resulting object files, but takes a bit more investment. The concrete task of
|
||||
a delinking contributor is to populate `symboltable.tsv` and `objects.csv` in
|
||||
the `delink/` directory, which together enable consistent delinking of object
|
||||
files. The former lists symbols at different addresses through the whole
|
||||
executable, while the latter lists the address ranges that have been identified
|
||||
as separable objects. Both of these things are figured out by combing over the
|
||||
whole executable in Ghidra.
|
||||
|
||||
|
||||
### Updating `symboltable.tsv`
|
||||
If you have got a bunch of symbols you'd like to add to `symboltable.tsv`, a
|
||||
workflow has been devised to generate it from your Ghidra project. Before
|
||||
regenerating the table, however, make sure that you have all of it symbols
|
||||
already in your project so that you don't end up deleting any. One option is
|
||||
to import `symboltable.tsv` into your project with the `ImportSymbolsScript.py`
|
||||
script as mentioned under "Creating a JSRF Ghidra Project," but be aware that
|
||||
this will overwrite any names you've assigned to the same symbols. You will
|
||||
also have to ensure that no two symbols share the same name. This can be
|
||||
avoided by using namespaces if need be (i.e. `X::symbol` and `Y::symbol` may
|
||||
coexist), but function overloading must be avoided (you may not have one
|
||||
function with the signature `void X::f(int)` and another with the signature
|
||||
`void X::f(float)`), else errors can arise when delinking, as the delinker
|
||||
extension does not mangle symbol names.
|
||||
|
||||
Once you're ready to export your symbols, open the symbol table
|
||||
(`Window > Symbol Table`). Open the symbol filter window (cog button near the
|
||||
top right), and uncheck everything but "User Defined" under "Symbol Source,"
|
||||
"Data Labels" and "Function Labels" under "Symbol Types," "Use Advanced
|
||||
Filters," and "Non-Externals" under "Non-Externals." This ensures that you
|
||||
only export symbols that you've defined and that are useful for delinking.
|
||||
|
||||
Now we need to configure the columns that we want to export. Right-click on
|
||||
one of the colum headers, click "Add/Remove Columns..." to open the "Select
|
||||
Columns" window, and in it check only "Location," "Name," and "Type." Click
|
||||
"OK" to close the window and ensure that the column order is "Name,"
|
||||
"Location," "Type" (you can drag the column headers to reorder them if needed).
|
||||
|
||||
Now, to actually export the table, right-click on one of the table cells, click
|
||||
"Select All," and then right-click again on a cell to select "Export > Export
|
||||
to CSV..." before selecting where to save your exported symbol table.
|
||||
|
||||
The final step is converting this CSV file to the format expected by
|
||||
`ImportSymbolsScript.py`. Open a shell in the repository's `delink/` directory
|
||||
and run `make_symboltable.sh` with the path of your exported CSV as an
|
||||
argument, and `symboltable.tsv` will be overwritten with a new table containing
|
||||
your exported symbols.
|
||||
|
||||
|
||||
### Updating `objects.csv`
|
||||
`objects.csv` is a listing of addresses for each object file or group of object
|
||||
files that we've identified. Each column after the first two corresponds to a
|
||||
section of the executable, with filled cells indicating an address range
|
||||
occupied by that object file, empty cells indicating that the object occupies
|
||||
none of that section, and a `?` indicating an unknown address range or
|
||||
boundary. The `Object` column gives the path under `decompile/target/` to
|
||||
extract the object file to if the `Delink?` column is `true`, otherwise it's
|
||||
just a human-readable label for that row. `delink.sh` parses this file and
|
||||
uses any rows marked for delinking to produce object files.
|
||||
|
||||
A couple criteria should be fulfilled before marking row in `objects.csv` for
|
||||
extraction. First, of course, the whole row should be filled with an object
|
||||
path and with address ranges that we're certain of. Make sure that not just
|
||||
the `.text` section, but also `.text$x` (exception handling code), `.data`,
|
||||
`.rdata`, and `.rdata$x` (data pointing to exception-handing code) are included
|
||||
in the object file if applicable! Address ranges also should not include any
|
||||
padding before or after data or code. Second, all of the symbols within those
|
||||
address ranges need to be present in `symboltable.tsv`, else delinking after
|
||||
only importing those symbols won't arrange the object file's internals
|
||||
correctly (exception-handling code might be appended onto another function, for
|
||||
example). Because `symboltable.tsv` should only be populated with symbols that
|
||||
have been manually defined as per the previous section, this means that you
|
||||
need to define variable names and labels in Ghidra for everything therein (and
|
||||
ideally everything referenced externally, as well). Do try to maintain basic
|
||||
consistency with the rest of the codebase: functions and methods begin with
|
||||
lowercase letters, for instance, while class/struct/enum names begin with
|
||||
capital letters, and special methods like constructors and destructors should
|
||||
have the names they would have in real C++ code (i.e. `Class::Class` and
|
||||
`Class::~Class`, respectively).
|
||||
|
||||
Once an object is ready for extracting, its `Delink?` column should be set to
|
||||
`true` and the `objdiff.json` file in the `decompile/` directory should be
|
||||
updated to include it (give it an entry in the `units` list, modelled after
|
||||
other existing entries minus the `complete` and `symbol_mappings` fields), plus
|
||||
a `.cpp` file (and `.hpp` file if suitable) for it should be added for it in
|
||||
the `decompile/src/` directory. Make sure that any relevant data structures
|
||||
you've figured out are included in the new source files, then give extraction
|
||||
via `delink.sh` a test. Add a new prerequisite to `all:` at the top of the
|
||||
`Makefile` at the top of the `decompile/` directory, and add an entry at the
|
||||
bottom to record which header files need to be up to date to build the new
|
||||
object file (including anything included transitively!). Finally, make sure
|
||||
that the new object file builds in objdiff, even if its functions haven't
|
||||
actually been implemented yet.
|
||||
|
||||
When you have it all sorted out, make a merge request to share your work with
|
||||
us!
|
||||
Loading…
Add table
Add a link
Reference in a new issue