mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 10:17:03 +03:00
353 lines
15 KiB
Markdown
353 lines
15 KiB
Markdown
# Decompiling C++
|
|
Like most (all?) Xbox titles and most sixth-generation games more generally,
|
|
JSRF is not written in assembly or C as those before it were, but rather C++.
|
|
C++ introduces new features that both complicate the final machine code and
|
|
weaken the correspondence between said machine code and the original C++
|
|
source.
|
|
|
|
This guide will cover various C++ features appearing in JSRF, explaining how
|
|
they manifest in the game's executable and how to properly decompile them, to
|
|
the extent possible. Basic familiarity with C features (e.g. functions,
|
|
structs) and how to decompile them is assumed.
|
|
|
|
|
|
## Name Mangling
|
|
Whenever you encounter symbol names actually produced by a C++ compiler, like
|
|
when recompiling decompiled code, they'll probably look garbled like
|
|
`??_GGameObj@@UAEPAXI@Z` or `_ZN7GameObjD1Ev` depending on the compiler. These
|
|
are mangled names, used by compilers to prevent conflicts from overloaded
|
|
functions, communicate additional information about symbols, and so on.
|
|
|
|
Many tools can print these in human-readable form to produce e.g.
|
|
`` public: virtual void * __thiscall GameObj::`scalar deleting destructor'(unsigned int) ``,
|
|
and objdiff will do so by default. When using the Ghidra delinking tool
|
|
specifically, it's important to keep in mind that the delinked symbol names do
|
|
_not_ get mangled, so they won't have the exact same names as in the recompiled
|
|
code, and corresponding symbols in the delinked and recompiled object files
|
|
will need to be associated by hand.
|
|
|
|
|
|
## Classes
|
|
C++ classes evolve the C struct to associate the data structure with code,
|
|
which are called methods in this context. Classes can also inherit from one or
|
|
more other classes, sharing their data members and access to their methods.
|
|
Certain special methods called constructors and destructors can also be added
|
|
to a class, and these can be called implicitly when an instance of a class goes
|
|
in or out of scope. Classes can also have fields and methods marked as
|
|
private, but these permissions are usually completely erased during
|
|
compilation and don't need to be respected by a decompilation.
|
|
|
|
### `class` vs. `struct`
|
|
The `struct` keyword can still be used in C++ and is equivalent to `class`,
|
|
except that the former makes all members public by default and the latter makes
|
|
all private by default. Since there's not much reason to make anything private
|
|
in a decompilation, one will usually use `struct` declarations in
|
|
decompilations rather than `class`.
|
|
|
|
```c++
|
|
// These two declarations are equivalent
|
|
class SomeClass {
|
|
public: // Makes everything after public
|
|
float someMemberVariable;
|
|
unsigned anotherMemberVariable;
|
|
};
|
|
|
|
struct SomeStruct {
|
|
float someMemberVariable;
|
|
unsigned anotherMemberVariable;
|
|
};
|
|
```
|
|
|
|
A reasonable way to implement an inherited struct in Ghidra is to define the
|
|
base class normally, and then define the child with a first member called
|
|
`super` of the parent class type. Members specific to the child class can then
|
|
be inserted afterwards.
|
|
|
|
### Class Methods
|
|
Methods are functions declared within a class's namespace, like so:
|
|
```c++
|
|
struct SomeClass {
|
|
// Regular data members
|
|
float someMemberVariable;
|
|
unsigned anotherMemberVariable;
|
|
|
|
// Methods declared in class definition
|
|
SomeClass(int anArgument); // Constructor
|
|
~SomeClass(); // Destructor
|
|
|
|
void regularMethod(unsigned anArgument);
|
|
virtual void virtualMethod(char * anArgument);
|
|
static void staticMethod (char * anArgument);
|
|
|
|
// Can also provide entire definition in class
|
|
float anotherMethod(float x) {
|
|
this->someMemberVariable += x;
|
|
return this->someMemberVariable;
|
|
}
|
|
};
|
|
|
|
// Definition of a method declared in class
|
|
void SomeClass::regularMethod(unsigned anArgument) {
|
|
this->anotherMemberVariable -= anArgument;
|
|
}
|
|
```
|
|
|
|
Methods can then be accessed and called with member access syntax, like
|
|
`classInstance.regularMethod(3)` and `instancePtr->anotherMethod(1.2)`.
|
|
|
|
Static methods are indistinguishable from regular functions in compiled code,
|
|
so they probably won't see much use in decompilations. They don't have access
|
|
to the `this` pointer that other types of methods can use.
|
|
|
|
Regular methods are similar to regular functions, but have an implicit first
|
|
argument called `this` representing a pointer to the object that the method
|
|
was called from. Some C++ implementations use a different calling convention
|
|
for method calls, such as Microsoft's implementation for the Xbox using the
|
|
`__thiscall` convention where the `this` pointer is passed in the ECX register
|
|
while all other arguments are passed on the stack.
|
|
|
|
Constructors and destructors function largely like regular methods, but
|
|
implicitly return the `this` pointer. C++ makes certain guarantees about
|
|
objects that have constructors and destructors that obligate the compiler to
|
|
insertt additional code in certain circumstances: be aware, for instance, that
|
|
constructor calls will often be wrapped with stack unwinding code in case an
|
|
exception is thrown from within the constructor (see the exception handling
|
|
section). An object's destructor is also automatically called at the end of
|
|
its lifetime (e.g. it goes out of scope), which can lead to inclusion in
|
|
exception handling code or just being called at the end of a code block even if
|
|
the source code doesn't invoke it explicitly. This automatic resource
|
|
management is often called part of C++'s RAII (resource acquisition is
|
|
initialization) design.
|
|
|
|
Virtual methods are methods that can be overridden on child classes. They're
|
|
not called directly, but instead called through a hidden first member that
|
|
points to an array of method function pointers, usually called a vtable (Visual
|
|
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
|
|
made virtual, additional "deleting destructors" may be generated as well, which
|
|
are methods taking one `unsigned` argument that call the destructor and then,
|
|
depending on the argument, free the object's memory.
|
|
|
|
Ghidra has somewhat obscure support for classes and regular methods, and
|
|
virtual methods can be made to work with some admittedly tedious effort.
|
|
|
|
A class can be defined right-clicking on "Classes" in the symbol tree window
|
|
and selecting "Create Class." Symbols (e.g. methods) can then be added to this
|
|
class by putting them in the class's namespace, i.e. opening the Add/Edit Label
|
|
or Rename Function window (usually from right-clicking something or its name)
|
|
and adding the class name as a prefix, e.g. `ClassName::someSymbol`. Be aware
|
|
that certain windows like the Edit Function window have no awareness of
|
|
namespaces, and trying to add the namespace prefix will just modify the symbol
|
|
name directly without actually adding it to the namespace. For Microsoft code
|
|
(e.g. Xbox), applying the appropriate `__thiscall` calling convention enables a
|
|
special behaviour where the first argument passed in ECX is forcibly named
|
|
`this` and has a fixed pointer type (by default `void *`). If the method is
|
|
placed in a class's namespace, however, and a struct of the same name exists,
|
|
the `this` pointer's type will be set to that struct.
|
|
|
|
Since virtual method calls go through pointers rather than calling a function
|
|
at a fixed address, they show up in Ghidra as unsightly member accesses like
|
|
`(**(code **)(*g_graphics + 0x160))(g_graphics,0)`. One can however
|
|
simulate vtables by hand to get these calls resolving to something somewhat
|
|
more manageable like `(*g_graphics->vtable->setFogEnable)(g_graphics,0)` with
|
|
the correct number and types of arguments. The class's first member (which is
|
|
a link to its vtable) can be set to a pointer to a new struct type whose
|
|
members are pointers to functions defined in the data type manager (right click
|
|
and then `New > Function Definition...`). To actually access the methods'
|
|
definitions (keeping in mind there are likely multiple for different classes
|
|
inheriting from the same base class), it will be necessary to either find
|
|
where the vtable is assigned (the class constructor is a good choice) or
|
|
potentially examine the first member of an instance of the class at runtime
|
|
with the help of Cheat Engine or an emulator's memory viewer.
|
|
|
|
### Inheritance
|
|
Child classes can be used in most places that their parent class can be used:
|
|
```c++
|
|
// Class inheriting from SomeStruct
|
|
struct SomeStructChild : SomeStruct {
|
|
// Inherits these from SomeStruct:
|
|
// float someMemberVariable;
|
|
// unsigned anotherMemberVariable;
|
|
char * additionalMemberVariable;
|
|
};
|
|
|
|
// Could call this with either a SomeStruct* or SomeStructChild* argument
|
|
float getSomeMemberVariable(SomeStruct const * const ss) {
|
|
return ss->someMemberVariable;
|
|
}
|
|
```
|
|
|
|
|
|
## The `new` and `delete` Operators
|
|
One way to allocate an object in C++ is using `new` and `delete`. The former
|
|
can both allocate and construct the object, while the latter is analogous to
|
|
calling `free()`. Each has a corresponding `operator new()` or
|
|
`operator delete()` function called implicitly.
|
|
|
|
The generated code for a use of `new` with a constructor (like
|
|
`SomeStruct ss = new SomeStruct(7)`) performs the allocator and constructor
|
|
calls separately, roughly as follows (as it would appear in Ghidra; note that
|
|
Ghidra shows explicitly the passing of the `this` pointer):
|
|
```c++
|
|
SomeStruct *ss;
|
|
ss = (SomeStruct *)operator_new(0xc);
|
|
if (ss == NULL) {
|
|
ss = NULL; // No, I'm not sure what the point of reassigning NULL is
|
|
}
|
|
else {
|
|
SomeStruct::SomeStruct(7);
|
|
}
|
|
```
|
|
|
|
|
|
## Exception Handling
|
|
C++ offers the ability to throw and catch exceptions, which have highly
|
|
platform-specific implementations that require some sophistication to uphold
|
|
the language's guarantees about object initialization and destruction. In
|
|
particular, some hidden bookkeeping needs to be done to implement `try` and
|
|
`catch` blocks, as well as keep track of what cleanup needs to be done if an
|
|
exception is thrown (part of a process known as stack unwinding, i.e. walking
|
|
back up the call stack until the exception is caught or the top is reached).
|
|
|
|
We'll focus here on the Microsoft implementation found in Xbox games. The FS
|
|
register holds the last item of a linked list of structures with exception
|
|
handling information, defined thusly:
|
|
```c++
|
|
struct EXCEPTION_REGISTRATION_RECORD {
|
|
EXCEPTION_REGISTRATION_RECORD * next; // Next item in linked list
|
|
EXCEPTION_ROUTINE * handler; // Function pointer
|
|
};
|
|
```
|
|
|
|
Functions with any exception handling or stack unwinding will have a prologue
|
|
like the following in Ghidra:
|
|
```c++
|
|
undefined4 *unaff_FS_OFFSET;
|
|
undefined4 local_c;
|
|
undefined *puStack_8;
|
|
undefined4 local_4;
|
|
|
|
local_4 = 0xffffffff;
|
|
puStack_8 = &LAB_00186c4b;
|
|
local_c = *unaff_FS_OFFSET;
|
|
*unaff_FS_OFFSET = &local_c;
|
|
```
|
|
|
|
One might clean this up a bit, revealing that the code is adding a new entry to
|
|
the list (here from the JSRF `Game::Game()` constructor):
|
|
```c++
|
|
EXCEPTION_REGISTRATION_RECORD *_tib; // "thread information block"
|
|
EXCEPTION_REGISTRATION_RECORD _err;
|
|
int _trylevel;
|
|
|
|
_err.Next = _tib->Next;
|
|
_trylevel = -1;
|
|
_err.Handler = Game_handler;
|
|
_tib->Next = &_err;
|
|
```
|
|
|
|
As the name suggests `_trylevel` will be incremented when a new block of code
|
|
requiring exception handling or stack unwinding is encountered, e.g. around
|
|
constructors whose memory must be freed if they throw. The function will end
|
|
by dropping item that was added to the exception handling list
|
|
(`_tib->Next = _err.Next`).
|
|
|
|
To actually see what the exception handling or stack unwinding code will do, we
|
|
need to look at the `.Handler` function that was assigned. It usually looks
|
|
something like this:
|
|
```c++
|
|
void Game_handler(EHExceptionRecord *param_1,EHRegistrationNode *param_2,void *param_3,
|
|
DispatcherContext *param_4) {
|
|
___CxxFrameHandler(param_1,param_2,param_3,param_4,&Game_funcinfo);
|
|
return;
|
|
}
|
|
```
|
|
|
|
What we care about here is the last argument passed to `__CxxFrameHandler()`,
|
|
which is a pointer to a `FuncInfo` structure defined as follows:
|
|
```c++
|
|
struct FuncInfo {
|
|
DWORD magicNumber;
|
|
int maxState;
|
|
UnwindMapEntry * pUnwindMap;
|
|
DWORD nTryBlocks;
|
|
TryBlockMapEntry * pTryBlockMap;
|
|
DWORD nIMapEntries;
|
|
void * pIPtoStateMap;
|
|
};
|
|
```
|
|
|
|
Here we can finally distinguish between unwinding code (called as an exception
|
|
raises up the call stack) and catching code (also called if an exception is
|
|
raised, but it can stop the exception from elevating any further): the former
|
|
gets entries in `pUnwindMap` (the number of entries being given by `maxState`),
|
|
while the latter gets entries in `pTryBlockMap` (the number of entries being
|
|
given by `nTryBlocks`).
|
|
|
|
The unwind map is the simpler of the two, with each entry being as follows:
|
|
```c++
|
|
struct UnwindMapEntry {
|
|
int toState;
|
|
void (*action)();
|
|
};
|
|
```
|
|
|
|
The `toState` member describes which value `_trylevel` will assume after the
|
|
function in the second member is called. The second member points to the
|
|
actual unwinding code, which will tend to decompile to something simple but
|
|
unpleasant like this:
|
|
```c++
|
|
void Game_handler_unwind1(void) {
|
|
int unaff_EBP;
|
|
|
|
operator_delete(*(void **)(unaff_EBP + 8));
|
|
return;
|
|
}
|
|
```
|
|
|
|
Clearly this is freeing memory (in fact, it frees a particular object's memory
|
|
if its constructor throws), but what is the argument? EBP here holds the stack
|
|
pointer for the function that this code applies to, so you'll have to look at
|
|
the stack layout when this handler is active. While it's easy to guess much of
|
|
the time based on what code is being wrapped, one could look to confirm in this
|
|
case that `ESP + 8` in the function holds a pointer to memory that was just
|
|
allocated and is being passed to a constructor that's being guarded (shown by a
|
|
`CALL operator_new` followed by `dword ptr [ESP + 8],EAX` in the disassembly;
|
|
make sure you know your registers and calling conventions!).
|
|
|
|
Try blocks aren't too much different in reality, with entries defined like
|
|
this:
|
|
```c++
|
|
struct TryBlockMapEntry {
|
|
int tryLow;
|
|
int tryHigh;
|
|
int catchHigh;
|
|
int nCatches;
|
|
HandlerType * pHandlerArray;
|
|
};
|
|
```
|
|
|
|
The `tryLow` and `tryHigh` specify the `_trylevel` values that this handler
|
|
applies to, and `nCatches` indicates how many `catch` blocks there are (which
|
|
are in an array pointed to by `pHandlerArray`). `HandlerType` is our final
|
|
structure to define:
|
|
```c++
|
|
struct HandlerType {
|
|
DWORD adjectives;
|
|
TypeDescriptor * pType;
|
|
int dispCatchObj;
|
|
void * addressOfHandler;
|
|
};
|
|
```
|
|
|
|
The first three members specify what kinds of exceptions are being caught
|
|
(either by type in the first two members' case or a stack offset to an
|
|
exception object in the third's), and the final member is the actual exception
|
|
handling code, which again uses EBP to reference data on the original
|
|
function's stack.
|
|
|
|
If you'd like another more thorough treatment of reverse engineering
|
|
exceptions, also take a look at
|
|
[this article](https://www.openrce.org/articles/full_view/21), or if you'd
|
|
really like the whole implementation spelled out in excruciating detail,
|
|
[this one](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx)
|
|
is unparalleled.
|