mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 02:07:02 +03:00
Complete Decompiling C++ article (for now)
This commit is contained in:
parent
547f2ba179
commit
d06de00855
1 changed files with 207 additions and 6 deletions
|
|
@ -12,8 +12,19 @@ structs) and how to decompile them is assumed.
|
||||||
|
|
||||||
|
|
||||||
## Name Mangling
|
## Name Mangling
|
||||||
(on the off chance you actually get symbol names, like from debug info; also
|
Whenever you encounter symbol names actually produced by a C++ compiler, like
|
||||||
why symbol names don't match in objdiff)
|
when recompiling decompiled code, they'll probably look garbled like
|
||||||
|
`??_GGameObj@@UAEPAXI@Z` or `_ZN7GameObjD1Ev` depending on the compiler. These
|
||||||
|
are mangled names, used by compilers to prevent conflicts from overloaded
|
||||||
|
functions, communicate additional information about symbols, and so on.
|
||||||
|
|
||||||
|
Many tools can print these in human-readable form to produce e.g.
|
||||||
|
`` public: virtual void * __thiscall GameObj::`scalar deleting destructor'(unsigned int) ``,
|
||||||
|
and objdiff will do so by default. When using the Ghidra delinking tool
|
||||||
|
specifically, it's important to keep in mind that the delinked symbol names do
|
||||||
|
_not_ get mangled, so they won't have the exact same names as in the recompiled
|
||||||
|
code, and corresponding symbols in the delinked and recompiled object files
|
||||||
|
will need to be associated by hand.
|
||||||
|
|
||||||
|
|
||||||
## Classes
|
## Classes
|
||||||
|
|
@ -55,7 +66,7 @@ be inserted afterwards.
|
||||||
### Class Methods
|
### Class Methods
|
||||||
Methods are functions declared within a class's namespace, like so:
|
Methods are functions declared within a class's namespace, like so:
|
||||||
```c++
|
```c++
|
||||||
class SomeClass {
|
struct SomeClass {
|
||||||
// Regular data members
|
// Regular data members
|
||||||
float someMemberVariable;
|
float someMemberVariable;
|
||||||
unsigned anotherMemberVariable;
|
unsigned anotherMemberVariable;
|
||||||
|
|
@ -96,17 +107,57 @@ for method calls, such as Microsoft's implementation for the Xbox using the
|
||||||
while all other arguments are passed on the stack.
|
while all other arguments are passed on the stack.
|
||||||
|
|
||||||
Constructors and destructors function largely like regular methods, but
|
Constructors and destructors function largely like regular methods, but
|
||||||
implicitly return the `this` pointer.
|
implicitly return the `this` pointer. C++ makes certain guarantees about
|
||||||
|
objects that have constructors and destructors that obligate the compiler to
|
||||||
|
insertt additional code in certain circumstances: be aware, for instance, that
|
||||||
|
constructor calls will often be wrapped with stack unwinding code in case an
|
||||||
|
exception is thrown from within the constructor (see the exception handling
|
||||||
|
section). An object's destructor is also automatically called at the end of
|
||||||
|
its lifetime (e.g. it goes out of scope), which can lead to inclusion in
|
||||||
|
exception handling code or just being called at the end of a code block even if
|
||||||
|
the source code doesn't invoke it explicitly. This automatic resource
|
||||||
|
management is often called part of C++'s RAII (resource acquisition is
|
||||||
|
initialization) design.
|
||||||
|
|
||||||
Virtual methods are methods that can be overridden on child classes. They're
|
Virtual methods are methods that can be overridden on child classes. They're
|
||||||
not called directly, but instead called through a hidden first member that
|
not called directly, but instead called through a hidden first member that
|
||||||
points to an array of method function pointers, usually called a vtable (Visual
|
points to an array of method function pointers, usually called a vtable (Visual
|
||||||
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
|
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
|
||||||
made virtual, additional "deleting destructors" may be generated as well, which
|
made virtual, additional "deleting destructors" may be generated as well, which
|
||||||
are methods taking one boolean argument that call the destructor and then,
|
are methods taking one `unsigned` argument that call the destructor and then,
|
||||||
depending on the argument, free the object's memory.
|
depending on the argument, free the object's memory.
|
||||||
|
|
||||||
(TODO: how to implement methods and vtables in Ghidra)
|
Ghidra has somewhat obscure support for classes and regular methods, and
|
||||||
|
virtual methods can be made to work with some admittedly tedious effort.
|
||||||
|
|
||||||
|
A class can be defined right-clicking on "Classes" in the symbol tree window
|
||||||
|
and selecting "Create Class." Symbols (e.g. methods) can then be added to this
|
||||||
|
class by putting them in the class's namespace, i.e. opening the Add/Edit Label
|
||||||
|
or Rename Function window (usually from right-clicking something or its name)
|
||||||
|
and adding the class name as a prefix, e.g. `ClassName::someSymbol`. Be aware
|
||||||
|
that certain windows like the Edit Function window have no awareness of
|
||||||
|
namespaces, and trying to add the namespace prefix will just modify the symbol
|
||||||
|
name directly without actually adding it to the namespace. For Microsoft code
|
||||||
|
(e.g. Xbox), applying the appropriate `__thiscall` calling convention enables a
|
||||||
|
special behaviour where the first argument passed in ECX is forcibly named
|
||||||
|
`this` and has a fixed pointer type (by default `void *`). If the method is
|
||||||
|
placed in a class's namespace, however, and a struct of the same name exists,
|
||||||
|
the `this` pointer's type will be set to that struct.
|
||||||
|
|
||||||
|
Since virtual method calls go through pointers rather than calling a function
|
||||||
|
at a fixed address, they show up in Ghidra as unsightly member accesses like
|
||||||
|
`(**(code **)(*g_graphics + 0x160))(g_graphics,0)`. One can however
|
||||||
|
simulate vtables by hand to get these calls resolving to something somewhat
|
||||||
|
more manageable like `(*g_graphics->vtable->setFogEnable)(g_graphics,0)` with
|
||||||
|
the correct number and types of arguments. The class's first member (which is
|
||||||
|
a link to its vtable) can be set to a pointer to a new struct type whose
|
||||||
|
members are pointers to functions defined in the data type manager (right click
|
||||||
|
and then `New > Function Definition...`). To actually access the methods'
|
||||||
|
definitions (keeping in mind there are likely multiple for different classes
|
||||||
|
inheriting from the same base class), it will be necessary to either find
|
||||||
|
where the vtable is assigned (the class constructor is a good choice) or
|
||||||
|
potentially examine the first member of an instance of the class at runtime
|
||||||
|
with the help of Cheat Engine or an emulator's memory viewer.
|
||||||
|
|
||||||
### Inheritance
|
### Inheritance
|
||||||
Child classes can be used in most places that their parent class can be used:
|
Child classes can be used in most places that their parent class can be used:
|
||||||
|
|
@ -149,4 +200,154 @@ else {
|
||||||
|
|
||||||
|
|
||||||
## Exception Handling
|
## Exception Handling
|
||||||
|
C++ offers the ability to throw and catch exceptions, which have highly
|
||||||
|
platform-specific implementations that require some sophistication to uphold
|
||||||
|
the language's guarantees about object initialization and destruction. In
|
||||||
|
particular, some hidden bookkeeping needs to be done to implement `try` and
|
||||||
|
`catch` blocks, as well as keep track of what cleanup needs to be done if an
|
||||||
|
exception is thrown (part of a process known as stack unwinding, i.e. walking
|
||||||
|
back up the call stack until the exception is caught or the top is reached).
|
||||||
|
|
||||||
|
We'll focus here on the Microsoft implementation found in Xbox games. The FS
|
||||||
|
register holds the last item of a linked list of structures with exception
|
||||||
|
handling information, defined thusly:
|
||||||
|
```c++
|
||||||
|
struct EXCEPTION_REGISTRATION_RECORD {
|
||||||
|
EXCEPTION_REGISTRATION_RECORD * next; // Next item in linked list
|
||||||
|
EXCEPTION_ROUTINE * handler; // Function pointer
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Functions with any exception handling or stack unwinding will have a prologue
|
||||||
|
like the following in Ghidra:
|
||||||
|
```c++
|
||||||
|
undefined4 *unaff_FS_OFFSET;
|
||||||
|
undefined4 local_c;
|
||||||
|
undefined *puStack_8;
|
||||||
|
undefined4 local_4;
|
||||||
|
|
||||||
|
local_4 = 0xffffffff;
|
||||||
|
puStack_8 = &LAB_00186c4b;
|
||||||
|
local_c = *unaff_FS_OFFSET;
|
||||||
|
*unaff_FS_OFFSET = &local_c;
|
||||||
|
```
|
||||||
|
|
||||||
|
One might clean this up a bit, revealing that the code is adding a new entry to
|
||||||
|
the list (here from the JSRF `Game::Game()` constructor):
|
||||||
|
```c++
|
||||||
|
EXCEPTION_REGISTRATION_RECORD *_tib; // "thread information block"
|
||||||
|
EXCEPTION_REGISTRATION_RECORD _err;
|
||||||
|
int _trylevel;
|
||||||
|
|
||||||
|
_err.Next = _tib->Next;
|
||||||
|
_trylevel = -1;
|
||||||
|
_err.Handler = Game_handler;
|
||||||
|
_tib->Next = &_err;
|
||||||
|
```
|
||||||
|
|
||||||
|
As the name suggests `_trylevel` will be incremented when a new block of code
|
||||||
|
requiring exception handling or stack unwinding is encountered, e.g. around
|
||||||
|
constructors whose memory must be freed if they throw. The function will end
|
||||||
|
by dropping item that was added to the exception handling list
|
||||||
|
(`_tib->Next = _err.Next`).
|
||||||
|
|
||||||
|
To actually see what the exception handling or stack unwinding code will do, we
|
||||||
|
need to look at the `.Handler` function that was assigned. It usually looks
|
||||||
|
something like this:
|
||||||
|
```c++
|
||||||
|
void Game_handler(EHExceptionRecord *param_1,EHRegistrationNode *param_2,void *param_3,
|
||||||
|
DispatcherContext *param_4) {
|
||||||
|
___CxxFrameHandler(param_1,param_2,param_3,param_4,&Game_funcinfo);
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
What we care about here is the last argument passed to `__CxxFrameHandler()`,
|
||||||
|
which is a pointer to a `FuncInfo` structure defined as follows:
|
||||||
|
```c++
|
||||||
|
struct FuncInfo {
|
||||||
|
DWORD magicNumber;
|
||||||
|
int maxState;
|
||||||
|
UnwindMapEntry * pUnwindMap;
|
||||||
|
DWORD nTryBlocks;
|
||||||
|
TryBlockMapEntry * pTryBlockMap;
|
||||||
|
DWORD nIMapEntries;
|
||||||
|
void * pIPtoStateMap;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
Here we can finally distinguish between unwinding code (called as an exception
|
||||||
|
raises up the call stack) and catching code (also called if an exception is
|
||||||
|
raised, but it can stop the exception from elevating any further): the former
|
||||||
|
gets entries in `pUnwindMap` (the number of entries being given by `maxState`),
|
||||||
|
while the latter gets entries in `pTryBlockMap` (the number of entries being
|
||||||
|
given by `nTryBlocks`).
|
||||||
|
|
||||||
|
The unwind map is the simpler of the two, with each entry being as follows:
|
||||||
|
```c++
|
||||||
|
struct UnwindMapEntry {
|
||||||
|
int toState;
|
||||||
|
void (*action)();
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The `toState` member describes which value `_trylevel` will assume after the
|
||||||
|
function in the second member is called. The second member points to the
|
||||||
|
actual unwinding code, which will tend to decompile to something simple but
|
||||||
|
unpleasant like this:
|
||||||
|
```c++
|
||||||
|
void Game_handler_unwind1(void) {
|
||||||
|
int unaff_EBP;
|
||||||
|
|
||||||
|
operator_delete(*(void **)(unaff_EBP + 8));
|
||||||
|
return;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Clearly this is freeing memory (in fact, it frees a particular object's memory
|
||||||
|
if its constructor throws), but what is the argument? EBP here holds the stack
|
||||||
|
pointer for the function that this code applies to, so you'll have to look at
|
||||||
|
the stack layout when this handler is active. While it's easy to guess much of
|
||||||
|
the time based on what code is being wrapped, one could look to confirm in this
|
||||||
|
case that `ESP + 8` in the function holds a pointer to memory that was just
|
||||||
|
allocated and is being passed to a constructor that's being guarded (shown by a
|
||||||
|
`CALL operator_new` followed by `dword ptr [ESP + 8],EAX` in the disassembly;
|
||||||
|
make sure you know your registers and calling conventions!).
|
||||||
|
|
||||||
|
Try blocks aren't too much different in reality, with entries defined like
|
||||||
|
this:
|
||||||
|
```c++
|
||||||
|
struct TryBlockMapEntry {
|
||||||
|
int tryLow;
|
||||||
|
int tryHigh;
|
||||||
|
int catchHigh;
|
||||||
|
int nCatches;
|
||||||
|
HandlerType * pHandlerArray;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The `tryLow` and `tryHigh` specify the `_trylevel` values that this handler
|
||||||
|
applies to, and `nCatches` indicates how many `catch` blocks there are (which
|
||||||
|
are in an array pointed to by `pHandlerArray`). `HandlerType` is our final
|
||||||
|
structure to define:
|
||||||
|
```c++
|
||||||
|
struct HandlerType {
|
||||||
|
DWORD adjectives;
|
||||||
|
TypeDescriptor * pType;
|
||||||
|
int dispCatchObj;
|
||||||
|
void * addressOfHandler;
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
The first three members specify what kinds of exceptions are being caught
|
||||||
|
(either by type in the first two members' case or a stack offset to an
|
||||||
|
exception object in the third's), and the final member is the actual exception
|
||||||
|
handling code, which again uses EBP to reference data on the original
|
||||||
|
function's stack.
|
||||||
|
|
||||||
|
If you'd like another more thorough treatment of reverse engineering
|
||||||
|
exceptions, also take a look at
|
||||||
|
[this article](https://www.openrce.org/articles/full_view/21), or if you'd
|
||||||
|
really like the whole implementation spelled out in excruciating detail,
|
||||||
|
[this one](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx)
|
||||||
|
is unparalleled.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue