mirror of
https://codeberg.org/KeybadeBlox/JSRF-Decompilation.git
synced 2026-02-20 02:07:02 +03:00
Complete Decompiling C++ article (for now)
This commit is contained in:
parent
547f2ba179
commit
d06de00855
1 changed files with 207 additions and 6 deletions
|
|
@ -12,8 +12,19 @@ structs) and how to decompile them is assumed.
|
|||
|
||||
|
||||
## Name Mangling
|
||||
(on the off chance you actually get symbol names, like from debug info; also
|
||||
why symbol names don't match in objdiff)
|
||||
Whenever you encounter symbol names actually produced by a C++ compiler, like
|
||||
when recompiling decompiled code, they'll probably look garbled like
|
||||
`??_GGameObj@@UAEPAXI@Z` or `_ZN7GameObjD1Ev` depending on the compiler. These
|
||||
are mangled names, used by compilers to prevent conflicts from overloaded
|
||||
functions, communicate additional information about symbols, and so on.
|
||||
|
||||
Many tools can print these in human-readable form to produce e.g.
|
||||
`` public: virtual void * __thiscall GameObj::`scalar deleting destructor'(unsigned int) ``,
|
||||
and objdiff will do so by default. When using the Ghidra delinking tool
|
||||
specifically, it's important to keep in mind that the delinked symbol names do
|
||||
_not_ get mangled, so they won't have the exact same names as in the recompiled
|
||||
code, and corresponding symbols in the delinked and recompiled object files
|
||||
will need to be associated by hand.
|
||||
|
||||
|
||||
## Classes
|
||||
|
|
@ -55,7 +66,7 @@ be inserted afterwards.
|
|||
### Class Methods
|
||||
Methods are functions declared within a class's namespace, like so:
|
||||
```c++
|
||||
class SomeClass {
|
||||
struct SomeClass {
|
||||
// Regular data members
|
||||
float someMemberVariable;
|
||||
unsigned anotherMemberVariable;
|
||||
|
|
@ -96,17 +107,57 @@ for method calls, such as Microsoft's implementation for the Xbox using the
|
|||
while all other arguments are passed on the stack.
|
||||
|
||||
Constructors and destructors function largely like regular methods, but
|
||||
implicitly return the `this` pointer.
|
||||
implicitly return the `this` pointer. C++ makes certain guarantees about
|
||||
objects that have constructors and destructors that obligate the compiler to
|
||||
insertt additional code in certain circumstances: be aware, for instance, that
|
||||
constructor calls will often be wrapped with stack unwinding code in case an
|
||||
exception is thrown from within the constructor (see the exception handling
|
||||
section). An object's destructor is also automatically called at the end of
|
||||
its lifetime (e.g. it goes out of scope), which can lead to inclusion in
|
||||
exception handling code or just being called at the end of a code block even if
|
||||
the source code doesn't invoke it explicitly. This automatic resource
|
||||
management is often called part of C++'s RAII (resource acquisition is
|
||||
initialization) design.
|
||||
|
||||
Virtual methods are methods that can be overridden on child classes. They're
|
||||
not called directly, but instead called through a hidden first member that
|
||||
points to an array of method function pointers, usually called a vtable (Visual
|
||||
C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is
|
||||
made virtual, additional "deleting destructors" may be generated as well, which
|
||||
are methods taking one boolean argument that call the destructor and then,
|
||||
are methods taking one `unsigned` argument that call the destructor and then,
|
||||
depending on the argument, free the object's memory.
|
||||
|
||||
(TODO: how to implement methods and vtables in Ghidra)
|
||||
Ghidra has somewhat obscure support for classes and regular methods, and
|
||||
virtual methods can be made to work with some admittedly tedious effort.
|
||||
|
||||
A class can be defined right-clicking on "Classes" in the symbol tree window
|
||||
and selecting "Create Class." Symbols (e.g. methods) can then be added to this
|
||||
class by putting them in the class's namespace, i.e. opening the Add/Edit Label
|
||||
or Rename Function window (usually from right-clicking something or its name)
|
||||
and adding the class name as a prefix, e.g. `ClassName::someSymbol`. Be aware
|
||||
that certain windows like the Edit Function window have no awareness of
|
||||
namespaces, and trying to add the namespace prefix will just modify the symbol
|
||||
name directly without actually adding it to the namespace. For Microsoft code
|
||||
(e.g. Xbox), applying the appropriate `__thiscall` calling convention enables a
|
||||
special behaviour where the first argument passed in ECX is forcibly named
|
||||
`this` and has a fixed pointer type (by default `void *`). If the method is
|
||||
placed in a class's namespace, however, and a struct of the same name exists,
|
||||
the `this` pointer's type will be set to that struct.
|
||||
|
||||
Since virtual method calls go through pointers rather than calling a function
|
||||
at a fixed address, they show up in Ghidra as unsightly member accesses like
|
||||
`(**(code **)(*g_graphics + 0x160))(g_graphics,0)`. One can however
|
||||
simulate vtables by hand to get these calls resolving to something somewhat
|
||||
more manageable like `(*g_graphics->vtable->setFogEnable)(g_graphics,0)` with
|
||||
the correct number and types of arguments. The class's first member (which is
|
||||
a link to its vtable) can be set to a pointer to a new struct type whose
|
||||
members are pointers to functions defined in the data type manager (right click
|
||||
and then `New > Function Definition...`). To actually access the methods'
|
||||
definitions (keeping in mind there are likely multiple for different classes
|
||||
inheriting from the same base class), it will be necessary to either find
|
||||
where the vtable is assigned (the class constructor is a good choice) or
|
||||
potentially examine the first member of an instance of the class at runtime
|
||||
with the help of Cheat Engine or an emulator's memory viewer.
|
||||
|
||||
### Inheritance
|
||||
Child classes can be used in most places that their parent class can be used:
|
||||
|
|
@ -149,4 +200,154 @@ else {
|
|||
|
||||
|
||||
## Exception Handling
|
||||
C++ offers the ability to throw and catch exceptions, which have highly
|
||||
platform-specific implementations that require some sophistication to uphold
|
||||
the language's guarantees about object initialization and destruction. In
|
||||
particular, some hidden bookkeeping needs to be done to implement `try` and
|
||||
`catch` blocks, as well as keep track of what cleanup needs to be done if an
|
||||
exception is thrown (part of a process known as stack unwinding, i.e. walking
|
||||
back up the call stack until the exception is caught or the top is reached).
|
||||
|
||||
We'll focus here on the Microsoft implementation found in Xbox games. The FS
|
||||
register holds the last item of a linked list of structures with exception
|
||||
handling information, defined thusly:
|
||||
```c++
|
||||
struct EXCEPTION_REGISTRATION_RECORD {
|
||||
EXCEPTION_REGISTRATION_RECORD * next; // Next item in linked list
|
||||
EXCEPTION_ROUTINE * handler; // Function pointer
|
||||
};
|
||||
```
|
||||
|
||||
Functions with any exception handling or stack unwinding will have a prologue
|
||||
like the following in Ghidra:
|
||||
```c++
|
||||
undefined4 *unaff_FS_OFFSET;
|
||||
undefined4 local_c;
|
||||
undefined *puStack_8;
|
||||
undefined4 local_4;
|
||||
|
||||
local_4 = 0xffffffff;
|
||||
puStack_8 = &LAB_00186c4b;
|
||||
local_c = *unaff_FS_OFFSET;
|
||||
*unaff_FS_OFFSET = &local_c;
|
||||
```
|
||||
|
||||
One might clean this up a bit, revealing that the code is adding a new entry to
|
||||
the list (here from the JSRF `Game::Game()` constructor):
|
||||
```c++
|
||||
EXCEPTION_REGISTRATION_RECORD *_tib; // "thread information block"
|
||||
EXCEPTION_REGISTRATION_RECORD _err;
|
||||
int _trylevel;
|
||||
|
||||
_err.Next = _tib->Next;
|
||||
_trylevel = -1;
|
||||
_err.Handler = Game_handler;
|
||||
_tib->Next = &_err;
|
||||
```
|
||||
|
||||
As the name suggests `_trylevel` will be incremented when a new block of code
|
||||
requiring exception handling or stack unwinding is encountered, e.g. around
|
||||
constructors whose memory must be freed if they throw. The function will end
|
||||
by dropping item that was added to the exception handling list
|
||||
(`_tib->Next = _err.Next`).
|
||||
|
||||
To actually see what the exception handling or stack unwinding code will do, we
|
||||
need to look at the `.Handler` function that was assigned. It usually looks
|
||||
something like this:
|
||||
```c++
|
||||
void Game_handler(EHExceptionRecord *param_1,EHRegistrationNode *param_2,void *param_3,
|
||||
DispatcherContext *param_4) {
|
||||
___CxxFrameHandler(param_1,param_2,param_3,param_4,&Game_funcinfo);
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
What we care about here is the last argument passed to `__CxxFrameHandler()`,
|
||||
which is a pointer to a `FuncInfo` structure defined as follows:
|
||||
```c++
|
||||
struct FuncInfo {
|
||||
DWORD magicNumber;
|
||||
int maxState;
|
||||
UnwindMapEntry * pUnwindMap;
|
||||
DWORD nTryBlocks;
|
||||
TryBlockMapEntry * pTryBlockMap;
|
||||
DWORD nIMapEntries;
|
||||
void * pIPtoStateMap;
|
||||
};
|
||||
```
|
||||
|
||||
Here we can finally distinguish between unwinding code (called as an exception
|
||||
raises up the call stack) and catching code (also called if an exception is
|
||||
raised, but it can stop the exception from elevating any further): the former
|
||||
gets entries in `pUnwindMap` (the number of entries being given by `maxState`),
|
||||
while the latter gets entries in `pTryBlockMap` (the number of entries being
|
||||
given by `nTryBlocks`).
|
||||
|
||||
The unwind map is the simpler of the two, with each entry being as follows:
|
||||
```c++
|
||||
struct UnwindMapEntry {
|
||||
int toState;
|
||||
void (*action)();
|
||||
};
|
||||
```
|
||||
|
||||
The `toState` member describes which value `_trylevel` will assume after the
|
||||
function in the second member is called. The second member points to the
|
||||
actual unwinding code, which will tend to decompile to something simple but
|
||||
unpleasant like this:
|
||||
```c++
|
||||
void Game_handler_unwind1(void) {
|
||||
int unaff_EBP;
|
||||
|
||||
operator_delete(*(void **)(unaff_EBP + 8));
|
||||
return;
|
||||
}
|
||||
```
|
||||
|
||||
Clearly this is freeing memory (in fact, it frees a particular object's memory
|
||||
if its constructor throws), but what is the argument? EBP here holds the stack
|
||||
pointer for the function that this code applies to, so you'll have to look at
|
||||
the stack layout when this handler is active. While it's easy to guess much of
|
||||
the time based on what code is being wrapped, one could look to confirm in this
|
||||
case that `ESP + 8` in the function holds a pointer to memory that was just
|
||||
allocated and is being passed to a constructor that's being guarded (shown by a
|
||||
`CALL operator_new` followed by `dword ptr [ESP + 8],EAX` in the disassembly;
|
||||
make sure you know your registers and calling conventions!).
|
||||
|
||||
Try blocks aren't too much different in reality, with entries defined like
|
||||
this:
|
||||
```c++
|
||||
struct TryBlockMapEntry {
|
||||
int tryLow;
|
||||
int tryHigh;
|
||||
int catchHigh;
|
||||
int nCatches;
|
||||
HandlerType * pHandlerArray;
|
||||
};
|
||||
```
|
||||
|
||||
The `tryLow` and `tryHigh` specify the `_trylevel` values that this handler
|
||||
applies to, and `nCatches` indicates how many `catch` blocks there are (which
|
||||
are in an array pointed to by `pHandlerArray`). `HandlerType` is our final
|
||||
structure to define:
|
||||
```c++
|
||||
struct HandlerType {
|
||||
DWORD adjectives;
|
||||
TypeDescriptor * pType;
|
||||
int dispCatchObj;
|
||||
void * addressOfHandler;
|
||||
};
|
||||
```
|
||||
|
||||
The first three members specify what kinds of exceptions are being caught
|
||||
(either by type in the first two members' case or a stack offset to an
|
||||
exception object in the third's), and the final member is the actual exception
|
||||
handling code, which again uses EBP to reference data on the original
|
||||
function's stack.
|
||||
|
||||
If you'd like another more thorough treatment of reverse engineering
|
||||
exceptions, also take a look at
|
||||
[this article](https://www.openrce.org/articles/full_view/21), or if you'd
|
||||
really like the whole implementation spelled out in excruciating detail,
|
||||
[this one](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx)
|
||||
is unparalleled.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue