diff --git a/documentation/decompilingcpp.md b/documentation/decompilingcpp.md index 540025b..8fe24a7 100644 --- a/documentation/decompilingcpp.md +++ b/documentation/decompilingcpp.md @@ -12,8 +12,19 @@ structs) and how to decompile them is assumed. ## Name Mangling -(on the off chance you actually get symbol names, like from debug info; also -why symbol names don't match in objdiff) +Whenever you encounter symbol names actually produced by a C++ compiler, like +when recompiling decompiled code, they'll probably look garbled like +`??_GGameObj@@UAEPAXI@Z` or `_ZN7GameObjD1Ev` depending on the compiler. These +are mangled names, used by compilers to prevent conflicts from overloaded +functions, communicate additional information about symbols, and so on. + +Many tools can print these in human-readable form to produce e.g. +`` public: virtual void * __thiscall GameObj::`scalar deleting destructor'(unsigned int) ``, +and objdiff will do so by default. When using the Ghidra delinking tool +specifically, it's important to keep in mind that the delinked symbol names do +_not_ get mangled, so they won't have the exact same names as in the recompiled +code, and corresponding symbols in the delinked and recompiled object files +will need to be associated by hand. ## Classes @@ -55,7 +66,7 @@ be inserted afterwards. ### Class Methods Methods are functions declared within a class's namespace, like so: ```c++ -class SomeClass { +struct SomeClass { // Regular data members float someMemberVariable; unsigned anotherMemberVariable; @@ -96,17 +107,57 @@ for method calls, such as Microsoft's implementation for the Xbox using the while all other arguments are passed on the stack. Constructors and destructors function largely like regular methods, but -implicitly return the `this` pointer. +implicitly return the `this` pointer. C++ makes certain guarantees about +objects that have constructors and destructors that obligate the compiler to +insertt additional code in certain circumstances: be aware, for instance, that +constructor calls will often be wrapped with stack unwinding code in case an +exception is thrown from within the constructor (see the exception handling +section). An object's destructor is also automatically called at the end of +its lifetime (e.g. it goes out of scope), which can lead to inclusion in +exception handling code or just being called at the end of a code block even if +the source code doesn't invoke it explicitly. This automatic resource +management is often called part of C++'s RAII (resource acquisition is +initialization) design. Virtual methods are methods that can be overridden on child classes. They're not called directly, but instead called through a hidden first member that points to an array of method function pointers, usually called a vtable (Visual C++ 7 calls it `` ClassName::`vftable' ``). If a destructor specifically is made virtual, additional "deleting destructors" may be generated as well, which -are methods taking one boolean argument that call the destructor and then, +are methods taking one `unsigned` argument that call the destructor and then, depending on the argument, free the object's memory. -(TODO: how to implement methods and vtables in Ghidra) +Ghidra has somewhat obscure support for classes and regular methods, and +virtual methods can be made to work with some admittedly tedious effort. + +A class can be defined right-clicking on "Classes" in the symbol tree window +and selecting "Create Class." Symbols (e.g. methods) can then be added to this +class by putting them in the class's namespace, i.e. opening the Add/Edit Label +or Rename Function window (usually from right-clicking something or its name) +and adding the class name as a prefix, e.g. `ClassName::someSymbol`. Be aware +that certain windows like the Edit Function window have no awareness of +namespaces, and trying to add the namespace prefix will just modify the symbol +name directly without actually adding it to the namespace. For Microsoft code +(e.g. Xbox), applying the appropriate `__thiscall` calling convention enables a +special behaviour where the first argument passed in ECX is forcibly named +`this` and has a fixed pointer type (by default `void *`). If the method is +placed in a class's namespace, however, and a struct of the same name exists, +the `this` pointer's type will be set to that struct. + +Since virtual method calls go through pointers rather than calling a function +at a fixed address, they show up in Ghidra as unsightly member accesses like +`(**(code **)(*g_graphics + 0x160))(g_graphics,0)`. One can however +simulate vtables by hand to get these calls resolving to something somewhat +more manageable like `(*g_graphics->vtable->setFogEnable)(g_graphics,0)` with +the correct number and types of arguments. The class's first member (which is +a link to its vtable) can be set to a pointer to a new struct type whose +members are pointers to functions defined in the data type manager (right click +and then `New > Function Definition...`). To actually access the methods' +definitions (keeping in mind there are likely multiple for different classes +inheriting from the same base class), it will be necessary to either find +where the vtable is assigned (the class constructor is a good choice) or +potentially examine the first member of an instance of the class at runtime +with the help of Cheat Engine or an emulator's memory viewer. ### Inheritance Child classes can be used in most places that their parent class can be used: @@ -149,4 +200,154 @@ else { ## Exception Handling +C++ offers the ability to throw and catch exceptions, which have highly +platform-specific implementations that require some sophistication to uphold +the language's guarantees about object initialization and destruction. In +particular, some hidden bookkeeping needs to be done to implement `try` and +`catch` blocks, as well as keep track of what cleanup needs to be done if an +exception is thrown (part of a process known as stack unwinding, i.e. walking +back up the call stack until the exception is caught or the top is reached). +We'll focus here on the Microsoft implementation found in Xbox games. The FS +register holds the last item of a linked list of structures with exception +handling information, defined thusly: +```c++ +struct EXCEPTION_REGISTRATION_RECORD { + EXCEPTION_REGISTRATION_RECORD * next; // Next item in linked list + EXCEPTION_ROUTINE * handler; // Function pointer +}; +``` + +Functions with any exception handling or stack unwinding will have a prologue +like the following in Ghidra: +```c++ +undefined4 *unaff_FS_OFFSET; +undefined4 local_c; +undefined *puStack_8; +undefined4 local_4; + +local_4 = 0xffffffff; +puStack_8 = &LAB_00186c4b; +local_c = *unaff_FS_OFFSET; +*unaff_FS_OFFSET = &local_c; +``` + +One might clean this up a bit, revealing that the code is adding a new entry to +the list (here from the JSRF `Game::Game()` constructor): +```c++ + EXCEPTION_REGISTRATION_RECORD *_tib; // "thread information block" + EXCEPTION_REGISTRATION_RECORD _err; + int _trylevel; + + _err.Next = _tib->Next; + _trylevel = -1; + _err.Handler = Game_handler; + _tib->Next = &_err; +``` + +As the name suggests `_trylevel` will be incremented when a new block of code +requiring exception handling or stack unwinding is encountered, e.g. around +constructors whose memory must be freed if they throw. The function will end +by dropping item that was added to the exception handling list +(`_tib->Next = _err.Next`). + +To actually see what the exception handling or stack unwinding code will do, we +need to look at the `.Handler` function that was assigned. It usually looks +something like this: +```c++ +void Game_handler(EHExceptionRecord *param_1,EHRegistrationNode *param_2,void *param_3, + DispatcherContext *param_4) { + ___CxxFrameHandler(param_1,param_2,param_3,param_4,&Game_funcinfo); + return; +} +``` + +What we care about here is the last argument passed to `__CxxFrameHandler()`, +which is a pointer to a `FuncInfo` structure defined as follows: +```c++ +struct FuncInfo { + DWORD magicNumber; + int maxState; + UnwindMapEntry * pUnwindMap; + DWORD nTryBlocks; + TryBlockMapEntry * pTryBlockMap; + DWORD nIMapEntries; + void * pIPtoStateMap; +}; +``` + +Here we can finally distinguish between unwinding code (called as an exception +raises up the call stack) and catching code (also called if an exception is +raised, but it can stop the exception from elevating any further): the former +gets entries in `pUnwindMap` (the number of entries being given by `maxState`), +while the latter gets entries in `pTryBlockMap` (the number of entries being +given by `nTryBlocks`). + +The unwind map is the simpler of the two, with each entry being as follows: +```c++ +struct UnwindMapEntry { + int toState; + void (*action)(); +}; +``` + +The `toState` member describes which value `_trylevel` will assume after the +function in the second member is called. The second member points to the +actual unwinding code, which will tend to decompile to something simple but +unpleasant like this: +```c++ +void Game_handler_unwind1(void) { + int unaff_EBP; + + operator_delete(*(void **)(unaff_EBP + 8)); + return; +} +``` + +Clearly this is freeing memory (in fact, it frees a particular object's memory +if its constructor throws), but what is the argument? EBP here holds the stack +pointer for the function that this code applies to, so you'll have to look at +the stack layout when this handler is active. While it's easy to guess much of +the time based on what code is being wrapped, one could look to confirm in this +case that `ESP + 8` in the function holds a pointer to memory that was just +allocated and is being passed to a constructor that's being guarded (shown by a +`CALL operator_new` followed by `dword ptr [ESP + 8],EAX` in the disassembly; +make sure you know your registers and calling conventions!). + +Try blocks aren't too much different in reality, with entries defined like +this: +```c++ +struct TryBlockMapEntry { + int tryLow; + int tryHigh; + int catchHigh; + int nCatches; + HandlerType * pHandlerArray; +}; +``` + +The `tryLow` and `tryHigh` specify the `_trylevel` values that this handler +applies to, and `nCatches` indicates how many `catch` blocks there are (which +are in an array pointed to by `pHandlerArray`). `HandlerType` is our final +structure to define: +```c++ +struct HandlerType { + DWORD adjectives; + TypeDescriptor * pType; + int dispCatchObj; + void * addressOfHandler; +}; +``` + +The first three members specify what kinds of exceptions are being caught +(either by type in the first two members' case or a stack offset to an +exception object in the third's), and the final member is the actual exception +handling code, which again uses EBP to reference data on the original +function's stack. + +If you'd like another more thorough treatment of reverse engineering +exceptions, also take a look at +[this article](https://www.openrce.org/articles/full_view/21), or if you'd +really like the whole implementation spelled out in excruciating detail, +[this one](https://web.archive.org/web/20101007110629/http://www.microsoft.com/msj/0197/exception/exception.aspx) +is unparalleled.