The research presented here has produced a program!
PS2 Controller Remapper
I recently authored the following code for Castlevania: Curse of Darkness, and there have been questions (see posts #2 , #3 , #4 , #5 ) regarding the method used to create it, I will try to describe it here.
Originally Posted by pelvicthrustman
patch=1,EE,00404710,word,081DA4E8 //J $007693A0 <- Jump to Injected Code
patch=1,EE,0076939c,word,081DA4F0 //J $007693C0 <- Hide Injected Code
patch=1,EE,007693a0,word,820500c8 //LB a1,$00c8(s0) <- right stick horizontal value
patch=1,EE,007693a4,word,240600ff //LI a2,$00ff
patch=1,EE,007693a8,word,00c52822 //SUB a1,a2,a1
patch=1,EE,007693ac,word,a20500c8 //SB a1,$00c8(s0)
patch=1,EE,007693b0,word,0040282d //DADDU a1, v0, zero <- Replaced instruction
patch=1,EE,007693b4,word,081011C5 //J $00404714 <- Jump back
Disclaimer: A working knowledge of MIPS assembly language and C as well as basic familiarity with ps2dis and pcsx2 is assumed; I will not answer general programming or tool-specific questions here.
In order to remap the controller it is necessary first to figure out how the game itself gets this data, thanks to ps2dev.org this is pretty well understood. It is my understanding that many(most/all?) PS2 games use a library called libpad to read the controller. Labels for these functions can be found in ps2dis and begin with "scePad". For the purposes of remapping the controller scePadRead is the function we're interested in.
Note: I am aware of scePad2Read...when I understand how this changes things I'll have more to say on the difference
Here's the assembly for the routine:
and the reverse-engineered version in C (authored by ps2dev.org):
(sometimes it's not quite the same....versioning is a bitch)
* Read pad data
* Result is stored in 'data' which should point to a 32 byte array
padRead(int port, int slot, struct padButtonStatus *data)
struct pad_data *pdata;
pdata = padGetDmaStr(port, slot);
memcpy(data, pdata->data, pdata->length);
If scePadRead is not labeled in the game you're looking at you can search for 2d388000700003241c0004241818e3701820a400 (the first five instructions).
While this isn't the most reliable way to find scePadRead you can generally compare the assembly above with what you're looking at to find it.
Now that we can see where the game is getting the controller data from we need to know how the data is arranged, here is the data structure:
Note that when the buttons are pressed the bit corresponding with the order of the layout in the structure above will be set in "btns" AND the byte for the corresponding button in the structure above will be set to FF. For the analog sticks the centered value is 0x80, for the horizontal axis 0x0 is full left, 0xFF is full right.
* Button info
unsigned char ok;
unsigned char mode;
unsigned short btns;
unsigned char rjoy_h;
unsigned char rjoy_v;
unsigned char ljoy_h;
unsigned char ljoy_v;
// pressure mode
unsigned char right_p;
unsigned char left_p;
unsigned char up_p;
unsigned char down_p;
unsigned char triangle_p;
unsigned char circle_p;
unsigned char cross_p;
unsigned char square_p;
unsigned char l1_p;
unsigned char r1_p;
unsigned char l2_p;
unsigned char r2_p;
unsigned char unkn16;
So now that we know where the game is getting the data from we also know that the game cannot interpret this data until AFTER a call to this function, this means that AFTER the call to memcpy() (where the game gets its usable copy of the controller state) there is now a buffer in memory that the game will subsequently use as its input. This is important because it tells us where/when we need to adjust the data to remap the controls.
There are a few options for how to do this but what I've done is this: Look for a section in the game's code with a sufficient number of NOP instructions in a row, replace the first NOP with a jump that skips to the instruction after the NOPs, now you have a few instructions to use as a remote procedure!
The code above does exactly that - it replaces an instruction in the function that calls scePadRead (you can use Jump to Next Referrer in ps2dis to find this function) after the function call with a jump to the remote procedure, it then performs the operations we want it to do then the original instruction that was replaced and then jumps back to the next instruction in the function. By doing this we can "inject" code into function.
Given all of this the only real question left is "where is the data structure?". Unfortunately I don't have a concrete answer on that! scePadRead appears to leave the register s0 with the memory address of the data (I figured this out by setting break points in the PCSX2 0.9.7 debugger and looking at the register values before and after the call to memcpy in scePadRead then comparing the value of a0, the void *destination argument of memcpy, to the register values after the function call) and after extracting a pcsx2 save state (the file eeMemory contains an EE memory dump at the moment the state was saved) at the address (hex offset in eeMemory) which s0 held that turned out to be true...however I found three copies of the data structure, the one that worked in game was located at s0 + 0xC4. I do not know if this is a universally applicable result or one specific to Castlevania: Curse of Darkness.
I intend to look at this more later and come up with a more universal injection strategy for scePadRead. I'll update this when I know more.
I'm updating this for posterity in case anyone else is interested in how this stuff works, the research I presented above was preliminary, but has matured into something much more concrete, I'll try to present some lessons learned here.
Code injection strategies
Originally I made the assumption that pad data needed to be modified AFTER the execution of the scePadRead function, this is only half true, the best way I've found to inject code into a function is to literally extend the function - this can be done by replacing the return (JR $ra) instruction with (J <injection address>), in this way the function will not return immediately but instead execute a remote procedure - the remote procedure can then use use JR $ra to return to the original caller - effectively extending the function's length (note that this assumes that no JAL instructions overwrite the RA register in the injected code - otherwise $ra must be preserved/restored). Don't forget to put a NOP after JR $ra (or a J if you need to inject mid-function)!!! (explaned below)
MIPS branch hazards
An important thing to understand when injecting code is MIPS hazards - the MIPS architecture is explicitly pipelined and thus what would generate a stall on another architecture may not in MIPS - this primarily has implications for branch and jump instructions. While the branch/jump address is being calculated the instruction following the branch/jump will be executed, before the branch is taken! This is why when looking at PS2 ELF disassemblies instructions often follow JR $ra - the last instruction in a function is after the return! This is called the "branch delay slot."
Compiler MIPS register usage
This is an excellent reference regarding the typical uses of MIPS registers - for example, the $a0-$a3 registers are typically used to store function arguments and are not guaranteed to be preserved across function calls (JAL), while the $s0-$s8 registers are guaranteed to be preserved. This provides a good idea of which registers it is safe to use when injecting code and which should be preserved/restored by an injected routine.
Safe PS2 remote injection
Unfortunately this one I have yet to fully nail down, but I know this much: User address space begins at 0x00100000, addresses before that can be considered as potential places to safely inject arbitrary routines - that said this memory is effectively system reserved and can be a minefield. 0xx000fc000 and above seem pretty safe though, which is why I use 0x000fd000 as the default injection address for PS2 Controller Remapper. If you want to stay inside normal memory - or even perform an injection by modifying the game ELF the safest place I've found is debugging strings. Using Ctrl+G in ps2dis you can see a lot of strings embedded in most ELFs - many of these are statements intended for the IOP console - literally printf format strings. These make an ideal place for injecting code since it is memory that is effectively wasted by the final build of the game. So long as you make sure that the initial byte of a string is null (NOP comes in handy) you can write an injected routine into the debugging string table without risking compromising the game's logic or performance - which is something that can't be said about just searching for a bunch of NOPs.