PSP Homebrew Loader
This article mainly concerns technical details for a homebrew loader on PSP. To download homebrew loaders, see List of PSP homebrew loaders.
Originally posted by m0skit0 on advancedpsp.tk. Retrieved by Ultimakillz, http://h4ck.fi.st/index.php/topic,80.0.html
Introduction
Well, as I wished to do something with the recent MoHH exploit, a friend came with a nice suggestion: making an eLoader. And I said myself "why not, let's give it a try". After a couple of weeks of research and coding, I can tell I'm able to load some non-signed ELFs on OFW using this exploit. How is this acheived? Well, you have to get some basic knowledge about how the whole thing works. Let's start with it.
First of all, we have to keep in mind that the exploit allows us to run our code, but we're still limited to user mode. This is not a kernel exploit, so we're stuck in user mode. This means we cannot access kernel memory whatsoever, so patching syscalls/functions is out of the order of the day.
Second, we're still under OFW. This means we cannot load unsigned code using sceKernelLoadExec()
or sceKernelLoadModule()
, unless we replace them with our own module/executable loading function. Any attempt to use such functions on homebrew modules will just crash the console.
Third, the exploit SDK only allows us to use a little subset of the whole PSP functions. Basically, you can use any function imported by the game, in this case MoHH, but only those. But not all of them are linked by the exploit SDK linker. You have to patch the SDK if you want to use those imported but not included.
Well, quite a few restrictions, right? We'll have to make workarounds for this. Let's see how.
We cannot access kernel mode, no way to workaround this except finding a kernel exploit. That said, any homebrew that require kernel mode priviliges to run would not run on our eloader.
We cannot load unsigned code using OFW kernel functions. Then we'll have to code the module loader function manually. That is, basically rewriting sceKernelLoadExec()/sceKernelLoadModule() without sign check. We'll talk about this in more detail later.
We're limited to the functions subset the exploit SDK allows us to use. We can find more MoHH imports and patch the SDK linker to be able to use more. We can also call some functions directly if we know their address or syscall (well, syscalls are not that easy as we will see), just like the TIFF and ChickHEN exploits did.
Well, basically we have to write a replacement for sceKernelLoadExec()
/sceKernelLoadModule()
functions taking into account those restrictions. I'll go into it in next.
Basic Concepts
So as said in the introduction, our objective is to make our own sceKernelLoadExec()
/sceKernelLoadModule()
. First thing we need to know is what kind of executable PSP uses. Well, as its non-portable brothers, PSP uses ELF executables. ELF stands for Executable and Linkable Format. But those ELFs are wrapped inside a DATA.PSP file, which in turn is embedded into an EBOOT.PBP file. To simplify things, we're going straight to the ELFs themselves, skipping all the onion layers removing (EBOOT.PBP and DATA.PSP).
ELFs contain code and data for a program to run. But to run a program in a decent OS you need more information, which is also stored in the ELF. An ELF file is divided into multiple sections, each one containing different type of information. Let's see what information an ELF contains.
Loading address. Well, an ELF, as an executable, needs to be loaded in memory to be run. But at what address should it be loaded? In PSP there are two kinds of ELFs: static and relocatable. Static ones have no relocation information. This means addresses and references into the code are fixed numbers, so we need to always load the code at the same fixed address or it will not work properly. Relocatable ones (better known as PRX, PSP Relocatable Executable) have no fixed loading address, so the kernel can choose where to load them. This is more flexible but need relocation information, that is, how to change the references inside the code with the real loading address. To simplify, we're not going to consider relocatable ELFs by now (for more info about relocation: http://en.wikipedia.org/wiki/Relocation ... er_science))
Imports. An ELF can have calls to external functions provided by the OS, commonly called system calls. Those system calls are just petitions from the application to the OS, such as "open this file", "write this to screen", etc... This code is not included on the application itself (that is, the ELF) but on the OS. Some calls are on the kernel space, some are on user space. To access user space calls, PSP uses direct jumps, using plain j MIPS instructions. For kernel space calls, PSP uses syscall MIPS instructions. So the ELF has a section called .sceStub.text, where all the system calls are stored. If you prxtool -w an ELF file generated by PSPSDK, you'll find something like this:
; ==== Section .sceStub.text - Address 0x0890F24C Size 0x000000F8 Flags 0x0006 ; ====================================================== ; Subroutine sceDisplaySetMode - Address 0x0890F24C sceDisplaySetMode: ; Refs: 0x08900BA0 0x0890F24C: 0x03E00008 '....' - jr $ra 0x0890F250: 0x00000000 '....' - nop ; End Subroutine sceDisplaySetMode ; ======================================================
Well, you can see the sceDisplaySetMode()
system call code. But wait... jr $ra
and nop
? This will do nothing, just return to the caller!! Well, those calls have to be resolved by the ELF loader. Here the ELF loader is going to insert the real system call when it loads the ELF to be executed into memory. If the ELF loader doesn't resolve this stubs, the executable will just receive nothing from those syscalls. An example of resolved stub would be like this:
(direct jump - user space system call)
000002ac: 0a7cfc80 j 0x9f3f200 000002b0: 00000000 nop
(syscall - kernel space system call)
08C9298c: 03e00008 jr $ra 08C92990: 0008b68c syscall 0x022da
As you can see, the ELF loader changed the stubs code. So the result is a double jump to make the system call: the code calls the stub, the stub calls the real system call. This way, you have to resolve the system call only once, and not every time it appears on the code.
I think this is all for basic concepts, let's see how to get the ELF load and resolve those imports.
ELF Header
So now we need loading the ELF code into memory. How can we do this?
Well first, I'd like to introduce some typedefs for using with ELFs. Very simple ones here:
typedef unsigned int Elf32_Addr; typedef unsigned int Elf32_Off; typedef int Elf32_Sword; typedef int Elf32_Word; typedef short int Elf32_Half; typedef char BYTE;
This said, every ELF file starts with an ELF header, which has the following structure:
#define EI_NIDENT 16 //Size of e_ident[] typedef struct { BYTE e_ident[EI_NIDENT];//Magic number Elf32_Half e_type; // Identifies object file type Elf32_Half e_machine; // Architecture build Elf32_Word e_version; // Object file version Elf32_Addr e_entry; // Virtual address of code entry Elf32_Off e_phoff; // Program header table's file offset in bytes Elf32_Off e_shoff; // Section header table's file offset in bytes Elf32_Word e_flags; // Processor specific flags Elf32_Half e_ehsize; // ELF header size in bytes Elf32_Half e_phentsize; // Program header size (all the same size) Elf32_Half e_phnum; // Number of program headers Elf32_Half e_shentsize; // Section header size (all the same size) Elf32_Half e_shnum; // Number of section headers Elf32_Half e_shstrndx; // Section header table index of the entry associated with the // section name string table. } Elf32_Ehdr;
Next I'm gonna explain the more important members we have in this structure. Some of them are already explained on the comment, so no need to repeat the same thing again here.
An ELF file is known to have 4 magic initial bytes (we call them magic because they allow us to identify the type of file), which are 0x7F 'E' 'L' 'F'. If this magic number doesn't appear at the start of the file, then no need to continue, it's not an ELF file. So these bytes should appear from e_ident[0] to e_ident[3].
The e_entry member indicates the virtual address for code entry, that is, the first instruction to begin execution of the code. As I already said a few times, in PSP architecture there's no such thing as virtual memory, so we're reffering to real addresses.
The e_phoff member indicates the offset in the ELF file of the program headers table. The program sections contain the code + data required for the code to run properly, and thus should be allocated in RAM. Each program header describes a program section. Afaik, there's only one program header and one program section in PSP ELFs.
The e_shoff indicates the ELF offset for the section header table. This table contains section headers, which describe each ELF section and their attributes. For example, the .sceStub.text section would be described in the section header table.
The e_shstrndx indicates the index in the section header table for the String Table. This table is just a concatenation of various null-terminated strings, that usually holds the name of the sections (such as .sceStub.text ).
Loading ELF Program Section
Well, as we said in the last section, the Elf32_Off e_phoff member of the ELF header indicates us the ELF offset for the program headers table. The program headers table contains program headers (obviously ) which have the following structure:
typedef struct { Elf32_Word p_type; // Type of segment Elf32_Off p_offset; // Offset for segment's first byte in file Elf32_Addr p_vaddr; // Virtual address for segment Elf32_Addr p_paddr; // Physical address for segment Elf32_Word p_filesz; // Segment image size in file Elf32_Word p_memsz; // Segment image size in memory Elf32_Word p_flags; // Flags :P Elf32_Word p_align; // Alignment } Elf32_Phdr;
For our purpose, we only care about p_offset, p_vaddr (which is the same as p_paddr in PSP's ELFs), p_filesz and p_memsz. We need to load p_filesz bytes from p_offset offset of the ELF file to p_vaddr address. p_memsz indicates the size of the program segment in memory, which must be equal or greater than p_filesz. If it's greater, we should fill with zeroes the extra space indicated.
The simplest approach to code this would look like:
/* Load executable in memory using virtual address */ /* Returns total size copied in memory */ unsigned int elf_load_program(SceUID elf_file, Elf32_Ehdr* pelf_header) { Elf32_Phdr program_header; int excess; void *buffer; /* Read the program header */ sceIoLseek(elf_file, pelf_header->e_phoff, SEEK_SET); sceIoRead(elf_file, &program_header, sizeof(Elf32_Phdr)); /* Loads program segment at virtual address */ sceIoLseek(elf_file, program_header.p_offset, SEEK_SET); buffer = (void *) program_header.p_vaddr; sceIoRead(elf_file, buffer, program_header.p_filesz); /* Sets the buffer pointer to end of program segment */ buffer = buffer + program_header.p_filesz + 1; /* Fills excess memory with zeroes */ excess = program_header.p_memsz - program_header.p_filesz; if(excess > 0) memset(buffer, 0, excess); return program_header.p_memsz; }
Note that we didn't use any memory deallocation/allocation function for the program section buffer, so we are simply overwriting the game module already loaded in memory.
Now all the code & data needed for the ELF to run is loaded in memory. Let's continue with it.
Stub Headers
Ok, time to more serious things: resolving stubs. But for that we need to know first what a stub is and what structure it has.
Stubs are functions imported by an executable. We call a function "imported" when its implementation (the actual code) is not included in the executable itself, but in the OS. So the OS has to resolve these imports, that is, make those imported functions point to the right OS code (doing a jump or a syscall in our case here). As we saw before, ELF stubs are this way before being loaded:
; Section .sceStub.text - Address 0x08C92974 0x08C92974: 0x03E00008 '....' - jr $ra 0x08C92978: 0x00000000 '....' - nop 0x08C9297C: 0x03E00008 '....' - jr $ra 0x08C92980: 0x00000000 '....' - nop 0x08C92984: 0x03E00008 '....' - jr $ra 0x08C92988: 0x00000000 '....' - nop 0x08C9298C: 0x03E00008 '....' - jr $ra 0x08C92990: 0x00000000 '....' - nop 0x08C92994: 0x03E00008 '....' - jr $ra 0x08C92998: 0x00000000 '....' - nop 0x08C9299C: 0x03E00008 '....' - jr $ra 0x08C929A0: 0x00000000 '....' - nop 0x08C929A4: 0x03E00008 '....' - jr $ra 0x08C929A8: 0x00000000 '....' - nop 0x08C929AC: 0x03E00008 '....' - jr $ra 0x08C929B0: 0x00000000 '....' - nop 0x08C929B4: 0x03E00008 '....' - jr $ra 0x08C929B8: 0x00000000 '....' - nop 0x08C929BC: 0x03E00008 '....' - jr $ra 0x08C929C0: 0x00000000 '....' - nop 0x08C929C4: 0x03E00008 '....' - jr $ra 0x08C929C8: 0x00000000 '....' - nop 0x08C929CC: 0x03E00008 '....' - jr $ra 0x08C929D0: 0x00000000 '....' - nop ...
and so on... So we need to have those calls that do absolutley nothing getting replaced by proper jumps and syscalls, like:
; Section .sceStub.text - Address 0x08C92974 0x08C92974: 0x0A200020 ' . .' - j loc_08800080 0x08C92978: 0x00000000 '....' - nop 0x08C9297C: 0x0A20002B '+. .' - j loc_088000AC 0x08C92980: 0x00000000 '....' - nop 0x08C92984: 0x03E00008 '....' - jr $ra 0x08C92988: 0x00088A8C '....' - syscall 0x222A 0x08C9298C: 0x03E00008 '....' - jr $ra 0x08C92990: 0x00088B8C '....' - syscall 0x222E 0x08C92994: 0x03E00008 '....' - jr $ra 0x08C92998: 0x00088C8C '....' - syscall 0x2232 0x08C9299C: 0x03E00008 '....' - jr $ra 0x08C929A0: 0x000882CC '....' - syscall 0x220B 0x08C929A4: 0x03E00008 '....' - jr $ra 0x08C929A8: 0x000886CC '....' - syscall 0x221B 0x08C929AC: 0x03E00008 '....' - jr $ra 0x08C929B0: 0x0008874C 'L...' - syscall 0x221D 0x08C929B4: 0x03E00008 '....' - jr $ra 0x08C929B8: 0x00084D8C '.M..' - syscall 0x2136 0x08C929BC: 0x03E00008 '....' - jr $ra 0x08C929C0: 0x00084E8C '.N..' - syscall 0x213A ...
Each pair is an imported function jump/call. So for example any call in the code that refers to sceKernelCpuSuspendIntr() is actually doing a JAL to the address 0x08C92974, which as you can see will jump to address 0x08800080, which is user mode memory, so no need for syscall. But the question is how do the OS knows what stubs references what function?
For this, there's an ELF section named .lib.stubs that has stub headers like this one:
typedef struct { u32 library_name_index; u16 import_flags; u16 library_version; u16 import_stubs; u16 stub_size; u32* nid_pointer; u32* jump_pointer; } tStubEntry;
Each one of this stubs describes the imports for a given library. So we'll have a stub header for each library imported by the ELF. Let's see the meanings of the data important to us:
- library_name_index: this is an index in string table section that holds library name as a NULL terminated character string.
- stub_size: number of stubs to resolve for this library.
- nid_pointer: address of first NID of library in NIDs array.
- jump_pointer: address of first stub to be resolved in stubs array (like seen before ;))
What is a NID? A NID is a 32-bit universal identifier for FW functions. This way, FW recognizes what function we're talking about, just by an integer. For most functions this is first 32-bits of SHA-1 hash of function name, but some NIDs were just randomized by Sony back in 2.70 FW IIRC.
Now it's so simple: we just have to resolve all stubs indicated in each stub header. For each library (we don't even need to know the name, as NIDs are universal ;)), we have to resolve as many NIDs as indicated by stub_size, and for each NID, we have to resolve the correspondent stub. A simple approach in C pseudo-code would look like this:
/* Resolves imports in ELF's program section already loaded in memory */ unsigned int resolve_imports(tStubEntry* pstub_entry, unsigned int stubs_size, tNIDResolver* nid_table) { int i,j; u32* cur_nid; u32* cur_call; unsigned int resolving_count = 0; /* Browse ELF stub's headers */ for(i=0; i<stubs_size; i+=sizeof(tStubEntry)) { cur_nid = pstub_entry->nid_pointer; cur_call = pstub_entry->jump_pointer; /* For each stub header, browse all stubs */ for(j=0; j<pstub_entry->stub_size; j++) { /* Resolve stub */ resolve_call(cur_nid, cur_call); /* Next NID */ cur_nid++; /* Next stub (8 bytes each) */ cur_call += 2; /* Update count */ resolving_count++; } /* Next stub */ pstub_entry++; } return resolving_count; }
But the question is now obviously: how to resolve the ELF stubs?
Resolving Stubs
So now we know what a stub is, and also how it looks like. Time to see how we can resolve the stubs for the ELF program we already loaded.
To do this, we can simply use the game's already resolved imports and subtitute the right stub with the right call from the game's stubs. To do this, I would first backup all game's stubs in a data structure, so we don't lost them if they're going to be overwritten by the homebrew ELF we loaded. So, as I said, first we need to backup the game's stubs before loading the ELF program section. This can easily be done using the information I gave on the previous chapter.
I personally prefer saving memory and not include the whole stubs, but only the effective call associated with a NID. For example this struct:
// Struct holding all NIDs imported by the game and their respective jump/syscalls typedef struct { u32 nid; // NID u32 call; // Syscall/jump associated to the NID } tNIDResolver;
Here I will store the function NID and its real call, that is, the jump instruction if it's a user mode system call, or the syscall if it's a kernel mode system call. This way I only need half the size of the stubs, as the other instruction needed for each system call mode is always the same (nop for user mode, jr $ra for kernel mode). The little problem with this approach is knowing when to take first instruction as real call (user mode) or the second one (kernel mode). Here a refresh about user/kernel mode system calls:
User mode system call:
0x08C92974: 0x0A200020 ' . .' - j loc_08800080 0x08C92978: 0x00000000 '....' - nop
Kernel mode system call:
0x08C92984: 0x03E00008 '....' - jr $ra 0x08C92988: 0x00088A8C '....' - syscall 0x222A
This can easily be solved by bitwise checking the stub's first instruction:
#define SYSCALL_MASK_IMPORT 0x01000000 // Return real instruction that makes the system call (jump or syscall) u32 get_good_call(u32* call_pointer) { // Dirty hack here but works if(*call_pointer & SYSCALL_MASK_IMPORT) call_pointer++; return *call_pointer; }
This function receives a pointer to the stub. You can see that *call_pointer & SYSCALL_MASK_IMPORT in fact checks if the bit 24 is set (thus the instruction is a JR) or reset (thus the instruction is a J). So if the first stub instruction is a JR, we need to take the stub's second instruction (the syscall), else we just return the first one (the jump). Then just store the effective call on the tNIDResolver data structure.
To resolve the ELF stubs, we just do the inverse process: check the effective call we have on the tNIDResolver data structure, then recreate the stub's missing instruction. Here's an example:
#define JR_RA_OPCODE 0x03E00008 #define NOP_OPCODE 0x00000000 /* Subsitutes the right instruction */ void resolve_call(u32 *call_to_resolve, u32 call_resolved) { /* SYSCALL */ if(!(call_resolved & SYSCALL_MASK_RESOLVE)) { *call_to_resolve = JR_RA_OPCODE; *(++call_to_resolve) = call_resolved; } /* JUMP */ else { *call_to_resolve = call_resolved; *(++call_to_resolve) = NOP_OPCODE; } }
But as some of you may have noticed, this will only allow us to resolve the ELF imports that are also the game's imports... But what if the ELF imports are not the same? What if there's an ELF import that is not imported by the game? Well, that's another matter, maybe I shall discuss a solution to this problem later