PSP homebrew development

From homebrew.pixelbath
Revision as of 22:33, 20 January 2020 by Mhoskins (talk | contribs) (display list stuff)
Jump to navigation Jump to search

Useful Information

Note that while pixel depths are referred to by RGBA below, the actual bit order is BGRA (or BGR for formats without alpha).

Color format is adjustable, but display resolution is fixed when using the PSP display.

Pixel Formats

  • PSP_DISPLAY_PIXEL_FORMAT_8888 - 32-bit RGBA 8:8:8:8 - Full 24-bit color with 8-bit alpha, uses double the VRAM
  • PSP_DISPLAY_PIXEL_FORMAT_4444 - 16-bit RGBA 4:4:4:4 - 12-bit colors, plus 4-bit alpha
  • PSP_DISPLAY_PIXEL_FORMAT_5551 - 16-bit RGBA 5:5:5:1 - 5 bits per color channel, plus boolean alpha (1-bit)
  • PSP_DISPLAY_PIXEL_FORMAT_565 - 16-bit RGBA 5:6:5 - Slightly wider color gamut by bumping green bits to 6, with red and blue using 5 bits

Drawing Pixels

If you're just pushing raw pixels to the screen, there are two ways to do it. There's direct access to the framebuffer:

u32* vram = (u32*)0x44000000;
sceDisplaySetFrameBuf((void*)vram, 512, PSP_DISPLAY_PIXEL_FORMAT_8888, PSP_DISPLAY_SETBUF_IMMEDIATE);

The VRAM address is 0x04000000, changing the address into 0x04000000 means that reads and writes are uncached. This is slower, but has the advantage that you don't have to sync the cache manually and you're less likely to see graphic artifacts. You can use other memory locations for the framebuffer as well, but the address must be a multiple of 16 bytes.

Calling sceDisplaySetMode() before drawing is also recommended. The code above is for 32-bit mode (8 bits per channel), but other modes are available. Even though the PSP screen is 480 pixels wide, the framebuffer is usually 512 pixels wide to allow for faster address calculation, and as a result faster drawing.

Some of the hardware assisted features you can do with framebuffer include vertical scrolling and page flipping.

Or you can create an array of vertices and draw them using the GPU:

typedef struct {
	short unsigned u, v;
	short x,y,z;
} Vertex;

Vertex*v=sceGuGetMemory(1*sizeof(Vertex));
v[0].u=0;
v[0].v=0;
v[0].x=240;
v[0].y=136;

sceGuDrawArray(GU_POINTS,GU_TEXTURE_16BIT|GU_VERTEX_16BIT|GU_TRANSFORM_2D,1,0,v);

The GU routines use hardware to perform drawing operations for you. Note that you can draw to different locations, such as a memory buffer or to the framebuffer, in the latter case you'll see the result on screen. This also means it is possible to combine page flipping with hardware assisted drawing, and there's GU routines to help with this. It is important to note that since the GPU is drawing stuff the CPU are free to do other things, also note that you have to consider that the CPU doesn't know what the GPU is doing and vice versa. Because of cache, all textures, vertex lists and such must be in memory before you use them. This means you use a function to invalidate the cache for those memory regions causing any cached data to be written to memory. Again the same data also needs to be in a memory address that is a multiple of 16 bytes, but if you use GU routines to allocate the memory they will take care of this for you. Textures need to be swizzled as well.

Display Lists

Drawing graphics can be done by hitting VRAM directly, for decent performance and 3D work you need to use the display lists.

  • Each entry in the display list is 32-bits wide: 8-bits command, 24-bits data
  • 24bit data can represent an integer, a pointer or a cut down float
  • Can create sub-lists and which are called from your main list
  • Can specify custom vertex types, different size (8/16-bit fixed point, 32-bit float)
  • Each vertex type can include, position, colour, normal or weight
  • Vertices can be specified as a direct list or indexed
  • Due to floats only being 24bit precision can be lost during transformation stage

The display lists are transferred to the GE using a DMA mechanism, this has important ramifications when developing the lists.

  • A list can be transferred while building it using a 'Stall' address
  • The list must be written using an uncached memory area or the datacache written back before use
  • Lists should be terminated with a FINISH command
uint32_t list [2561024]; //Define a display list

int qid = sceGeListEnQueue(list, list, cbid, NULL);
/ Fill in the dispay list /
sceKernelDcacheWritebackInvalidateAll(); //Writeback cache
sceGeListUpdateStallAddr(qid, &list[endp]);

Developing VFPU Code

The 128 32-bit VFPU registers are reconfigurable to operate as single values, 2x2/3x3/4x4 rows, columns, matrices, or transposed matrices. The stored matrices can be multiplied in a single operation. The usual selection of trigonometric/square root instructions is available.

The VFPU also has an on-board pseudo-random number generator.

VFPU code is written in assembly, and the PSP build toolchain supports all known VFPU operations.

/ Multmatrix $a0result, $a1a, $a2b /
multmatrix :
/ Load matrices to internal registers /
ulv.q C000,   0($a1);  ulv.q C010, 16($a1)
ulv.q C020,  32($a1);  ulv.q C030, 48($a1)
ulv.q C100,   0($a2);  ulv.q C110, 16($a2)
ulv.q C120,  32($a2);  ulv.q C130, 48($a2)
/ Multiply matrices/
vmmul.q M200, M000, M100
/ Store result */
usv.q C200,   0($a0);  usv.q C210, 16($a0)
usv.q C220,  32($a0);  usv.q C230, 48($a0)

Tutorials

Resources