A from-scratch software 3-D renderer written in Lua — no GPU, no graphics library math, just pixels.
Backends: Love2D main.lua & LuaJIT + SDL2 app.lua · Shared core: common.lua algebra.lua
Every frame, the renderer executes these stages in order:
The renderer is entirely software: every pixel is computed in Lua and written
to an in-memory buffer (ImageData for Love2D; direct pointer write to
SDL_Surface.pixels for LuaJIT). The buffer is uploaded to the window
once per frame in a single call.
Moving a point by an offset — the simplest transform:
Rotating a point in the XZ plane (turning left/right) uses the 2-D rotation formula applied to the X and Z components while Y stays fixed:
Intuition: imagine the unit circle in the XZ plane. A point at angle φ from the X-axis moves to angle φ+θ after rotation. cos/sin give the new X and Z components of that rotated direction.
function point_rotate_y(point, radiants)
return point_rotate_axes('xz', point, radiants)
end
-- Generic: rotate two named axes of a point by angle radiants
function point_rotate_axes(axes, point_in, radiants)
local one, two = axes[1], axes[2]
point_out[one] = math.cos(radiants)*point_in[one] - math.sin(radiants)*point_in[two]
point_out[two] = math.cos(radiants)*point_in[two] + math.sin(radiants)*point_in[one]
end
Real eyes (and cameras) see distant objects as smaller. Perspective projection maps a 3-D point to a 2-D screen position by dividing by depth (−z, because the scene is in front of the camera at negative Z).
focal (= 200) is the focal length: it controls the field of view. Larger focal → narrower FOV, more telephoto-like. (cx, cy) (= 150, 150) is the screen centre, so objects on the optical axis project to the middle of the window.
inv_z = 1 / (−z) per vertex
and stores it. This same value is later reused for perspective-correct UV interpolation —
so the division is paid only once.
A closed mesh (like a head) always has faces pointing toward the camera and faces pointing away. The away-faces are invisible and can be skipped entirely — typically ~50% of all triangles.
In world space the camera is at the origin looking along −Z. A face is front-facing if its normal has a positive Z component (pointing toward +Z, toward camera). The Z component of the cross product of two edges gives exactly this:
The ≥ 0 (not > 0) is intentional: edge-on faces (nz = 0)
like a horizontal floor plane are included rather than silently dropped.
Backface culling runs before shading and perspective, so culled triangles pay
no further cost at all.
A surface lit by a directional light reflects more light the more directly it faces the light. This is captured by the dot product of the surface normal and the direction toward the light:
N̂ is the unit surface normal. L̂ is the unit vector pointing toward the light. The dot product equals cos(θ) where θ is the angle between them. Clamping to 0 means the face receives no negative light when it points away.
Real scenes have indirect light bouncing from all surfaces. We fake this with a constant ambient term added to every surface:
Instead of computing lighting once per triangle (flat shading), Gouraud shading computes it per vertex using the vertex's own normal, then smoothly interpolates the resulting colors across the face during rasterisation. This gives smooth curvature on coarse meshes at low cost.
When two triangles overlap on screen, the closer one should win. The z-buffer stores the depth of the closest fragment seen so far for each pixel, and a new fragment is drawn only if it is closer (higher z, since the scene uses negative Z for depth):
The buffer is initialised to −∞ each frame, so the very first
fragment at any pixel always wins. After drawing, the buffer is updated with
the new depth.
The buffer is allocated once at module scope and reset with a fill loop —
avoiding the 90 000 table-slot re-allocation that would otherwise occur each frame.
Barycentric coordinates are the language of triangle interpolation. Given triangle vertices A, B, C, any point P inside the triangle can be written as:
Geometrically, rA is the fraction of the triangle's area in the sub-triangle formed by P, B, C (opposite to A), and similarly for rB, rC. When P is at vertex A: rA=1, rB=rC=0.
Any per-vertex value V (depth z, RGB color, UV texture coordinates) is then interpolated across the face:
Naively evaluating the barycentric formula costs 4 multiplications + 4 additions per pixel. The incremental trick cuts this to 3 additions.
Along a scanline (fixed py, increasing px), barycentric numerators are linear functions of px. When px increases by 1:
-- Seed at the left edge of each scanline:
local ra_num = ax*(x_min - pre.cx) + pre.ay*(py - pre.cy)
local rb_num = bx*(x_min - pre.cx) + pre.by*(py - pre.cy)
local rc_num = pre.common - ra_num - rb_num
for px = x_min, x_max do
-- Inside test: all numerators ≥ 0 (assuming common > 0 from CCW winding)
if ra_num >= 0 and rb_num >= 0 and rc_num >= 0 then
-- Divide ONLY for inside pixels (the minority):
local ra = ra_num * inv_common
local rb = rb_num * inv_common
local rc = rc_num * inv_common -- == 1 - ra - rb
-- ... depth, color, UV interpolation using ra/rb/rc ...
end
-- Step: 3 additions, no multiplications:
ra_num = ra_num + ax
rb_num = rb_num + bx
rc_num = rc_num + drc
end
common (expensive) is skipped
for those — only the sign of the numerators matters for the inside test.
Texture coordinates (UV) cannot be linearly interpolated in screen space. Because perspective projection compresses distant parts of a surface, naive linear UV interpolation "swims" as the mesh rotates — a classic artefact.
Quantities that are linear in screen space are u/z,
v/z, and 1/z (recall inv_z = 1/(−z)
is stored per vertex by perspective()).
Interpolate those with barycentric weights, then divide:
Writing one pixel at a time through a graphics API (one SDL_FillRect
or love.graphics.rectangle per pixel) incurs enormous per-call overhead.
Both backends instead accumulate all pixels into memory and flush once:
| Backend | Write per pixel | Flush once per frame |
|---|---|---|
| SDL2 / LuaJIT | uint8_t* pointer write via FFI |
SDL_UpdateWindowSurface() |
| Love2D | ImageData:setPixel() |
Image:replacePixels() + love.graphics.draw() |
The SDL2 path uses the same ffi.cast("uint8_t*", surface.pixels) pattern
as texture sampling and writes the packed pixel directly — either as a 32-bit word
(4-byte surfaces) or three sequential bytes (24-bit surfaces).
| Optimisation | Technique | Gain |
|---|---|---|
| Pixel buffer | Write to memory, flush once per frame | Eliminates ~90k API calls/frame |
| Depth buffer reuse | Allocate once, reset with fill loop | Eliminates ~90k table allocs/frame |
| Hoist closures | halfplane, inside_polygon → module-level locals |
Enables JIT compilation of inner loop |
| Backface culling | nz ≥ 0 test before shading/perspective | ~50% fewer triangles processed |
| Incremental barycentric | Seed + add per scanline step | 4 mul+add → 3 add per pixel |
| Unified interpolation | One barycentric pass for inside+z+color+UV | Eliminates 2–3× redundant barycentric + table allocs |
| Reuse pixel tables | pixel_rgb, pixel_xy, pixel_point at module scope |
Zero per-pixel allocations in hot path |
| Mesh | Triangles | FPS (SDL2 / LuaJIT) |
|---|---|---|
| cube.obj | 12 | ~18 |
| teapot.obj | 992 | ~52 |
| head.obj | 7 586 | ~9–12 |
| File | Role |
|---|---|
common.lua | Scene setup, transform, shading, perspective, backface cull, rasteriser |
algebra.lua | Vectors, cross/dot product, normals, barycentric coordinates |
loader.lua | OBJ and STL mesh file parser |
main.lua | Love2D backend: window, ImageData pixel buffer, event loop |
app.lua | LuaJIT + SDL2 backend: FFI, window surface, event loop |
assets/ | OBJ meshes, BMP textures |