In the last post I described how chat interfaces to LLMs are giving a limited view of what the LLMs could do. In this post I wanted to compare this to graphics pipelines.

Graphics pipelines used to be "fixed function", where the graphics system (such as Silicon Graphics hardware) would offer you the graphics features:

  • you always got a matrix multiply (translate, scale, rotate, perspective, etc.)
  • you always got gouraud shading
  • you always got their standard lighting model (ambient, diffuse, specular, emission, shininess)
  • you got to choose up to 8 lights with glLight() with a handful of styles
  • you always got a color per object, but later they offered a texture lookup
  • you got to control the fog level with glFog() (density, color, etc.)

But what if you wanted a different model like phong shading? Too bad! What if you wanted any effect other than fog? Too bad! What if you wanted more than 8 lights? Too bad! What if you wanted 2 textures? Too bad! At least until they added a feature to enable a second texture lookup. If you wanted anything else, you needed to implement your own graphics pipeline and give up on hardware acceleration.

Over time, these fixed features were replaced by "shaders". The vertex shader would handle the overall structure with matrix multiply, clipping, fog, colors. Once we could control it, people came up with cloth, grass, water, terrain, animation, procedural geometry, and lots of other things. The fragment shader would handle fine details with texture lookup and lighting. Once we could control it, people came up with custom lighting, organic textures, animation, procedural art, and lots of other things.

Drawing of vertex and fragment shaders

Today's cloud-based LLMs remind me of the fixed function graphics pipeline. There's a fixed pipeline of steps they go through:

  • prepend a "system prompt" before your prompt
  • feed the transcript to the LLM
  • have the LLM make predictions about all the possible next words (tokens)
  • choose which token comes next (e.g. the highest probability)
  • append that token to the transcript, and go back to first step (feeding the entire transcript again to the LLM)

There's a bit of flexibility here. For example instead of appending the next token to the transcript, "tool use" says if the next token is a special marker, then we should run an outside program and insert its output into the transcript. That's kind of like what a shader does in graphics — instead of always doing things a certain way, we want to be able to reuse the pipeline as a whole, but replace one piece of it. But we still can't do much. Maybe I want to pick three possible tokens, and then split this into three transcripts, and see how they progress. Maybe they converge. Maybe they diverge. Maybe I want to go back an edit the existing transcript and let it run through again. Maybe I want to switch the roles of the human and computer in the transcript. Right now I can't do those things. The transcript is a graph and I would like to explore a "tree of thoughts" or analyze the whole graph instead of only seeing a single path through it.

Another bit of flexibility they offer is to constrain the how the tokens are chosen with "structured outputs". But we still can't do much. We might want to pick an unusual word occasionally (see Count Bayesie's visual explanation of how this can work), or only pick words that don't have the letter e (like Ernest Wright's story "Gadsby" that was written without the letter 'e'), or only pick words that represent a number, or only pick kiki or bouba words, or pick words that rhyme. With local models we can use a "logits processor", but there seems to be no equivalent for cloud based LLMs.

Right now, if I want to do anything other than the standard functions, I have to write my own pipeline and use local models. I give up on the hardware and convenience that the cloud-based LLMs offer.

So I've been wondering if we'll evolve the equivalent of graphics shaders but for LLMs. Broadly speaking, I think I'd like to have more control over how the overall transcript structure (mutable graph/tree instead of a single append-only path), and more control over how the individual tokens are chosen (fully programmable). These might be the equivalent of vertex and fragment shaders. I think today's cloud LLMs are similar to when graphics APIs only supported fixed lighting and fog. The provider is trying to imagine all the things we might want to do, and then program only those specific things. The future may allow people to plug in code to customize the process.

Drawing of transcript graph and token selection in LLMs

A side effect of adding this flexibility to the graphics pipeline is that GPUs got used for lots of things other than 3D graphics, such as scientific computing, cryptocurrency, and neural networks. What would LLMs be used for if they weren't limited to chat transcripts?

Labels: ,

0 comments: