Ryan M. Poe | LLM Glossary for People Who Build Stuff

I’ve done a number of AI-related projects. Nothing huge, and nothing from the metal, like training models. (I don’t have a year’s salary to set on fire!) If you don’t do a ton of Python (I don’t), some of the techniques appear at first glance magical. They’re referenced in arxiv papers with fancy names like “embeddings,” or “quantization,” as if to whisper “this is above your pay grade 😉.”

Nonsense.

Most of the jargon in the LLM space exists because it hasn’t yet made its way out of academia. I have an extreme bias against gate-keeping language (which, as useful as jargon can be, often functions to keep the plebs out), but that’s another story. For those of us looking to use LLMs to build neat things, the learning curve can look like a cliff.

But I contend that most of these terms are akin to “oh, why did you invent a whole new term for this?” Or, “oh, there’s a term for this?” I’ve tried to unpack as much of this as I could, so you don’t have to. (And because writing things down is useful for remembering them.)

Note: the terms below are defined within the context of LLMs and generative AI.

AI, artificial intelligence: Buzzword useful for separating gullible investors from their cash. 😏
- Typically referring to machine-learning-driven products, like Dall-E, ChatGPT, etc. The models themselves aren’t what constitutes AI, here. The real power is the glue to combine models, systems, prompts, and techniques that all funnel data to the right places so that the outputs make up what appears to be a functional intelligence. Whether or not this constitutes intelligence is above my pay grade. I just like to build stuff with cool tools.
Autocomplete, Chat: modes of LLM use. I think most models begin at autocomplete, but through tuning its use and inputs/outputs from users, researchers can turn autocomplete into a very convincing AI.
- For example, “What is the capital of Thailand?”, to a chat bot is autocompleted by turning that sentence to “The capital of Thailand is,” which it then autocompletes to Bangkok.
Back propagation: a technique to add new training data to a model by adjusting some or all previous vectors by some portion of the new training data.
- E.g., to train a medical model, you might establish a basic language model, then add an enormous set of medical data on top of that, then an enormous set of specialized data (e.g., oncology) on top of that. For more accurate results, you may back-propagate your oncology data over just a portion of your vectors, or all of them.
- It’s a very popular technique for refining a set of vectors, and has a number of pros and cons
- See also, HSIC
Context, Context window: the maximum size of a body payload when making calls to LLMs. In my experience, all LLMs silently ignore context window overflows, so you have to know the size of the window for the model you’re using, and make sure your context and question don’t exceed that.
- Context is generally anything not directly related to the question, but might assist an autocomplete model in generating the correct answer. In the above example about country capitals, for example, we might feed a list of country capitals into the LLM as context.
Embeddings: vectors of alike text, basically.
- It’s unclear why OpenAI and other vendors provide embedding generation. E.g., why use their embeddings over simple TensorFlow vectors?
- I’m guessing it matters because embeddings generated via these large models would probably group relevant things a bit better given its massive set of training data. For example “Titanic shipwreck” might be more tightly grouped with “tourism” or “implosion” after the 2023 Titan submersible accident. Tensorflow embeddings wouldn’t correlate these two concepts, but instead rely on the literal grouping between words in the embedding itself.
Embedding-assisted (retrieval augmented) generative AI: this is an entire system masquerading as a black box technology. You generate embeddings for your data, use those embeddings to search for relevance to a question being asked by the user, then stuff those search results into the context.
- It sometimes helps generate better responses, but in terms of interesting or useful technology, it’s cheap hack around LLM context windows.
Fine-tuning: providing novel data to an LLM that heavily affects its future output.
- Useful for taking a very general model, layering specialist knowledge over it, and getting a good linguistic/general model that can speak to very niche, specialized subjects.
- Different from general training by virtue of the weighting. You can use far less data for fine-tuning, and affect the resulting output much more than with training.
- Potentially cheaper than stuffing context windows: you only need to fine-tune once, and all future prompts are affected by the tunings. If you’re having to stuff a bunch of context into every LLM prompt, it’s likely going to be more expensive. This is also a trade-off, of course. You have more flexibility in how fast you can iterate if you avoid fine-tunings until you need them.
Generative AI: another term for asking an AI to do something, and having it generate a response that loosely correlates to your request.
HSIC: Hilbert-Schemidt independence criterion. An alternative to back propagation that adjusts existing model weights with more precision and flexibility (e.g., you can have multiple processes adjusting weights at the same time) than back propagation
- Does not appear to be in widespread use?
- See also, Back propagation
LLM, Large language model: term for a language auto-complete trained on a lot of data
- There is no watermark here for what constitutes “large” that I could tell
- Typically created by training a model on a lot of data. That data must be tagged and categorized by low-paid workers in poorer countries. Unclear if LLMs can scale beyond this base need for extremely cheap labor.
Machine learning: general term for using machines to build vectorized datasets of weights that can be turned into language models, image models, and all other sorts of models.
Modality, Multi-Modal: typically used to refer to the type of data an LLM can process. E.g., text or images.
- Multi-modality refers to something that can ingest many types of input: text, images, audio, etc.
- Multi-modality may also refer to a prompting strategy, where you prompt an LLM; receive its answer; and use that and other answers in future prompts in the same session so that the LLM has more of its own answers as context. (Pretty much all chat models are multi-modal in this way.)
Quantization: trimming decimals off of vectors to reduce the amount of math needed to be done to get results from a model. Reduces CPU needs much more than it affects the resulting accuracy, so tends to be used in low-resource systems.
Retrieval augmentation: a way to prompt a model with additional context added by vector-powered searches.
- See also embeddings, vectors
RLHF: Reinforcement learning from human feedback.
- Using humans to tell an LLM whether its answers are right or wrong, and having that human feedback added to the base model’s training set.
- Sometimes also referred to a technique of sending multiple prompts to an LLM with the prompt history, allowing the LLM to alter its output based on human feedback
Training: creating a set of vectors, and constantly tweaking them and adding to them, based on a set of training data.
- This is an intensive process, because each new data set’s weights must be factored into all past data. The bigger the model is–the more vectors it holds–the longer this takes.
Vectors: numeric representation of letters, words, and sentences.
Vector database: software to store a lot of vectors so you can do similarity lookups. E.g., “give me all documents in my database that are numerically similar to this question.” Very useful for full-text searches.
- Not required for any LLM use. People using vector databases to stuff context into a question for more relevant results. You might already have this by just… searching your database.
- Most useful where you don’t know the LLM prompt ahead of time. (E.g., you let users prompt their own data.)
Vectorization: the process of turning letters and words into a set of numbers that computers can perform math on.
Weights: how vectors are used by LLMs. Given previous output, LLMs send that output through the vector weights (via math) that determine what the next token will be.

What can LLMs do?

LLMs are advanced autocomplete. They’re not magic, they’re not “artificial intelligence,” they can’t reason or logic. Many things LLMs do appears to be reasoning and logic, but aren’t much more intelligent than a parrot repeating things it hears.

The fact that in many instances, this appears to be genuine reasoning capacity tells us nothing about whether LLMs are actually doing advanced reasoning, but it might tell us a little bit about the nature of intelligence. E.g., how critical language is to our understanding of intelligence.

That said, they can still do really cool stuff. Autocomplete on steroids turns out to be pretty useful if you have a known set of inputs with correlative outputs that are improved with convincing human language.

LLMs can

summarize a small amount of data
- amount of data is some percentage of the context window
- large amounts of data, however, must be systematically consumed, refined, and assembled separately
turn quantitative data into human-readable summaries
- make sense of noisy data, or difficult-to-decipher signals
- for best results, provide the model with many examples (maybe even fine tunings) of how outputs correlate with inputs
iterate against text output given a success signal (a sort of last-mile RLHF)
- e.g., “Here’s a unit test. Here’s some context. Write some code for it to pass. Iterate on the output I paste until the test passes.”
- human intervention and multiple points of confirmation of a result make this very effective

Note on updates

I will keep this document updated as I find new jargon to unpack.