This doc is meant for a technical audience that is new to AI/ML, has their social feeds filled up with GenAI news, and wants to wrap their head around it—by rooting their understanding on top of foundations, and not hype.

Hence, this doc:

[This doc has been shared externally; Feedback is welcome!]

The model

The stack

Analogous to the OSI model for networking. These boundaries are still emerging: companies are crossing over them as business models and customer needs evolve (e.g. OpenAI is in the application layer, base model layer, inference layer, and arguably also in the fine-tuning and app framework layers)

Layer name Description Examples
Application layer App or feature built on top of an LLM

[Key question] Who is to gain from these features: incumbents (e.g. GitHub) or new startups (related: do LLMs create a new business model? Sell the work) | ChatGPT, GitHub Copilot, Jasper | | App frameworks layer | Libraries or frameworks that assist building applications for a particular workload[1] (e.g. RAG)

[Optional layer] Not all apps use frameworks, since they can straitjacket experimentation (learn more)

[1] The workload might require other components that are not in this table. For example, RAG might require vector databases for retrieval | LangChain, Llama Index | | Middleware layer | Observability, testing, model failovers or routing. Can retro-fit existing tools (e.g. an APM tool) or use LLM-specific tooling

[Optional layer] Becomes relevant after some level of product maturity | LangSmith (from LangChain), Portkey, Martian | | Fine tuning layer | There are many ways to fine tune models, and techniques like LoRA sit on top of base models (learn more)

[Optional layer] Just use base models to launch the end product faster | Frameworks like Axolotl, companies like Predibase | | Base model layer | The actual LLM. Can be from a large closed player (e.g. OpenAI), or large open player (e.g. Llama 3), or from long tail of specialized model builders (e.g. Defog for text-to-SQL). HuggingFace is the registry for all open models (from larger players or the long tail)

Model builders are partnering up with inference layers/cloud providers for enterprise distribution | GPTx from OpenAI, Claude3 from Anthropic, Gemini from Google | | Inference infra layer | Running the model in production to generate text etc is called inference

The future

Where is this headed?