Knox theorem

The theoretical foundation.

Why a workflow of specialized models, each trained on a narrow slice, returns more correct answers than the largest general model run alone.

The principle

Given a task that decomposes into independent sub-tasks, a system of small models specialized to each sub-task returns more correct answers than a single general model of equal aggregate parameter count.

The argument is straightforward. Specialization concentrates the loss landscape. A model trained only to write in one voice has fewer wrong answers available to it. A model trained only to extract entities from contracts cannot drift into prose. Compose them in a workflow that knows which one to call, and the system as a whole is more reliable than any one model could be.