The deeper question — the one that the adoption or rejection of language model cataloguing will force us to answer whether we like it or not — is an epistemological one. It is the question of what catalogue metadata actually is. Not what it does, which is relatively well understood, but what kind of thing it is: whether it is a description of an object, an interpretation of an object, a social contract about how an object will be retrieved, or something that cannot be reduced to any of these descriptions without loss. The answer to that question — which the field has mostly held in productive suspension — now has to be given, because machine-generated catalogue metadata embodies a specific and contestable answer, and deploying it at scale is an implicit commitment to that answer whether the deploying institution acknowledges it or not.
What Catalogue Metadata Has Always Been Ambiguously
The professional literature on cataloguing has, for decades, contained two traditions that have coexisted without quite reconciling. The first tradition treats cataloguing as documentation: the faithful transcription of properties of an object into a standardised vocabulary that enables retrieval. The second tradition treats cataloguing as interpretation: the assignment of meaning to an object that is never simply given by the object itself but requires contextual knowledge, professional judgment, and an understanding of who will be searching and why.
In practice, most cataloguers do both simultaneously and don't think very hard about which they are doing at any given moment. Subject headings for a seventeenth-century pamphlet require the cataloguer to know the controlled vocabulary, to understand what the pamphlet is about, and to make a judgment about which controlled terms best represent that content for the expected user community. The vocabulary constrains interpretation; interpretation operates within the constraint. The tension between these two modes is productive in skilled hands and invisible in routine ones.
What language model cataloguing does is resolve this tension by eliminating the second tradition. A large language model generating subject headings, generating abstracts, or assigning classification codes is doing something that looks like interpretation but is structurally documentation — it is transcribing statistical regularities in its training distribution into the standard vocabulary fields. It has no conception of the user community, no understanding of the institutional context, no sense of the difference between what a text means and what retrieval terms will best serve the people who need it. It has a very sophisticated model of which words tend to co-occur with which other words, and which co-occurrences tend to be assigned which catalogue descriptors.
This is not nothing. For a significant proportion of cataloguing decisions, the documentation function and the interpretation function converge on the same output, and a model that captures the documentation function well will produce adequate results. The problem is in the cases where they diverge — which are precisely the cases that matter most for research use of digital collections.
The Long Tail of Difficult Objects
The objects that most need cataloguing — the backlogs that motivated the turn toward automated cataloguing in the first place — are disproportionately objects at the margins of existing vocabulary and existing training data. The uncatalogued material in special collections tends to be unusual: manuscript records in non-dominant languages, ephemera without obvious genre classification, materials whose significance is contextual and institutional rather than intrinsic, born-digital objects whose form does not map cleanly onto any existing cataloguing tradition.
These are exactly the objects for which the documentation/interpretation distinction matters most, because they are objects for which the standard vocabulary was not designed and for which the training data distribution is thin. A model trained on the distribution of existing catalogue records will produce catalogue records that look like existing catalogue records. When the object being catalogued does not resemble the objects that generated the training distribution, the model will force it into the most similar available category — which may be wrong in ways that are subtle and hard to detect.
This is the long tail problem of automated cataloguing, and it is not a temporary problem that will be resolved by better models or more training data. It is a structural feature of the relationship between machine learning systems and the marginal cases that are, for archival purposes, often the most significant ones. The objects that are most important for preserving cultural heritage are frequently the objects that are most unlike the objects that generated the model's training distribution. The model's competence is inversely correlated with the importance of what it is being asked to catalogue.
FAIR Principles and the Question of Human Legibility
The FAIR principles — Findable, Accessible, Interoperable, Reusable — have become the de facto standard framework for evaluating data and digital object management in research and library contexts. The framework is well designed for what it is designed to do: operationalise the technical conditions under which digital objects can be effectively shared and reused across systems and communities.
What the FAIR framework does not address — and was not designed to address — is the question of what makes metadata true in the sense that matters for research: not merely machine-processable but accurately representing the content and context of the object in ways that support genuine scholarly work.
A catalogue record that is technically FAIR — that has a persistent identifier, that uses a recognised vocabulary, that is machine-readable and interoperable across systems — can still be wrong in the sense that matters for research. It can assign a subject heading that technically fits the object's surface content while missing the object's significance within its intellectual tradition. It can generate a description that correctly captures what is literally present in the text while failing to capture what the text is doing — its rhetorical purpose, its relationship to a genre or a polemical tradition, the specific historical context that makes its language meaningful.
These failures matter because researchers do not use catalogue metadata merely to find objects. They use it to evaluate whether objects are worth examining, to understand how objects have been classified by previous users and cataloguers, and to navigate the intellectual terrain of a field or a period. Metadata that is technically accurate and intellectually misleading serves these purposes badly — and the misleading can be harder to detect than the technically inaccurate, because it looks right.
The Competency Question and Its Institutional Dimensions
The adoption of automated cataloguing at scale has an institutional dimension that the field has been slow to address directly. The competency that automated cataloguing eliminates — the interpretive, contextually informed, user-aware work of skilled cataloguing — is a professional competency. It lives in people, and people who are not exercising it are not maintaining it.
An institution that replaces a significant proportion of its cataloguing work with automated processes is not simply changing its workflow. It is changing what kind of knowledge it employs and what kind of knowledge it will be able to regenerate if the automated system fails, is discontinued, or is found to have systematic errors that require retrospective correction. The degradation of professional cataloguing competency is not an immediate problem; it is a slow institutional risk that becomes visible only when the competency is needed and is not there.
This is not an argument against automation. It is an argument for being specific about what is being automated and what the institutional consequences of that automation are over a ten- or twenty-year horizon. The framing of automated cataloguing as a solution to the backlog crisis is accurate at the level of immediate throughput. It is potentially misleading at the level of institutional capacity, because the backlog will continue to grow, the material in it will continue to include difficult objects, and the professional capacity to handle difficult objects will have been allowed to atrophy.
What the Field Actually Needs to Decide
The conversation about language models and cataloguing has, in most venues where it is happening, been conducted as a cost-benefit analysis: what can the models do, how accurate are they, what is the quality threshold for different categories of material. This is a necessary conversation. It is not sufficient.
The conversation the field needs to have — and that the JCDL community is better positioned to have than most, because it spans the technical and the humanistic dimensions of the problem — is about what cataloguing is for. Not in the abstract, but concretely: for what user communities, in what institutional contexts, with what research purposes, does the documentation function and the interpretation function need to be separated? Where is the distinction between a technically adequate catalogue record and an intellectually adequate one significant enough to justify the cost of human cataloguing? And what does the answer imply for professional training, institutional staffing, and the allocation of the automated systems that genuinely can improve throughput in the cases where the distinction is not significant?
These are not questions that technology choices can answer. They are questions that require the kind of disciplinary self-examination that a conference constituted across computer science, library science, archival studies, and information policy is exactly the right venue to undertake. The temptation, when the tool is ready and the backlog is large and the resourcing is inadequate, is to answer the epistemological question by default — by simply deploying the tool and accepting what it produces as good enough. That default has consequences that will take years to become visible and longer to reverse.
The Vannevar Bush Problem, Revisited
Vannevar Bush's 1945 essay "As We May Think" — the founding document to which this conference's best-paper award pays tribute — was concerned, at its centre, with a problem that has not gone away: the problem of how a culture preserves and transmits its intellectual output across time and across the boundaries of individual minds. The Memex that Bush imagined was a retrieval system, but it was also something more: a technology for making connections that individual minds cannot make, for keeping intellectual work alive beyond the death of its creator, for allowing one generation to stand on the accumulated thought of the previous ones.
What the Memex assumed, and what Bush's essay took for granted, was that the intellectual output being preserved was accurately represented — that the indexing and linking that the system performed corresponded to the actual intellectual content of the documents it handled. The crisis that automated cataloguing creates is a crisis of that assumption. A culture whose intellectual output is catalogued by systems that describe surface without understanding substance is a culture that has automated the appearance of preservation without preserving the thing itself.
This is not hyperbole. It is a description of a failure mode that is technically possible and institutionally probable if the field does not make explicit choices about where human interpretive judgement is irreplaceable. The choice not to make that decision is itself a decision — and it is one that Vannevar Bush, who understood both the possibilities and the limits of information technology better than most of his contemporaries, would have recognised as a mistake.
The cataloguer's problem is not going away. It is going deeper. The question is whether the field goes with it.