Terms, Tags, and Classification

It is helpful to classify documents or other content items to make them easier to find later. Searching the full text alone can retrieve inaccurate results or miss appropriate documents containing different words from the words entered into a search box. A document or content management system may include features for tagging, keywords, categories, indexing, etc. What’s the difference between these?

Keywords vs. Controlled Vocabularies

There is a difference between free keywords and terms in a controlled vocabulary, although both have the same goal of supporting search. Keywords are words or phrases that the author, editor, or content manager comes up to indicate the topics of a document. Keywords may reflect the language of the document, and they may or may not be consistent across different documents. The same person will strive for consistency, but if individual authors are coming up with keywords, there will inevitably be inconsistency. A set of keywords for a document may include synonyms of each other to support search.

A controlled vocabulary provides a restricted list of terms permitted for tagging/indexing a document. Rather than coming up with words, the author or editor may only select what is available in the controlled vocabulary, although they may make suggestions to the owner of the controlled vocabulary to add new terms that will be useful across multiple documents. A controlled vocabulary has a single preferred term for each concept and may have synonyms redirecting to that preferred term. Depending on its size, a controlled vocabulary may be structured into a hierarchy, as a taxonomy.

It is possible to allow both keywords and controlled vocabulary terms in the same system. For example, it could be required to tag or index a document with controlled vocabulary terms and optionally to add supplemental keywords.

Tags vs. Categories

Tags are the terms or labels applied to documents or other content items in the process called “tagging.” Tags tend to be brief labels indicating what an item is about, and an item usually has multiple tags. Tags can be very specific or relatively broad. Information professionals might prefer to call them “index terms” and the process as indexing. A resulting organized, alphabetized list of tags could serve as an index, which could be browsed or merely searched. If a collection of tags is too large, it would not be fully displayed but rather searched. Sometimes a set of just the highly used tags may be displayed in a user interface. Tags can be keywords or they can be from a controlled vocabulary.

Categories, by contrast, emphasize categorization, which can also be considered for grouping or classifying. It implies putting something into a category, often represented in a user interface with the icon of a file folder, whether as an actual electronic folder path or just a depiction of a folder icon for a virtual categorization. While categories have different levels of specificity, the designation of “category” implies a collection of things, so there is an implicit understanding that categories don’t get too specific. An organized structure of nested categories typically constitutes a hierarchical taxonomy. Categories are generally a controlled vocabulary, rather than keywords.

A document cannot belong to more than one category in physical paper folders (unless you make photocopy of the document for each folder). In the digital world, on the other hand, it is possible to belong to more than one category, although it may depend on the content management system rules. Typically, there is more than one way to categorize (such as by document type or by jurisdiction), so there could be a set of categories for each of these types, with the expectation that a document is assigned only one category of each type. Even when it is possible to put a content item into more than one category of the same type, it is still preferable to have most documents assigned to only a single category of a certain type.

Because tags and categories are different, it is possible to have both at the same time on the same documents, especially if the categories are deliberately kept broad and the tags are relatively specific. Document management and content management systems increasingly offer features of both categories and tags for managing content.  In these cases, the challenge is to decide to what degree of classification to use the categories and to what degree to use the tags. Tags can be for specific topics, whereas categories can be for such things a document type, source type, industry or business area, organization type, or geographic location. But there are gray areas of distinction that challenge even a professional taxonomist.

Editor’s Note: This article is based on a blog post Tags and Categories in the author’s blog, The Accidental Taxonomist.

Posted in: Business Research, Case Management, Competitive Intelligence, E-Discovery, Information Architecture, Information Mapping, KM