Home » Posts tagged 'Taxonomy management systems'
Tag Archives: Taxonomy management systems
The Art of Taxonomy Workarounds

“With the greatest of ease / I stay on track like the greatest of skis.” – Stress Eater, Czarface, Kool Keith, Rocket Science
In all my years working in taxonomy, there have been few times I have walked into an environment that already had a robust taxonomy program supported by a dedicated, centralized taxonomy management system (TMS). Far more common was that there was no taxonomy program or tooling established yet. There are many ways taxonomy programs begin at an organization, but in most cases, there is a period in which taxonomy work must be done in tools insufficient for true enterprise taxonomy management.
Not moving forward with taxonomy development or waiting for a proven use case or budget necessitating a dedicated system are rarely options when laying down the foundations for an enterprise taxonomy program. What do taxonomists do in the meantime? How do they stay on track with taxonomy development while waiting for the right tools to support scalable taxonomy? There are always taxonomy modeling or tooling workarounds (I suppose someone with better clickthrough titling would call these “taxonomy hacks”) to get taxonomy programs started while the case is being made and the budget is allocated for a robust, centralized TMS.
Spreadsheets
I have never met a taxonomist who didn’t start taxonomy development in spreadsheets. The random glossaries and lists you might find in unstructured PDF or text documents during a vocabulary audit are great taxonomy starting points, but truly actionable and useful current or potential taxonomy values live in spreadsheets, either as a reference file or as a metadata schema output from an active system.
Some of these files output from existing systems might have hierarchy, but I would expect that more often than not, the spreadsheet is a set of columns with flat lists. There may be dependent metadata fields in which a selection from one list drives the values appearing in the next list, but all of the lists are flat. From these outputs, taxonomists taxonomize, adding hierarchy, synonyms, definitions, and other columns into a spreadsheet format ingestible by a future state TMS.
Not every organization has the budget or can be convinced to find the budget, so taxonomies in spreadsheets may be the beginning and end of your taxonomy management tooling. It is possible to do a “pencil and paper” taxonomy management program in which a highly governed, single source of truth spreadsheet is fed, in whole or in part, to consuming systems. It may even be possible to assign the taxonomy concepts unique IDs, synonyms, definitions, and other property fields which can be ingested into consuming systems so there is more than concept labels to work from for tagging and analytics.
I don’t love management by spreadsheet as an enterprise taxonomy program, but it’s certainly a step in the right direction and a level of maturity above no centralized taxonomy management at all. The benefit of such an approach is that there is so much painful governance and process management overhead that eventually there are one of two outcomes: no more taxonomy management or the push for better tooling. Spreadsheet management is the roughest hewed of the workarounds.
Not Taxonomy Tools
The next level of enterprise taxonomy management with potentially few workaround issues are systems that “do taxonomy management” that certainly do not do taxonomy management by most taxonomists’ standards. These include tools like digital asset management (DAM) and content management systems (CMS). These tools are really term management tools at best since they don’t include any of the functionality to move from a term to a fully fledged concept.
Perhaps the most notorious of these smooth criminals is the SharePoint Term Store, which is great for all things SharePoint in SharePoint and in SharePoint only. But, all of my snide comments aside, works well enough if most of your work is in SharePoint. The Term Store does include hierarchy (but not polyhierarchy), synonyms, definitions, and generates unique IDs for each term. The management is relatively easy and there are manageable limits to hierarchy depth and total term count.
I was recently at a DAM conference and asked a vendor about their taxonomy management functionality. I suggested it would be a temporary stepping stone until the organization I was working for could move to a dedicated enterprise taxonomy management system. He asked with incredulity, “But why would you want to manage taxonomies anywhere but in the DAM?!” Now that’s dedication to your product! In a world in which all digital assets are created, managed, and reused in a single system without needing any overlapping metadata application to any other content for any other purpose, I guess you wouldn’t need to manage taxonomies outside the DAM. Somewhere in this alternative marketing-only dimension, the taxonomies never see anything outside the DAM.
DAMs do manage taxonomies to some extent because metadata is absolutely essential for unstructured content with little or no text to search against. The only way to properly search for and reuse digital assets is to have associated textual metadata. The limitations in these systems are typically a series of managed flat lists with the ability to include dependencies between selected values, potentially reducing the amount of time it takes to apply metadata and the number of choices someone has to wade through to get to the right values. Some DAMs do include some shallow hierarchy in the taxonomy management and hierarchy can further be built out by using lists in a predetermined hierarchical folder structure. You can get quite a bit out of this taxonomy and folder harmony, including inherited taxonomy values. These structures tend to be inflexible and require heavy lifts to change after they are already in use. Most DAMs generate some kind of unique ID for each term. You may even have fields for synonyms and definitions, but there’s not much you can do with these properties.
I’m being pretty hard on the taxonomy capabilities of these systems, but they are a step up from managing taxonomies by spreadsheets. What they sorely lack to be an enterprise TMS, however, is the ability to tag with the same metadata values outside the system, requiring aligning taxonomies across systems by label only. Matching by label is severely limiting and requires a phenomenal amount of overhead in business alignment and the actual execution of any changes to taxonomy values, which may need to be updated across multiple systems within a reasonably short timeframe. All this not accounting for values which should not or can not be changed because they drive system workflows or create orphaned content when the term label changes.
Another fundamental problem that non-taxonomists working in a DAM or CMS can overlook is that, even with the best governance and alignment between systems, the taxonomies are always separated by system location and proprietary technology. Any alignment of taxonomies is surface only, mapped by label or ID matching. Taxonomy management across systems in this way is tenuous and unsupported by common technology and standards architectures. Thus, while working in systems that are production ready, the taxonomy alignment across systems is only slightly better (if better) than managing taxonomies from a single source of truth spreadsheet.
The Workarounds
Now that I’ve outlined the tooling supporting workarounds when an organization does not have a TMS, what can taxonomists do to make sure these workarounds are as painless as possible to move away from and toward a proper centralized solution?
Design for the future. Despite taxonomy tooling limitations, there are still several fundamental taxonomy construction best practices taxonomists can employ to help future-proof the taxonomies and ready them for import into a TMS. While you never know what strategic shifts an organization may make, including changing branding, mergers and acquisitions, and responses to social and technological change, there are core taxonomies which won’t change much from organization to organization. Designing around repeatable, core taxonomy domains and creating faceted taxonomies like Content Types, Geography, Products, and Topics can reduce large modeling changes within one system or when migrating from one system to another. Setting up separate, mutually exclusive and collectively exhaustive (MECE) faceted taxonomies allows for greater flexibility to add or subtract vocabularies as needed, including the ability to take in publically available vocabularies as part of the mix.
Even significant limitations like lack of hierarchical support does not need to prevent the development of taxonomies prior to using a TMS. Admittedly unwieldy, creating separate faceted taxonomies as flat lists in systems that don’t support hierarchy can get taxonomies into production. Once a TMS is purchased and the existing flat taxonomies are being imported, hierarchies can be created in spreadsheets in exported files, designated as parents and children in other file types before import, or manually created in the tool itself after import. If there are no label changes or there are unique IDs for each concept, hierarchies can be created in the new TMS while still mapping to the original concepts used to maintain tagging continuity.
Unique IDs. Unique Uniform Resource Identifiers (URIs) are a core concept of taxonomy management systems based on RDF, SKOS, and OWL. URIs ensure that any concept label and its associated properties and relationships are unique, even in cases where labels match but the contextual meanings don’t (though, of course, this should be avoided). No matter what happens to a concept label, the URI remains the same across systems, allowing system users to do things like use an alternative label as the preferred label or consume different concept properties. At the end of the concept lifecycle, when users are looking to aggregate data from multiple systems for business intelligence and analytics, knowing a single concept URI exists without having to map several labels to understand the data is a huge burden off analysts.
While many systems generate some kind of ID, there can be limitations. For example, some systems generate an ID based on a namespace combined with the label name. In such a case, the system may not allow for taxonomy labels to be the same because the ID will then be the same. However, these restrictions may not exist and the system will generate the same ID for two labels which are the same but have different hierarchical contexts. Scenarios like these are exactly the reason unique URIs are a differentiator in taxonomy management. The way IDs are generated can have an impact on the way you design taxonomies in platforms that weren’t meant to act as centralized TMS.
Whatever the case, creating unique IDs with the tooling you have at hand will help to provide history and concept continuity when moving to a TMS. While not common, label names can change. An acronym can be switched out for the full form or an organization name may change (Twitter to X, for example). In these cases, matching label to label across systems won’t maintain any historical context. If the concept has a unique ID which does not change, then these can follow the concept wherever it goes. Finding the equivalent in your systems or creating them manually in spreadsheets allows for mapping back to URIs by property or label when moving to a TMS.
Clever properties. If you have acknowledged the limitations of the system in which you are designing taxonomies for the future, then you use several modeling practices to make the migration into a TMS easier. Rather than supporting a mapping table, you can create a property to house the taxonomy concept ID from the old system. Upon import, every concept will be assigned a new URI in a TMS. URIs are typically the combination of a designated namespace (http://www.myorganization.com/enterprisetaxonomy, for example) with the name of the concept or an alphanumeric ID appended at the end (http://www.myorganization.com/enterprisetaxonomy/myConcept). Namespaces and ID generation patterns can typically be set in the TMS to generate unique IDs based on an existing company policy or can be newly established. Maintaining the old concept ID from legacy systems provides historical context and can prevent significant content retagging by mapping the newly generated concept in the TMS to the labels and IDs used previously.
I’m making this sound easy, but there are always going to be some challenges moving from legacy tooling to a new TMS. That said, the transition can be made easier and maintain concept continuity by planning ahead and building mechanisms into the taxonomy modeling itself to tie legacy taxonomies and tools to the taxonomies in a new TMS. An organization may never get out of the workaround phase of taxonomy management, but using taxonomy modeling and tooling best practices will make governance a lot easier.