Information Panopticon

Home » Uncategorized

Category Archives: Uncategorized

The AI-Taxonomy Disconnect

https://pixabay.com/photos/telephone-contact-us-call-contact-4440525/

“Me, I disconnect from you.” – Gary Numan, Me! I Disconnect from You

I have seen a significant decline in the number of taxonomist positions available. Wondering if I was in a myopic bubble based on my location or choice of job boards, I’ve been asking colleagues if they see the same thing. The general consensus is, yes, there are fewer taxonomist jobs available even as the importance of foundational data structures are becoming more important with AI tools. I have seen an increase, or at least a steady availability of, ontologist jobs, many requiring more technical expertise (the ability to create Python data ingestion pipelines and retrieval augmented generation (RAG) systems) than I have seen in the past. Again, this may be a matter of where I live or where I am seeking jobs, but it seems jobs in the taxonomy and ontology field are becoming more technical and less focused on the business operations side of working with stakeholders to provide analysis and guidance on semantic frameworks.

I suspect this shift is driven by a wider adoption of AI tools and a contraction in the job market. Employers seem to be seeking a single resource to do both the business and technical sides of implementing semantic structures and the foundational components for building out applications. I wonder if another factor is also the belief that large language models (LLMs) are a replacement for semantic models. If true, there are several reasons I can see that may be driving these beliefs.

Speed to Business

As I have touched upon before in my blog (Friction and Complexity and The Taxonomy Tortoise and the ML Hare), the manual, slower, but deliberate curation of controlled vocabularies can be seen as a roadblock to business speed and agility. There are several valid points here.

One area of pushback I see frequently in organizations is the response time from initially requesting a new concept, taxonomy branch, or vocabulary until it is available in production. The ownership of the concepts and data is, to some degree, taken out of the hands of the business users and put under the control of taxonomists who incorporate these concepts into centralized semantic models (taxonomies, thesauri, ontologies). Even with governance models including service level agreements stating turnaround times and taxonomy availability, not every group in the organization is going to see this centralized service as a benefit. Rather than wait for enterprise-level support, stakeholders may develop workarounds in services and tools to support their own use cases. As we know, this decentralizing of schemas and tools creates a fragmented landscape of differing terminology, functional support in what tools have available to manage taxonomies, and processes in the way data and content is handled and tagged with metadata.

Similarly, enterprise taxonomists serving many areas of the organization may seem to be too domain agnostic to serve the variety of use cases served by controlled vocabularies. While taxonomists do not need to be a domain expert in the areas covered by semantic models, the perception may be that their time spent ramping up in a domain would be better served having the subject matter experts in those domains build their own models. There is some validity here if the domains are truly standalone and operate to serve only those domain use cases. However, tying together various domain areas to form enterprise-wide knowledge graphs seems to be the direction most organizations want to go. If that’s the case, then a centralized taxonomy team as a service to the entire enterprise makes a lot of sense.

Given these counterpoints to slowly developing semantic models, why not, one may ask, simply ask machine learning models to provide the schemes we need to organize and optimize information?

Words Are Words Are Worlds

Are LLMs seen as a replacement, or at least a viable alternative, to taxonomies (using the term as a broad umbrella for all controlled vocabularies?

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) and provide the core capabilities of modern chatbots. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologiesinherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on (Wikipedia).

Certainly building your own language model reflecting all of the ways that natural language queries could be asked is non-starter when there are clearly highly performing, public LLMs available at our fingertips. The chatbots ChatGPT, Google Gemini, and Microsoft Copilot are some of the more familiar tools based on foundational LLMs. One of the primary, and arguably most successful, use cases for these tools is language generation. When prompted, a chatbot can generate text, format this text based on instructions or examples, and produce a slick product which, short of a quick review to ensure the content checks out, is ready to go. These LLMs are based on a “vast amount of text”; in short, a lot more text than you will be able to provide to train a whole model.

What LLMs are missing, however, is your context. There are many, fairly easy, methods for providing them context. You can allow them access to your documents, at work or at home, so the chatbot can “see” the content of your documents, the structures, and even the writing style. That is very much your context, allowing the chatbot to compose in a way that reflects your home or work artifacts and generate new text combining the existing LLM with your specific context. Using additional context moves an LLM from generic to specific, from a wider world of words to your world of words.

At enterprise scale, the same method applies. If your organization is large, has a lot of public exposure and presence, and has a history of public interaction on the Internet and in social media, LLMs are going to know a lot about you even without additional context. I think, however, for the same reason that the market for reusable, one-size-fits-all taxonomies has never taken off is that every organization—whether true or not—is unique and special. Outside of certain industries like healthcare, life sciences, pharmaceuticals, and finance which have well-defined, and often extremely complex, ontologies they can adopt, other industries or functions within a company do not. In my experience, marketing is a great example. Despite the common needs, I have never seen a marketing department adopt any public standard. They build from scratch even when a majority of the taxonomies are commonly available terms.

In these cases, building taxonomies and ontologies to add context specific to your organization provides LLMs with both words and structures modeling the world of your domain. In fact, it is becoming more common that the development of these taxonomies is being done with human-chatbot interaction. A taxonomist can provide glossaries, metadata schemas output in spreadsheets, and documents to provide chatbots with the raw materials to extract entities, cluster topics, compare values across documents, and other processes that once required text analytics tools and, sometimes, human intervention in the form of rule writing. The speed to taxonomy and ontology development is increasing. Like other iterative feedback processes, the taxonomist and LLM work together to create domain schemas in the form of taxonomies and ontologies that provide additional guidance to future “manual” and automated processes with LLMs in the mix.

Pure speculation on my part, but is the niche and sometimes still esoteric and obscure field of taxonomy and ontology design being replaced, for better or for worse, with LLM use? Or, more specifically, is it being viewed as a replacement for taxonomy and ontology building and the experts who do it? As I stated above, I have seen a shift from the business of taxonomies to the technology of ontologies.

I Disconnect from You

In my opinion, the roles of a taxonomist and of a technically skilled ontologist are still separate. While many people in the industry have the skills to do the work of both, the paths to the two roles have been different. Many taxonomists have library science degrees. They likely have technical skills, but are more focused on the information science aspect of taxonomy and ontology development, interfacing directly with the business and providing other services, such as research, business analysis, and support for use cases relying on semantic models. Ontologists are typically computer scientists who can code and develop the technical infrastructure for ontologies. Finding resources who can, or who like, to do both has not been common. This may be changing. Certainly the roles are asking for both, with, in my view, a leaning toward the technical.

Again, speculatively, is there a shift toward more technical resources in support of rising AI use in organizations? Is there a move to cut out the intermediary roles of taxonomists to let the business owners and technical implementers of taxonomies and ontologies interface more directly? If so, what do organizations lose in the process?

The Connect

If I were reading this blog, I would think the author was trying to sell you the value of taxonomists with more of the “soft” skills of research, business stakeholder interaction, and translation of business requirements into taxonomies and to the more technical resources who support their implementation and move to actionable production. After catching up on a few seasons of Landman and hearing in my head at some point in every episode, “Brought to you by [insert name of large oil company]”, maybe I’m a little sensitive to reading between the lines. You don’t have to. I am selling you on the value of taxonomists for all of the reasons I’ve listed above. If there is indeed a shift from the business skills of taxonomists to the technical skills of an ontologist instead of having the excellent skills of both, then your organization is missing out on a valuable resource who can work alongside AI technologies to bring the business requirements and practical domain building skills to bear.

In summary, believe there are two necessary components to bridge the seeming disconnect between AI and the foundational data quality governance needed to make AI operational:

  1. A taxonomist who can
    1. build taxonomies and ontologies to create domain-specific semantic models representing the business, 
    2. provide business analysis and requirements for technical implementation, and
    3. be the human-in-the-loop working with AI tools to continue building out, expanding, and governing semantic models, and
  2. Technical engineers who can
    1. operationalize ontologies by building data pipelines to AI tools, and
    2. focus on the engineering aspects of sharing out and productionizing ontologies for use across the enterprise.

An enterprise with both of these roles isn’t creating unnecessary resource overhead or additional layers effectively slowing the path from automation to implementation; rather, the two roles work in harmony to clarify requirements and optimize the use of AI in a variety of applications meeting business use cases. Taxonomists and ontologists are bridges between the business and the technical implementation enabling business needs.

The Art of Taxonomy Workarounds

https://pixabay.com/illustrations/maze-lost-confusing-puzzle-1800993/

“With the greatest of ease / I stay on track like the greatest of skis.” – Stress Eater, Czarface, Kool Keith, Rocket Science

In all my years working in taxonomy, there have been few times I have walked into an environment that already had a robust taxonomy program supported by a dedicated, centralized taxonomy management system (TMS). Far more common was that there was no taxonomy program or tooling established yet. There are many ways taxonomy programs begin at an organization, but in most cases, there is a period in which taxonomy work must be done in tools insufficient for true enterprise taxonomy management.

Not moving forward with taxonomy development or waiting for a proven use case or budget necessitating a dedicated system are rarely options when laying down the foundations for an enterprise taxonomy program. What do taxonomists do in the meantime? How do they stay on track with taxonomy development while waiting for the right tools to support scalable taxonomy? There are always taxonomy modeling or tooling workarounds (I suppose someone with better clickthrough titling would call these “taxonomy hacks”) to get taxonomy programs started while the case is being made and the budget is allocated for a robust, centralized TMS.

Spreadsheets

I have never met a taxonomist who didn’t start taxonomy development in spreadsheets. The random glossaries and lists you might find in unstructured PDF or text documents during a vocabulary audit are great taxonomy starting points, but truly actionable and useful current or potential taxonomy values live in spreadsheets, either as a reference file or as a metadata schema output from an active system.

Some of these files output from existing systems might have hierarchy, but I would expect that more often than not, the spreadsheet is a set of columns with flat lists. There may be dependent metadata fields in which a selection from one list drives the values appearing in the next list, but all of the lists are flat. From these outputs, taxonomists taxonomize, adding hierarchy, synonyms, definitions, and other columns into a spreadsheet format ingestible by a future state TMS.

Not every organization has the budget or can be convinced to find the budget, so taxonomies in spreadsheets may be the beginning and end of your taxonomy management tooling. It is possible to do a “pencil and paper” taxonomy management program in which a highly governed, single source of truth spreadsheet is fed, in whole or in part, to consuming systems. It may even be possible to assign the taxonomy concepts unique IDs, synonyms, definitions, and other property fields which can be ingested into consuming systems so there is more than concept labels to work from for tagging and analytics.

I don’t love management by spreadsheet as an enterprise taxonomy program, but it’s certainly a step in the right direction and a level of maturity above no centralized taxonomy management at all. The benefit of such an approach is that there is so much painful governance and process management overhead that eventually there are one of two outcomes: no more taxonomy management or the push for better tooling. Spreadsheet management is the roughest hewed of the workarounds.

Not Taxonomy Tools

The next level of enterprise taxonomy management with potentially few workaround issues are systems that “do taxonomy management” that certainly do not do taxonomy management by most taxonomists’ standards. These include tools like digital asset management (DAM) and content management systems (CMS). These tools are really term management tools at best since they don’t include any of the functionality to move from a term to a fully fledged concept.

Perhaps the most notorious of these smooth criminals is the SharePoint Term Store, which is great for all things SharePoint in SharePoint and in SharePoint only. But, all of my snide comments aside, works well enough if most of your work is in SharePoint. The Term Store does include hierarchy (but not polyhierarchy), synonyms, definitions, and generates unique IDs for each term. The management is relatively easy and there are manageable limits to hierarchy depth and total term count.

I was recently at a DAM conference and asked a vendor about their taxonomy management functionality. I suggested it would be a temporary stepping stone until the organization I was working for could move to a dedicated enterprise taxonomy management system. He asked with incredulity, “But why would you want to manage taxonomies anywhere but in the DAM?!” Now that’s dedication to your product! In a world in which all digital assets are created, managed, and reused in a single system without needing any overlapping metadata application to any other content for any other purpose, I guess you wouldn’t need to manage taxonomies outside the DAM. Somewhere in this alternative marketing-only dimension, the taxonomies never see anything outside the DAM.

DAMs do manage taxonomies to some extent because metadata is absolutely essential for unstructured content with little or no text to search against. The only way to properly search for and reuse digital assets is to have associated textual metadata. The limitations in these systems are typically a series of managed flat lists with the ability to include dependencies between selected values, potentially reducing the amount of time it takes to apply metadata and the number of choices someone has to wade through to get to the right values. Some DAMs do include some shallow hierarchy in the taxonomy management and hierarchy can further be built out by using lists in a predetermined hierarchical folder structure. You can get quite a bit out of this taxonomy and folder harmony, including inherited taxonomy values. These structures tend to be inflexible and require heavy lifts to change after they are already in use. Most DAMs generate some kind of unique ID for each term. You may even have fields for synonyms and definitions, but there’s not much you can do with these properties.

I’m being pretty hard on the taxonomy capabilities of these systems, but they are a step up from managing taxonomies by spreadsheets. What they sorely lack to be an enterprise TMS, however, is the ability to tag with the same metadata values outside the system, requiring aligning taxonomies across systems by label only. Matching by label is severely limiting and requires a phenomenal amount of overhead in business alignment and the actual execution of any changes to taxonomy values, which may need to be updated across multiple systems within a reasonably short timeframe. All this not accounting for values which should not or can not be changed because they drive system workflows or create orphaned content when the term label changes.

Another fundamental problem that non-taxonomists working in a DAM or CMS can overlook is that, even with the best governance and alignment between systems, the taxonomies are always separated by system location and proprietary technology. Any alignment of taxonomies is surface only, mapped by label or ID matching. Taxonomy management across systems in this way is tenuous and unsupported by common technology and standards architectures. Thus, while working in systems that are production ready, the taxonomy alignment across systems is only slightly better (if better) than managing taxonomies from a single source of truth spreadsheet.

The Workarounds

Now that I’ve outlined the tooling supporting workarounds when an organization does not have a TMS, what can taxonomists do to make sure these workarounds are as painless as possible to move away from and toward a proper centralized solution?

Design for the future. Despite taxonomy tooling limitations, there are still several fundamental taxonomy construction best practices taxonomists can employ to help future-proof the taxonomies and ready them for import into a TMS. While you never know what strategic shifts an organization may make, including changing branding, mergers and acquisitions, and responses to social and technological change, there are core taxonomies which won’t change much from organization to organization. Designing around repeatable, core taxonomy domains and creating faceted taxonomies like Content Types, Geography, Products, and Topics can reduce large modeling changes within one system or when migrating from one system to another. Setting up separate, mutually exclusive and collectively exhaustive (MECE) faceted taxonomies allows for greater flexibility to add or subtract vocabularies as needed, including the ability to take in publically available vocabularies as part of the mix.

Even significant limitations like lack of hierarchical support does not need to prevent the development of taxonomies prior to using a TMS. Admittedly unwieldy, creating separate faceted taxonomies as flat lists in systems that don’t support hierarchy can get taxonomies into production. Once a TMS is purchased and the existing flat taxonomies are being imported, hierarchies can be created in spreadsheets in exported files, designated as parents and children in other file types before import, or manually created in the tool itself after import. If there are no label changes or there are unique IDs for each concept, hierarchies can be created in the new TMS while still mapping to the original concepts used to maintain tagging continuity.

Unique IDs. Unique Uniform Resource Identifiers (URIs) are a core concept of taxonomy management systems based on RDF, SKOS, and OWL. URIs ensure that any concept label and its associated properties and relationships are unique, even in cases where labels match but the contextual meanings don’t (though, of course, this should be avoided). No matter what happens to a concept label, the URI remains the same across systems, allowing system users to do things like use an alternative label as the preferred label or consume different concept properties. At the end of the concept lifecycle, when users are looking to aggregate data from multiple systems for business intelligence and analytics, knowing a single concept URI exists without having to map several labels to understand the data is a huge burden off analysts.

While many systems generate some kind of ID, there can be limitations. For example, some systems generate an ID based on a namespace combined with the label name. In such a case, the system may not allow for taxonomy labels to be the same because the ID will then be the same. However, these restrictions may not exist and the system will generate the same ID for two labels which are the same but have different hierarchical contexts. Scenarios like these are exactly the reason unique URIs are a differentiator in taxonomy management. The way IDs are generated can have an impact on the way you design taxonomies in platforms that weren’t meant to act as centralized TMS.

Whatever the case, creating unique IDs with the tooling you have at hand will help to provide history and concept continuity when moving to a TMS. While not common, label names can change. An acronym can be switched out for the full form or an organization name may change (Twitter to X, for example). In these cases, matching label to label across systems won’t maintain any historical context. If the concept has a unique ID which does not change, then these can follow the concept wherever it goes. Finding the equivalent in your systems or creating them manually in spreadsheets allows for mapping back to URIs by property or label when moving to a TMS.

Clever properties. If you have acknowledged the limitations of the system in which you are designing taxonomies for the future, then you use several modeling practices to make the migration into a TMS easier. Rather than supporting a mapping table, you can create a property to house the taxonomy concept ID from the old system. Upon import, every concept will be assigned a new URI in a TMS. URIs are typically the combination of a designated namespace (http://www.myorganization.com/enterprisetaxonomy, for example) with the name of the concept or an alphanumeric ID appended at the end (http://www.myorganization.com/enterprisetaxonomy/myConcept). Namespaces and ID generation patterns can typically be set in the TMS to generate unique IDs based on an existing company policy or can be newly established. Maintaining the old concept ID from legacy systems provides historical context and can prevent significant content retagging by mapping the newly generated concept in the TMS to the labels and IDs used previously.

I’m making this sound easy, but there are always going to be some challenges moving from legacy tooling to a new TMS. That said, the transition can be made easier and maintain concept continuity by planning ahead and building mechanisms into the taxonomy modeling itself to tie legacy taxonomies and tools to the taxonomies in a new TMS. An organization may never get out of the workaround phase of taxonomy management, but using taxonomy modeling and tooling best practices will make governance a lot easier.

KMWorld 2025 Themes and Trends

https://pixabay.com/illustrations/ai-generated-creature-animal-8407601/

“But I’m a creep / I’m a weirdo / What the hell am I doin’ here? / I don’t belong here.” – Radiohead, Creep

I attended the KMWorld Conference and two of the four co-located sub-conferences, Taxonomy Boot Camp and Enterprise AI World, in Washington, D.C. last week. It was the 20th Anniversary of Taxonomy Boot Camp, and although I raised my hand when asked if I had attended 10 or more, I had never actually bothered to count how many years I’ve been to the conference. It turns out, I have attended KMWorld and the co-located conferences for 13 years out of the last 18, with my first being the 2008 Enterprise Search Summit West when it was still hosted on the West Coast in San Jose, California. Over this span of time, I’ve seen trends rise and fall, fears realized and alleviated, promises promised and promises kept.

Every year there seems to be at least one theme I can carry away from the conference, and for several years, I’ve written about the themes and trends from the conference as I see them. Some of these blogs are hosted on the Synaptica website and on my personal blog at Information Panopticon. I will go into some of those themes and trends as I did last month in my recap of the Henry Stewart Semantic Data Conference and HS DAM.

After many discussions at the event with colleagues in the industry, I can confidently state there was one overarching theme: things are weird.

Things Are Weird

What’s weird in 2025? Seemingly, everything. While the wider world may be as weird as it’s ever been, the state of knowledge management, taxonomy, and related fields covered by the conference have been supremely weirded and mostly by AI. There has been a rapid shift from feelings of dread that AI is coming for all of our jobs to an embracing acceptance that those who use AI as a work tool will maintain their place, or even excel, in the industry.

Despite this acceptance, the industry is still weird. In the world of taxonomy, we all understand and believe that AI and its need for clean, curated, semantic data will necessitate good taxonomists and taxonomy systems. However, while the bottom hasn’t completely fallen out of the market, there seem to be fewer jobs, more unemployed taxonomists, and lower salary and contract rates. Maybe it’s the generalized uncertainty in the tech sector driving the slump, but many people I spoke with at the conference think that it’s a misunderstanding of AI foundational needs that’s causing companies to lay off taxonomists and invest in AI more heavily.

What else is weird is how excited everyone is about AI but how few companies have managed to successfully bake AI practices into their processes. We’ve seen the incredibly successful use of AI in text summarization and generation, particularly when it comes to meeting minutes and the identification of who said what and what action items were decided upon. Agentic AI has had some success, and taxonomists seem to be working at skilling up in prompt engineering using already existing taxonomies and ontologies and generating new taxonomy options from generative AI prompts.

What was very weird for me is when you hear people in taxonomy talking about the auto-generation of taxonomies who (including myself) for years said that the results of autogenerated taxonomies were mediocre at best, you know there’s been a seismic shift in the industry.

AI in Everything, Everywhere, all at Once

Like every other conference I’ve attended this year, the focus is on AI. Something that was different this year, in my opinion, was the amount of more practical applications and use cases that were presented. While AI still hasn’t found its stride when it comes to maximizing ROI in a variety of use cases, there were very informative sessions on what skills knowledge management and taxonomy specialists need to master in order to work in an AI world. I have seen this reflected in the job market. There are still taxonomy and ontology roles being posted, but there are many more requiring technical skills like Python as part of the role. In essence, companies seem to be looking for people with both the business skills to build and manage semantic models as well as the technical skills to implement them in AI applications. While I know these people exist, I don’t know how common they are. People at the conference felt that there will always be a need for taxonomists who do not have deep technical skills, but an acknowledgement that getting skilled up in prompt engineering is a differentiator.

Several presentations in both the KMWorld and Taxonomy Boot Camp conferences focused on AI readiness. What does an organization need to appropriately identify AI use cases, guardrail the data within the organization so only what is appropriate is available to AI tools, and determine how humans fit into the process? Since generative AI can extract and generate metadata candidate concepts, it’s important to understand the results to interpret their trustworthiness and value. Many of these concepts are net new and should be added to taxonomies as concepts, properties, or relationships. Other values are net new but are ephemeral or trending and can be used in search, for example, but aren’t yet established enough to be codified in semantic models. Other concepts will already be in the taxonomy and can be discarded unless they are used to inform which content includes taxonomy values and should be tagged appropriately. Finally, some generated concepts are garbage or nonsense and should be used to inform and improve the machine learning model. All of these questions need to be answered and processes decided upon and put into practice.

Another common theme was the use of taxonomies and ontologies to inform and refine prompts and to support Retrieval Augmented Generation (RAG) with or without labeled content as part of a knowledge graph. Taxonomies and ontologies supply the domain specificity required to augment large language models (LLMs) to inform them of the specifics of the organizational perception. In short, using internal semantic models is a shortcut to providing the book of knowledge about the organization. What was once a topic in only a few presentations seemed to be much more prevalent this year.

As I mentioned earlier, organizations are now in the phase between conducting proof of concepts on AI in the organization and the need to productionize and provide ROI on AI in organizational workflows and processes. If the AI bubble is stretching, the continued low adoption and ability to prove ROI for organizations will cause the bubble to burst. The AI industry will only continue to grow and expand if there are tangible benefits increasing the bottom line of organizations who have adopted the practices and scaled them into production.

Taxonomy, Data Governance…and Where Is the Data?

At least two presentations focused on the intersection of semantics and data governance. These both stood out to me because I have experienced the intersection of semantic models and highly governed data in data lakes myself as there are frequent conversations about who owns which data and what is the ultimate source of truth. How do internally developed taxonomies and ontologies fit into the overall data governance model? Who owns which values in which systems? How are the practices similar and different?

In his keynote, Malcolm Hawker simplified two key aspects in the different disciplines. In data governance, one of the main concerns is accuracy and precision for measurement. In taxonomy, the focus is on meaning, or semantics. Bridging the two disciplines is the bringing together of measurement and meaning. AI is a motivating factor in reconciling the two disciplines due to the need for clean, structured data and its use in tagging unstructured content. Although the approaches to relational database focused data management and graph database semantic data are different, they are complimentary functions ensuring the best data quality in an organization.

Beyond who creates and owns the data and in which system it lives, organizations may be trending towards moving back to “walled gardens” of data to use in AI. The difference between public tools like ChatGPT and using LLMs within an organization puts the emphasis on what data is shared with the model and which users can see it. Since many AI tools use publicly available information on the Internet, organizations may protect their data and bring more of it in house so it is not available to such tools externally. Similarly, the concept of having a sovereign cloud, in which data is housed in the cloud but designed to meet regional data requirements and regulations, is probably gaining traction. Where there have been so many efforts to make data and content freely available, there is also recognition of which data should be available to machine learning models both to produce reasonable outcomes and to protect organizations and individuals. As states, provinces, countries, and regions pass additional laws about data, privacy, and AI, parsing out which data is subject to which regulations is becoming more difficult. Separating data in sovereign cloud environments may be a key to addressing this challenge.

In Summary

Things are weird, but the foundational value of semantics has not disappeared. The focus has changed and the toolsets we use to create and productionize semantic models have expanded, but the need for clean, accurate, source of truth metadata has only grown in importance.

Semantic Data 2025 Themes and Trends

https://pixabay.com/illustrations/rosette-mandala-art-model-design-7171917/

“Trust in me, just in me / Shut your eyes and trust in me” – The Jungle Book

I attended the Henry Steward Semantic Data Conference co-located with HS DAM in New York City a few weeks ago. As I’ve done with KMWorld in the past, I’m going to summarize some themes and trends I took away from both conferences, with an emphasis on Semantic Data.

Inevitable AI

The most common theme of both conferences, unsurprisingly, was artificial intelligence (AI) in all of its forms, applications, and impact. Broadly speaking, the key takeaway across all of the presentations and discussions was: this is happening. Whether it’s baked into digital asset management (DAM) systems (hint: it is), used wildly thrown at use cases until something sticks, or carefully governed with strict governance, guardrails to protect the organization, its people, and the people they serve, and measured to understand the effectiveness of different large language models (LLMs), AI is happening. So what do we, as digital asset and semantic data professionals, do about it? What is our role in the use of AI in the organization and in the public sphere? What are our responsibilities?

From the Semantic Data Conference, several themes emerged:

  • Organizations are going to experiment with generative AI models to develop workable pipelines with humans in the loop;
  • Context is key, and organizations can develop domain-specific and constrained semantic models to be used in conjunction with external LLMs;
  • It’s incumbent upon all of us to develop valid, organizationally-specific and curated training data sets to provide machine learning models the context to output reasonable results.

Themes from the Digital Asset Management Conference included:

  • AI can speed up the generation of assets and the automated application of metadata to those assets;
  • Access to clean, curated metadata is critical, both from taxonomies and sources like data lakes;
  • Metadata as a source of truth for embedded AI can lead to better analytics;
  • Asset provenance is essential for usage and rights management, especially when AI is involved.

Metadata Is Critical

That’s it. That’s the story. Metadata is critical. It has been, and it will continue to be. But, maybe, organizations are more aware of the importance of metadata because of the lightning fast rise of AI. Metadata is critically important as applied to digital assets, and semantic metadata powers better asset connections, discovery, personalization, and analytics.

Core to the importance of metadata is the importance of trust. Metadata quality must be trusted. The data and content to which metadata is applied must be trusted. Quality, trusted data leads to quality, trusted content and training sets which can feed into AI pipelines. Similarly, legal and reputational risks can be mitigated by ensuring the quality of information and data, especially as applied as compliance and usage rights.

Since semantic models are a source of truth for quality metadata, developing taxonomies and ontologies over time can create more complexity as needed to support a variety of use cases. Complexity sounds like a negative, but the world is complex, and semantic models are meant to represent organizational domains, which are by necessity complex. Complex semantic models support a variety of use cases, even if they do take more conscientious planning, development, and governance. Within these complex models are fit-for-purpose structures addressing use cases.

As with AI processes, developing, managing, and governing metadata in all its forms involves humans in the loop. Even as the identification, extraction, and application of metadata improves with AI, humans need to be involved in the process to add, remove, and quality check automatically applied metadata. As pipeline processes improve, reaching a specified threshold of metadata accuracy may reduce the need for human intervention and review.

Context and Trust

If I had to boil the conference down to two keywords–or, maybe, if I could only apply two metadata tags to the conference–they would be context and trust. Data and content requires context and semantic models are one way to provide this context whether for use in machine learning pipelines or direct human interaction with content.

Taxonomy Blues

https://pixabay.com/photos/background-blue-close-up-craft-3628553/

“I got to keep movin’, I got to keep movin’/Blues fallin’ down like hail, blues fallin’ down like hail/Hmm-mmm, blues fallin’ down like hail, blues fallin’ down like hail.” – Robert Johnson, Hellhound on My Trail

I was laid off recently from a taxonomy position I very much enjoyed. Rather than wait for the smart to wear off, the day after, I sat with a not insignificant IBU IPA and doubled down on the mixed emotions rattling through me to expound on some common issues I see in the work world of taxonomists. Many of these challenges I’ve dealt with in my blogs in one form or another in the past, but the day after a layoff hits a little differently when there’s a bruised ego and a true feeling of loss at play.

Like emails sent in the heat of the moment, posting a blog to social media hot on the heels of a personally emotional layoff is probably not the best idea. I’ve had nearly some time to let reality set in and revisit this writing. Surprisingly, I didn’t have to alter much of the content.

It’s Just Business

Layoffs happen. If you think your company loves you, you may have never been laid off. If it has never happened to you, then I am genuinely happy that you’ve not gone through the experience. Personally, I think it’s ok to believe in your company, drink the Kool-Aid (actually, it was grape Flavor Aid), and embrace their mission, goals, and strategy. I also think it’s natural and in the interest of self-preservation to recognize that any company will unceremoniously dispose of you when necessary. Their commitment to you will never match your commitment to their goals. Short of creating your own company and working for yourself building something you strongly believe in, this is always going to be the case. I enjoyed my company, believed in their mission, but also wasn’t one bit surprised when I got laid off.

From a detached, objective position, layoffs are as much a part of doing business as hiring when times are good. Layoffs can be triggered by downturns in which an organization’s revenue drops enough to merit reducing headcount. They can be triggered by well-meaning attempts at reducing redundancy and bloat. They can also be triggered by poor strategic decisions. Whatever spurs the layoffs, they are not always conducted in a strategic and thoughtful manner. Or, perhaps, there is a thoughtful strategy, but not one that will clearly bring about success.

No matter the impetus for a layoff, in my experience they disproportionately affect contractors and, because of what I can see immediately around me, taxonomists. There’s often an overlap between the two groups. When staff is augmented with consultant, freelance, or contract taxonomists, expect those people to be higher on the list when it comes to reducing headcount. The business likely doesn’t understand the role a taxonomist plays or minimizes the skill set as something anyone can do. As a seasoned taxonomist with years of consulting engagements behind me, I can tell you not everyone can just be a taxonomist. Like any proficient role, taxonomists bring unique organizational and research skills to bear. Shifting this work to the technology organization or a business domain in the enterprise is a misappropriation of work.

You’re a Taxo-what-now?

From years in the industry and still having not yet quite perfected my elevator pitch explaining my job, I can tell you taxonomy is not well understood. Cue the taxidermy jokes, financial tax questions, and, if you had the “ontologist” title, interrogations about whether you know anything about cancer. Speaking metaphorically, I do, and that is the metastasizing misunderstanding of what taxonomy, ontology, and semantic technologies bring to the table. Hence, taxonomists are not just laid off singly, but en masse, eradicating entire capabilities from organizations which likely had a long and painful path to establishing an enterprise taxonomy capability in the first place. With one swift slice of the oncologist’s scalpel, an entire function is excised with no idea of how to replace the missing connective tissue.

I’m sure many people think their job is extremely important, if for no other reason than it keeps one motivated to show up every morning. As part of justifying your work to yourself, and, more importantly, to the chain of command above you, there needs to be definitive and clear expressions of why the work is too critical to eliminate. Expressing the necessity of taxonomy work is essential precisely because it is misunderstood. Taxonomy work is often seen as simply gathering terms and putting them in lists or hierarchies, but the deep work of information science is frequently unseen. An inexperienced, self-nominated taxonomist is going to be at a loss when confronted by a dedicated, commercial off-the-shelf taxonomy and ontology system backed by a standards-compliant RDF triple store. That leaves the amateur with two options: do “taxonomy” in a simpler, alternative tool or establish a taxonomy capability in the organization starting with hiring a trained taxonomist to build the taxonomies and lead the effort to evaluate and purchase a taxonomy and ontology management system.

Having worked in many organizations which have gone from zero to taxonomy capability, it is no small task and can take anywhere from months to years. Taxonomists can start as contractors building taxonomies in spreadsheets and then either transition to a full-time role him or herself or lead the effort to hire a full-time taxonomist and bring in a tool. Regardless, the effort it takes to convince upper management of the need for a taxonomy program and the long journey to making it an essential part of the business involves a tremendous amount of time and resources. Establishing such a program and cutting it demonstrates a lack of understanding, an irresponsible waste of company resources, and a phenomenal strategic error, especially in the rising tide of machine learning and generative AI.

Foundations

It is widely understood and communicated that clear, accurate foundational data is essential for a business to create meaningful analytics, support strategic decisions, and train machine learning models properly. Despite the monumental efforts corporations put into building and maintaining clean data, it’s not all that common to see it done well. Typically, the issue is years of legacy data, all created with good intentions but frequently in conflict across disparate systems, inaccurate due to the passage of time, or made obsolete by shifts in strategic direction. To use a worn expression, there is no magic bullet to solve this problem. Migrating all that data to a data lake is time-consuming and still results in redundant, conflicting data. Creating taxonomies, ontologies, and tying data together with semantic layers and knowledge graphs also assists with creating and building foundational data, but these methods too can result in disparities.

There is no simple plug-and-play solution, but pursuing multiple strategies and bringing them together is not impossible. Data lakes serve a purpose, just as taxonomies and ontologies do. They are not either/or solutions, but AND solutions: structured, relational data working with structured semantic data and both describing semi- and unstructured content across the organization.You can have one strategy without the other…but why? There are realistic barriers to pursuing multiple data strategies, including budgets, resources, data ownership, and governance. Barriers do not preclude building strong capabilities to address these different aspects. Completely eliminating one or more of these pillars makes it more difficult for the business to execute a clear and effective data strategy, only to return to rebuild that capability at a later date at a not insignificant effort.

I’ll Be Back

There is a bitter vengeance tale in my head that goes something like this: you laid me off and now I’m back as a consultant making money from you to fix what you broke. Cyborg, guns blazing, blasting all your crappy taxonomies straight to hell. Well, not very likely, but I always win this one in my head (aside: does anyone ever lose the self-righteous conversations in their head?). Yeah, ok, so maybe I’m not back and may never be. Maybe the story runs more like someone either “discovers” that taxonomy is useful or stumbles on the skeletal, ancient remains of a discarded taxonomy management system half-buried in the earth, sorely out of date, and filled with the eggs of xenomorphs. Some face-hugger plants the seed of understanding in the astronaut and they decide the organization needs to do taxonomy stuff. Maybe someone listens and they hire dedicated roles to do taxonomy stuff. Taxonomy stuff takes off, becomes seemingly essential, and then taxonomy stuff gets stuffed in a round of layoffs. Maybe taxonomists are all just Cylons.

Anyway, bitter, wounded emotions aside, eliminating an enterprise taxonomy capability does your organization a disservice. Taxonomy is foundational to well-structured, semantic, governed data. Taxonomy data should be feeding your website navigation and search, applied as metadata to content, data, and digital assets, and providing a semantic layer to your products to power personalized experiences and recommendation engines. I’ll say it bluntly: if you don’t get taxonomy, you’ll never get machine learning. Or, rather, your machine learning models will never be optimized. If you think your generative AI proof of concept will run without taxonomy, it will…at first. Then, when scaling is the next step, expect it to fall on its face. Large language models without the context of your domain, your organization, what makes you you—that is, what is modeled in taxonomies and ontologies—will give you the bland, contextless results that can only be delivered by models that don’t get who you are. Taxonomists get who you are, but, well, they’re gone.

I said bitter, wounded emotions aside. Scrap that. I am very passionate and emotional about quality data. Nerd or no nerd, this is true. And your organizational truth is going to suffer without the expertise a semantic expert can deliver. You might say this is shameless self-promotion in search of the next gig, and you might be right. What is also right is that whether I’m the taxonomist hero who saves your disintegrating semantics or it’s another capable taxonomist, I’ll applaud the result. Because truth in data. Because waste and redundancy. Because efficiency. Because user experience. Because…

I’m not going to skewer the company that laid me off and dissect what I see as their poor strategic decisions. And, honestly, I don’t know what their strategy is or will be going forward. But, I will say this: I think it was a mistake. Not for me, not for the team I really admired and enjoyed working with, but for the greater strategy of the organization. Revenge tales aside, the company is going to feel the lack of governed, semantic data built by seasoned, professional taxonomists. There are people who remain in the organization who will carry the torch, but they’ve been hobbled by indiscriminate layoffs subjected to, unfortunately, a misguided data strategy. Maybe there are other options the company will pursue to fill the gap. Maybe taxonomy will come back someday when the stock prices are more favorable, but, in the meantime, no decent data strategy is complete without semantics.

I’ll close with a call to action to taxonomists. You already know how difficult it can be to build and maintain a taxonomy capability in your organization. Once established, make it essential. Make it foundational to data work wherever it happens. Integrate the taxonomy system into important, enterprise-wide data systems and strategies. No job is impervious to layoffs, but cementing the capability will, hopefully, help you avoid the taxonomy blues.

Friction and Complexity

https://pixabay.com/illustrations/spiral-fractal-template-abstract-1167623/

“Without culture there can be no growth; without exertion, no acquisition; without friction, no polish; without labor, no knowledge; without action, no progress and without conflict, no victory.” – Frederick Douglass, Self-Made Men

I attended the three day Information Architecture (IA) Conference in Philadelphia, Pennsylvania May 1st through 3rd. Over the course of the event, speakers covered numerous IA and UX topics. Of the many topics, two stood out to me both in frequency and in potential personal and professional application. Let’s consider friction and complexity through the lens of taxonomy and ontology practice.

Friction

Friction is an interesting phenomenon in that it is both to be avoided and embraced. Friction is undesirable in machinery, requiring lubricants to keep engines running and avoid heat buildup. Friction in relationships is unpleasant and is avoided through tactful word choice and conversation topic selection. The friction of a space capsule entering the earth’s atmosphere can incinerate it and friction on the rink can stop a skater in his or her tracks. On the other hand, friction between rubber tires and the surface of a road is what keeps motor vehicles from spinning in circles in wet or icy conditions. Friction between the head of match and the striker produces the heat necessary to ignite a flame. Friction in negotiations can potentially lead to a better end solution than either party would have thought of in an uncontested proposition.

User experience (UX) friction is typically to be avoided because it can prevent the user from achieving their desired outcome, whether that be finding information or completing a purchase. The goal of a UX designer is to minimize friction so the user can achieve his or her goals without frustration or task abandonment. Is there good friction in a user experience? In some cases, friction slows the user’s actions for a more thoughtful and productive outcome. Friction may help a user from going down an incorrect navigational path or line of inquiry resulting in unexpected or inaccurate results. The friction is built in to avoid more friction. As one of the speakers pointed out, friction can be a hindrance, but, applied correctly, it can be a help. Specifically, people need friction in order to learn.

The topic of friction resonates with me as a taxonomist because taxonomy needs to be thoughtful and deliberately slowed to maximize the long-term intent and outcomes. Frictionless keyword and tag creation seems like a fantastic idea in the democratization of making information findable. However, as tag clouds, folksonomies, and cumbersome repositories of user-generated tags can attest to, a little friction can prevent a lot of garbage. The nature of taxonomy work is one of calculated friction in the form of strict governance and maintenance processes. Users are not allowed to enter concepts directly into taxonomies and, if they are, the taxonomist’s job is to research the concept thoroughly before adding it permanently to the semantic structure. Governed friction slows the taxonomy creation process to ensure it is accurate, consistent, and serves as a definitive source of information for the organization.

Coincidentally, the topic of friction came up as a common theme in both our work when my wife was describing building redundancies into safety procedures. In her work as a safety specialist, she described the need for two safety hatches when closing an underground water tunnel for maintenance. A single hatch can fail and the workers could drown. With two hatches, one can fail, flood a chamber between the failed hatch and the next, and still allow time for the workers to get to safety while working behind the second hatch. A double safety redundancy is a form of friction. The obstacle between one potential point of failure and another creates a necessary friction which can save lives.

In taxonomy work, there are probably not as many examples in which the wrong taxonomy term can put lives at risk, though they do exist. Wrongly tagged content offered as medical advice can cause risk to a patient. Incorrectly identified drawings based on metadata can result in a manufacturing error. Machine learning models offering suggestions or identifying criminals based on training data tagged from taxonomies must be correct. For many of us creating taxonomies, there may not be the risk of lost lives, but there is probably some kind of risk to a company’s legal compliance or reputation.

In one of the presentations at the IA Conference, the speaker noted that the more friction there is in a system, the less people are likely to share information. We can see this in what can be characterized as the oversharing nature of social media. It is extremely easy to post text, pictures, and videos in any emotional (or inebriated) state of mind, resulting in an outpouring of information which should not necessarily be shared. The lack of friction makes it easy to share information, for better or for worse. Similarly, there is risk of creating too much friction in the taxonomy governance process, resulting in a reluctance to contribute to or use taxonomies. Users may seek out other options as a workaround; less friction, but inherently less quality. It is precisely this balance which must be discovered within an organization: how much friction will end users tolerate without abandoning the use of taxonomy altogether?

Complexity

I believe people gravitate to either/or solutions and black and white thinking because it reduces complexity. While reducing complexity to simple binaries helps us to understand and categorize the world more easily, it also creates a false embodiment of complex systems. Like friction, reducing complexity in user experience is generally a guiding principle. Finding and displaying information should not be complex. The user should be able to find what they need and understand the results quickly…and if not quickly, then efficiently. However, avoiding or obfuscating complexity can have unintended consequences. In one conference session, the speakers explored the use of storytelling to build awareness about complexity, specifically citing the work of Dave Snowden. Storytelling can be used to clarify the information coming from systems. Reducing complex problems to simpler components in order to better analyze or solve them is a good tactic, but there is risk if those components are analyzed or presented as isolated and independent rather than a part of a more complex and nuanced whole. 

My own presentation at the IA Conference also dealt with complexity (based on a blog I wrote on this topic, Specialty Skills and the Risks of Hidden Complexity), specifically the potential complexity of enterprise taxonomies and ontologies and how they can be displayed and consumed to maximum effect without overwhelming end users. Like having too much or too little friction, showing or hiding too much complexity in the form of taxonomies and ontologies can have several risks.

First, the user may not understand what is being presented and the intent behind it. Building taxonomies and ontologies can seem like an esoteric, academic approach to information management when users have to understand why taxonomies are structured the way they are, why they have strict governance procedures, and why complex semantic structures are useful. When all a user wants is a dropdown to select a few metadata values, providing a full set of faceted taxonomies can appear to be an overly engineered solution to a simple problem. Likewise, since fewer people in an organization have backgrounds in information science, the time spent carefully curating semantic models can seem to be a hindrance rather than a help, slowing the path to a deliverable project.

Second, users can be overwhelmed by the sheer scope of enterprise taxonomy models, especially as they mature to cover multiple domains within an organization. For example, if a user needs to apply metadata to data or content, a typeahead dropdown pulling from all possible concepts in a taxonomy will likely be too much for the user to select from. A consequence of this is that with more tagging options, there will be more tagging inconsistency. A complex set of taxonomies bound by a semantically rich ontology will benefit the organization, but presenting everything everywhere all at once will be a frustrating experience.

Finally, hiding complexity to make it easier for end users can have unintended consequences. Taxonomy work can seem simplistic, easy to do, and within reach of any business user. Being too good at hiding semantic complexity can result in users taking up the work themselves, leading to disparate efforts and data sources. Additionally, without a coordinated and governed effort, the “simplicity” of taxonomy work typically results in products which aren’t much better than user-generated keywords, filled with misspellings, variant concepts describing the same thing, a mix of full concepts and acronyms, and a host of other inconsistencies. By hiding complexity, non-practitioners may believe taxonomy construction is easy and quick to complete.

Like calculated friction, it may be beneficial to find a path toward calculated complexity. Determine which user experiences require simple dropdown selections and which may benefit from displaying the entire taxonomy framework, either as a hierarchy or as a visualized graph showing all components of the structure. Somewhere between these two lies a spectrum of user experience options combining simple flat lists, hierarchies, tabbed information displays, graphical representations, hover menus, and a plethora of other displays which are fit-for-purpose to the user goals. Revealing complexity, where appropriate, can expose the range of domain coverage, but also the possibilities that semantic models can offer.

Friction and Complexity

Communicating the value and guiding principles of semantic models and their supporting technologies can be extremely difficult in an enterprise. There are many ways to solve business problems and making the case for what semantic technologies can do, the use cases to which they can be applied, and the advantages they bring can be challenging for taxonomists and ontologists to convey. Considering the nuanced balance between friction and flow, complexity and simplicity, can have application to user experiences and the way we present taxonomies and ontologies to end users. In turn, thoughtful presentations providing the correct amount of friction and complexity can influence adoption across the organization.

Nobody’s Alt but Mine

https://pixabay.com/vectors/creative-commons-derivatives-785336/

“A good ruler has to learn his world’s language, and that’s different for every world, the language you don’t hear just with your ears.” – Frank Herbert, Dune

Synonymy

Synonyms are “one of two or more words or expressions of the same language that have the same or nearly the same meaning in some or all senses” (Merriam-Webster). We can define synonyms when building taxonomies using the skos:altLabel field from semantic standards created by the W3C. The W3C defines skos:altLabel as “an alternative lexical label for a resource”, which, using the same words in the definition as in the label, isn’t as helpful as their example: “acronyms, abbreviations, spelling variants, and irregular plural/singular forms may be included among the alternative labels for a concept. Mis-spelled terms are normally included as hidden labels (see skos:hiddenLabel)” (W3C).

Synonymous concepts are part of what make taxonomies complete, clear, and demonstrate a consideration for each selected concept distinguishing between the chosen preferred label, the possible other languages for that preferred label, and other possible options which can be captured in the altLabel field. English may or may not have the most synonyms, but it certainly has many:

“The richness of the English vocabulary and the wealth of available synonyms means that English speakers can often draw shades of distinction unavailable to non-English speakers. Modern English has an unusually large number of synonyms or near-synonyms, mainly because of the influence of very different language groups: Germanic (Anglo-Saxon and Old Norse, the main basis of English), Romance languages (Latin, French), and Greek. There are many sets of triplet synonyms from Anglo-Saxon/Latin/Greek and also Anglo-Saxon/Norman French/Latin-Greek like cool-calm-collected and foretell-predict-prophesy” (Why other languages don’t use thesauruses like we do).

Synonymy is not as cut and dried as it would seem. Let’s consider the use of the skos:altLabel in practice.

ALTruism

According to the W3C, the altLabel field can, by definition, include acronyms, abbreviations, spelling variants, and even irregular plural or singular forms. If we use the “out of the box” set of SKOS labels, it simplifies the structural encoding of taxonomies and reduces the number of properties we need to maintain and manage. However, just as synonyms aren’t purely objective, there are a few disadvantages and risks involved with only using the skos:altLabel. First, the nuanced differences between acronyms, abbreviations, and spelling variants is lost. Second, the ability to separate these nuanced differences in systems consuming altLabels from a taxonomy and ontology management system becomes difficult, if not impossible. Finally, taxonomy consumers or viewers may or may not know what an altLabel is, what it includes, or why the values they are seeing in that field seem to differ.

Defining separate, clearly named taxonomy properties defined by the skos:altLabel type may help to solve this problem. Let’s consider a few examples.

Synonyms

Using the skos:altLabel field type but naming it synonym can make it clear for end users what values to expect. A field like this could include “true” synonyms like “shoelace” and “shoestring” and synonyms which an organization defines for their domain, like “footwear” and “shoes”. These are not truly synonymous, but for an organization which sells footwear of all types, they might be.

Acronyms

Acronyms may be a clearer example since an acronym is more narrowly defined. Even here, it depends on the taxonomy use case, altruistically considering the audience. One organization may use the preferred label “deoxyribonucleic acid” while another organization may use the preferred label “DNA”. In either case, defining shades of skos:altLabel can help. In one instance, “deoxyribonucleic acid”” is the preferred term while “DNA” is included in the acronym field. In the other instance, “DNA” is the preferred label while “deoxyribonucleic acid” is in the synonym field. Using fields in this way very clearly expresses the organization’s viewpoint on how they approach the domain.

Abbreviations

Using skos:altLabel as an abbreviation field also sets expectations for what is included. I live in California, and when I abbreviate the state name, I use “CA”. I remember, however, using longer state name abbreviations when I was younger (which is interesting, because the U.S. Post Office officially switched to two letters in 1963…and, well, I’m a little younger than that). In this case, we could include all of the historical abbreviations “Cal.”, “Calif.”, and “CALIF” in the abbreviations field.

Historical Label

If we want to add even more nuance in naming a skos:altLabel field, we could include something like historical label, more or less paralleling the skos:historyNote field. Using another California example, the historical label field could include “Oakland Raiders” and “Los Angeles Raiders” for the team “Las Vegas Raiders”. These terms are really synonymous, but we’ve added another layer of clarity by adding a time element. If a taxonomy and ontology system supports adding metadata to properties and relationships, you could use the synonym field and add date ranges for when the team was active with each name: “Oakland Raiders” (1960-1981, 1995-2019) and “Los Angeles Raiders” (1982-1994) with the option to add an open date range to “Las Vegas Raiders” or no date range at all.

Spelling Variants

The use of skos:altLabel for spelling variants is dependent on your use case, in my opinion. For taxonomies supporting search, it may be valuable to define a field called spelling variant. Likewise, despite W3C’s recommendation, there could be another field called misspellings. However, in practical application, maintaining spelling variants and misspellings is the work of the front-end search system, not the taxonomy. How you architect your taxonomy to support search may be a matter of your organization’s technical environment, so do what’s best for your use case.

Another Word for Thesaurus

You’ve probably heard the old joke, “What’s another word for thesaurus?” Well, the slippery nature of language is on display in that the Merriam-Webster online dictionary provides seven “Synonyms & Similar Words” for “thesaurus”. So, the answer is: dictionary, gloss, glossary, lexicon, nomenclature, vocabulary, or workbook. All of these, arguably, are not synonyms for the word “thesaurus”.
I’m not going to get into homonyms, homographs, or the use of the skos:closeMatch or skos:exactMatch relationships in this blog because, although all are similar concepts, they are not synonyms. What I will leave you with is adhering to the W3C standard is a good practice in taxonomy and ontology design, but creating nuanced flavors of the skos:altLabel might prove to be a useful alternative for your taxonomy audiences.

The Path to Taxonomist

https://pixabay.com/illustrations/maze-labyrinth-solution-lost-1804499/

“And you may ask yourself, “Well, how did I get here?” – The Talking Heads, Once in a Lifetime

When people ask me what I do for a living, I usually say something like “information management” because it’s an easy way to simplify the explanation of my work. If people dig deeper, they may ask me, first, what is a taxonomist? Then, what does a taxonomist do? And, if we get through those questions, the next is usually, how did you become a taxonomist?

How does one become a taxonomist? Let me share with you my path to becoming a taxonomist through a combination of chronology, and, of course, categories. I’ll also sprinkle in what I think is also a more common path to getting a role as a taxonomist currently.

Background

If you asked me what I wanted to be when I was young, I probably would have said I wanted to be a writer. Even in high school, I knew that one did not simply become a writer. There was starving and sacrifice to be done, and I am cursed with a little too much pragmatism to go through all that.

My first decision which led to a life of taxonomy was determining the best way to become a writer was to embrace what I loved and teach it. As we all know, teachers have plenty of leisure time and summers off to write their to-be-famous tomes. Well, it turns out that’s not what happens when you are a teacher, which I quickly learned when I was a teacher myself. I went to Eastern Michigan University in Ypsilanti, Michigan and got a Bachelor of Arts in English Literature and Language. I also became certified to teach in the Secondary Education program, adding an entire year to my undergraduate studies. I minored in history. The dual program allowed me to take courses in literature, composition, linguistics, and in my history minor. All of those courses turned out to be important later.

After teaching for a few years at the high school level after graduation, I was driven to get my Masters after a rather unsuccessful year as a middle school teacher. I graduated with a Masters degree in English Literature from the State University of New York at Stony Brook, a school known for…well, not English literature. The content of my degree soon became irrelevant, as the important takeaway was having a masters with even more literature and linguistics classes as well as a good dose of literary theory.

With my Masters, it was difficult for me to find full-time employment as a teacher in the New York area. The available positions for an English teacher were limited, the best school districts weren’t often hiring, and the worst only reminded me of what I hadn’t done well as a middle school teacher. After a few years of cobbling together several adjunct teaching positions at several different community colleges, I decided living on the economic edge just wasn’t for me anymore. I wanted some stability, and I was already seeing teaching wasn’t going to remain my passion for many more years.

My decision to leave teaching was as much about the pragmatic ability to have a full-time job with benefits as it was a recognition I would not be fulfilled with a career in education. While seeking a new career path, I applied for a job whose description I didn’t even understand at the Modern Language Association International Bibliography. In retrospect, I probably shouldn’t have mentioned in the interview that I didn’t understand what an Associate Thesaurus Editor did, but I somehow got the job anyway. Later, I was told I was hired because of my diverse background in language, literature, history, and education. Since the MLA indexed articles in all of those humanities areas, having knowledge and experience in them was helpful for reviewing content. In addition, I don’t believe they hired anyone without a Masters degree, so having an MA probably led directly to getting that role.

Everything I ever needed to learn about controlled vocabularies I learned at the MLA. It provided me with a firm foundation in the world of controlled vocabularies and indexing. The more I learned there, the more I realized the timing couldn’t have been better.

Timing (Luck)

I got my job at the MLA in 2002. In 2005, Gartner released its annual Hype Cycle with Taxonomy sitting at the Peak of Inflated Expectations on the curve. Businesses were clamoring for taxonomies. They didn’t really understand what they were or what they did, but they knew they had to have one.

Not only was I learning what it takes to maintain and govern a thesaurus of more than 56,000 terms in a flat list managed in an Access database, I was also making coincidental connections with people in market research and special libraries. The more I understood what they did, the more I began to understand the role of information in the corporate world. The corporate world, mind you, I knew nothing about with my two academic degrees and non-profit academic job. The intersection of my understanding of controlled vocabularies, learning more about market research content, and researching what taxonomists and special librarians did led me to understand that to grow in my career, I would have to make the jump into the business environment.

Strategy (Timing)

I often tell people I couldn’t have planned my career better if I had planned it. The fact is, I didn’t know I would be a good taxonomist until I became one. Sure, I had always been very organized (perhaps borderline OCD), lining up my Matchbox toy cars alternately by color or size as a child. I had also been interested in many different topics, but I never knew there was such a job as working with language that wasn’t linguistics, writing, or teaching some combination of those fields.

I often put more emphasis on my background and timing than on my ability to think strategically, but I did in fact have ideas and insights that proved to be true over time. So, in hindsight, I made some very strategic decisions that just so happened to work out.

The first was contacting people in the world of controlled vocabularies to understand better what they did, how they got there, and what I could do to advance my career in the field. I spoke to two people from the Getty Research Institute about their work on the Getty Vocabularies and how people got into those roles. I also spoke to a well-known taxonomy consultant who, when I told him what I was doing and asked what I should do next, replied, “Be a consultant.”

Back to timing. Since organizations knew they needed taxonomies but not what for or how to build them, they often hired consultancies to show them the way. Before there were taxonomy jobs at many of these companies, there were consultants telling them they needed to create a taxonomy job and hire a taxonomist. I started a two-pronged search for taxonomist or librarian roles and taxonomy consulting jobs. Traditionally, most taxonomists have a Masters in Library and Information Science. It is in this course of study they learn about taxonomies and then choose a path of becoming a librarian, becoming a taxonomist, or becoming a librarian who later becomes a taxonomist. Without an MLIS and no business background, I wasn’t having luck inserting myself into librarian jobs, especially corporate librarian roles.

Consulting companies, however, are always looking for fresh graduates to hire on as junior consultants. Despite being almost ten years out of my undergraduate degree, I did have enough on my resume to work in taxonomies, a niche and uncommon skill set for anyone looking for jobs in the corporate world. I landed a job at BearingPoint, a now defunct consulting company with a great reputation and a strong team in content and information analysis, organization, and discovery. Despite coming in at a junior level, the overall compensation package seemed incredible after years in education and non-profit. I was probably severely lowballed, but it was more than I had been making and gave me several years of additional background in content and records management, search, business analysis, process mapping, user experience design, and a host of other tasks thrown at junior consultants with an explanation something along the lines of, “Here, go become an expert in this.” And, of course, there was the taxonomy for business work that I had been seeking.

So, at the intersection of timing and strategy, I launched a career. I went on to work at several more consulting roles–some for others and one for myself during the economic downturn in 2008–in-house taxonomy roles, and once for a taxonomy and ontology management software vendor. The perfect trifecta: consultant, in-house practitioner, and vendor. With every passing year after breaking into the industry with my first thesaurus job, I learned more and was able to apply those learnings, with varying degrees of success, at the next job.

Takeaways

I believe we are at a moment as pivotal for taxonomy work, if not moreso, than the peak of taxonomy expectations back in 2005. In a field in which the Semantic Web has mostly failed to deliver, artificial intelligence and machine learning models have filled some of the gaps in automating and connecting existing content. What is widely accepted is that these technologies need clean data and taxonomies and ontologies provide foundational structures for classifying and connecting content. They are the semantic underpinnings that should have been the publically accessible and connected Semantic Web. I believe, now more than ever, experts in taxonomies and ontologies are necessary and valuable roles in an organization.

I’ve already laid out a path to becoming a taxonomist that didn’t require an MLIS, getting on-the-job training at an important moment in taxonomy evolution as a required business need. Aside from my initial Masters, which I got in order to get full-time teaching jobs at the college level, I never had to return to school to get the additional skills I needed to become a taxonomist. I learned in my work roles and researched information on my own, with a few certificate courses along the way to round out my education.

Will this still work today? Possibly, but getting your first taxonomy job without an MLIS is probably more difficult now than in the past. I do believe, though, the skills remain about the same as they were over 20 years ago. Therefore, you may find that you have the right skills and knowledge through a combination of your degree and work experience, as I did.

Things you need to know and skills you need to have:

  1. How to do research…in any field. The ability to research is inherent to the core tasks of defining preferred labels and discovering concept definitions.
  2. The ability to interact with others and negotiate. Taxonomy work is only enabled through work with other business and technical colleagues. It is often necessary to convince people of the value of taxonomies and there are often compromises to be made.
  3. Taxonomy and ontology best practices following standards and guidelines like the ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, ISO 25964-1:201, W3C SKOS, W3C RDF, W3C OWL, and any industry-specific standards and available taxonomies and ontologies applicable to your area.
  4. Taxonomy and ontology management software vendors and their capabilities, like Access Innovations (Data Harmony), Graphifi (Graphologi), Progress (Semaphore), Stanford’s Protégé, Synaptica (Graphite), TopQuadrant (TopBraid EDG), and others which may be tangential to the functionalities of dedicated software platforms. Understanding how these systems work, even basically, informs how taxonomies are built and delivered to the business.

Things you may need to know and skills you may need to have:

  1. A background in library science (an MLIS degree), English (or other literature), or linguistics.
  2. Project work in text analytics, search, content, digital asset, and/or records management, user experience, or other fields or applications in which taxonomy features.
  3. Decent spreadsheet skills. I’ve never met a taxonomist who didn’t start their taxonomies in spreadsheets.
  4. Knowledge of triplestore graph databases on the market, including AllegroGraph, Amazon Neptune, OntoText GraphDB, Stardog, Neo4j, and others.
  5. Awareness of and interaction with taxonomy practitioners and consultancies, including Dovecot Studio, Enterprise Knowledge, Semantic Arts, Straits Knowledge, Taxonomy Strategies, taxonomy and ontology management software vendors, and others.

Things you really don’t need to know or have the skills to do yourself:

  1. Speak more than one language. If working in the United States, the primary language of an organization is probably English. If you are doing multilingual taxonomy work, it may help to speak one or more languages, but many organizations use translation services. Of course, an understanding of other languages helps because taxonomy is all about words.
  2. Programming and formatting languages, including XML, JSON, SPARQL, and SHACL. You may need to write these yourself, but there are frequently vendor or in-house resources to do this if you’ve already gone far enough to need them.
  3. How machine learning models work. There are people who do this as their core competency. A taxonomist isn’t expected to know how these models work under the covers, but it does help to have an understanding of how models will use taxonomies.

This is not an exhaustive list and these are my opinions after working in taxonomy for over 20 years. People, companies, or products not mentioned here is likely an oversight and does not indicate a negative opinion or lack of endorsement. I also didn’t provide a list of excellent taxonomy resources, including websites and books, because there are so many and they are already pretty well documented. A few quick searches should set you on the right path to find your own resources.

Having frequently been asked how I became a taxonomist and what I had to do, know, or learn to become one, I hope this blog helps others on their path!

KMWorld Themes and Trends

https://pixabay.com/illustrations/cyborg-hand-human-robot-contact-9091910/

“At a certain level of machine-ness, of immersion in virtual machinery, there is no more distinction between man and machine.” – Jean Baudrillard, Violence of the Virtual and Integral Reality

I attended the KMWorld and co-located conferences held in Washington, D.C., November 18th – 21st. Across the conferences and sessions that I attended and the conversations between sessions, there were several themes which resonated. I’ll detail some of those themes and highlight any talks in particular I thought inspired or captured those trends. I couldn’t attend everything, so please forgive me if I’ve overlooked great talks, of which there were many.

Taxonomies as an Enterprise Strategy

The importance of taxonomies and ontologies as a foundational business and technical program was particularly evident at Taxonomy Boot Camp, but the theme also came up in talks in the other conferences as well. I presented on the topic in my session, “Stand Still Like the Hummingbird: Enterprise Taxonomy Strategy When Nothing Stands Still”. Likewise, Thomas Stilling in his Monday keynote, “Be the Change: Your Taxonomy Expertise Can Help Drive Organizational Transformation”, emphasized metadata as an organizational strategic asset and discussed how to position taxonomy to align with strategic objectives. Similarly, Lindsay Pettai’s session, “Empowering Your Enterprise With a Dynamic Taxonomy Program”, discussed the importance of having an enterprise taxonomy program.

These are just a few examples. Throughout the conferences, the notion that taxonomies and ontologies provide structured, source of truth values for navigation, search, and content tagging was prevalent. The main theme, however, was how taxonomies and ontologies are critical to the success of machine learning (ML) model training and program success. In talk after talk, artificial intelligence (AI) and ML were top topics and various ways to approach projects to make them viable and successful included what needed to be in place. Taxonomies and ontologies were among those foundational requirements.

As a 16-year veteran of KMWorld associated conferences–my first being Enterprise Search Summit West in 2008–I have seen the more recent embrace and understanding of enterprise taxonomies and ontologies beyond simple use cases like navigation. Even just a few years ago, taxonomies seemed to be misunderstood or undervalued by many conference attendees. Now, talk of their use as enablers of AI was ubiquitous.

Artificial Intelligence Is Here to Stay

I don’t believe I saw a talk which didn’t include the keywords “artificial intelligence”, “machine learning”, “AI”, or “ML”. I was unable to attend KMWorld last year, but in years past, any conversation on the role of AI/ML was frequently met with scoffs, eyerolls, or fear. In KMWorld’s Thursday Keynote, KM, Experts & AI: Learning From KM Leaders, Kim Glover and Cindy Hubert captured something important when they said that AI is driving a lot of emotions. Some of these emotions are excitement, skepticism, and fear. Addressing these emotions is going to be essential in addressing the technologies. 

The overarching theme is that AI/ML is here to stay, so what are you going to do about it? Continue to reject it and fall behind the curve? Embrace it without question? Embracing ML models, including LLMs, with guidance and guardrails seemed to be the message most were conveying. While many organizations are still in the proof of concept (PoC) project stage, adoption seems to have already arrived or is pending, so get ready to embrace the change to be successful.

The exponential growth in ML has been fueled by cheaper storage, faster technologies, more robust and accessible models such as ChatGPT, and built-in technologies like Microsoft Copilot. Not only are these tools more accessible, they are more accurate than they have ever been in the past…for the right use cases. And this was another big takeaway: AI/ML is here to stay, but use these technologies for the right use cases. Document summarization and generative AI text generation are big winners while tools used for critical decision making are still improving and must be approached with caution.

Human-in-the-Loop, Cyborg, or Sigh…Borg?

Overwhelmingly across sessions, the consensus was the importance of the human in the use of AI/ML technologies. For both the input and the output, to avoid garbage in and garbage out, human beings need to curate and review ML training sets and the information that an ML model outputs. As noted above, document summarization may be low risk depending on the industry, but decision-making in areas like law and medicine are high-risk and require humans-in-the-loop.

In his session, Evolving KM: AI, Agile, & Answers, the inestimable Dave Snowden discussed storytelling–or, rather, talked about storytelling through the art of storytelling–as an intimately human activity. He noted the deeply contextual nature of human knowledge and the need for knowledge management to generate knowledge rather than store it as codified data. If knowledge can not be fully codified as data, therefore, it is difficult or impossible to transfer that level of human knowing into machines and their models. The connectivity to knowledge and deep metaphor was evident in my three takeaways from his session: his notion of a tri-opticon integrating knowledge between three areas of an organization and the connection to Welsh Triads; forming teams or roles of threes, fives, or sevens, Welsh Rugby, and the number of people on a rugby sevens team; and assemblages, mycorrhizae, and rhizomatic networks in Deleuze and Guattari’s, “A Thousand Plateaus”. These kinds of connections are weak use cases for machine learning and emphasize the importance of human knowledge and inclusion in AI technologies.

And, speaking of human knowledge, of course knowledge retention and transfer were key topics in an era when both age and work-from-home job mobility are creating more job turnover than ever. While human knowledge is difficult to capture, using AI agents to capture knowledge and ML models to parse textual information will assist in ramping up the sheer scale and increasing pace necessary to retain and transfer knowledge.

The human-in-the-loop, the cyborg, and, sigh…Borg, are all going to rely on knowledge and ethics in order to create a human-machine interactive paradigm. If humans don’t use our own ethical boundaries to curate machine content, then bad data, and bad ethics, will spread throughout our information systems.

While there was some fear and loathing in D.C., there was a stronger current of hope, optimism, and curiosity in the many ways we can use taxonomies and ontologies, AI/ML, and human knowledge management together to guide us toward a brighter future of technology use.

Specialty Skills and the Risks of Hidden Complexity

https://pixabay.com/photos/statue-camouflage-hidden-discreet-2393168/

The Cube is, at the same time, a symbol of simplicity and complexity.” – Ernő Rubik

Taxonomists and ontologists–sometimes separate but more often one and the same role I’ll simply call a “taxonomist”–are frequently underappreciated for their work in modeling and representing complex domains. Taxonomists do not simply suggest term forms and put them in a list; they research and consider each concept, carefully map out its relationships to other concepts, and consider how new additions and changes impact the overall semantic model. The nature of taxonomy work is often misunderstood, or, worse, trivialized as work that anyone in the organization might perform. Several expert taxonomists discussed this topic as a pain point in communicating the value of taxonomy at the Henry Stewart Semantic Data 2024 Conference in New York this week.

Taxonomies and ontologies need to be clear, transparent, and wholly visible and available where appropriate. Serving different use cases drives user experience decisions which lean toward simplifying the display of complex semantic models so users can more easily understand and use them. While hiding complexity is a service for end users, it can communicate the result of taxonomy work as simplified dropdown lists, shallow hierarchical structures, or simple concept values without additional properties and relationships. The complexity of semantic models, including the taxonomist’s work in building them, can be hidden and thus misunderstood.

Hidden Work

Semantic models are frequently hidden from users due to their complexity and necessarily strict governance models. The overall semantic structure, potentially composed of many taxonomies connected by robust ontologies, is not always available for the end user to view as a whole or to interact with directly. Hiding the complexity of semantic models can simplify user experiences, but they can also present fractured, contextless concept values. Worse, the art and science of semantic modeling conducted by highly skilled and knowledgeable taxonomists is also hidden…or at least potentially misrepresented.

The implications of this hidden work are manyfold. Because the skilled work of taxonomists is simplified, so too does the portrayal of semantic work seem simple. The result is inexperienced employees taking on taxonomy and ontology modeling tasks, potentially in parallel to ongoing enterprise taxonomy efforts. The fragmentation of taxonomy work in turn impacts the success of the overall taxonomy program as an enterprise strategy.

Additionally, the value of taxonomy work in supporting a variety of use cases is also undercut. Taxonomy is reduced to simple hierarchies to use in navigation or flat lists used in content tagging. The true value of complex, interrelated concepts with additional property metadata is lost. End users, or potential end users, don’t know the art of the possible or even what to ask for from enterprise taxonomists.

Exposing Semantic Models in the UI

Systems consuming taxonomy values are an abstracted layer, presenting concepts in navigational structures, as typeahead values in search boxes, or as limited flat lists for tagging content or assets. Because of this, the additional value of ontological relationships and property metadata is stripped away. Of course, designing user experiences to meet the use case and deciding when it is appropriate to display taxonomy values, how many, what type, and what is possible are all aimed at organizational efficiency and clarity. Sometimes too much is just too much.

However, consideration is not always given to what else from a semantic model may be displayed, and in what way, so that users get more contextual value from taxonomy concepts shown out of context. How much information associated to a taxonomy concept should be exposed and in what context? Concept definitions or synonyms might be valuable to expose to end users so they understand the label and what else it may represent. These could be displayed in a hover menu so they are available but unobtrusive.

Similarly, taxonomies need not always be displayed as flat lists or hierarchies. Taxonomies are natural candidates for graphical visualizations to display interconnected values and associated metadata. Many existing taxonomy and ontology tools include APIs or widgets meant to display taxonomies in just this way external to the tool. Publicly available visualization libraries can be used in custom, in-house built UIs to display and navigate graphs. A graphical user experience can display the complexity of semantic models, including hierarchical and associative relationships, while easing their presentation and navigation.

Taxonomies and ontologies are foundational components in the training and refining of machine learning models, including their use as a source for tagging machine learning training sets and as institutional reference for large language models. Graphical displays have the benefit of exposing the full scope of semantic models for more complex use cases like serving data to machine learning models. 

Another way to expose the full scope of semantic models is to provide partners with read-only access to taxonomy and ontology management systems. These tools typically have few editing licenses which should be reserved for members of the taxonomy team to maintain proper model governance. Read access, when available, should be reserved for business partners who understand, or are willing to be trained, how to navigate complicated platforms.

The Benefits

There are several benefits to getting creative about how and what to expose from semantic models.

The main benefit is discrete, visual communication and training of the complexity and possibilities of semantic models. Good UIs don’t simply provide a way of interacting with machines, they also educate and train the end user on common principles such as clicking on a button to perform an action, pinching in and out to view content, and, in this case, how to traverse hierarchies and graphs. From this, end users may come to their own realizations about how semantic models may be used in their projects.

Another benefit is increasing end user awareness and the visible profiles of the taxonomists and taxonomy program. Knowing how information is provided to consuming systems and who is responsible for creating, modeling, and maintaining that information provides a point of contact for end users who can field their questions. Since consumers don’t always know the full range of possible semantic model use cases, they are provided a path to get modeling advice. Taxonomists increase their visibility and can act as internal consultants on a range of organizational projects. In turn, their valuable skills are employed in many areas of information management, improving the overall health of information best practices.

In sum, don’t hide your taxonomists or their work. Get creative about how to expose what they do and the results of their professional skills.