Information Panopticon

Home » Articles posted by Ahren E. Lehnert (Page 2)

Author Archives: Ahren E. Lehnert

The Path to Taxonomist

https://pixabay.com/illustrations/maze-labyrinth-solution-lost-1804499/

“And you may ask yourself, “Well, how did I get here?” – The Talking Heads, Once in a Lifetime

When people ask me what I do for a living, I usually say something like “information management” because it’s an easy way to simplify the explanation of my work. If people dig deeper, they may ask me, first, what is a taxonomist? Then, what does a taxonomist do? And, if we get through those questions, the next is usually, how did you become a taxonomist?

How does one become a taxonomist? Let me share with you my path to becoming a taxonomist through a combination of chronology, and, of course, categories. I’ll also sprinkle in what I think is also a more common path to getting a role as a taxonomist currently.

Background

If you asked me what I wanted to be when I was young, I probably would have said I wanted to be a writer. Even in high school, I knew that one did not simply become a writer. There was starving and sacrifice to be done, and I am cursed with a little too much pragmatism to go through all that.

My first decision which led to a life of taxonomy was determining the best way to become a writer was to embrace what I loved and teach it. As we all know, teachers have plenty of leisure time and summers off to write their to-be-famous tomes. Well, it turns out that’s not what happens when you are a teacher, which I quickly learned when I was a teacher myself. I went to Eastern Michigan University in Ypsilanti, Michigan and got a Bachelor of Arts in English Literature and Language. I also became certified to teach in the Secondary Education program, adding an entire year to my undergraduate studies. I minored in history. The dual program allowed me to take courses in literature, composition, linguistics, and in my history minor. All of those courses turned out to be important later.

After teaching for a few years at the high school level after graduation, I was driven to get my Masters after a rather unsuccessful year as a middle school teacher. I graduated with a Masters degree in English Literature from the State University of New York at Stony Brook, a school known for…well, not English literature. The content of my degree soon became irrelevant, as the important takeaway was having a masters with even more literature and linguistics classes as well as a good dose of literary theory.

With my Masters, it was difficult for me to find full-time employment as a teacher in the New York area. The available positions for an English teacher were limited, the best school districts weren’t often hiring, and the worst only reminded me of what I hadn’t done well as a middle school teacher. After a few years of cobbling together several adjunct teaching positions at several different community colleges, I decided living on the economic edge just wasn’t for me anymore. I wanted some stability, and I was already seeing teaching wasn’t going to remain my passion for many more years.

My decision to leave teaching was as much about the pragmatic ability to have a full-time job with benefits as it was a recognition I would not be fulfilled with a career in education. While seeking a new career path, I applied for a job whose description I didn’t even understand at the Modern Language Association International Bibliography. In retrospect, I probably shouldn’t have mentioned in the interview that I didn’t understand what an Associate Thesaurus Editor did, but I somehow got the job anyway. Later, I was told I was hired because of my diverse background in language, literature, history, and education. Since the MLA indexed articles in all of those humanities areas, having knowledge and experience in them was helpful for reviewing content. In addition, I don’t believe they hired anyone without a Masters degree, so having an MA probably led directly to getting that role.

Everything I ever needed to learn about controlled vocabularies I learned at the MLA. It provided me with a firm foundation in the world of controlled vocabularies and indexing. The more I learned there, the more I realized the timing couldn’t have been better.

Timing (Luck)

I got my job at the MLA in 2002. In 2005, Gartner released its annual Hype Cycle with Taxonomy sitting at the Peak of Inflated Expectations on the curve. Businesses were clamoring for taxonomies. They didn’t really understand what they were or what they did, but they knew they had to have one.

Not only was I learning what it takes to maintain and govern a thesaurus of more than 56,000 terms in a flat list managed in an Access database, I was also making coincidental connections with people in market research and special libraries. The more I understood what they did, the more I began to understand the role of information in the corporate world. The corporate world, mind you, I knew nothing about with my two academic degrees and non-profit academic job. The intersection of my understanding of controlled vocabularies, learning more about market research content, and researching what taxonomists and special librarians did led me to understand that to grow in my career, I would have to make the jump into the business environment.

Strategy (Timing)

I often tell people I couldn’t have planned my career better if I had planned it. The fact is, I didn’t know I would be a good taxonomist until I became one. Sure, I had always been very organized (perhaps borderline OCD), lining up my Matchbox toy cars alternately by color or size as a child. I had also been interested in many different topics, but I never knew there was such a job as working with language that wasn’t linguistics, writing, or teaching some combination of those fields.

I often put more emphasis on my background and timing than on my ability to think strategically, but I did in fact have ideas and insights that proved to be true over time. So, in hindsight, I made some very strategic decisions that just so happened to work out.

The first was contacting people in the world of controlled vocabularies to understand better what they did, how they got there, and what I could do to advance my career in the field. I spoke to two people from the Getty Research Institute about their work on the Getty Vocabularies and how people got into those roles. I also spoke to a well-known taxonomy consultant who, when I told him what I was doing and asked what I should do next, replied, “Be a consultant.”

Back to timing. Since organizations knew they needed taxonomies but not what for or how to build them, they often hired consultancies to show them the way. Before there were taxonomy jobs at many of these companies, there were consultants telling them they needed to create a taxonomy job and hire a taxonomist. I started a two-pronged search for taxonomist or librarian roles and taxonomy consulting jobs. Traditionally, most taxonomists have a Masters in Library and Information Science. It is in this course of study they learn about taxonomies and then choose a path of becoming a librarian, becoming a taxonomist, or becoming a librarian who later becomes a taxonomist. Without an MLIS and no business background, I wasn’t having luck inserting myself into librarian jobs, especially corporate librarian roles.

Consulting companies, however, are always looking for fresh graduates to hire on as junior consultants. Despite being almost ten years out of my undergraduate degree, I did have enough on my resume to work in taxonomies, a niche and uncommon skill set for anyone looking for jobs in the corporate world. I landed a job at BearingPoint, a now defunct consulting company with a great reputation and a strong team in content and information analysis, organization, and discovery. Despite coming in at a junior level, the overall compensation package seemed incredible after years in education and non-profit. I was probably severely lowballed, but it was more than I had been making and gave me several years of additional background in content and records management, search, business analysis, process mapping, user experience design, and a host of other tasks thrown at junior consultants with an explanation something along the lines of, “Here, go become an expert in this.” And, of course, there was the taxonomy for business work that I had been seeking.

So, at the intersection of timing and strategy, I launched a career. I went on to work at several more consulting roles–some for others and one for myself during the economic downturn in 2008–in-house taxonomy roles, and once for a taxonomy and ontology management software vendor. The perfect trifecta: consultant, in-house practitioner, and vendor. With every passing year after breaking into the industry with my first thesaurus job, I learned more and was able to apply those learnings, with varying degrees of success, at the next job.

Takeaways

I believe we are at a moment as pivotal for taxonomy work, if not moreso, than the peak of taxonomy expectations back in 2005. In a field in which the Semantic Web has mostly failed to deliver, artificial intelligence and machine learning models have filled some of the gaps in automating and connecting existing content. What is widely accepted is that these technologies need clean data and taxonomies and ontologies provide foundational structures for classifying and connecting content. They are the semantic underpinnings that should have been the publically accessible and connected Semantic Web. I believe, now more than ever, experts in taxonomies and ontologies are necessary and valuable roles in an organization.

I’ve already laid out a path to becoming a taxonomist that didn’t require an MLIS, getting on-the-job training at an important moment in taxonomy evolution as a required business need. Aside from my initial Masters, which I got in order to get full-time teaching jobs at the college level, I never had to return to school to get the additional skills I needed to become a taxonomist. I learned in my work roles and researched information on my own, with a few certificate courses along the way to round out my education.

Will this still work today? Possibly, but getting your first taxonomy job without an MLIS is probably more difficult now than in the past. I do believe, though, the skills remain about the same as they were over 20 years ago. Therefore, you may find that you have the right skills and knowledge through a combination of your degree and work experience, as I did.

Things you need to know and skills you need to have:

  1. How to do research…in any field. The ability to research is inherent to the core tasks of defining preferred labels and discovering concept definitions.
  2. The ability to interact with others and negotiate. Taxonomy work is only enabled through work with other business and technical colleagues. It is often necessary to convince people of the value of taxonomies and there are often compromises to be made.
  3. Taxonomy and ontology best practices following standards and guidelines like the ANSI/NISO Z39.19-2005 (R2010) Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies, ISO 25964-1:201, W3C SKOS, W3C RDF, W3C OWL, and any industry-specific standards and available taxonomies and ontologies applicable to your area.
  4. Taxonomy and ontology management software vendors and their capabilities, like Access Innovations (Data Harmony), Graphifi (Graphologi), Progress (Semaphore), Stanford’s Protégé, Synaptica (Graphite), TopQuadrant (TopBraid EDG), and others which may be tangential to the functionalities of dedicated software platforms. Understanding how these systems work, even basically, informs how taxonomies are built and delivered to the business.

Things you may need to know and skills you may need to have:

  1. A background in library science (an MLIS degree), English (or other literature), or linguistics.
  2. Project work in text analytics, search, content, digital asset, and/or records management, user experience, or other fields or applications in which taxonomy features.
  3. Decent spreadsheet skills. I’ve never met a taxonomist who didn’t start their taxonomies in spreadsheets.
  4. Knowledge of triplestore graph databases on the market, including AllegroGraph, Amazon Neptune, OntoText GraphDB, Stardog, Neo4j, and others.
  5. Awareness of and interaction with taxonomy practitioners and consultancies, including Dovecot Studio, Enterprise Knowledge, Semantic Arts, Straits Knowledge, Taxonomy Strategies, taxonomy and ontology management software vendors, and others.

Things you really don’t need to know or have the skills to do yourself:

  1. Speak more than one language. If working in the United States, the primary language of an organization is probably English. If you are doing multilingual taxonomy work, it may help to speak one or more languages, but many organizations use translation services. Of course, an understanding of other languages helps because taxonomy is all about words.
  2. Programming and formatting languages, including XML, JSON, SPARQL, and SHACL. You may need to write these yourself, but there are frequently vendor or in-house resources to do this if you’ve already gone far enough to need them.
  3. How machine learning models work. There are people who do this as their core competency. A taxonomist isn’t expected to know how these models work under the covers, but it does help to have an understanding of how models will use taxonomies.

This is not an exhaustive list and these are my opinions after working in taxonomy for over 20 years. People, companies, or products not mentioned here is likely an oversight and does not indicate a negative opinion or lack of endorsement. I also didn’t provide a list of excellent taxonomy resources, including websites and books, because there are so many and they are already pretty well documented. A few quick searches should set you on the right path to find your own resources.

Having frequently been asked how I became a taxonomist and what I had to do, know, or learn to become one, I hope this blog helps others on their path!

Taxonomies and Turnover, or Johnny Pneumonic

https://pixabay.com/vectors/bookshelves-books-reading-learning-1751334/

“In any given culture and at any given moment, there is always only one ‘episteme’ that defines the conditions of possibility of all knowledge, whether expressed in theory or silently invested in a practice.” – Michel Foucault, The Order of Things: An Archaeology of the Human Sciences

When the movie Johnny Mnemonic came out in 1995, I often heard people mispronounce the word “mnemonic” as “pneumonic”. I speculated at the time, rightly or wrongly, that more people knew the term “pneumonia” than “mnemonic” or maybe one was simply easier to say. Strangely enough, I’ve now associated the concepts of memory and sickness because of the similarity, and confusion, of those two concepts. Maybe a pneumomic device helps you remember something…or maybe it helps your lungs function properly. Are we concerned about another pneumonic plague or a pending mnemonic plague? Who can say?

The Mnemonic Plague

If we live in a knowledge zeitgeist, it may someday later be defined as the death of knowledge. Or, more accurately, the death of the belief in knowledge and expertise in favor of opinion and belief. The results of successful misinformation campaigns include a skepticism of expertise and knowledge, feelings that opinions are equal to or supersede knowable facts, or even the inability to know anything at all. Postmodern philosophy has concerned itself with the idea that there is no objective truth, either foreshadowing the imminent knowledge paradigm or driving it. Baudrillard has predicted, quite accurately, that simulations and simulacra will become so prevalent that all meaning will be meaningless.

Add to this another sociological trend, the current workplace generational turnover from a larger, knowledgeable, older generation to a younger generation buried in and swept up by these currents of skepticism. With reality changing so quickly, and so many people being distrustful of what reality is anyway, are we on the cusp of a mnemonic plague? A plague in which no one remembers anything? A plague in which we all doubt the ability to remember it correctly? A plague in which we are convinced by others that reality is not reality, facts are not facts, and that memory is susceptible to convincingly reality-sounding alternative histories?

Johnny Pneumonic

When we began the great shift from paper to electronic documentation, the speed at which we would be able to create, store, and access information grew exponentially. Imagine a library which could feasibly contain all knowledge from human history across all countries, languages, ethnicities, and religions. An electronic library that would make the Library of Alexandria, the Library of Congress, and the Bodleian Library look like corner news stands. One library to rule them all. That happened, sort of, with the expansion of the World Wide Web. Vast troves of information were digitized and made available online even as people continued to create informational web pages and create documents which were born digital, never seeing the pulpified remains of a tree. In parallel, all the other kinds of news, information, and entertainment were also being born digitally and, in the great democratization of the Internet, anyone anywhere could theoretically have access to the ability to view and create information assuming they had some kind of device and network or satellite infrastructure.

Imagine the possibilities! Imagine how knowledgeable we could all become! Imagine knowing anything, anywhere, at any time, at the moment you need to know it! And some of that is true. What is also true is that people create patently false information reflecting their personal beliefs and agendas. There are millions of mediums, channels, and formats available to anyone to tell us what magical healing herbs will make us live longer and with fewer wrinkles; entire video libraries dedicated to documenting the horrors of vaccines and Western medicine; travel logs with erroneous anecdotes and false facts with the only verification being the person who created the content; videos of lizard-people trafficking human children; videos of directors faking the moon landing. All of this available to anyone, anywhere who wants to be an audience.

Whether an individual or a state actor, whether the content creator believes their own content or not, whether we can decipher what is true and what is not, all of this information is there for us to consume and form our own opinions about. That is freedom. That is democracy. That is the democratization and platform of eight billion voices. We no longer have to be slaves to a single narrative; no, we can write our own narratives and gather our own followers, forming fragmented communities in a new-world Splinternet in which we can all, finally, show everyone else a picture of what we had for lunch.

The Codification of Memory

I could say things were better back in the good old days, whenever those were and for whomever they were good, but that’s also an opinion, not a fact. The genie is out of the bottle. The train has left the station. The die is cast. We have crossed the Rubicon. The horse has left the barn. We now live in a world of snowclones, memechés, and information of dubious provenance. Compounding the intersection of easily creatable and accessible information, skepticism of expertise, and generational turnover is the ever-improving use of AI tools to generate very impressive, lifelike, and believable images to support textual messages. Don’t believe we faked the moon landing? Here’s a photo! Didn’t that famous Hollywood actor appear in a porno? Here’s the video!

And now for something completely different. Or, rather, here is me now getting to the point for my target audience of information professionals: what do we do in the face of all this misinformation? Our jobs…the best we can.

I’ve been working on taxonomies for over 20 years. At various times–sometimes from one year to the next and other times one hour to the next–I have experienced that overwhelming sense of doom, frustration, or hopelessness in the face of shockingly ignorant misinformation and opinions. I have long since abandoned any hope for the Semantic Web’s promise that real, verified meaning could be captured in logical, formalized, human and machine-readable structures like taxonomies and ontologies. Despite the acknowledgement, both philosophical and practical, that there is no Truth with a capital “t” and that all facts are context-dependent, I still try to build semantic structures that codify the truth of my organization at the moment we are building it.

In the context of any moment, there are truths we can build and verify. We can create preferred concepts and connect them with human-readable, sensible relationships and build taxonomies into interconnected graphical knowledge bases applied to a variety of content vetted by subject matter experts. Verified and vetted content may no longer be true in a week or a year or a decade. The relationship between two concepts may quickly become outdated. A preferred concept may fall out of use or fashion. Despite all of this, codifying knowledge in taxonomies and ontologies is not an act of futility; it is an act of capturing truth and memory at a moment. We can document these changes over time and look back over the history to see what was true versus what is true now.

These semantic structures are a way we can document organizational knowledge and pass this knowledge on to the next employee, the next generation, the next iteration of a company. Creating good historical records in semantic models serves a knowledge management function in being one way we can enable knowledge handover from one person to the next, one team to the next, one project to the next.

The organizational book of knowledge as written in taxonomies is often edited and changed, but it is still a book to which we can refer with confidence that what we tried to build was accurate and true to the best of our abilities. Taxonomies are the new black. Ontologies are the mother of all semantic structures. In space, no one can hear you taxonomize…but your work is still valuable.

KMWorld Themes and Trends

https://pixabay.com/illustrations/cyborg-hand-human-robot-contact-9091910/

“At a certain level of machine-ness, of immersion in virtual machinery, there is no more distinction between man and machine.” – Jean Baudrillard, Violence of the Virtual and Integral Reality

I attended the KMWorld and co-located conferences held in Washington, D.C., November 18th – 21st. Across the conferences and sessions that I attended and the conversations between sessions, there were several themes which resonated. I’ll detail some of those themes and highlight any talks in particular I thought inspired or captured those trends. I couldn’t attend everything, so please forgive me if I’ve overlooked great talks, of which there were many.

Taxonomies as an Enterprise Strategy

The importance of taxonomies and ontologies as a foundational business and technical program was particularly evident at Taxonomy Boot Camp, but the theme also came up in talks in the other conferences as well. I presented on the topic in my session, “Stand Still Like the Hummingbird: Enterprise Taxonomy Strategy When Nothing Stands Still”. Likewise, Thomas Stilling in his Monday keynote, “Be the Change: Your Taxonomy Expertise Can Help Drive Organizational Transformation”, emphasized metadata as an organizational strategic asset and discussed how to position taxonomy to align with strategic objectives. Similarly, Lindsay Pettai’s session, “Empowering Your Enterprise With a Dynamic Taxonomy Program”, discussed the importance of having an enterprise taxonomy program.

These are just a few examples. Throughout the conferences, the notion that taxonomies and ontologies provide structured, source of truth values for navigation, search, and content tagging was prevalent. The main theme, however, was how taxonomies and ontologies are critical to the success of machine learning (ML) model training and program success. In talk after talk, artificial intelligence (AI) and ML were top topics and various ways to approach projects to make them viable and successful included what needed to be in place. Taxonomies and ontologies were among those foundational requirements.

As a 16-year veteran of KMWorld associated conferences–my first being Enterprise Search Summit West in 2008–I have seen the more recent embrace and understanding of enterprise taxonomies and ontologies beyond simple use cases like navigation. Even just a few years ago, taxonomies seemed to be misunderstood or undervalued by many conference attendees. Now, talk of their use as enablers of AI was ubiquitous.

Artificial Intelligence Is Here to Stay

I don’t believe I saw a talk which didn’t include the keywords “artificial intelligence”, “machine learning”, “AI”, or “ML”. I was unable to attend KMWorld last year, but in years past, any conversation on the role of AI/ML was frequently met with scoffs, eyerolls, or fear. In KMWorld’s Thursday Keynote, KM, Experts & AI: Learning From KM Leaders, Kim Glover and Cindy Hubert captured something important when they said that AI is driving a lot of emotions. Some of these emotions are excitement, skepticism, and fear. Addressing these emotions is going to be essential in addressing the technologies. 

The overarching theme is that AI/ML is here to stay, so what are you going to do about it? Continue to reject it and fall behind the curve? Embrace it without question? Embracing ML models, including LLMs, with guidance and guardrails seemed to be the message most were conveying. While many organizations are still in the proof of concept (PoC) project stage, adoption seems to have already arrived or is pending, so get ready to embrace the change to be successful.

The exponential growth in ML has been fueled by cheaper storage, faster technologies, more robust and accessible models such as ChatGPT, and built-in technologies like Microsoft Copilot. Not only are these tools more accessible, they are more accurate than they have ever been in the past…for the right use cases. And this was another big takeaway: AI/ML is here to stay, but use these technologies for the right use cases. Document summarization and generative AI text generation are big winners while tools used for critical decision making are still improving and must be approached with caution.

Human-in-the-Loop, Cyborg, or Sigh…Borg?

Overwhelmingly across sessions, the consensus was the importance of the human in the use of AI/ML technologies. For both the input and the output, to avoid garbage in and garbage out, human beings need to curate and review ML training sets and the information that an ML model outputs. As noted above, document summarization may be low risk depending on the industry, but decision-making in areas like law and medicine are high-risk and require humans-in-the-loop.

In his session, Evolving KM: AI, Agile, & Answers, the inestimable Dave Snowden discussed storytelling–or, rather, talked about storytelling through the art of storytelling–as an intimately human activity. He noted the deeply contextual nature of human knowledge and the need for knowledge management to generate knowledge rather than store it as codified data. If knowledge can not be fully codified as data, therefore, it is difficult or impossible to transfer that level of human knowing into machines and their models. The connectivity to knowledge and deep metaphor was evident in my three takeaways from his session: his notion of a tri-opticon integrating knowledge between three areas of an organization and the connection to Welsh Triads; forming teams or roles of threes, fives, or sevens, Welsh Rugby, and the number of people on a rugby sevens team; and assemblages, mycorrhizae, and rhizomatic networks in Deleuze and Guattari’s, “A Thousand Plateaus”. These kinds of connections are weak use cases for machine learning and emphasize the importance of human knowledge and inclusion in AI technologies.

And, speaking of human knowledge, of course knowledge retention and transfer were key topics in an era when both age and work-from-home job mobility are creating more job turnover than ever. While human knowledge is difficult to capture, using AI agents to capture knowledge and ML models to parse textual information will assist in ramping up the sheer scale and increasing pace necessary to retain and transfer knowledge.

The human-in-the-loop, the cyborg, and, sigh…Borg, are all going to rely on knowledge and ethics in order to create a human-machine interactive paradigm. If humans don’t use our own ethical boundaries to curate machine content, then bad data, and bad ethics, will spread throughout our information systems.

While there was some fear and loathing in D.C., there was a stronger current of hope, optimism, and curiosity in the many ways we can use taxonomies and ontologies, AI/ML, and human knowledge management together to guide us toward a brighter future of technology use.

Specialty Skills and the Risks of Hidden Complexity

https://pixabay.com/photos/statue-camouflage-hidden-discreet-2393168/

The Cube is, at the same time, a symbol of simplicity and complexity.” – Ernő Rubik

Taxonomists and ontologists–sometimes separate but more often one and the same role I’ll simply call a “taxonomist”–are frequently underappreciated for their work in modeling and representing complex domains. Taxonomists do not simply suggest term forms and put them in a list; they research and consider each concept, carefully map out its relationships to other concepts, and consider how new additions and changes impact the overall semantic model. The nature of taxonomy work is often misunderstood, or, worse, trivialized as work that anyone in the organization might perform. Several expert taxonomists discussed this topic as a pain point in communicating the value of taxonomy at the Henry Stewart Semantic Data 2024 Conference in New York this week.

Taxonomies and ontologies need to be clear, transparent, and wholly visible and available where appropriate. Serving different use cases drives user experience decisions which lean toward simplifying the display of complex semantic models so users can more easily understand and use them. While hiding complexity is a service for end users, it can communicate the result of taxonomy work as simplified dropdown lists, shallow hierarchical structures, or simple concept values without additional properties and relationships. The complexity of semantic models, including the taxonomist’s work in building them, can be hidden and thus misunderstood.

Hidden Work

Semantic models are frequently hidden from users due to their complexity and necessarily strict governance models. The overall semantic structure, potentially composed of many taxonomies connected by robust ontologies, is not always available for the end user to view as a whole or to interact with directly. Hiding the complexity of semantic models can simplify user experiences, but they can also present fractured, contextless concept values. Worse, the art and science of semantic modeling conducted by highly skilled and knowledgeable taxonomists is also hidden…or at least potentially misrepresented.

The implications of this hidden work are manyfold. Because the skilled work of taxonomists is simplified, so too does the portrayal of semantic work seem simple. The result is inexperienced employees taking on taxonomy and ontology modeling tasks, potentially in parallel to ongoing enterprise taxonomy efforts. The fragmentation of taxonomy work in turn impacts the success of the overall taxonomy program as an enterprise strategy.

Additionally, the value of taxonomy work in supporting a variety of use cases is also undercut. Taxonomy is reduced to simple hierarchies to use in navigation or flat lists used in content tagging. The true value of complex, interrelated concepts with additional property metadata is lost. End users, or potential end users, don’t know the art of the possible or even what to ask for from enterprise taxonomists.

Exposing Semantic Models in the UI

Systems consuming taxonomy values are an abstracted layer, presenting concepts in navigational structures, as typeahead values in search boxes, or as limited flat lists for tagging content or assets. Because of this, the additional value of ontological relationships and property metadata is stripped away. Of course, designing user experiences to meet the use case and deciding when it is appropriate to display taxonomy values, how many, what type, and what is possible are all aimed at organizational efficiency and clarity. Sometimes too much is just too much.

However, consideration is not always given to what else from a semantic model may be displayed, and in what way, so that users get more contextual value from taxonomy concepts shown out of context. How much information associated to a taxonomy concept should be exposed and in what context? Concept definitions or synonyms might be valuable to expose to end users so they understand the label and what else it may represent. These could be displayed in a hover menu so they are available but unobtrusive.

Similarly, taxonomies need not always be displayed as flat lists or hierarchies. Taxonomies are natural candidates for graphical visualizations to display interconnected values and associated metadata. Many existing taxonomy and ontology tools include APIs or widgets meant to display taxonomies in just this way external to the tool. Publicly available visualization libraries can be used in custom, in-house built UIs to display and navigate graphs. A graphical user experience can display the complexity of semantic models, including hierarchical and associative relationships, while easing their presentation and navigation.

Taxonomies and ontologies are foundational components in the training and refining of machine learning models, including their use as a source for tagging machine learning training sets and as institutional reference for large language models. Graphical displays have the benefit of exposing the full scope of semantic models for more complex use cases like serving data to machine learning models. 

Another way to expose the full scope of semantic models is to provide partners with read-only access to taxonomy and ontology management systems. These tools typically have few editing licenses which should be reserved for members of the taxonomy team to maintain proper model governance. Read access, when available, should be reserved for business partners who understand, or are willing to be trained, how to navigate complicated platforms.

The Benefits

There are several benefits to getting creative about how and what to expose from semantic models.

The main benefit is discrete, visual communication and training of the complexity and possibilities of semantic models. Good UIs don’t simply provide a way of interacting with machines, they also educate and train the end user on common principles such as clicking on a button to perform an action, pinching in and out to view content, and, in this case, how to traverse hierarchies and graphs. From this, end users may come to their own realizations about how semantic models may be used in their projects.

Another benefit is increasing end user awareness and the visible profiles of the taxonomists and taxonomy program. Knowing how information is provided to consuming systems and who is responsible for creating, modeling, and maintaining that information provides a point of contact for end users who can field their questions. Since consumers don’t always know the full range of possible semantic model use cases, they are provided a path to get modeling advice. Taxonomists increase their visibility and can act as internal consultants on a range of organizational projects. In turn, their valuable skills are employed in many areas of information management, improving the overall health of information best practices.

In sum, don’t hide your taxonomists or their work. Get creative about how to expose what they do and the results of their professional skills.

Modeling a Moving Target

https://pixabay.com/illustrations/ai-generated-pipes-industrial-9043429/

Like a wave in the physical world, in the infinite ocean of the medium which pervades all, so in the world of organisms, in life, an impulse started proceeds onward, at times, may be, with the speed of light, at times, again, so slowly that for ages and ages it seems to stay, passing through processes of a complexity inconceivable to men, but in all its forms, in all its stages, its energy ever and ever integrally present.” – Nikola Tesla

Some of the most thorough, authoritative, and well-constructed controlled vocabularies have been built and curated over the course of decades. The NASA Thesaurus was first published in 1967. The Library of Congress Subject Headings can be traced back to its roots in 1898. The MLA Thesaurus was modernized in 1981 based on previous classification methods. The Getty Art & Architecture Thesaurus was started in the late 1970s. There are many more examples of well-established controlled vocabularies, some older and some newer.Maybe there’s something to the adage “with age comes wisdom” in the longevity and authority of these vocabularies.

Many of these longstanding vocabularies serve the academic system and do not necessarily “move with the speed of business”. In a business environment, the emphasis is typically on speed, agility, and efficiency. None of these qualities are in opposition to the typical development of good controlled vocabularies, but a fast-paced organizational environment can certainly be difficult to support with vocabulary development that can’t keep up.

In my experience, product vendors talk about how to speed up taxonomy and ontology development while semantic practitioners (librarians, taxonomists, ontologists, etc.) regularly have to strike a balance between maintaining quality semantic models and serving the business needs. The speed at which semantic models need to be developed in a business setting often leads to the extension of inappropriate taxonomy use cases in order to support and facilitate immediate business priorities. We need to step back and ask ourselves how we can as taxonomists effectively support the business while addressing what should be modeled in taxonomies and what should not.

In this blog, I’m going to focus on two closely related cases of data which I think are not appropriately modeled or maintained in a semantic model but are often brought up as examples of serving the business quickly.

Processes & Sequences

A business problem I’ve seen in my roles as a taxonomy management platform product manager for a vendor and as a taxonomy practitioner is the modeling of processes as taxonomies.

There are several difficulties with modeling processes as taxonomies, and why I tend to discourage their inclusion as part of the overall enterprise taxonomy and ontology semantic framework. First, steps in a process aren’t usually semantically hierarchical. For example, say you are asked to support a step-by-step process for submitting a ticket in a request tool such as those offered by ServiceNow or Atlassian in a centralized taxonomy. An IT service ticket can be opened, assigned, reassigned, resolved, closed, and reopened. Building these steps as a hierarchical taxonomy doesn’t convey their true relationship to each other. You can build these steps as a flat list, which addresses that problem, but then you have a sorting issue of making sure the steps are displayed in the proper order in the consuming system. Since the actual ticket requests live in a consuming ticketing system while the controlled values naming each step in the process live in a taxonomy management system, this may not be a problem at all. The ticket is an object that is simply tagged with a new value at each stage in its lifecycle, regardless of the flat or hierarchical structure in the taxonomy. So, the first problem may be solvable, but it needs to be addressed.

The second issue is that processes are rarely progressive steps from beginning to end. The stages of a ticket may move up and down a flat list or hierarchy, often skipping steps in between and moving back to a previous step. Again, if the ticket is tagged from a flat list which is properly ordered for display, this may not be an issue. If the consuming system must follow the flat list or tree order, however, there may be challenges in changing the value to a previous step. Where this gets complicated, however, are processes in which the same steps are repeated by different groups in the organization. A contract, for example, can go through contract review by the vendor, by the business team representatives, by legal, and again by the vendor’s legal. Modeling a process like this taxonomically often means that these repeated processes are either appended with the team or entity conducting the process step (business review, legal review, etc.) or repeating the steps as children under organizational group headings, creating a polyhierarchical nightmare. 

The final and most difficult challenge with modeling a process is that they often change. In another example, modeling the customer journey from start to finish is much like stages in an IT ticketing process. The customer rarely moves neatly from each step to the next. More importantly, however, is that marketing processes frequently are overhauled with changing values anytime there are changes in business strategy. Even when the values don’t change, their ordering of the process does and these orders are often reflected as navigational structures represented as filters on the front end. Representing a customer journey based on filterable values is challenging because of the numerous ways a customer may enter the UX pipeline. In retail apparel, for example, they may start looking for products at Gender (Men’s, Women’s, Children), then apparel type, then size, then filtering by material or color. Or, the customer may start with color, then apparel type, then size. There is a process here, but one with multiple entry and end points. Trying to represent this as a taxonomy is incredibly difficult.

A similar problem I’ve come across in taxonomy modeling is using lists or hierarchies to indicate sequence. Most processes have steps that are in sequence, but in this case I’m talking about strictly fixed sequential order. Examples can include the order of books or films (by date or by narrative order), the list of U.S. Presidents in order, or events by date.

The fundamental issue for most taxonomy management systems, and for many systems relying on taxonomy data for that matter, is the default to alphabetical display in lists. For most taxonomy hierarchies, alphabetical is the preferred display order, with each cascading branch also alphabetized. On most front end websites, navigational taxonomies are ordered by use with the most prevalent ways to access information listed first. Front end website platforms are built for this type of information display because the hierarchies do not necessarily follow what I would call semantic or “is a” taxonomy practices; parent child relationships are based on filtered drilldowns, not by strict contextual meaning. Where this becomes an issue is when an organization, quite rightly, wants to consume centralized taxonomy concepts for the front end experience.

Even when using the larger graph underlying taxonomy and ontology structures, conveying order can be challenging without the right functionality to support it or the ability to leverage the model in consuming systems that can only handle flat lists or shallow hierarchies.

Modeling Options

Modeling sequences isn’t out of the question in taxonomy management systems assuming that 1) the taxonomy and ontology management system includes functionality supporting modeling options, or 2) downstream systems can effectively handle or transform concepts received from a centralized taxonomy.

To model steps in a process or to capture sequences, there are a few options. If you are using a tool supporting RDF and core SKOS elements, you can consider using skos:OrderedCollection. An ordered collection is exactly what it sounds like: a collection of concepts put in a specified order. Using an ordered collection for a list of concepts in a branch of taxonomy allows listing those items in the order desired. There may be no other indicators of why those concepts are in that order if stripped from their contextual parent, but it will force sort a group of concepts. This assumes, of course, that the consuming systems don’t simply revert to alphabetical order once received.

A more flexible and sustainable way to model a process is to model semantically using “is a” rules and then leveraging a true ontological structure and a graph to map the journey. This means modeling concepts in their one best location in one or more taxonomies as part of a larger domain model. Modeling this way leans into the strengths of an ontology by using associative relationships between entities to make a graphical representation of the order while also connecting the entities to their owners. So, for example, books or films may have relationships like is followed by / is preceded by and then can be connected to their authors, directors, and actors as part of a greater graph.

Another option is to include a property on each concept. The property could be a field which indicates a numerical value listing its placement in the list. While this metadata field could be useful in the taxonomy user experience, whether or not these property values could be used in consuming systems to order items is still problematic. Furthermore, it gets complicated if it’s necessary to order multiple sets, all of them including the same numbers as an ordering property.

In advanced graph-based taxonomy and ontology management systems, there may be an option to use reification or RDF* to support metadata on triples. In this way, the ordering is embedded on the edges themselves. For example, books and films could include relationships with a release date. This could look something like “James Bond film” has release date [20061117] “Casino Royale” in addition to a broader/narrower relationship between the two concepts. There are several modeling options to make use of associative relationships with added metadata on the relationship edge.

In sum, it’s not impossible to model processes and sequences in taxonomies, but it requires thoughtful modeling in the context of other existing business taxonomies likely sharing the same overarching business ontology. Moreover, thoughtful modeling may not move with the speed at which an organization wants to move, but slowing down and getting it right the first time can save a lot of painful rework later.

How Not to Say What You Mean

The author at the Panathenaic Stadium in Athens, Greece, the site of the first modern Olympic Games.

“If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn’t. And contrary wise, what is, it wouldn’t be. And what it wouldn’t be, it would. You see?” – Lewis Carroll, Alice in Wonderland

As I traverse the daily absurdities of life, I’m often caught between the dark, existential absurdism of Albert Camus (Camus was famous, but Sartre was smarter!) and the light, but also dark, comedy of Monty Python. “The absurd is born of this confrontation between the human need and the unreasonable silence of the world,” Camus wrote, but in French. Absurd, but similar in a sense to the absurdity of the Python sketch, “How Not to be Seen”. In this bit, several people “demonstrate the value” of not being seen because, once seen, one may just be shot or blown up. You just never know.

I am reminded of the “How Not to be Seen” skit because there are other things which demonstrate value but should not be seen. Consumers–that is, all of us–often have expectations which may not be known to retail companies. These expectations may be reasonable or unreasonable. We may expect a website to know exactly what we need or mean simply by landing on the home page or typing in a general search query. Things which should not be seen can in fact be fundamental to our contextual understanding and our shopping experiences. Patented, copyrighted, misspelled, or inappropriate words or concepts, for example, should not be seen in the wrong place or context. There are times when what needs (or wants) to be said can not be.

Taxonomists are word people driven by an insatiable compulsion to classify the world around them. Much of a taxonomist’s work is spent finding the right word form, and its synonymous variants and acronyms, so that a single word or phrase can represent a defined concept. Usually the intent of finding the right concept is exactly so that it can be seen, and understood, by end users. How can taxonomy professionals say what needs to be said without saying it at all? How can the valuable work of agreeing on common terms and phrases be obfuscated so that what needs to be said isn’t seen?

How Not to Say “Olympics

Coming up in just a month is a perfect example of what’s not to be seen: the “Olympics”. Of course, we’ll all be seeing them on television, mottos and rings and all. However, using the word “Olympics” or any of their properties is highly restricted. A full explanation exists here: https://olympics.com/ioc/olympic-properties. I swim with a United States Masters Swim (USMS) team. Despite being an athletic organization, USMS and its subdivisions can’t have any events called “Olympics” or “Paralympics” even if they are for fun and without official times.

During the period around the Olympics, people rightly get excited. They want to purchase athletic equipment or apparel to support their favorite teams and sports. How do retailers say “Olympics” without saying “Olympics”? In these circumstances, representing the “real” or intended concept while using or redirecting from the stated–or unstated–concept requires some trickery. While pondering how this plays out for retailers, many of whom may or may not have legal contracts with athletes who are past or current Olympians, I visited some sites to conduct some very unscientific and informal searches. In the spirit of not being seen, I won’t mention the sites by name or all of the search terms I used because I can not speculate on what is actually going on behind the scenes, only what I believe may be happening to make the front end behave as it does.

I started my search by visiting several retail sites who would have an interest in selling products related to the Olympics. These sites include athletic apparel and product aggregation sites and I used the term “Olympics” as my search query directly on their site’s home page. Here’s what I found.

On the first site, I got a page full of products which will very likely be worn by athletes at the Olympic Games. There were no words or phrases appearing in the product names or descriptions linking the products directly to the “Olympics”. I suspect this product page was manually curated in the web content management system and search terms and variants for “Olympics” all redirect to this page. On the same site, I put in the name of an Olympic athlete and was presented with a page with products appropriate to that athlete and the sport with no mention of either the upcoming competition or the athlete’s name.

On the second site I chose, I got a handful of products searching for the term “Olympics”. Not very motivating, but at least there was something. I then searched for an athlete this company sponsors and got a page about the athlete. However, there were no associated products, only information about how this athlete is sponsored by the brand. I’d call this a missed opportunity. Show me your profile of the athlete, sure, but also give me something along the lines of “This athlete likes products x, y, and z.” If I’m a fan of an athlete, I might just buy what that athlete endorses, even if what I can buy isn’t the customized equipment used by an Olympian.

On the third site, searching the term “Olympics” brought me to a single product page for a sporting event which will be at the competition. Fine, but underwhelming. That’s it? That’s your stake in these upcoming sports? While I couldn’t find a specific Olympic athlete sponsored by this brand, I did find teams wearing their apparel. When I searched for one of these teams, I got similar products, but none for the team for which I searched.

On the fourth site, I got a message saying there were no products matching the term “Olympics”. While this company has avoided using a search term which they have no legal right to display, they have also missed a huge opportunity to move product during one of the largest sporting events around. No search should have an empty results page and should always redirect to something, even if only notionally related. As a consumer, I might bounce out of this site right here and now. I then searched for an athlete this company sponsors, and, almost unbelievably, the page again told me there were no related products. The term “Olympics” may not be available for use, but it was no secret which athlete this company sponsors. To not have any related products come up with their name was a big surprise.

On the fifth and final site, a retail aggregator, there seemed to be no restrictions on Olympics or brand name products with some or most of the items available branded themselves. To be honest, I don’t know how they get away with it, and I would need to talk to someone familiar with business law to get an answer. Perhaps there is a blanket agreement with The IOC (International Olympic Committee) allowing the term to be used in search, product names, and product descriptions since the term appeared in all of those locations. I doubt if many of the numerous sellers on the site have a direct agreement with the IOC, but I did find a few products that claim they were officially licensed merchandise. Here, any and all search terms brought at least something back.

How Not to Say Your Competitor’s Name

In a world of fleeting social media trends, there is opportunity to ride a popularity wave by creating similar, or even highly contrasting, products. If you like product A, you’ll love the similar (but better!) product B. Or, you used to like product A, but it is so last week, so check out product B as the next trend. With this in mind, my next experiment was searching for competitor’s names and products across sites. Again, I searched directly on the company home page seeking competitor names or products.

Back on site one, searching the competitor’s company name defaulted to a page of undifferentiated products. Again, these are probably manually curated to line up against the competition’s offerings. Since my search was vague, the product offerings spanned just about everything. When I tried a more specific, and trending, product name, I was brought to a page of products which had similar characteristics. Again, there is no mention of the competitor or their product. However, there was a general sense of “if you like this, you might also like this”. I’d call this a win for the retailer. If I came looking for something from another brand expecting to find something similar from this brand, then I may have some brand loyalty and want an equivalent product without having to shop their competitor.

On the second site, I got a message saying there were no products matching the term for their competitor’s name. Similarly, I was not shown any products when I typed in the specific name of a competitor’s offering. Disappointing, especially since there are similar products to be found on the site. It is impossible to know what strategy, if any, this retailer is employing. Maybe they don’t see their competitors as a threat. Maybe they haven’t found a mechanism for presenting similar products. Maybe I just happened to pick a competitor’s name and product they don’t match to. Regardless, coming up with nothing prompts me to look elsewhere.

On site three, I got the same message saying there were no results: an empty set for my search term. When I typed in the name of a popular rival product, I was shown two products which had almost nothing to do with my search. They showed me something, but it gives the searcher the impression that this company doesn’t sell similar products, which is false. If you come to the site with intent, you can probably find what you’re looking for, but if you come with an open-ended search, you will likely be disappointed.

Site four brought up one product when a competitor’s name was typed into search. While the rival company does in fact sell similar products, it’s debatable whether the product this site returned was representative of the other brand. When I typed in a rival brand’s product name, I got no results. Again, an empty results set. The brand does offer a row of their most popular apparel beneath my “no results” message, but this may or may not have any semblance to the product I’m trying to compare.

Finally, site five yet again brought up numerous products both for brand and product names. I can’t speak to whether the products are genuine or knock-offs and whether the sellers have any rights to sell those products. What a few searches prove on an aggregate retailer site is that they carry anything and everything.

In summary, I believe some retailers are mapping competitor brand and specific product names to similar products they carry. I’ll speculate that they do this on the front end, manually mapping products that retailer carries to search terms which don’t show in dropdowns, navigation, product names, or product descriptions. In this way, consumers get a reasonable comparative product results set without the retailer getting into legal trouble showing event names like the “Olympics”, competitor brands, or competitor brand products.

How Not to Say the Unsayable

I’ve been focused on retail sites because that’s the environment I work in today, but this idea of redirecting to the proper term came to my attention in my first taxonomist job as an assistant thesaurus editor at an academic institution. At this company, subject matter experts manually indexed academic articles in a variety of humanities disciplines. Their index spanned over 100 years worth of articles and had been indexed across the span of decades.

Over the course of their work, language changed and terms fell in and out of use. Maintaining a thesaurus of thousands of concepts displayed in a flat list made long-term curation and governance challenging. Terms usually fell out of use as indexing terms to match their use, or lack of use, in the text. Many of these zombie terms lived on in the thesaurus until sometimes they were found by an indexer and used again.

I apologize in advance for the terms I’m going to put into print here, but I think it’s important to underscore exactly what language was appropriate at the time the article was written and the difference between terms we use now to express the same ideas. One term that popped up in indexing was “mixed bloods”. It only took one glance to know that was not a term we should be using to index academic articles. As I recall, the article did in fact use that phrase, and it was probably an older article being indexed long after the term fell out of use. So, how do we remain true to the spirit of the article while also remaining true to the integrity of the institution and keeping up with appropriate and inoffensive language? In this case, common thesaurus parlance and structures were utilized, using the Used/Used For relationship to point from one term to another. The old term was turned into a synonymous (but hidden) “use” term and the more modern term, probably “multiracial”, was the new “used for” term. A thesaurus redirect then addresses the current as well as all past indexing. Anyone typing in the former term will automatically be redirected to the newer term and the articles indexed with both concepts.

Another example, used frequently in the writing of Martin Luther King, Jr., is the term “negro”. I think this is a good example of intent and the problematic use of inappropriate language. When MLK, Jr. used the term, it was the “acceptable” parlance of his day and was used 15 times in his famous “I Have a Dream” speech. When keyword indexing a piece of literature like this speech, it’s important to capture the subject matter, but we need to recognize the now problematic terminology. Again, using redirects from appropriate terms to older indexed terms solves this problem and can be done again in the future if current terms fall out of favor.

The latter example can quickly lead down a slippery slope of intent and political speech. People say–and imply–what they shouldn’t all the time in political speech. How we recognize and address that speech depends on the context.

How Not to Say what You Want to Say with a Wink and a Nudge

Back to not saying what you want to say when you need to not say it. Riding quickly-changing trends can be a huge revenue vehicle for many companies, especially if they are able to pivot quickly and sell the message to the right audience. As I’ve said before, taxonomy development lags behind current trending language, both because of the amount of work involved in maintaining a vocabulary but also because of literary warrant. While the concept was established when indexing written works, the principle still applies to fast-paced corporate environments. Add concepts to your taxonomies when they have proven their value or staying power. Anything trending or temporary should be managed another way.

Let’s look at an example. When Barbie was released on July 21, 2023, searches for “Barbie pink” and “pink” spiked according to Google Trends. In addition, searches for “Barbie” and “pink” in various combinations with specific brand and product names also rose. While a retail company might ride this wave and create collections of pink product landing pages, many do not have licenses with Mattel to use the term “Barbie”. Hence, products needed to be assembled based on terminology that in whole or in part, yet again, couldn’t be spoken or displayed.

What are some ways to do this while leveraging controlled metadata values from taxonomy in search? Taxonomists can request on-site search logs from the search team and analyze and mine these concepts for candidate terms. Analyzing search logs is valuable, but making an informed decision as part of the retailer search strategy is even better. Work with marketing and analytics and insights colleagues to identify top competitors and products and map them as appropriate to your company’s competing products. Consumers will be pleasantly surprised when they type in a competitor product and land on a page of very similar products offered by your company. These search terms can then be added to taxonomies strategically.

For anything with staying power (i.e., it has literary warrant) consider using skos:hiddenLabel to redirect to results. Skos:hiddenLabel is “A lexical label for a resource that should be hidden when generating visual displays of the resource, but should still be accessible to free text search operations” (W3C). A hidden label can be associated to a preferred label concept just like an alternative label. For each appropriate product, map it to a competitor or product name stored as a hidden label so any searches using a term on the company’s front that can not legally be displayed redirects to the correct similar products tagged with that competitor name.

This method brings up two important questions. Should you use skos:hiddenLabel or its equivalent in a consuming system to redirect from forbidden terms to mapped, and potentially displayable, concepts and products? Yes! Can you store and use forbidden terms in your taxonomy or other systems? Maybe! I have not researched the legality of holding concepts like this in a company’s data. While it’s clear there are concepts which can not appear anywhere on a company’s site, it’s not clear (at least to me) whether that data can exist hidden in a database. The argument could come down to intent. A company can’t control what a user types into their search box and is recorded in the search index, but they can control the results that are shown. If a company intentionally redirects based on a stored concept, is that ok? My guess is yes since so many companies do it, but it could be that it’s not worth pursuing legally if it’s not on full display. I’ll make a point of looking into the subject more.

Another method along similar lines is to store terms which should not be displayed either as individual concepts within an organization’s taxonomy or as a completely separate taxonomy of concepts which should never be displayed but can still be tagged to content. This method carries more risk as concepts which should not be published for use or are published for specific use can quickly become dissociated from their intent by consuming systems and users. One tagging mistake may reveal a hidden concept on public-facing properties. While this method makes the hidden concepts clearer to manage on the back end, it can have serious repercussions if not governed properly.

When it comes to trending topics, a more flexible approach may be warranted. Not all concepts have staying power, as is evidenced by the trending “Barbie” searches. Therefore, trying to maintain these trending, quickly-changing concepts in taxonomies is unsustainable. In cases like this, using large language models (LLMs) or text analytics may quickly identify trending concepts from multiple sources, including organic and on-site search terms, product reviews and descriptions, and social media feeds. These trending concepts can then be mapped automatically to taxonomy values using semantic similarity to concepts in the product metadata or in the product titles and descriptions. Because these models can operate more quickly on incoming data, it’s possible to act on search terms as they come in rather than spending time codifying them in taxonomies for reuse when these terms probably won’t last long as a topic. Rather than storing the concepts long-term, they are only stored in the model long enough to create mappings to products to assemble product landing pages. These logs can be purged regularly as trends change. Treating all concepts as viable ways to boost search but then being more strategic about which terms wind up as preferred or hidden labels in a taxonomy versus concepts which are essentially fleeting keywords to match trends is a more mature level of taxonomy application.

Like not being seen, not saying what you mean is a valuable skill. Ask any politician. Better, put it into practice in your searches and make some consumers happy.

Taxonomy Calling

https://pixabay.com/vectors/alien-greeting-hello-long-life-1292972/

“Hello, is it me you’re looking for? / ‘Cause I wonder where you are / And I wonder what you do / Are you somewhere feeling lonely?” – Lionel Richie, Hello

One of the most challenging activities in taxonomy work is communicating the value of taxonomy to potential business stakeholders. With so many shiny, promising technologies and methodologies, it can be daunting for the taxonomy strategist to win over converts to taxonomy use. Taxonomies and their applications are often misunderstood or are narrowly focused on a few common use cases like navigation. While business users can clearly articulate their needs, they may not be able to connect those needs to how taxonomies can be applied in the business.

The taxonomy strategist must be able to communicate the value of taxonomy while expressing the complexity of semantic structures like ontologies and their supporting technologies simply and succinctly to a variety of business stakeholders. 

Communicating the Value through Examples

To gain taxonomy users, it’s essential to communicate the value of taxonomy. One way to start is to seek out areas taxonomy can directly address, find examples of the current state problems, provide taxonomy-based solutions, and then communicate the findings to the business owners. This process can be initiated by the taxonomy strategist or by the business owners themselves, assuming, of course, they know to contact the taxonomy team in an effort to answer their need.

One simple, powerful example is to review a search-dependent organizational website–which could be an internal intranet or external, public-facing website–and collect examples of navigational and search barriers causing confusion, poor search results, or revenue-losing scenarios. For each example, provide an explanation of how taxonomy might help. For navigational issues, the taxonomy solution may be category restructuring or improved, facet-based results filtering aligned to the typical user journey. For search retrieval issues, taxonomy may be used for typeahead search keyword matching or to improve search relevance to include more accurate or additional results through content tagging or keywording. Navigation and search are often close to time-saving or profit-driving activities, improving the efficiency and bottom line of the organization. Search examples and their potential taxonomy solutions, therefore, are closer to the source of organizational revenue and make convincing use cases.

As generative AI becomes more prevalent in organizations, finding examples of general or inaccurate results and how an enterprise, domain-specific taxonomy (and ontology) can act as foundational training data to improve those results can result in convincing proof of concept projects. Generative AI and machine learning models can seem like magic to the average user who may not know the amount of time and data it takes to train a model to produce accurate and useful results. Providing examples of poor machine learning model output can illuminate the need for clean, accurate foundational data. As an organizational source of truth, taxonomies can provide such semantic data.

To overcome user confusion about what taxonomy means or clarify what they think taxonomy means, try starting with the end result and work backwards. When assessing the value of a new bathroom faucet, someone will look at whether the fixtures look appealing and if hot and cold water comes out as expected. Initially, no one is interested in the pipes. Taxonomy, unflatteringly, is the pipeline infrastructure providing clean water to downstream consuming systems. First show excellent search results or machine learning outcomes and then explain how taxonomy is the basis for those results. If business stakeholders are interested in taxonomy, all the better for your work and evangelization. If they aren’t, let them be impressed by the final state and develop a process of working together to get to and maintain that final result.

Communicating the Value through Time and ROI

One potential stakeholder hesitation may be the time it takes to perform discovery, conduct the build, and put taxonomy values into production. This process can take time in the initial business stakeholder relationship. Once established, however, the speed at which business users can request concepts and see them live can move as quickly as your organizational systems can handle. People often believe they need to “move at the speed of business”, which, ironically, they think is fast but is more often cumbersome, manual, and slow. What they want is the magical now in which thought is converted to action faster than Captain Kirk can have his shirt ripped when first confronting an alien species.

Machine learning techniques, once perfected, can offer the kind of rapid response business owners are looking for, but only after a lot of training. Specifically, a lot of training on assets and data tagged with taxonomy. Too often, the “magic” of artificial intelligence business users are sold isn’t artificial at all: it is thousands of hours of tagging content and training models to get the desired results. If done properly, there’s nothing wrong with using machine learning models to quickly react to trending topics or generate text on the fly. However, the slower growth of a taxonomy, as I cover in my blog The Taxonomy Tortoise and the ML Hare, actually creates speed in other areas, saving time in responding to consumers’ direct search queries and tagging content to train and evolve machine learning models. Communicating the need for time investment up front to generate time-savings later can be compelling.

Communicating taxonomy ROI, which I covered a few years ago in my blog for Synaptica, Running a Successful Taxonomy Campaign, can be extremely difficult. How do you explain how words become money? Again, show the examples. Mining successful and failed search results and mapping these to taxonomy as metadata tagged to assets can show a direct line between creating taxonomy concepts, applying them to content, and successful search results that end in a product purchase. Going back to time, time is money: time employees spend manually creating, tagging, and manipulating content which drives sales; time spent training machine learning models; time spent seeking information which has not been tagged with metadata. Ramping up taxonomy processes to more quickly tag content and put words into production will result in quicker time to money and realized ROI. While starting taxonomies can be slow at first, the more success the taxonomy strategist has in engaging business users, the more quickly the taxonomy is built out and covers the breadth needed to tag assets and express important concepts users are seeking.

Communicating Complexity

Communicating the nuance and complexity of taxonomies and ontologies may be necessary as the details of a pending or ongoing project develop. Few business contacts need to know the difference between a flat list, taxonomy, thesauri, or ontology. In fact, I find there are disagreements about the differences even among practitioners. That said, users can come to the discussion believing that taxonomies are only hierarchical lists of terms. For most practical discussions, I use the term “taxonomy” to include flat and hierarchical lists of terms, properties, and hierarchical and associative relationships. I rarely bother with ontology concepts like classes unless they are necessary to meet the project objectives.

If these terms do need clarification, however, I often clarify with simplicity. Taxonomies are concepts (preferred labels) that include synonyms (alternative labels) and other metadata attributes (properties) and these concepts can be related hierarchically and through custom relationships (associative relationships). When discussing ontology, I usually state that taxonomies are the words you want to use and the ontology includes the rules for the words you want to use. For example, how concepts are grouped (classes), how they can be related to each other (domain and range constrained by classes), and whether certain properties can be made available for a use case (properties constrained by classes). That’s often all a user needs to know.

There are more advanced use cases, like machine learning, which is, in my experience, more of a mapping of ideas than an education. Data scientists usually use all the same concepts as taxonomies and ontologies but may use different terms to express them. After one or two conversations, the mappings are understood and the complexity is simplified. It’s not often a data scientist needs convincing to leverage taxonomies, but getting on the same page with conceptual ideas is a good way to make taxonomy value clear.

In large organizations, there is usually information architecture complexity as well. Because of this, taxonomy can often become necessarily complex as values are consumed by and flow through various systems. Understanding this workflow is not always a prerequisite for understanding the value of taxonomy, however, and does not need to weigh down conversations with potential business stakeholders. If it does become necessary, simplify those information architecture diagrams into simple flowcharts between systems, showing at a high level how taxonomy concepts move from system to system and what they do in each.

Being a taxonomy strategist is challenging, but is a necessary part of the job for taxonomy to show and prove its value in the organization.

Tempering Your “No Filter” Taxonomy

https://pixabay.com/photos/color-black-white-filter-hand-2246333/

“Do what they say / Say what they mean / One thing leads to another.” – One Thing Leads to Another, The Fixx

Are your consumers overwhelmed by a barrage of terms and phrases coming at them from your taxonomy? Are they at a loss when faced by seemingly endless and contextless values presented in a taxonomy-fed typeahead field? How can you ask your taxonomy to manner up and find its filter? 

Taxonomists love their taxonomies. They don’t want to see the progeny they’ve so lovingly curated act inappropriately, particularly when it comes to throwing out values which don’t have context or are incorrect for the tasks they are intended to support. As taxonomies mature, they grow both in depth and breadth, becoming more refined and nuanced while also covering more topics and use cases. Building meaningful taxonomies with clear semantic hierarchies is the desired goal when developing enterprise knowledge models for delivery to consuming systems. As an organization’s use cases grow more numerous, however, the complexity involved in delivery can often roll back into taxonomies, forcing compromises and workarounds which further complicate future scalability and context.

I touched on this conundrum briefly in my January blog, “Polyhierarchy and the Dissolution of Meaning” in which I discuss separating semantic taxonomies from navigational and information access taxonomies consumed by downstream systems. It’s rare that consumers want every value from every taxonomy built as part of an overall enterprise ontology. Separating purely semantic taxonomies from the structures used for delivery can allow for several transformations which may not be possible in systems not designed for taxonomy management. For instance, allowing for logical navigational hierarchies which don’t follow meaningful “is a” taxonomy building principles.

Separating taxonomies may not be necessary, however, if your taxonomy and ontology system, a transformation layer, or the consuming systems can filter or restructure taxonomy values into new hierarchies which are context appropriate. Following are several use cases I’ve heard which may require this kind of transformation.

A Little Bit of Everything from Everywhere

In this scenario, the consumer needs to combine many different types of concepts into a new, usually smaller taxonomy for use in tagging or navigation. For example, they want either individual concepts or branches from different and mutually exclusive domains recombined into a new, single taxonomy for something like a front-end or app experience. Keeping track of all these different values when potentially scattered across numerous taxonomies at different hierarchical depths can be very difficult for taxonomists to visualize and maintain. In essence, they need to see a deliverable sub-taxonomy within the larger taxonomies they have created.

One way to solve this is to use skos:Collection as a label which can be applied to individual or hierarchical concepts in order to group them for delivery. SKOS collections support grouping, but they don’t support hierarchy, so the delivered concepts may be presented as flat lists in dropdown or typeahead fields. Often, the retention of hierarchy loses meaning when consumed from collections anyway as the collection label can theoretically be applied to any concept at any level, making the retention of hierarchy difficult or even nonsensical.

Adults Only!

Another fairly common use case is the consumer needing parent terms but not the children. Or, just as common, some but not all of the children or selected children at different levels of the hierarchy for a skipped-level hierarchical delivery.

In this scenario, it could be possible to use a designated property for each use case if the taxonomy management system APIs allows for filtering by a property or relationship. A property will likely be a Boolean flag to indicate it is either true or false for a use case delivery and end consumers can choose to filter out all Boolean values set to false. A relationship could be associative or hierarchical in order to create a hierarchy in which a parent has one set of children for one broader/narrower relationship and another set of children for another broader/narrower relationship.

The downsides to both of these solutions is both maintenance and functional feasibility. Can taxonomists develop and maintain properties and relationships for each different use case and make sense of them in the UI? Can the taxonomy tool represent hierarchies in which one parent term has different child terms with different broader/narrower relationships, especially if the hierarchical parent-child chain is a mixture of different broader/narrower relationships? In these cases, an associative relationship which stands in place of the broader/narrower relationship may work better in the system even if it doesn’t feed the concepts as parent-child to consuming systems.

Call Security!

What about concepts which need to be delivered to internal consumers but not to external consumers or partner organizations? In this use case, there may be much more at stake than user experience and presentation. If the wrong values are delivered to the wrong customers, there could be legal ramifications or new product releases could be leaked ahead of schedule. In this case, using a Boolean flag may enforce what values are sent to which consumers. Similarly, a status field in the system may be even better. Concepts in a “candidate” status should never be published to downstream systems until they go through a workflow in which they move through stages like “reviewed”, “approved”, and “published”. Similarly, having a “deprecated” flag or status can keep concepts which should no longer be used from flowing downstream to systems in which they can be tagged to assets or exposed in user experiences.

I Know You Are but What Am I?

Probably one of the most common delivery use cases is the “everybody is special” problem. In this case, consumers are steadfast in their ambiguous use of a concept and that its meaning must mean different things to different people depending on the context. In good taxonomy practice, we would opt to disambiguate homographs or other contextually dependent concepts. For example, “mercury (metal)”, “mercury (planet)”, “Mercury (god)”, etc. For navigational purposes, users won’t want to see these qualifiers as we use them in taxonomy practice. In these cases, context frequently provides the meaning needed at the time. If we go to a website and search for or choose “Brazil” as a location, this means the country. If we go to a website and search for or choose “Brazil” as a filter during World Cup, we mean the Brazil National Football Team. If your taxonomies cannot separate “Brazil (country)” and “Brazil (football team)”, this built-in ambiguity is going to cause major problems across your internal data systems.

If the system functionality permits, using proper taxonomy practices and disambiguating concepts in the semantic taxonomies and then stripping off these parenthetical qualifiers based on use case can maintain both meanings while allowing for front-end ambiguity. Each term will have its own full label and, more importantly, URI, to tell us which was the country and which was the team. Ideally, the taxonomists may be able to influence internal business requesters in a direction which will not require this. “Brazil” is a country only and “Brazil National Football Team” is a football team only. But practitioners know it’s not that easy. We can either allow the ambiguity and deal with the consequences later or we can get clever about filtering taxonomies so we can deliver the concept consumers want without sacrificing the semantics behind the scenes.

In my experience, which may be limited by the tools I’ve used and the organizations at which I’ve worked, having the ability to create this amount of taxonomy restructuring flexibility can be very difficult with a taxonomy management system alone and requires negotiations with intermediary transformation layers or consuming system owners to build out the functionality to enable more flexibility and scalability in taxonomy modeling.

The Taxonomy Tortoise and the ML Hare

https://pixabay.com/illustrations/aesops-fable-tortoise-and-the-hare-6570775/

“I knew I shoulda’ taken that left turn at Albuquerque.” – Bugs Bunny

For better or worse, much of my childhood was informed by Looney Tunes, Monty Python, and a diet of science fiction ranging from the profound to the disjointedly camp. As such, I expect the absurd and am wildly skeptical of easy answers. Additionally, my foundation of science fiction books and films compels me to speculate that artificial intelligence will become a more realistic probability in our lives with actions ranging from locking us out of airlocks and starting global thermonuclear war to providing answers to our most pressing global problems.

The long-promised advantages of artificial intelligence seem finally to be reaching a point at which they can be utilized for enterprise purposes, including parsing, and even understanding, large amounts of text and data at rapid speed. The recent successes beg the question that if machine learning models can operate on data at high volume and velocity, then why shouldn’t they be used to come up with answers on the fly based on large amounts of data internal or external to an organization? Well, in fact, they already are, and, in my opinion, they should, but not without some acknowledgment of absurdity and a certain degree of skepticism.

I’m a firm believer in defining semantic models in the form of taxonomies and ontologies to be used as a foundational schema for an organization’s data. One of the arguments against investment in taxonomies is the time it takes to create them and the amount of maintenance they require to sustain them. In a world in which what is trending changes frequently, user tastes are fickle, and the jargon associated with these trends passes quickly, the desire to avoid the tortoise-like pace of building taxonomies in lieu of utilizing other, faster technologies is tempting. But, as the hare who lost the race to the tortoise laments, “I knew I shoulda’ taken that left turn at Albuquerque.” Or, let’s consider checking the map before we go racing off in the wrong direction.

Let’s talk semantics. Putting it simply, ontologies are semantic structures which define one or more domains. They describe the types of things in the domain (classes), how these things can relate to each other (relationships, predicates, or edges), what labeled fields are used to describe these things (properties), and the instances of things (subjects, objects, or, more plainly, taxonomy concepts). Ontologies describing the general domain and taxonomies including the specific instances within one or more domains can be created as a map of your organization. These semantic structures represent the organization in all of its complexity. They specify the concepts important to the company and how these concepts relate to each other, data, and content. Once data or content is added, we can call this entire structure a knowledge graph.

In short, ontologies, taxonomies, and content are the organization’s view of itself, the world, and where it lives in it.

Large language models (LLMs) have the ability to generate text, answer natural language questions, and classify content. Most publicly available LLMs, like ChatGPT, are trained on publicly available information. It is also possible to supply these LLMs with your own training sets of documents and language samples to develop answers more applicable to your own organization. Wisely, many organizations tightly control what information can be presented to these AI tools to avoid company information leaks or supplying competitors with proprietary information.

What’s lacking in using these hare-rapid models, however, is the organizational perspective. They are very good at answering general questions and making factual assertions from text, but they require tailored training content with specific use cases in mind to generate answers specific to an organization’s needs. There can be a temptation to feed one of these models a large quantity of organizational content to train them faster. However, the span of topics, language, jargon, and acronyms used in an organization can yield unsatisfying or unpredictable results. Imagine, if you will, the amount and variety of content in any one of your company’s content management systems. Now imagine asking a machine learning model to analyze and make sense of it all without guidance. You can index all of your own content, but without a framework, what sense does it make?

At this moment, the hare and the tortoise must strike a deal if they both want to win. To improve the performance of LLMs and other machine learning models, a domain topology specific to your organization defining the concepts, their synonyms and acronyms, and how they relate to each other, can be used as a schema input into the model. Semantic models are, after all, assertions in the form of triple statements (subject-predicate-object). Ontologies establish factual statements as determined by your organization’s use cases and, hence, provide patterns which can be used by machine learning models. Lexical proximity can be gathered from taxonomy hierarchies (these concepts are more closely related because they share a parent-child relationship) and associative relationships (these concepts, separated across several taxonomies, are actually very closely related because they have a direct associative relationship between them). Semantic models provide factual statements, built slowly over time based on business use cases, which can augment and improve LLMs.

Not only can we think of semantic models as a collection of factual statements according to your organizational domain and use cases, we can also think of it as a summary, requiring the LLM ingest a lot less information to reach the same factual conclusion. For example, you can provide the model with a huge amount of training data stating that a particular SKU-level product is available in the color blue. If this is a factual assertion in your semantic models (Product name has color Blue), however, then this fact can be tagged to a single product representation in a database and in turn is applied to thousands of real-world SKU instances. Semantic models are a distilling and modeling of thousands of instances of truths across an organization and summarized into a collection of ontology structural elements and taxonomic instances. Citing a joke by Steven Wright, in which the comic tells us he has a map of the United States which is actual size, your organizational map can be represented in a much smaller scale.

Yes, it’s certainly true that given large amounts of data, machine learning models or text analytics can identify all kinds of important concepts. These concepts (and fact assertions between concepts) can be a great pipeline to feed into taxonomy and ontology construction. I am skeptical of machine learning models generating taxonomies and ontologies based on organizational data and content unless there is heavy human-in-the-loop curation to reconcile those absurdities which I believe inevitably creep in. And, yes, it’s certainly true that this curation is potentially at a tortoise pace, but once these concepts and assertions are built into semantic models, the ongoing maintenance and governance demands less time and effort.

Those slow semantic model builds enable fast-moving machine learning models and LLMs to be grounded in organizational truths, allowing for expansion, augmentation, and question-answering at a much faster pace but backed with foundational truths as asserted by your organization.

Be the tortoise first and foremost and the hare will follow.