Home » Posts tagged 'machine learning'

Tag Archives: machine learning

The AI-Taxonomy Disconnect

01/25/2026 16:22 / Leave a comment

https://pixabay.com/photos/telephone-contact-us-call-contact-4440525/

“Me, I disconnect from you.” – Gary Numan, Me! I Disconnect from You

I have seen a significant decline in the number of taxonomist positions available. Wondering if I was in a myopic bubble based on my location or choice of job boards, I’ve been asking colleagues if they see the same thing. The general consensus is, yes, there are fewer taxonomist jobs available even as the importance of foundational data structures are becoming more important with AI tools. I have seen an increase, or at least a steady availability of, ontologist jobs, many requiring more technical expertise (the ability to create Python data ingestion pipelines and retrieval augmented generation (RAG) systems) than I have seen in the past. Again, this may be a matter of where I live or where I am seeking jobs, but it seems jobs in the taxonomy and ontology field are becoming more technical and less focused on the business operations side of working with stakeholders to provide analysis and guidance on semantic frameworks.

I suspect this shift is driven by a wider adoption of AI tools and a contraction in the job market. Employers seem to be seeking a single resource to do both the business and technical sides of implementing semantic structures and the foundational components for building out applications. I wonder if another factor is also the belief that large language models (LLMs) are a replacement for semantic models. If true, there are several reasons I can see that may be driving these beliefs.

Speed to Business

As I have touched upon before in my blog (Friction and Complexity and The Taxonomy Tortoise and the ML Hare), the manual, slower, but deliberate curation of controlled vocabularies can be seen as a roadblock to business speed and agility. There are several valid points here.

One area of pushback I see frequently in organizations is the response time from initially requesting a new concept, taxonomy branch, or vocabulary until it is available in production. The ownership of the concepts and data is, to some degree, taken out of the hands of the business users and put under the control of taxonomists who incorporate these concepts into centralized semantic models (taxonomies, thesauri, ontologies). Even with governance models including service level agreements stating turnaround times and taxonomy availability, not every group in the organization is going to see this centralized service as a benefit. Rather than wait for enterprise-level support, stakeholders may develop workarounds in services and tools to support their own use cases. As we know, this decentralizing of schemas and tools creates a fragmented landscape of differing terminology, functional support in what tools have available to manage taxonomies, and processes in the way data and content is handled and tagged with metadata.

Similarly, enterprise taxonomists serving many areas of the organization may seem to be too domain agnostic to serve the variety of use cases served by controlled vocabularies. While taxonomists do not need to be a domain expert in the areas covered by semantic models, the perception may be that their time spent ramping up in a domain would be better served having the subject matter experts in those domains build their own models. There is some validity here if the domains are truly standalone and operate to serve only those domain use cases. However, tying together various domain areas to form enterprise-wide knowledge graphs seems to be the direction most organizations want to go. If that’s the case, then a centralized taxonomy team as a service to the entire enterprise makes a lot of sense.

Given these counterpoints to slowly developing semantic models, why not, one may ask, simply ask machine learning models to provide the schemes we need to organize and optimize information?

Words Are Words Are Worlds

Are LLMs seen as a replacement, or at least a viable alternative, to taxonomies (using the term as a broad umbrella for all controlled vocabularies?

A large language model (LLM) is a language model trained with self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially language generation. The largest and most capable LLMs are generative pre-trained transformers (GPTs) and provide the core capabilities of modern chatbots. LLMs can be fine-tuned for specific tasks or guided by prompt engineering. These models acquire predictive power regarding syntax, semantics, and ontologiesinherent in human language corpora, but they also inherit inaccuracies and biases present in the data they are trained on (Wikipedia).

Certainly building your own language model reflecting all of the ways that natural language queries could be asked is non-starter when there are clearly highly performing, public LLMs available at our fingertips. The chatbots ChatGPT, Google Gemini, and Microsoft Copilot are some of the more familiar tools based on foundational LLMs. One of the primary, and arguably most successful, use cases for these tools is language generation. When prompted, a chatbot can generate text, format this text based on instructions or examples, and produce a slick product which, short of a quick review to ensure the content checks out, is ready to go. These LLMs are based on a “vast amount of text”; in short, a lot more text than you will be able to provide to train a whole model.

What LLMs are missing, however, is your context. There are many, fairly easy, methods for providing them context. You can allow them access to your documents, at work or at home, so the chatbot can “see” the content of your documents, the structures, and even the writing style. That is very much your context, allowing the chatbot to compose in a way that reflects your home or work artifacts and generate new text combining the existing LLM with your specific context. Using additional context moves an LLM from generic to specific, from a wider world of words to your world of words.

At enterprise scale, the same method applies. If your organization is large, has a lot of public exposure and presence, and has a history of public interaction on the Internet and in social media, LLMs are going to know a lot about you even without additional context. I think, however, for the same reason that the market for reusable, one-size-fits-all taxonomies has never taken off is that every organization—whether true or not—is unique and special. Outside of certain industries like healthcare, life sciences, pharmaceuticals, and finance which have well-defined, and often extremely complex, ontologies they can adopt, other industries or functions within a company do not. In my experience, marketing is a great example. Despite the common needs, I have never seen a marketing department adopt any public standard. They build from scratch even when a majority of the taxonomies are commonly available terms.

In these cases, building taxonomies and ontologies to add context specific to your organization provides LLMs with both words and structures modeling the world of your domain. In fact, it is becoming more common that the development of these taxonomies is being done with human-chatbot interaction. A taxonomist can provide glossaries, metadata schemas output in spreadsheets, and documents to provide chatbots with the raw materials to extract entities, cluster topics, compare values across documents, and other processes that once required text analytics tools and, sometimes, human intervention in the form of rule writing. The speed to taxonomy and ontology development is increasing. Like other iterative feedback processes, the taxonomist and LLM work together to create domain schemas in the form of taxonomies and ontologies that provide additional guidance to future “manual” and automated processes with LLMs in the mix.

Pure speculation on my part, but is the niche and sometimes still esoteric and obscure field of taxonomy and ontology design being replaced, for better or for worse, with LLM use? Or, more specifically, is it being viewed as a replacement for taxonomy and ontology building and the experts who do it? As I stated above, I have seen a shift from the business of taxonomies to the technology of ontologies.

I Disconnect from You

In my opinion, the roles of a taxonomist and of a technically skilled ontologist are still separate. While many people in the industry have the skills to do the work of both, the paths to the two roles have been different. Many taxonomists have library science degrees. They likely have technical skills, but are more focused on the information science aspect of taxonomy and ontology development, interfacing directly with the business and providing other services, such as research, business analysis, and support for use cases relying on semantic models. Ontologists are typically computer scientists who can code and develop the technical infrastructure for ontologies. Finding resources who can, or who like, to do both has not been common. This may be changing. Certainly the roles are asking for both, with, in my view, a leaning toward the technical.

Again, speculatively, is there a shift toward more technical resources in support of rising AI use in organizations? Is there a move to cut out the intermediary roles of taxonomists to let the business owners and technical implementers of taxonomies and ontologies interface more directly? If so, what do organizations lose in the process?

The Connect

If I were reading this blog, I would think the author was trying to sell you the value of taxonomists with more of the “soft” skills of research, business stakeholder interaction, and translation of business requirements into taxonomies and to the more technical resources who support their implementation and move to actionable production. After catching up on a few seasons of Landman and hearing in my head at some point in every episode, “Brought to you by [insert name of large oil company]”, maybe I’m a little sensitive to reading between the lines. You don’t have to. I am selling you on the value of taxonomists for all of the reasons I’ve listed above. If there is indeed a shift from the business skills of taxonomists to the technical skills of an ontologist instead of having the excellent skills of both, then your organization is missing out on a valuable resource who can work alongside AI technologies to bring the business requirements and practical domain building skills to bear.

In summary, believe there are two necessary components to bridge the seeming disconnect between AI and the foundational data quality governance needed to make AI operational:

A taxonomist who can
1. build taxonomies and ontologies to create domain-specific semantic models representing the business,
2. provide business analysis and requirements for technical implementation, and
3. be the human-in-the-loop working with AI tools to continue building out, expanding, and governing semantic models, and
Technical engineers who can
1. operationalize ontologies by building data pipelines to AI tools, and
2. focus on the engineering aspects of sharing out and productionizing ontologies for use across the enterprise.

An enterprise with both of these roles isn’t creating unnecessary resource overhead or additional layers effectively slowing the path from automation to implementation; rather, the two roles work in harmony to clarify requirements and optimize the use of AI in a variety of applications meeting business use cases. Taxonomists and ontologists are bridges between the business and the technical implementation enabling business needs.

KMWorld 2025 Themes and Trends

11/27/2025 12:34 / Leave a comment

https://pixabay.com/illustrations/ai-generated-creature-animal-8407601/

“But I’m a creep / I’m a weirdo / What the hell am I doin’ here? / I don’t belong here.” – Radiohead, Creep

I attended the KMWorld Conference and two of the four co-located sub-conferences, Taxonomy Boot Camp and Enterprise AI World, in Washington, D.C. last week. It was the 20th Anniversary of Taxonomy Boot Camp, and although I raised my hand when asked if I had attended 10 or more, I had never actually bothered to count how many years I’ve been to the conference. It turns out, I have attended KMWorld and the co-located conferences for 13 years out of the last 18, with my first being the 2008 Enterprise Search Summit West when it was still hosted on the West Coast in San Jose, California. Over this span of time, I’ve seen trends rise and fall, fears realized and alleviated, promises promised and promises kept.

Every year there seems to be at least one theme I can carry away from the conference, and for several years, I’ve written about the themes and trends from the conference as I see them. Some of these blogs are hosted on the Synaptica website and on my personal blog at Information Panopticon. I will go into some of those themes and trends as I did last month in my recap of the Henry Stewart Semantic Data Conference and HS DAM.

After many discussions at the event with colleagues in the industry, I can confidently state there was one overarching theme: things are weird.

Things Are Weird

What’s weird in 2025? Seemingly, everything. While the wider world may be as weird as it’s ever been, the state of knowledge management, taxonomy, and related fields covered by the conference have been supremely weirded and mostly by AI. There has been a rapid shift from feelings of dread that AI is coming for all of our jobs to an embracing acceptance that those who use AI as a work tool will maintain their place, or even excel, in the industry.

Despite this acceptance, the industry is still weird. In the world of taxonomy, we all understand and believe that AI and its need for clean, curated, semantic data will necessitate good taxonomists and taxonomy systems. However, while the bottom hasn’t completely fallen out of the market, there seem to be fewer jobs, more unemployed taxonomists, and lower salary and contract rates. Maybe it’s the generalized uncertainty in the tech sector driving the slump, but many people I spoke with at the conference think that it’s a misunderstanding of AI foundational needs that’s causing companies to lay off taxonomists and invest in AI more heavily.

What else is weird is how excited everyone is about AI but how few companies have managed to successfully bake AI practices into their processes. We’ve seen the incredibly successful use of AI in text summarization and generation, particularly when it comes to meeting minutes and the identification of who said what and what action items were decided upon. Agentic AI has had some success, and taxonomists seem to be working at skilling up in prompt engineering using already existing taxonomies and ontologies and generating new taxonomy options from generative AI prompts.

What was very weird for me is when you hear people in taxonomy talking about the auto-generation of taxonomies who (including myself) for years said that the results of autogenerated taxonomies were mediocre at best, you know there’s been a seismic shift in the industry.

AI in Everything, Everywhere, all at Once

Like every other conference I’ve attended this year, the focus is on AI. Something that was different this year, in my opinion, was the amount of more practical applications and use cases that were presented. While AI still hasn’t found its stride when it comes to maximizing ROI in a variety of use cases, there were very informative sessions on what skills knowledge management and taxonomy specialists need to master in order to work in an AI world. I have seen this reflected in the job market. There are still taxonomy and ontology roles being posted, but there are many more requiring technical skills like Python as part of the role. In essence, companies seem to be looking for people with both the business skills to build and manage semantic models as well as the technical skills to implement them in AI applications. While I know these people exist, I don’t know how common they are. People at the conference felt that there will always be a need for taxonomists who do not have deep technical skills, but an acknowledgement that getting skilled up in prompt engineering is a differentiator.

Several presentations in both the KMWorld and Taxonomy Boot Camp conferences focused on AI readiness. What does an organization need to appropriately identify AI use cases, guardrail the data within the organization so only what is appropriate is available to AI tools, and determine how humans fit into the process? Since generative AI can extract and generate metadata candidate concepts, it’s important to understand the results to interpret their trustworthiness and value. Many of these concepts are net new and should be added to taxonomies as concepts, properties, or relationships. Other values are net new but are ephemeral or trending and can be used in search, for example, but aren’t yet established enough to be codified in semantic models. Other concepts will already be in the taxonomy and can be discarded unless they are used to inform which content includes taxonomy values and should be tagged appropriately. Finally, some generated concepts are garbage or nonsense and should be used to inform and improve the machine learning model. All of these questions need to be answered and processes decided upon and put into practice.

Another common theme was the use of taxonomies and ontologies to inform and refine prompts and to support Retrieval Augmented Generation (RAG) with or without labeled content as part of a knowledge graph. Taxonomies and ontologies supply the domain specificity required to augment large language models (LLMs) to inform them of the specifics of the organizational perception. In short, using internal semantic models is a shortcut to providing the book of knowledge about the organization. What was once a topic in only a few presentations seemed to be much more prevalent this year.

As I mentioned earlier, organizations are now in the phase between conducting proof of concepts on AI in the organization and the need to productionize and provide ROI on AI in organizational workflows and processes. If the AI bubble is stretching, the continued low adoption and ability to prove ROI for organizations will cause the bubble to burst. The AI industry will only continue to grow and expand if there are tangible benefits increasing the bottom line of organizations who have adopted the practices and scaled them into production.

Taxonomy, Data Governance…and Where Is the Data?

At least two presentations focused on the intersection of semantics and data governance. These both stood out to me because I have experienced the intersection of semantic models and highly governed data in data lakes myself as there are frequent conversations about who owns which data and what is the ultimate source of truth. How do internally developed taxonomies and ontologies fit into the overall data governance model? Who owns which values in which systems? How are the practices similar and different?

In his keynote, Malcolm Hawker simplified two key aspects in the different disciplines. In data governance, one of the main concerns is accuracy and precision for measurement. In taxonomy, the focus is on meaning, or semantics. Bridging the two disciplines is the bringing together of measurement and meaning. AI is a motivating factor in reconciling the two disciplines due to the need for clean, structured data and its use in tagging unstructured content. Although the approaches to relational database focused data management and graph database semantic data are different, they are complimentary functions ensuring the best data quality in an organization.

Beyond who creates and owns the data and in which system it lives, organizations may be trending towards moving back to “walled gardens” of data to use in AI. The difference between public tools like ChatGPT and using LLMs within an organization puts the emphasis on what data is shared with the model and which users can see it. Since many AI tools use publicly available information on the Internet, organizations may protect their data and bring more of it in house so it is not available to such tools externally. Similarly, the concept of having a sovereign cloud, in which data is housed in the cloud but designed to meet regional data requirements and regulations, is probably gaining traction. Where there have been so many efforts to make data and content freely available, there is also recognition of which data should be available to machine learning models both to produce reasonable outcomes and to protect organizations and individuals. As states, provinces, countries, and regions pass additional laws about data, privacy, and AI, parsing out which data is subject to which regulations is becoming more difficult. Separating data in sovereign cloud environments may be a key to addressing this challenge.

In Summary

Things are weird, but the foundational value of semantics has not disappeared. The focus has changed and the toolsets we use to create and productionize semantic models have expanded, but the need for clean, accurate, source of truth metadata has only grown in importance.

KMWorld Themes and Trends

11/27/2024 16:54 / Leave a comment

https://pixabay.com/illustrations/cyborg-hand-human-robot-contact-9091910/

“At a certain level of machine-ness, of immersion in virtual machinery, there is no more distinction between man and machine.” – Jean Baudrillard, Violence of the Virtual and Integral Reality

I attended the KMWorld and co-located conferences held in Washington, D.C., November 18th – 21st. Across the conferences and sessions that I attended and the conversations between sessions, there were several themes which resonated. I’ll detail some of those themes and highlight any talks in particular I thought inspired or captured those trends. I couldn’t attend everything, so please forgive me if I’ve overlooked great talks, of which there were many.

Taxonomies as an Enterprise Strategy

The importance of taxonomies and ontologies as a foundational business and technical program was particularly evident at Taxonomy Boot Camp, but the theme also came up in talks in the other conferences as well. I presented on the topic in my session, “Stand Still Like the Hummingbird: Enterprise Taxonomy Strategy When Nothing Stands Still”. Likewise, Thomas Stilling in his Monday keynote, “Be the Change: Your Taxonomy Expertise Can Help Drive Organizational Transformation”, emphasized metadata as an organizational strategic asset and discussed how to position taxonomy to align with strategic objectives. Similarly, Lindsay Pettai’s session, “Empowering Your Enterprise With a Dynamic Taxonomy Program”, discussed the importance of having an enterprise taxonomy program.

These are just a few examples. Throughout the conferences, the notion that taxonomies and ontologies provide structured, source of truth values for navigation, search, and content tagging was prevalent. The main theme, however, was how taxonomies and ontologies are critical to the success of machine learning (ML) model training and program success. In talk after talk, artificial intelligence (AI) and ML were top topics and various ways to approach projects to make them viable and successful included what needed to be in place. Taxonomies and ontologies were among those foundational requirements.

As a 16-year veteran of KMWorld associated conferences–my first being Enterprise Search Summit West in 2008–I have seen the more recent embrace and understanding of enterprise taxonomies and ontologies beyond simple use cases like navigation. Even just a few years ago, taxonomies seemed to be misunderstood or undervalued by many conference attendees. Now, talk of their use as enablers of AI was ubiquitous.

Artificial Intelligence Is Here to Stay

I don’t believe I saw a talk which didn’t include the keywords “artificial intelligence”, “machine learning”, “AI”, or “ML”. I was unable to attend KMWorld last year, but in years past, any conversation on the role of AI/ML was frequently met with scoffs, eyerolls, or fear. In KMWorld’s Thursday Keynote, KM, Experts & AI: Learning From KM Leaders, Kim Glover and Cindy Hubert captured something important when they said that AI is driving a lot of emotions. Some of these emotions are excitement, skepticism, and fear. Addressing these emotions is going to be essential in addressing the technologies.

The overarching theme is that AI/ML is here to stay, so what are you going to do about it? Continue to reject it and fall behind the curve? Embrace it without question? Embracing ML models, including LLMs, with guidance and guardrails seemed to be the message most were conveying. While many organizations are still in the proof of concept (PoC) project stage, adoption seems to have already arrived or is pending, so get ready to embrace the change to be successful.

The exponential growth in ML has been fueled by cheaper storage, faster technologies, more robust and accessible models such as ChatGPT, and built-in technologies like Microsoft Copilot. Not only are these tools more accessible, they are more accurate than they have ever been in the past…for the right use cases. And this was another big takeaway: AI/ML is here to stay, but use these technologies for the right use cases. Document summarization and generative AI text generation are big winners while tools used for critical decision making are still improving and must be approached with caution.

Human-in-the-Loop, Cyborg, or Sigh…Borg?

Overwhelmingly across sessions, the consensus was the importance of the human in the use of AI/ML technologies. For both the input and the output, to avoid garbage in and garbage out, human beings need to curate and review ML training sets and the information that an ML model outputs. As noted above, document summarization may be low risk depending on the industry, but decision-making in areas like law and medicine are high-risk and require humans-in-the-loop.

In his session, Evolving KM: AI, Agile, & Answers, the inestimable Dave Snowden discussed storytelling–or, rather, talked about storytelling through the art of storytelling–as an intimately human activity. He noted the deeply contextual nature of human knowledge and the need for knowledge management to generate knowledge rather than store it as codified data. If knowledge can not be fully codified as data, therefore, it is difficult or impossible to transfer that level of human knowing into machines and their models. The connectivity to knowledge and deep metaphor was evident in my three takeaways from his session: his notion of a tri-opticon integrating knowledge between three areas of an organization and the connection to Welsh Triads; forming teams or roles of threes, fives, or sevens, Welsh Rugby, and the number of people on a rugby sevens team; and assemblages, mycorrhizae, and rhizomatic networks in Deleuze and Guattari’s, “A Thousand Plateaus”. These kinds of connections are weak use cases for machine learning and emphasize the importance of human knowledge and inclusion in AI technologies.

And, speaking of human knowledge, of course knowledge retention and transfer were key topics in an era when both age and work-from-home job mobility are creating more job turnover than ever. While human knowledge is difficult to capture, using AI agents to capture knowledge and ML models to parse textual information will assist in ramping up the sheer scale and increasing pace necessary to retain and transfer knowledge.

The human-in-the-loop, the cyborg, and, sigh…Borg, are all going to rely on knowledge and ethics in order to create a human-machine interactive paradigm. If humans don’t use our own ethical boundaries to curate machine content, then bad data, and bad ethics, will spread throughout our information systems.

While there was some fear and loathing in D.C., there was a stronger current of hope, optimism, and curiosity in the many ways we can use taxonomies and ontologies, AI/ML, and human knowledge management together to guide us toward a brighter future of technology use.

The Taxonomy Tortoise and the ML Hare

03/26/2024 12:27 / 5 Comments on The Taxonomy Tortoise and the ML Hare

https://pixabay.com/illustrations/aesops-fable-tortoise-and-the-hare-6570775/

“I knew I shoulda’ taken that left turn at Albuquerque.” – Bugs Bunny

For better or worse, much of my childhood was informed by Looney Tunes, Monty Python, and a diet of science fiction ranging from the profound to the disjointedly camp. As such, I expect the absurd and am wildly skeptical of easy answers. Additionally, my foundation of science fiction books and films compels me to speculate that artificial intelligence will become a more realistic probability in our lives with actions ranging from locking us out of airlocks and starting global thermonuclear war to providing answers to our most pressing global problems.

The long-promised advantages of artificial intelligence seem finally to be reaching a point at which they can be utilized for enterprise purposes, including parsing, and even understanding, large amounts of text and data at rapid speed. The recent successes beg the question that if machine learning models can operate on data at high volume and velocity, then why shouldn’t they be used to come up with answers on the fly based on large amounts of data internal or external to an organization? Well, in fact, they already are, and, in my opinion, they should, but not without some acknowledgment of absurdity and a certain degree of skepticism.

I’m a firm believer in defining semantic models in the form of taxonomies and ontologies to be used as a foundational schema for an organization’s data. One of the arguments against investment in taxonomies is the time it takes to create them and the amount of maintenance they require to sustain them. In a world in which what is trending changes frequently, user tastes are fickle, and the jargon associated with these trends passes quickly, the desire to avoid the tortoise-like pace of building taxonomies in lieu of utilizing other, faster technologies is tempting. But, as the hare who lost the race to the tortoise laments, “I knew I shoulda’ taken that left turn at Albuquerque.” Or, let’s consider checking the map before we go racing off in the wrong direction.

Let’s talk semantics. Putting it simply, ontologies are semantic structures which define one or more domains. They describe the types of things in the domain (classes), how these things can relate to each other (relationships, predicates, or edges), what labeled fields are used to describe these things (properties), and the instances of things (subjects, objects, or, more plainly, taxonomy concepts). Ontologies describing the general domain and taxonomies including the specific instances within one or more domains can be created as a map of your organization. These semantic structures represent the organization in all of its complexity. They specify the concepts important to the company and how these concepts relate to each other, data, and content. Once data or content is added, we can call this entire structure a knowledge graph.

In short, ontologies, taxonomies, and content are the organization’s view of itself, the world, and where it lives in it.

Large language models (LLMs) have the ability to generate text, answer natural language questions, and classify content. Most publicly available LLMs, like ChatGPT, are trained on publicly available information. It is also possible to supply these LLMs with your own training sets of documents and language samples to develop answers more applicable to your own organization. Wisely, many organizations tightly control what information can be presented to these AI tools to avoid company information leaks or supplying competitors with proprietary information.

What’s lacking in using these hare-rapid models, however, is the organizational perspective. They are very good at answering general questions and making factual assertions from text, but they require tailored training content with specific use cases in mind to generate answers specific to an organization’s needs. There can be a temptation to feed one of these models a large quantity of organizational content to train them faster. However, the span of topics, language, jargon, and acronyms used in an organization can yield unsatisfying or unpredictable results. Imagine, if you will, the amount and variety of content in any one of your company’s content management systems. Now imagine asking a machine learning model to analyze and make sense of it all without guidance. You can index all of your own content, but without a framework, what sense does it make?

At this moment, the hare and the tortoise must strike a deal if they both want to win. To improve the performance of LLMs and other machine learning models, a domain topology specific to your organization defining the concepts, their synonyms and acronyms, and how they relate to each other, can be used as a schema input into the model. Semantic models are, after all, assertions in the form of triple statements (subject-predicate-object). Ontologies establish factual statements as determined by your organization’s use cases and, hence, provide patterns which can be used by machine learning models. Lexical proximity can be gathered from taxonomy hierarchies (these concepts are more closely related because they share a parent-child relationship) and associative relationships (these concepts, separated across several taxonomies, are actually very closely related because they have a direct associative relationship between them). Semantic models provide factual statements, built slowly over time based on business use cases, which can augment and improve LLMs.

Not only can we think of semantic models as a collection of factual statements according to your organizational domain and use cases, we can also think of it as a summary, requiring the LLM ingest a lot less information to reach the same factual conclusion. For example, you can provide the model with a huge amount of training data stating that a particular SKU-level product is available in the color blue. If this is a factual assertion in your semantic models (Product name has color Blue), however, then this fact can be tagged to a single product representation in a database and in turn is applied to thousands of real-world SKU instances. Semantic models are a distilling and modeling of thousands of instances of truths across an organization and summarized into a collection of ontology structural elements and taxonomic instances. Citing a joke by Steven Wright, in which the comic tells us he has a map of the United States which is actual size, your organizational map can be represented in a much smaller scale.

Yes, it’s certainly true that given large amounts of data, machine learning models or text analytics can identify all kinds of important concepts. These concepts (and fact assertions between concepts) can be a great pipeline to feed into taxonomy and ontology construction. I am skeptical of machine learning models generating taxonomies and ontologies based on organizational data and content unless there is heavy human-in-the-loop curation to reconcile those absurdities which I believe inevitably creep in. And, yes, it’s certainly true that this curation is potentially at a tortoise pace, but once these concepts and assertions are built into semantic models, the ongoing maintenance and governance demands less time and effort.

Those slow semantic model builds enable fast-moving machine learning models and LLMs to be grounded in organizational truths, allowing for expansion, augmentation, and question-answering at a much faster pace but backed with foundational truths as asserted by your organization.

Be the tortoise first and foremost and the hare will follow.