Information Panopticon

Home » Posts tagged 'machine learning'

Tag Archives: machine learning

KMWorld 2025 Themes and Trends

https://pixabay.com/illustrations/ai-generated-creature-animal-8407601/

“But I’m a creep / I’m a weirdo / What the hell am I doin’ here? / I don’t belong here.” – Radiohead, Creep

I attended the KMWorld Conference and two of the four co-located sub-conferences, Taxonomy Boot Camp and Enterprise AI World, in Washington, D.C. last week. It was the 20th Anniversary of Taxonomy Boot Camp, and although I raised my hand when asked if I had attended 10 or more, I had never actually bothered to count how many years I’ve been to the conference. It turns out, I have attended KMWorld and the co-located conferences for 13 years out of the last 18, with my first being the 2008 Enterprise Search Summit West when it was still hosted on the West Coast in San Jose, California. Over this span of time, I’ve seen trends rise and fall, fears realized and alleviated, promises promised and promises kept.

Every year there seems to be at least one theme I can carry away from the conference, and for several years, I’ve written about the themes and trends from the conference as I see them. Some of these blogs are hosted on the Synaptica website and on my personal blog at Information Panopticon. I will go into some of those themes and trends as I did last month in my recap of the Henry Stewart Semantic Data Conference and HS DAM.

After many discussions at the event with colleagues in the industry, I can confidently state there was one overarching theme: things are weird.

Things Are Weird

What’s weird in 2025? Seemingly, everything. While the wider world may be as weird as it’s ever been, the state of knowledge management, taxonomy, and related fields covered by the conference have been supremely weirded and mostly by AI. There has been a rapid shift from feelings of dread that AI is coming for all of our jobs to an embracing acceptance that those who use AI as a work tool will maintain their place, or even excel, in the industry.

Despite this acceptance, the industry is still weird. In the world of taxonomy, we all understand and believe that AI and its need for clean, curated, semantic data will necessitate good taxonomists and taxonomy systems. However, while the bottom hasn’t completely fallen out of the market, there seem to be fewer jobs, more unemployed taxonomists, and lower salary and contract rates. Maybe it’s the generalized uncertainty in the tech sector driving the slump, but many people I spoke with at the conference think that it’s a misunderstanding of AI foundational needs that’s causing companies to lay off taxonomists and invest in AI more heavily.

What else is weird is how excited everyone is about AI but how few companies have managed to successfully bake AI practices into their processes. We’ve seen the incredibly successful use of AI in text summarization and generation, particularly when it comes to meeting minutes and the identification of who said what and what action items were decided upon. Agentic AI has had some success, and taxonomists seem to be working at skilling up in prompt engineering using already existing taxonomies and ontologies and generating new taxonomy options from generative AI prompts.

What was very weird for me is when you hear people in taxonomy talking about the auto-generation of taxonomies who (including myself) for years said that the results of autogenerated taxonomies were mediocre at best, you know there’s been a seismic shift in the industry.

AI in Everything, Everywhere, all at Once

Like every other conference I’ve attended this year, the focus is on AI. Something that was different this year, in my opinion, was the amount of more practical applications and use cases that were presented. While AI still hasn’t found its stride when it comes to maximizing ROI in a variety of use cases, there were very informative sessions on what skills knowledge management and taxonomy specialists need to master in order to work in an AI world. I have seen this reflected in the job market. There are still taxonomy and ontology roles being posted, but there are many more requiring technical skills like Python as part of the role. In essence, companies seem to be looking for people with both the business skills to build and manage semantic models as well as the technical skills to implement them in AI applications. While I know these people exist, I don’t know how common they are. People at the conference felt that there will always be a need for taxonomists who do not have deep technical skills, but an acknowledgement that getting skilled up in prompt engineering is a differentiator.

Several presentations in both the KMWorld and Taxonomy Boot Camp conferences focused on AI readiness. What does an organization need to appropriately identify AI use cases, guardrail the data within the organization so only what is appropriate is available to AI tools, and determine how humans fit into the process? Since generative AI can extract and generate metadata candidate concepts, it’s important to understand the results to interpret their trustworthiness and value. Many of these concepts are net new and should be added to taxonomies as concepts, properties, or relationships. Other values are net new but are ephemeral or trending and can be used in search, for example, but aren’t yet established enough to be codified in semantic models. Other concepts will already be in the taxonomy and can be discarded unless they are used to inform which content includes taxonomy values and should be tagged appropriately. Finally, some generated concepts are garbage or nonsense and should be used to inform and improve the machine learning model. All of these questions need to be answered and processes decided upon and put into practice.

Another common theme was the use of taxonomies and ontologies to inform and refine prompts and to support Retrieval Augmented Generation (RAG) with or without labeled content as part of a knowledge graph. Taxonomies and ontologies supply the domain specificity required to augment large language models (LLMs) to inform them of the specifics of the organizational perception. In short, using internal semantic models is a shortcut to providing the book of knowledge about the organization. What was once a topic in only a few presentations seemed to be much more prevalent this year.

As I mentioned earlier, organizations are now in the phase between conducting proof of concepts on AI in the organization and the need to productionize and provide ROI on AI in organizational workflows and processes. If the AI bubble is stretching, the continued low adoption and ability to prove ROI for organizations will cause the bubble to burst. The AI industry will only continue to grow and expand if there are tangible benefits increasing the bottom line of organizations who have adopted the practices and scaled them into production.

Taxonomy, Data Governance…and Where Is the Data?

At least two presentations focused on the intersection of semantics and data governance. These both stood out to me because I have experienced the intersection of semantic models and highly governed data in data lakes myself as there are frequent conversations about who owns which data and what is the ultimate source of truth. How do internally developed taxonomies and ontologies fit into the overall data governance model? Who owns which values in which systems? How are the practices similar and different?

In his keynote, Malcolm Hawker simplified two key aspects in the different disciplines. In data governance, one of the main concerns is accuracy and precision for measurement. In taxonomy, the focus is on meaning, or semantics. Bridging the two disciplines is the bringing together of measurement and meaning. AI is a motivating factor in reconciling the two disciplines due to the need for clean, structured data and its use in tagging unstructured content. Although the approaches to relational database focused data management and graph database semantic data are different, they are complimentary functions ensuring the best data quality in an organization.

Beyond who creates and owns the data and in which system it lives, organizations may be trending towards moving back to “walled gardens” of data to use in AI. The difference between public tools like ChatGPT and using LLMs within an organization puts the emphasis on what data is shared with the model and which users can see it. Since many AI tools use publicly available information on the Internet, organizations may protect their data and bring more of it in house so it is not available to such tools externally. Similarly, the concept of having a sovereign cloud, in which data is housed in the cloud but designed to meet regional data requirements and regulations, is probably gaining traction. Where there have been so many efforts to make data and content freely available, there is also recognition of which data should be available to machine learning models both to produce reasonable outcomes and to protect organizations and individuals. As states, provinces, countries, and regions pass additional laws about data, privacy, and AI, parsing out which data is subject to which regulations is becoming more difficult. Separating data in sovereign cloud environments may be a key to addressing this challenge.

In Summary

Things are weird, but the foundational value of semantics has not disappeared. The focus has changed and the toolsets we use to create and productionize semantic models have expanded, but the need for clean, accurate, source of truth metadata has only grown in importance.

KMWorld Themes and Trends

https://pixabay.com/illustrations/cyborg-hand-human-robot-contact-9091910/

“At a certain level of machine-ness, of immersion in virtual machinery, there is no more distinction between man and machine.” – Jean Baudrillard, Violence of the Virtual and Integral Reality

I attended the KMWorld and co-located conferences held in Washington, D.C., November 18th – 21st. Across the conferences and sessions that I attended and the conversations between sessions, there were several themes which resonated. I’ll detail some of those themes and highlight any talks in particular I thought inspired or captured those trends. I couldn’t attend everything, so please forgive me if I’ve overlooked great talks, of which there were many.

Taxonomies as an Enterprise Strategy

The importance of taxonomies and ontologies as a foundational business and technical program was particularly evident at Taxonomy Boot Camp, but the theme also came up in talks in the other conferences as well. I presented on the topic in my session, “Stand Still Like the Hummingbird: Enterprise Taxonomy Strategy When Nothing Stands Still”. Likewise, Thomas Stilling in his Monday keynote, “Be the Change: Your Taxonomy Expertise Can Help Drive Organizational Transformation”, emphasized metadata as an organizational strategic asset and discussed how to position taxonomy to align with strategic objectives. Similarly, Lindsay Pettai’s session, “Empowering Your Enterprise With a Dynamic Taxonomy Program”, discussed the importance of having an enterprise taxonomy program.

These are just a few examples. Throughout the conferences, the notion that taxonomies and ontologies provide structured, source of truth values for navigation, search, and content tagging was prevalent. The main theme, however, was how taxonomies and ontologies are critical to the success of machine learning (ML) model training and program success. In talk after talk, artificial intelligence (AI) and ML were top topics and various ways to approach projects to make them viable and successful included what needed to be in place. Taxonomies and ontologies were among those foundational requirements.

As a 16-year veteran of KMWorld associated conferences–my first being Enterprise Search Summit West in 2008–I have seen the more recent embrace and understanding of enterprise taxonomies and ontologies beyond simple use cases like navigation. Even just a few years ago, taxonomies seemed to be misunderstood or undervalued by many conference attendees. Now, talk of their use as enablers of AI was ubiquitous.

Artificial Intelligence Is Here to Stay

I don’t believe I saw a talk which didn’t include the keywords “artificial intelligence”, “machine learning”, “AI”, or “ML”. I was unable to attend KMWorld last year, but in years past, any conversation on the role of AI/ML was frequently met with scoffs, eyerolls, or fear. In KMWorld’s Thursday Keynote, KM, Experts & AI: Learning From KM Leaders, Kim Glover and Cindy Hubert captured something important when they said that AI is driving a lot of emotions. Some of these emotions are excitement, skepticism, and fear. Addressing these emotions is going to be essential in addressing the technologies. 

The overarching theme is that AI/ML is here to stay, so what are you going to do about it? Continue to reject it and fall behind the curve? Embrace it without question? Embracing ML models, including LLMs, with guidance and guardrails seemed to be the message most were conveying. While many organizations are still in the proof of concept (PoC) project stage, adoption seems to have already arrived or is pending, so get ready to embrace the change to be successful.

The exponential growth in ML has been fueled by cheaper storage, faster technologies, more robust and accessible models such as ChatGPT, and built-in technologies like Microsoft Copilot. Not only are these tools more accessible, they are more accurate than they have ever been in the past…for the right use cases. And this was another big takeaway: AI/ML is here to stay, but use these technologies for the right use cases. Document summarization and generative AI text generation are big winners while tools used for critical decision making are still improving and must be approached with caution.

Human-in-the-Loop, Cyborg, or Sigh…Borg?

Overwhelmingly across sessions, the consensus was the importance of the human in the use of AI/ML technologies. For both the input and the output, to avoid garbage in and garbage out, human beings need to curate and review ML training sets and the information that an ML model outputs. As noted above, document summarization may be low risk depending on the industry, but decision-making in areas like law and medicine are high-risk and require humans-in-the-loop.

In his session, Evolving KM: AI, Agile, & Answers, the inestimable Dave Snowden discussed storytelling–or, rather, talked about storytelling through the art of storytelling–as an intimately human activity. He noted the deeply contextual nature of human knowledge and the need for knowledge management to generate knowledge rather than store it as codified data. If knowledge can not be fully codified as data, therefore, it is difficult or impossible to transfer that level of human knowing into machines and their models. The connectivity to knowledge and deep metaphor was evident in my three takeaways from his session: his notion of a tri-opticon integrating knowledge between three areas of an organization and the connection to Welsh Triads; forming teams or roles of threes, fives, or sevens, Welsh Rugby, and the number of people on a rugby sevens team; and assemblages, mycorrhizae, and rhizomatic networks in Deleuze and Guattari’s, “A Thousand Plateaus”. These kinds of connections are weak use cases for machine learning and emphasize the importance of human knowledge and inclusion in AI technologies.

And, speaking of human knowledge, of course knowledge retention and transfer were key topics in an era when both age and work-from-home job mobility are creating more job turnover than ever. While human knowledge is difficult to capture, using AI agents to capture knowledge and ML models to parse textual information will assist in ramping up the sheer scale and increasing pace necessary to retain and transfer knowledge.

The human-in-the-loop, the cyborg, and, sigh…Borg, are all going to rely on knowledge and ethics in order to create a human-machine interactive paradigm. If humans don’t use our own ethical boundaries to curate machine content, then bad data, and bad ethics, will spread throughout our information systems.

While there was some fear and loathing in D.C., there was a stronger current of hope, optimism, and curiosity in the many ways we can use taxonomies and ontologies, AI/ML, and human knowledge management together to guide us toward a brighter future of technology use.

The Taxonomy Tortoise and the ML Hare

https://pixabay.com/illustrations/aesops-fable-tortoise-and-the-hare-6570775/

“I knew I shoulda’ taken that left turn at Albuquerque.” – Bugs Bunny

For better or worse, much of my childhood was informed by Looney Tunes, Monty Python, and a diet of science fiction ranging from the profound to the disjointedly camp. As such, I expect the absurd and am wildly skeptical of easy answers. Additionally, my foundation of science fiction books and films compels me to speculate that artificial intelligence will become a more realistic probability in our lives with actions ranging from locking us out of airlocks and starting global thermonuclear war to providing answers to our most pressing global problems.

The long-promised advantages of artificial intelligence seem finally to be reaching a point at which they can be utilized for enterprise purposes, including parsing, and even understanding, large amounts of text and data at rapid speed. The recent successes beg the question that if machine learning models can operate on data at high volume and velocity, then why shouldn’t they be used to come up with answers on the fly based on large amounts of data internal or external to an organization? Well, in fact, they already are, and, in my opinion, they should, but not without some acknowledgment of absurdity and a certain degree of skepticism.

I’m a firm believer in defining semantic models in the form of taxonomies and ontologies to be used as a foundational schema for an organization’s data. One of the arguments against investment in taxonomies is the time it takes to create them and the amount of maintenance they require to sustain them. In a world in which what is trending changes frequently, user tastes are fickle, and the jargon associated with these trends passes quickly, the desire to avoid the tortoise-like pace of building taxonomies in lieu of utilizing other, faster technologies is tempting. But, as the hare who lost the race to the tortoise laments, “I knew I shoulda’ taken that left turn at Albuquerque.” Or, let’s consider checking the map before we go racing off in the wrong direction.

Let’s talk semantics. Putting it simply, ontologies are semantic structures which define one or more domains. They describe the types of things in the domain (classes), how these things can relate to each other (relationships, predicates, or edges), what labeled fields are used to describe these things (properties), and the instances of things (subjects, objects, or, more plainly, taxonomy concepts). Ontologies describing the general domain and taxonomies including the specific instances within one or more domains can be created as a map of your organization. These semantic structures represent the organization in all of its complexity. They specify the concepts important to the company and how these concepts relate to each other, data, and content. Once data or content is added, we can call this entire structure a knowledge graph.

In short, ontologies, taxonomies, and content are the organization’s view of itself, the world, and where it lives in it.

Large language models (LLMs) have the ability to generate text, answer natural language questions, and classify content. Most publicly available LLMs, like ChatGPT, are trained on publicly available information. It is also possible to supply these LLMs with your own training sets of documents and language samples to develop answers more applicable to your own organization. Wisely, many organizations tightly control what information can be presented to these AI tools to avoid company information leaks or supplying competitors with proprietary information.

What’s lacking in using these hare-rapid models, however, is the organizational perspective. They are very good at answering general questions and making factual assertions from text, but they require tailored training content with specific use cases in mind to generate answers specific to an organization’s needs. There can be a temptation to feed one of these models a large quantity of organizational content to train them faster. However, the span of topics, language, jargon, and acronyms used in an organization can yield unsatisfying or unpredictable results. Imagine, if you will, the amount and variety of content in any one of your company’s content management systems. Now imagine asking a machine learning model to analyze and make sense of it all without guidance. You can index all of your own content, but without a framework, what sense does it make?

At this moment, the hare and the tortoise must strike a deal if they both want to win. To improve the performance of LLMs and other machine learning models, a domain topology specific to your organization defining the concepts, their synonyms and acronyms, and how they relate to each other, can be used as a schema input into the model. Semantic models are, after all, assertions in the form of triple statements (subject-predicate-object). Ontologies establish factual statements as determined by your organization’s use cases and, hence, provide patterns which can be used by machine learning models. Lexical proximity can be gathered from taxonomy hierarchies (these concepts are more closely related because they share a parent-child relationship) and associative relationships (these concepts, separated across several taxonomies, are actually very closely related because they have a direct associative relationship between them). Semantic models provide factual statements, built slowly over time based on business use cases, which can augment and improve LLMs.

Not only can we think of semantic models as a collection of factual statements according to your organizational domain and use cases, we can also think of it as a summary, requiring the LLM ingest a lot less information to reach the same factual conclusion. For example, you can provide the model with a huge amount of training data stating that a particular SKU-level product is available in the color blue. If this is a factual assertion in your semantic models (Product name has color Blue), however, then this fact can be tagged to a single product representation in a database and in turn is applied to thousands of real-world SKU instances. Semantic models are a distilling and modeling of thousands of instances of truths across an organization and summarized into a collection of ontology structural elements and taxonomic instances. Citing a joke by Steven Wright, in which the comic tells us he has a map of the United States which is actual size, your organizational map can be represented in a much smaller scale.

Yes, it’s certainly true that given large amounts of data, machine learning models or text analytics can identify all kinds of important concepts. These concepts (and fact assertions between concepts) can be a great pipeline to feed into taxonomy and ontology construction. I am skeptical of machine learning models generating taxonomies and ontologies based on organizational data and content unless there is heavy human-in-the-loop curation to reconcile those absurdities which I believe inevitably creep in. And, yes, it’s certainly true that this curation is potentially at a tortoise pace, but once these concepts and assertions are built into semantic models, the ongoing maintenance and governance demands less time and effort.

Those slow semantic model builds enable fast-moving machine learning models and LLMs to be grounded in organizational truths, allowing for expansion, augmentation, and question-answering at a much faster pace but backed with foundational truths as asserted by your organization.

Be the tortoise first and foremost and the hare will follow.