Information Panopticon

Home » 2024 » September

Monthly Archives: September 2024

Modeling a Moving Target

https://pixabay.com/illustrations/ai-generated-pipes-industrial-9043429/

Like a wave in the physical world, in the infinite ocean of the medium which pervades all, so in the world of organisms, in life, an impulse started proceeds onward, at times, may be, with the speed of light, at times, again, so slowly that for ages and ages it seems to stay, passing through processes of a complexity inconceivable to men, but in all its forms, in all its stages, its energy ever and ever integrally present.” – Nikola Tesla

Some of the most thorough, authoritative, and well-constructed controlled vocabularies have been built and curated over the course of decades. The NASA Thesaurus was first published in 1967. The Library of Congress Subject Headings can be traced back to its roots in 1898. The MLA Thesaurus was modernized in 1981 based on previous classification methods. The Getty Art & Architecture Thesaurus was started in the late 1970s. There are many more examples of well-established controlled vocabularies, some older and some newer.Maybe there’s something to the adage “with age comes wisdom” in the longevity and authority of these vocabularies.

Many of these longstanding vocabularies serve the academic system and do not necessarily “move with the speed of business”. In a business environment, the emphasis is typically on speed, agility, and efficiency. None of these qualities are in opposition to the typical development of good controlled vocabularies, but a fast-paced organizational environment can certainly be difficult to support with vocabulary development that can’t keep up.

In my experience, product vendors talk about how to speed up taxonomy and ontology development while semantic practitioners (librarians, taxonomists, ontologists, etc.) regularly have to strike a balance between maintaining quality semantic models and serving the business needs. The speed at which semantic models need to be developed in a business setting often leads to the extension of inappropriate taxonomy use cases in order to support and facilitate immediate business priorities. We need to step back and ask ourselves how we can as taxonomists effectively support the business while addressing what should be modeled in taxonomies and what should not.

In this blog, I’m going to focus on two closely related cases of data which I think are not appropriately modeled or maintained in a semantic model but are often brought up as examples of serving the business quickly.

Processes & Sequences

A business problem I’ve seen in my roles as a taxonomy management platform product manager for a vendor and as a taxonomy practitioner is the modeling of processes as taxonomies.

There are several difficulties with modeling processes as taxonomies, and why I tend to discourage their inclusion as part of the overall enterprise taxonomy and ontology semantic framework. First, steps in a process aren’t usually semantically hierarchical. For example, say you are asked to support a step-by-step process for submitting a ticket in a request tool such as those offered by ServiceNow or Atlassian in a centralized taxonomy. An IT service ticket can be opened, assigned, reassigned, resolved, closed, and reopened. Building these steps as a hierarchical taxonomy doesn’t convey their true relationship to each other. You can build these steps as a flat list, which addresses that problem, but then you have a sorting issue of making sure the steps are displayed in the proper order in the consuming system. Since the actual ticket requests live in a consuming ticketing system while the controlled values naming each step in the process live in a taxonomy management system, this may not be a problem at all. The ticket is an object that is simply tagged with a new value at each stage in its lifecycle, regardless of the flat or hierarchical structure in the taxonomy. So, the first problem may be solvable, but it needs to be addressed.

The second issue is that processes are rarely progressive steps from beginning to end. The stages of a ticket may move up and down a flat list or hierarchy, often skipping steps in between and moving back to a previous step. Again, if the ticket is tagged from a flat list which is properly ordered for display, this may not be an issue. If the consuming system must follow the flat list or tree order, however, there may be challenges in changing the value to a previous step. Where this gets complicated, however, are processes in which the same steps are repeated by different groups in the organization. A contract, for example, can go through contract review by the vendor, by the business team representatives, by legal, and again by the vendor’s legal. Modeling a process like this taxonomically often means that these repeated processes are either appended with the team or entity conducting the process step (business review, legal review, etc.) or repeating the steps as children under organizational group headings, creating a polyhierarchical nightmare. 

The final and most difficult challenge with modeling a process is that they often change. In another example, modeling the customer journey from start to finish is much like stages in an IT ticketing process. The customer rarely moves neatly from each step to the next. More importantly, however, is that marketing processes frequently are overhauled with changing values anytime there are changes in business strategy. Even when the values don’t change, their ordering of the process does and these orders are often reflected as navigational structures represented as filters on the front end. Representing a customer journey based on filterable values is challenging because of the numerous ways a customer may enter the UX pipeline. In retail apparel, for example, they may start looking for products at Gender (Men’s, Women’s, Children), then apparel type, then size, then filtering by material or color. Or, the customer may start with color, then apparel type, then size. There is a process here, but one with multiple entry and end points. Trying to represent this as a taxonomy is incredibly difficult.

A similar problem I’ve come across in taxonomy modeling is using lists or hierarchies to indicate sequence. Most processes have steps that are in sequence, but in this case I’m talking about strictly fixed sequential order. Examples can include the order of books or films (by date or by narrative order), the list of U.S. Presidents in order, or events by date.

The fundamental issue for most taxonomy management systems, and for many systems relying on taxonomy data for that matter, is the default to alphabetical display in lists. For most taxonomy hierarchies, alphabetical is the preferred display order, with each cascading branch also alphabetized. On most front end websites, navigational taxonomies are ordered by use with the most prevalent ways to access information listed first. Front end website platforms are built for this type of information display because the hierarchies do not necessarily follow what I would call semantic or “is a” taxonomy practices; parent child relationships are based on filtered drilldowns, not by strict contextual meaning. Where this becomes an issue is when an organization, quite rightly, wants to consume centralized taxonomy concepts for the front end experience.

Even when using the larger graph underlying taxonomy and ontology structures, conveying order can be challenging without the right functionality to support it or the ability to leverage the model in consuming systems that can only handle flat lists or shallow hierarchies.

Modeling Options

Modeling sequences isn’t out of the question in taxonomy management systems assuming that 1) the taxonomy and ontology management system includes functionality supporting modeling options, or 2) downstream systems can effectively handle or transform concepts received from a centralized taxonomy.

To model steps in a process or to capture sequences, there are a few options. If you are using a tool supporting RDF and core SKOS elements, you can consider using skos:OrderedCollection. An ordered collection is exactly what it sounds like: a collection of concepts put in a specified order. Using an ordered collection for a list of concepts in a branch of taxonomy allows listing those items in the order desired. There may be no other indicators of why those concepts are in that order if stripped from their contextual parent, but it will force sort a group of concepts. This assumes, of course, that the consuming systems don’t simply revert to alphabetical order once received.

A more flexible and sustainable way to model a process is to model semantically using “is a” rules and then leveraging a true ontological structure and a graph to map the journey. This means modeling concepts in their one best location in one or more taxonomies as part of a larger domain model. Modeling this way leans into the strengths of an ontology by using associative relationships between entities to make a graphical representation of the order while also connecting the entities to their owners. So, for example, books or films may have relationships like is followed by / is preceded by and then can be connected to their authors, directors, and actors as part of a greater graph.

Another option is to include a property on each concept. The property could be a field which indicates a numerical value listing its placement in the list. While this metadata field could be useful in the taxonomy user experience, whether or not these property values could be used in consuming systems to order items is still problematic. Furthermore, it gets complicated if it’s necessary to order multiple sets, all of them including the same numbers as an ordering property.

In advanced graph-based taxonomy and ontology management systems, there may be an option to use reification or RDF* to support metadata on triples. In this way, the ordering is embedded on the edges themselves. For example, books and films could include relationships with a release date. This could look something like “James Bond film” has release date [20061117] “Casino Royale” in addition to a broader/narrower relationship between the two concepts. There are several modeling options to make use of associative relationships with added metadata on the relationship edge.

In sum, it’s not impossible to model processes and sequences in taxonomies, but it requires thoughtful modeling in the context of other existing business taxonomies likely sharing the same overarching business ontology. Moreover, thoughtful modeling may not move with the speed at which an organization wants to move, but slowing down and getting it right the first time can save a lot of painful rework later.