Home » Uncategorized (Page 2)

Category Archives: Uncategorized

How Not to Say What You Mean

07/29/2024 10:48 / Leave a comment

*The author at the Panathenaic Stadium in Athens, Greece, the site of the first modern ~~Olympic Games~~.*

“If I had a world of my own, everything would be nonsense. Nothing would be what it is, because everything would be what it isn’t. And contrary wise, what is, it wouldn’t be. And what it wouldn’t be, it would. You see?” – Lewis Carroll, Alice in Wonderland

As I traverse the daily absurdities of life, I’m often caught between the dark, existential absurdism of Albert Camus (Camus was famous, but Sartre was smarter!) and the light, but also dark, comedy of Monty Python. “The absurd is born of this confrontation between the human need and the unreasonable silence of the world,” Camus wrote, but in French. Absurd, but similar in a sense to the absurdity of the Python sketch, “How Not to be Seen”. In this bit, several people “demonstrate the value” of not being seen because, once seen, one may just be shot or blown up. You just never know.

I am reminded of the “How Not to be Seen” skit because there are other things which demonstrate value but should not be seen. Consumers–that is, all of us–often have expectations which may not be known to retail companies. These expectations may be reasonable or unreasonable. We may expect a website to know exactly what we need or mean simply by landing on the home page or typing in a general search query. Things which should not be seen can in fact be fundamental to our contextual understanding and our shopping experiences. Patented, copyrighted, misspelled, or inappropriate words or concepts, for example, should not be seen in the wrong place or context. There are times when what needs (or wants) to be said can not be.

Taxonomists are word people driven by an insatiable compulsion to classify the world around them. Much of a taxonomist’s work is spent finding the right word form, and its synonymous variants and acronyms, so that a single word or phrase can represent a defined concept. Usually the intent of finding the right concept is exactly so that it can be seen, and understood, by end users. How can taxonomy professionals say what needs to be said without saying it at all? How can the valuable work of agreeing on common terms and phrases be obfuscated so that what needs to be said isn’t seen?

How Not to Say “~~Olympics~~”

Coming up in just a month is a perfect example of what’s not to be seen: the “~~Olympics~~”. Of course, we’ll all be seeing them on television, mottos and rings and all. However, using the word “~~Olympics~~” or any of their properties is highly restricted. A full explanation exists here: https://olympics.com/ioc/olympic-properties. I swim with a United States Masters Swim (USMS) team. Despite being an athletic organization, USMS and its subdivisions can’t have any events called “~~Olympics~~” or “~~Paralympics~~” even if they are for fun and without official times.

During the period around the Olympics, people rightly get excited. They want to purchase athletic equipment or apparel to support their favorite teams and sports. How do retailers say “~~Olympics~~” without saying “~~Olympics~~”? In these circumstances, representing the “real” or intended concept while using or redirecting from the stated–or unstated–concept requires some trickery. While pondering how this plays out for retailers, many of whom may or may not have legal contracts with athletes who are past or current Olympians, I visited some sites to conduct some very unscientific and informal searches. In the spirit of not being seen, I won’t mention the sites by name or all of the search terms I used because I can not speculate on what is actually going on behind the scenes, only what I believe may be happening to make the front end behave as it does.

I started my search by visiting several retail sites who would have an interest in selling products related to the ~~Olympics~~. These sites include athletic apparel and product aggregation sites and I used the term “~~Olympics~~” as my search query directly on their site’s home page. Here’s what I found.

On the first site, I got a page full of products which will very likely be worn by athletes at the ~~Olympic Games~~. There were no words or phrases appearing in the product names or descriptions linking the products directly to the “~~Olympics~~”. I suspect this product page was manually curated in the web content management system and search terms and variants for “~~Olympics~~” all redirect to this page. On the same site, I put in the name of an ~~Olympic~~ athlete and was presented with a page with products appropriate to that athlete and the sport with no mention of either the upcoming competition or the athlete’s name.

On the second site I chose, I got a handful of products searching for the term “~~Olympics~~”. Not very motivating, but at least there was something. I then searched for an athlete this company sponsors and got a page about the athlete. However, there were no associated products, only information about how this athlete is sponsored by the brand. I’d call this a missed opportunity. Show me your profile of the athlete, sure, but also give me something along the lines of “This athlete likes products x, y, and z.” If I’m a fan of an athlete, I might just buy what that athlete endorses, even if what I can buy isn’t the customized equipment used by an ~~Olympian~~.

On the third site, searching the term “~~Olympics~~” brought me to a single product page for a sporting event which will be at the competition. Fine, but underwhelming. That’s it? That’s your stake in these upcoming sports? While I couldn’t find a specific ~~Olympic~~ athlete sponsored by this brand, I did find teams wearing their apparel. When I searched for one of these teams, I got similar products, but none for the team for which I searched.

On the fourth site, I got a message saying there were no products matching the term “~~Olympics~~”. While this company has avoided using a search term which they have no legal right to display, they have also missed a huge opportunity to move product during one of the largest sporting events around. No search should have an empty results page and should always redirect to something, even if only notionally related. As a consumer, I might bounce out of this site right here and now. I then searched for an athlete this company sponsors, and, almost unbelievably, the page again told me there were no related products. The term “~~Olympics~~” may not be available for use, but it was no secret which athlete this company sponsors. To not have any related products come up with their name was a big surprise.

On the fifth and final site, a retail aggregator, there seemed to be no restrictions on ~~Olympics~~ or brand name products with some or most of the items available branded themselves. To be honest, I don’t know how they get away with it, and I would need to talk to someone familiar with business law to get an answer. Perhaps there is a blanket agreement with The IOC (International Olympic Committee) allowing the term to be used in search, product names, and product descriptions since the term appeared in all of those locations. I doubt if many of the numerous sellers on the site have a direct agreement with the IOC, but I did find a few products that claim they were officially licensed merchandise. Here, any and all search terms brought at least something back.

How Not to Say Your Competitor’s Name

In a world of fleeting social media trends, there is opportunity to ride a popularity wave by creating similar, or even highly contrasting, products. If you like product A, you’ll love the similar (but better!) product B. Or, you used to like product A, but it is so last week, so check out product B as the next trend. With this in mind, my next experiment was searching for competitor’s names and products across sites. Again, I searched directly on the company home page seeking competitor names or products.

Back on site one, searching the competitor’s company name defaulted to a page of undifferentiated products. Again, these are probably manually curated to line up against the competition’s offerings. Since my search was vague, the product offerings spanned just about everything. When I tried a more specific, and trending, product name, I was brought to a page of products which had similar characteristics. Again, there is no mention of the competitor or their product. However, there was a general sense of “if you like this, you might also like this”. I’d call this a win for the retailer. If I came looking for something from another brand expecting to find something similar from this brand, then I may have some brand loyalty and want an equivalent product without having to shop their competitor.

On the second site, I got a message saying there were no products matching the term for their competitor’s name. Similarly, I was not shown any products when I typed in the specific name of a competitor’s offering. Disappointing, especially since there are similar products to be found on the site. It is impossible to know what strategy, if any, this retailer is employing. Maybe they don’t see their competitors as a threat. Maybe they haven’t found a mechanism for presenting similar products. Maybe I just happened to pick a competitor’s name and product they don’t match to. Regardless, coming up with nothing prompts me to look elsewhere.

On site three, I got the same message saying there were no results: an empty set for my search term. When I typed in the name of a popular rival product, I was shown two products which had almost nothing to do with my search. They showed me something, but it gives the searcher the impression that this company doesn’t sell similar products, which is false. If you come to the site with intent, you can probably find what you’re looking for, but if you come with an open-ended search, you will likely be disappointed.

Site four brought up one product when a competitor’s name was typed into search. While the rival company does in fact sell similar products, it’s debatable whether the product this site returned was representative of the other brand. When I typed in a rival brand’s product name, I got no results. Again, an empty results set. The brand does offer a row of their most popular apparel beneath my “no results” message, but this may or may not have any semblance to the product I’m trying to compare.

Finally, site five yet again brought up numerous products both for brand and product names. I can’t speak to whether the products are genuine or knock-offs and whether the sellers have any rights to sell those products. What a few searches prove on an aggregate retailer site is that they carry anything and everything.

In summary, I believe some retailers are mapping competitor brand and specific product names to similar products they carry. I’ll speculate that they do this on the front end, manually mapping products that retailer carries to search terms which don’t show in dropdowns, navigation, product names, or product descriptions. In this way, consumers get a reasonable comparative product results set without the retailer getting into legal trouble showing event names like the “~~Olympics~~”, competitor brands, or competitor brand products.

How Not to Say the Unsayable

I’ve been focused on retail sites because that’s the environment I work in today, but this idea of redirecting to the proper term came to my attention in my first taxonomist job as an assistant thesaurus editor at an academic institution. At this company, subject matter experts manually indexed academic articles in a variety of humanities disciplines. Their index spanned over 100 years worth of articles and had been indexed across the span of decades.

Over the course of their work, language changed and terms fell in and out of use. Maintaining a thesaurus of thousands of concepts displayed in a flat list made long-term curation and governance challenging. Terms usually fell out of use as indexing terms to match their use, or lack of use, in the text. Many of these zombie terms lived on in the thesaurus until sometimes they were found by an indexer and used again.

I apologize in advance for the terms I’m going to put into print here, but I think it’s important to underscore exactly what language was appropriate at the time the article was written and the difference between terms we use now to express the same ideas. One term that popped up in indexing was “~~mixed bloods~~”. It only took one glance to know that was not a term we should be using to index academic articles. As I recall, the article did in fact use that phrase, and it was probably an older article being indexed long after the term fell out of use. So, how do we remain true to the spirit of the article while also remaining true to the integrity of the institution and keeping up with appropriate and inoffensive language? In this case, common thesaurus parlance and structures were utilized, using the Used/Used For relationship to point from one term to another. The old term was turned into a synonymous (but hidden) “use” term and the more modern term, probably “multiracial”, was the new “used for” term. A thesaurus redirect then addresses the current as well as all past indexing. Anyone typing in the former term will automatically be redirected to the newer term and the articles indexed with both concepts.

Another example, used frequently in the writing of Martin Luther King, Jr., is the term “~~negro~~”. I think this is a good example of intent and the problematic use of inappropriate language. When MLK, Jr. used the term, it was the “acceptable” parlance of his day and was used 15 times in his famous “I Have a Dream” speech. When keyword indexing a piece of literature like this speech, it’s important to capture the subject matter, but we need to recognize the now problematic terminology. Again, using redirects from appropriate terms to older indexed terms solves this problem and can be done again in the future if current terms fall out of favor.

The latter example can quickly lead down a slippery slope of intent and political speech. People say–and imply–what they shouldn’t all the time in political speech. How we recognize and address that speech depends on the context.

How Not to Say what You Want to Say with a Wink and a Nudge

Back to not saying what you want to say when you need to not say it. Riding quickly-changing trends can be a huge revenue vehicle for many companies, especially if they are able to pivot quickly and sell the message to the right audience. As I’ve said before, taxonomy development lags behind current trending language, both because of the amount of work involved in maintaining a vocabulary but also because of literary warrant. While the concept was established when indexing written works, the principle still applies to fast-paced corporate environments. Add concepts to your taxonomies when they have proven their value or staying power. Anything trending or temporary should be managed another way.

Let’s look at an example. When Barbie was released on July 21, 2023, searches for “Barbie pink” and “pink” spiked according to Google Trends. In addition, searches for “Barbie” and “pink” in various combinations with specific brand and product names also rose. While a retail company might ride this wave and create collections of pink product landing pages, many do not have licenses with Mattel to use the term “Barbie”. Hence, products needed to be assembled based on terminology that in whole or in part, yet again, couldn’t be spoken or displayed.

What are some ways to do this while leveraging controlled metadata values from taxonomy in search? Taxonomists can request on-site search logs from the search team and analyze and mine these concepts for candidate terms. Analyzing search logs is valuable, but making an informed decision as part of the retailer search strategy is even better. Work with marketing and analytics and insights colleagues to identify top competitors and products and map them as appropriate to your company’s competing products. Consumers will be pleasantly surprised when they type in a competitor product and land on a page of very similar products offered by your company. These search terms can then be added to taxonomies strategically.

For anything with staying power (i.e., it has literary warrant) consider using skos:hiddenLabel to redirect to results. Skos:hiddenLabel is “A lexical label for a resource that should be hidden when generating visual displays of the resource, but should still be accessible to free text search operations” (W3C). A hidden label can be associated to a preferred label concept just like an alternative label. For each appropriate product, map it to a competitor or product name stored as a hidden label so any searches using a term on the company’s front that can not legally be displayed redirects to the correct similar products tagged with that competitor name.

This method brings up two important questions. Should you use skos:hiddenLabel or its equivalent in a consuming system to redirect from forbidden terms to mapped, and potentially displayable, concepts and products? Yes! Can you store and use forbidden terms in your taxonomy or other systems? Maybe! I have not researched the legality of holding concepts like this in a company’s data. While it’s clear there are concepts which can not appear anywhere on a company’s site, it’s not clear (at least to me) whether that data can exist hidden in a database. The argument could come down to intent. A company can’t control what a user types into their search box and is recorded in the search index, but they can control the results that are shown. If a company intentionally redirects based on a stored concept, is that ok? My guess is yes since so many companies do it, but it could be that it’s not worth pursuing legally if it’s not on full display. I’ll make a point of looking into the subject more.

Another method along similar lines is to store terms which should not be displayed either as individual concepts within an organization’s taxonomy or as a completely separate taxonomy of concepts which should never be displayed but can still be tagged to content. This method carries more risk as concepts which should not be published for use or are published for specific use can quickly become dissociated from their intent by consuming systems and users. One tagging mistake may reveal a hidden concept on public-facing properties. While this method makes the hidden concepts clearer to manage on the back end, it can have serious repercussions if not governed properly.

When it comes to trending topics, a more flexible approach may be warranted. Not all concepts have staying power, as is evidenced by the trending “Barbie” searches. Therefore, trying to maintain these trending, quickly-changing concepts in taxonomies is unsustainable. In cases like this, using large language models (LLMs) or text analytics may quickly identify trending concepts from multiple sources, including organic and on-site search terms, product reviews and descriptions, and social media feeds. These trending concepts can then be mapped automatically to taxonomy values using semantic similarity to concepts in the product metadata or in the product titles and descriptions. Because these models can operate more quickly on incoming data, it’s possible to act on search terms as they come in rather than spending time codifying them in taxonomies for reuse when these terms probably won’t last long as a topic. Rather than storing the concepts long-term, they are only stored in the model long enough to create mappings to products to assemble product landing pages. These logs can be purged regularly as trends change. Treating all concepts as viable ways to boost search but then being more strategic about which terms wind up as preferred or hidden labels in a taxonomy versus concepts which are essentially fleeting keywords to match trends is a more mature level of taxonomy application.

Like not being seen, not saying what you mean is a valuable skill. Ask any politician. Better, put it into practice in your searches and make some consumers happy.

Tempering Your “No Filter” Taxonomy

05/28/2024 17:35 / 1 Comment on Tempering Your “No Filter” Taxonomy

https://pixabay.com/photos/color-black-white-filter-hand-2246333/

“Do what they say / Say what they mean / One thing leads to another.” – One Thing Leads to Another, The Fixx

Are your consumers overwhelmed by a barrage of terms and phrases coming at them from your taxonomy? Are they at a loss when faced by seemingly endless and contextless values presented in a taxonomy-fed typeahead field? How can you ask your taxonomy to manner up and find its filter?

Taxonomists love their taxonomies. They don’t want to see the progeny they’ve so lovingly curated act inappropriately, particularly when it comes to throwing out values which don’t have context or are incorrect for the tasks they are intended to support. As taxonomies mature, they grow both in depth and breadth, becoming more refined and nuanced while also covering more topics and use cases. Building meaningful taxonomies with clear semantic hierarchies is the desired goal when developing enterprise knowledge models for delivery to consuming systems. As an organization’s use cases grow more numerous, however, the complexity involved in delivery can often roll back into taxonomies, forcing compromises and workarounds which further complicate future scalability and context.

I touched on this conundrum briefly in my January blog, “Polyhierarchy and the Dissolution of Meaning” in which I discuss separating semantic taxonomies from navigational and information access taxonomies consumed by downstream systems. It’s rare that consumers want every value from every taxonomy built as part of an overall enterprise ontology. Separating purely semantic taxonomies from the structures used for delivery can allow for several transformations which may not be possible in systems not designed for taxonomy management. For instance, allowing for logical navigational hierarchies which don’t follow meaningful “is a” taxonomy building principles.

Separating taxonomies may not be necessary, however, if your taxonomy and ontology system, a transformation layer, or the consuming systems can filter or restructure taxonomy values into new hierarchies which are context appropriate. Following are several use cases I’ve heard which may require this kind of transformation.

A Little Bit of Everything from Everywhere

In this scenario, the consumer needs to combine many different types of concepts into a new, usually smaller taxonomy for use in tagging or navigation. For example, they want either individual concepts or branches from different and mutually exclusive domains recombined into a new, single taxonomy for something like a front-end or app experience. Keeping track of all these different values when potentially scattered across numerous taxonomies at different hierarchical depths can be very difficult for taxonomists to visualize and maintain. In essence, they need to see a deliverable sub-taxonomy within the larger taxonomies they have created.

One way to solve this is to use skos:Collection as a label which can be applied to individual or hierarchical concepts in order to group them for delivery. SKOS collections support grouping, but they don’t support hierarchy, so the delivered concepts may be presented as flat lists in dropdown or typeahead fields. Often, the retention of hierarchy loses meaning when consumed from collections anyway as the collection label can theoretically be applied to any concept at any level, making the retention of hierarchy difficult or even nonsensical.

Adults Only!

Another fairly common use case is the consumer needing parent terms but not the children. Or, just as common, some but not all of the children or selected children at different levels of the hierarchy for a skipped-level hierarchical delivery.

In this scenario, it could be possible to use a designated property for each use case if the taxonomy management system APIs allows for filtering by a property or relationship. A property will likely be a Boolean flag to indicate it is either true or false for a use case delivery and end consumers can choose to filter out all Boolean values set to false. A relationship could be associative or hierarchical in order to create a hierarchy in which a parent has one set of children for one broader/narrower relationship and another set of children for another broader/narrower relationship.

The downsides to both of these solutions is both maintenance and functional feasibility. Can taxonomists develop and maintain properties and relationships for each different use case and make sense of them in the UI? Can the taxonomy tool represent hierarchies in which one parent term has different child terms with different broader/narrower relationships, especially if the hierarchical parent-child chain is a mixture of different broader/narrower relationships? In these cases, an associative relationship which stands in place of the broader/narrower relationship may work better in the system even if it doesn’t feed the concepts as parent-child to consuming systems.

Call Security!

What about concepts which need to be delivered to internal consumers but not to external consumers or partner organizations? In this use case, there may be much more at stake than user experience and presentation. If the wrong values are delivered to the wrong customers, there could be legal ramifications or new product releases could be leaked ahead of schedule. In this case, using a Boolean flag may enforce what values are sent to which consumers. Similarly, a status field in the system may be even better. Concepts in a “candidate” status should never be published to downstream systems until they go through a workflow in which they move through stages like “reviewed”, “approved”, and “published”. Similarly, having a “deprecated” flag or status can keep concepts which should no longer be used from flowing downstream to systems in which they can be tagged to assets or exposed in user experiences.

I Know You Are but What Am I?

Probably one of the most common delivery use cases is the “everybody is special” problem. In this case, consumers are steadfast in their ambiguous use of a concept and that its meaning must mean different things to different people depending on the context. In good taxonomy practice, we would opt to disambiguate homographs or other contextually dependent concepts. For example, “mercury (metal)”, “mercury (planet)”, “Mercury (god)”, etc. For navigational purposes, users won’t want to see these qualifiers as we use them in taxonomy practice. In these cases, context frequently provides the meaning needed at the time. If we go to a website and search for or choose “Brazil” as a location, this means the country. If we go to a website and search for or choose “Brazil” as a filter during World Cup, we mean the Brazil National Football Team. If your taxonomies cannot separate “Brazil (country)” and “Brazil (football team)”, this built-in ambiguity is going to cause major problems across your internal data systems.

If the system functionality permits, using proper taxonomy practices and disambiguating concepts in the semantic taxonomies and then stripping off these parenthetical qualifiers based on use case can maintain both meanings while allowing for front-end ambiguity. Each term will have its own full label and, more importantly, URI, to tell us which was the country and which was the team. Ideally, the taxonomists may be able to influence internal business requesters in a direction which will not require this. “Brazil” is a country only and “Brazil National Football Team” is a football team only. But practitioners know it’s not that easy. We can either allow the ambiguity and deal with the consequences later or we can get clever about filtering taxonomies so we can deliver the concept consumers want without sacrificing the semantics behind the scenes.

In my experience, which may be limited by the tools I’ve used and the organizations at which I’ve worked, having the ability to create this amount of taxonomy restructuring flexibility can be very difficult with a taxonomy management system alone and requires negotiations with intermediary transformation layers or consuming system owners to build out the functionality to enable more flexibility and scalability in taxonomy modeling.

Defining Taxonomy Borders

04/18/2024 11:17 / 1 Comment on Defining Taxonomy Borders

https://pixabay.com/photos/trees-farm-fence-farmland-2900064/

“Before I built a wall I’d ask to know/ What I was walling in or walling out,/ And to whom I was like to give offense.” – Robert Frost, Mending Wall

What data should be included in taxonomy models can be difficult to define. One can see this in practice at organizations in which similar data exists in multiple systems without a common source. Scoping what is included in taxonomies can be written out as broad policies, but these often don’t capture the nuances of data use and overlap. Two helpful ways to look at what data should be included in taxonomies is to look at the purpose and scope of software platforms and clearly defining data use cases.

Scoping by System Purpose

We can begin to define what data belongs where by demarcating borders between systems. We can start by looking at what systems already exist in the organization and what they were brought in to accomplish. While systems may have overlapping functionality, vendors, who are experts in their product offerings, know best what their systems are intended to do and market their products accordingly. While some systems have broad capabilities in an attempt to act as Swiss Army knives for data, dedicated platforms are frequently better for the job they were designed to do.

Starting from the taxonomy platform, taxonomy and ontology management systems (TMS) are meant to house taxonomy instances and their ontological structures. It can be confusing for some because they believe the TMS is where content tagging happens, but they are not intended to house external data tables or content. That fact is an important stake in the ground towards defining system boundaries and their intent: when it comes to the TMS, the data and content always lives elsewhere. External data can be tagged with taxonomy values as metadata or joined as relational to graph data, but the content is meant to live in place in its home platform. Other dedicated systems in the organization are meant to handle and manage specific domains of data: human resources systems handle personnel data, product information systems handle product information, and enterprise data governance systems centralize data for application of access and sensitivity classifications.

Personally identifiable information (PII) like employee names, social security numbers, and salary require the kind of security policies offered by dedicated human resource management systems. Other information of this type includes enterprise financial information, legal and contractual information and documents, and any other sensitive content which are housed in systems with functionality particular to the data types. This kind of data is not meant to be centralized with the intent to distribute widely for consumption by other systems, so these platforms don’t offer those capabilities. Not only is PII better handled by dedicated systems, the risk involved in managing this kind of information in taxonomy systems is high with low reward. The high maintenance overhead of managing employee names and associated attributes in taxonomies is usually not worth the trouble.

Digital asset management (DAM) systems include a variety of metadata fields with different data entry types, including dropdown or typeahead controlled vocabulary fields, because string-based descriptors are essential for fully describing digital assets for findability and classification. Metadata application is such a necessary requirement that assuming another system would be integrated to supply controlled metadata would not offer a minimum viable product for managing assets. Quite often, there is significant overlap in the descriptive metadata used to tag assets and values that could be centralized and used across the organization for a variety of use cases. Nonetheless, these systems are also not meant to be enterprise taxonomy management systems and their taxonomy capabilities are often remedial.

Another domain with potentially overlapping data values is product information housed in product information management (PIM) systems. Brand and product names, colors, materials, dimensions, geographical availability, and a host of different product attributes are great candidates for modeling in graphs of taxonomy instances. Other attributes, like SKUs, price, and lengthy textual product descriptions, however, are difficult to manage in taxonomies and don’t offer great advantages by doing so.

When systems are designed to manage content, digital assets, and product information, the capabilities often get murkier because vendors must offer overlapping functionality to create a viable standalone system. The content management system (CMS) SharePoint, for example, can manage taxonomies in the Term Store for tagging content with metadata within the SharePoint ecosystem (and shared across other Microsoft platforms). The Term Store was never meant to be an enterprise taxonomy management system, and, in fact, is nearly impossible to use as a centralized metadata hub outside the closed Microsoft ecosystem.

Unsurprisingly, overlapping functionality occurs in systems which are built for a specific purpose. Software platforms are built to solve one or more business needs by different vendors whose primary aims are to sell product, not to design systems with selfless interoperability with other platforms. It doesn’t serve them to not include functionality provided by another platform with the intent that consumers will build a complete ecosystem of different, singularly-focused platform modules. Overlapping functionality may, in part, contribute to the confusion of what data lives where and for what purpose.

Scoping by Use Case

In addition to considering platform intent, we can also consider data use cases. Defining how data is to be used may be the most important consideration in determining whether it should be included in taxonomies. The general use case for using a dedicated taxonomy and ontology management system is to centralize metadata values and their attributes. The intent of the system is clear, but the scope of the data included is not. The ability to define use cases for the delivery of taxonomy values as metadata by virtue of having a dedicated taxonomy system at all allows us to change—or perhaps amend—the case for what data should live in the system.

As I noted, some data is simply inappropriate or difficult to manage in taxonomies. Any data that is meant for use in one or more systems that can be reasonably and securely managed in taxonomies, on the other hand, is potentially fair game for centralized management. The larger the platform footprint or reuse across systems is another good indicator that the data must be centralized for publication to multiple systems, one system to be carried with assets or data to other systems, or both.

Starting with data that is reused across multiple systems is one way to separate platform purpose from data use within the platform. Descriptive metadata for geographical locations, product names and attributes, organizational activities and processes, and topics for rich and textual content are just a few of the use cases which likely see data of the same type being used in many scenarios for many purposes. Concepts used for search typeahead suggestions, search results filtering, navigation, and to describe the assets displayed to users are frequently going to be the same. If they are the same, then providing them from common taxonomies not only fixes the concept label and synonyms, but allows for more advanced use cases like querying knowledge graphs, similarity search, and product recommendations. Tying separate but similar concepts together after the fact is a lost opportunity to measure concept and content performance and derive a host of analytic insights from user actions and internal planning decisions.

Another use case consideration is the speed at which the data changes. Rapidly changing data can include time-based or transactional data which is created for one-time use or reflects a moment in time. While it is possible to include rapidly moving data in the taxonomies themselves, data with high velocity can also have high volatility. Rapidly created and changing data can disrupt the purpose of controlled semantic structures by introducing inaccuracies or representing fleeting moments of truth. Using slower-moving taxonomic structures as a semantic layer over quickly changing data can help provide veracity from a consistent source. Taxonomies are still in play, but are not disrupted as a reliable, stable source of truth.

Though not absolute, concepts which don’t hold any meaning when presented alone should also be questioned before living in taxonomies. Numbers representing widths, lengths, distances, or product types, for example, have meaning in context but are unclear when presented alone. Navigational taxonomies or taxonomy-driven search filters may have values important to the end-user experience but don’t have good context when viewed within the context of the larger graph. If using models for machine learning, these types of values may add noise and should either not be built into the taxonomies in the first place or excluded from subgraphs made available for model use.

Any time concepts are added to taxonomies, consider the overall model, the many use cases it will serve, and whether the data makes sense in the total system and data framework.

Polyhierarchy and the Dissolution of Meaning

01/02/2024 13:33 / 2 Comments on Polyhierarchy and the Dissolution of Meaning

https://pixabay.com/illustrations/red-pattern-abstract-background-2703887/

“Everything is everything/What is meant to be, will be.” – Lauryn Hill

Polyhierarchy

Polyhierarchy is “a controlled vocabulary structure in which some terms belong to more than one hierarchy. For example, rose might be a narrower term under both flowers and perennials in a horticulture vocabulary” (ANSI/NISO Z39.19-2005 (R2010), Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies).

While the ANSI/NISO Z39.19-2005 (R2010) standard is still my go-to for foundational taxonomy principles and may provide validation for using concepts in more than one location, I try to avoid polyhierarchy as much as possible. I see it as a construct necessary only in rare situations and because many systems are unable to consume taxonomical concepts in any other way than their actual location in a hierarchy. Specifically, I don’t like polyhierarchy which is 1) abused out of necessity to suit use cases consuming systems can not otherwise meet, or 2) used to solve many, differing use cases. To me, polyhierarchy is the enemy of specificity; it is the forward slash of the taxonomy world…the imprecision and indecision of the either/or.

There is a conflict between the construction of one or more taxonomies for semantic accuracy and how those taxonomies are displayed because of the inability to transform and restructure taxonomies to meet different, real-world use cases. If the use case demands a concept be more than one thing in more than one place, it must be put in all of those locations in the originating taxonomies to suit navigational needs.
My former colleague and contemporary taxonomy practitioner, Bob Kasenchak, wrote in his blog post “On Polyhierarchy”, “The most common misuse of polyhierarchy is overuse: the tendency to give terms multiple parents without sufficient reason.” I agree. This statement gets to my main objection with polyhierarchy in that when it is overused, semantic precision is diluted. When everything is everything, nothing is anything.

Polyhierarchy in Navigational and Information Access Taxonomies

People have different ways of searching for information and, in an online world in which a user can start in any number of locations and expect to get to the information they want, polyhierarchical taxonomies facilitate navigating to information through multiple pathways.

A common and familiar use case for polyhierarchy is in navigational taxonomies used in online retail. Consumers may require multiple entry points in product hierarchies to find what they are looking for. Using a search engine to get to a product display page in the first place is a common scenario in findability, while searching directly on the retailer’s website is often a consumer’s next choice. However, once on a website, users may use navigational structures and filters to get to specific products. Even if the navigational browse taxonomy is displayed as a flat list rather than a hierarchy, having multiple points of entry is going to lead consumers to the product they are seeking.

For example, one might expect to find Basketball shoes under Men, Women, Unisex, AND Kids. One may also expect to find Basketball shoes under Sports > Basketball. Given the current trends in athleisure apparel, one might also expect to locate Basketball shoes under Casual or Lifestyle. These divergences in meaning account for both a consumer’s individual browsing paths and competing notions of what Basketball shoes are worn to do. For a consumer, Basketball shoes may be just as easily in one category as another without any conflicting meanings.

Supporting this use case in one or more back end systems powering a front end experience may demand a concept be placed in more than one location in a taxonomy management system because the downstream system(s) can only consume concepts exactly as they appear in a hierarchy. In this scenario, you are forced to set up taxonomies that look like the following:

Kids’ shoes

Basketball shoes

Men’s shoes

Basketball shoes

Unisex shoes

Basketball shoes

Women’s shoes

Basketball shoes

Sports

Basketball

Basketball shoes

In the Basketball shoes example, the concept isn’t inherently a member of all the locations it is listed, but is listed in all locations as a way to facilitate user access to products through navigation. Even in this oversimplified taxonomy model, the repetition of the concept is becoming unwieldy.

Sometimes products really are two different things which can’t, or shouldn’t, be reconciled. The Z39 provides the example that a piano is both a percussion and stringed instrument. Therefore, on a website which sells many kinds of musical instruments, listing pianos under both seems sensible. Similarly, for a retailer selling toasters, ovens, and toaster ovens, we might expect to see Toaster ovens listed under concepts like Ovens and Countertop appliances.

The same principle applies when accessing informational content. For example, a country can be a part of a continent and a designated geographical region including more than one continent. For example, Denmark is both a part of Europe and EMEA (Europe, Middle East, and Africa). In a hierarchy, the construction may look like this:

Continents

Europe

Denmark

Geographical Regions

EMEA

Denmark

These use cases illustrate a need for polyhierarchy even in cases in which the back end systems may not support the need well.

Polyhierarchy in Semantic Taxonomies

Taxonomies which adhere to more stringent guidelines, which I will term semantic taxonomies, are those which follow taxonomy construction and maintenance standards in an attempt to arrive at more regular, logical structures to reduce or eliminate ambiguity. Building logical, semantic taxonomies have several long-term advantages.

First, adhering to simple principles of placing a concept in its single best location mitigates problems with system interoperability. In some cases, downstream systems consuming from a taxonomy management system can only recognize a single instance of a concept, most likely because it doesn’t have the ability to reconcile a label name with exactly the same string of characters. Another potential issue is consuming systems won’t allow for a concept with any label to have the same GUID to exist in more than one location. In well-structured semantic models, any polyhierarchical concept should only have one GUID or URI and not be a unique instance with exactly the same label but different identifier in each location. In this situation, the system receives the above example taxonomy hierarchy Kids’ shoes > Basketball shoes first on import and ignores each subsequent instance as it reconciles matching label strings.

Second, maintaining models requiring many polyhierarchical concepts becomes more difficult as more instances, and more semantically different domains, are covered by the taxonomies. Using the same form for a concept label with a single URI or GUID for multiple purposes can eventually cause a maintenance breakdown in which the concept loses semantic precision and scope and appears in locations with different logical underpinnings, especially using relationships with unique semantic meanings.

Finally, building semantic taxonomies supports the root purpose of taxonomic structures and ontologies: to define concepts so they are unambiguous. My taxonomy 101 go-to is the “is a…” principle. As a fundamental premise, I reject that a concept in most cases can not be placed in one, single best location expressing its intrinsic meaning. Is a toaster an appliance? Yes. Is an oven an appliance? Yes. Based on this, it’s easy enough to put toasters and ovens in their place.

Polyhierarchy also has acceptable use in semantic taxonomies. A concept can truly be a member of two categories which are overlapping or mutually exclusive. Our Denmark example above is a case in which a concept is a member of two categories. A homograph, like Mercury, is an example of a concept which has several, mutually exclusive, meanings.

However, in both cases, there are modeling choices to avoid polyhierarchy but are dependent on having the right functionality available. If the taxonomy tool supports associative relationships and consuming systems can use both hierarchical and associative relationships, the modeling may include a semantically named relationship in place of a standard hierarchical relationship. The associative relationship is part of geographical region can be used to create a specific semantic relationship to the concept EMEA allowing Denmark to be a child of Europe but not of EMEA.

Continents

Europe

Denmark is part of geographical region EMEA

Geographical Regions

EMEA

In the Mercury example, the Z39 suggests the use of parenthetical qualifiers so the concept appears in mutually exclusive domains which may very well all appear in one thesaurus:

Planets

Mercury (planet)

Metals

Mercury (metal)

Space vehicles

Mercury (space vehicle)

One of the challenges, especially in retail taxonomy concepts, is that concepts are rarely a single term. Returning to our Toasters and Ovens example, the concept Toaster oven was intrinsically two concepts, not one, because we have introduced a pattern or stacking nouns (toaster + oven) to create a new, compound concept. Even more frequently, adjectives are modifying nouns to include more than one independent, atomic concept. For the concept Men’s basketball shoes, the pattern is gender + sport + product. Sticking with our notion of a semantic taxonomy, the three separate concepts can easily belong to three, mutually exclusive schemes covering Gender, Sports, and Products. When the new concept is created, it’s easy to see how concepts find polyhierarchical locations in different schemes to support navigation.

What a thing is versus what is used for can also be problematic and demands a shift in thinking. Or, rather, defining exactly the modeling approach used across a set of taxonomies to maintain consistent semantic principles. Again, I stick with what a thing is. My favorite example is James Bond’s exploding pen from GoldenEye. Is the pen a writing utensil? Yes. Is the pen a weapon? Well…in this case it is. In the narrow perspective of spycraft, perhaps a pen is a weapon, but it is not inherently a weapon. In the Bond universe, a pen could very well appear in a taxonomy of weapons, but, as above, there are concept form and modeling choices which would alleviate the confusion. Rather than Pen, would it not then be entered as Exploding pen? Similarly, Bond has used a Rocket pen and a Poison pen. Once we modify these concepts, they then can find themselves in one best place in a taxonomy of weapons.

Why consider alternate modeling practices to avoid polyhierarchy if the standards and tool functionality allow it? In addition to the two reasons noted in this section, there is planning for unknown domain expansions in attempts to future-proof taxonomies for additional, currently unknown use cases.

Polyhierarchy across a Graph

A fundamental problem in modeling taxonomies is trying to serve two masters by including both semantic structures following logical rules and the useful, though typically less semantically precise, structures required for navigation. By trying to model for both purposes, there are inevitable conflicts which cause compromises in structure and meaning.

Different types of polyhierarchical instances living in the same domain attempting to address conflicting use cases cause the hierarchical taxonomies and the ontologies which provide logical modeling practices for the overall graph to experience semantic drift. While the human mind can understand seeing Dog food as a narrower term for both Pet food and Dogs, a system can only accept the strings it is given.

Using inconsistent modeling practices, like using different types of hierarchical or associative relationships for the same concept, causes concepts to drift from tightly bound semantic meaning, structural context, and scope. As the meaning expands to address more use cases, the precision wanes. As I said earlier, when everything is everything, nothing is anything. In other words, concept meanings become less precise and eventually concepts shift to mean what they are, what they are used for, where they are located in a navigational taxonomy virtual folder structure, who owns the concept, and on and on. The meaning erodes.

So what? We can see the concept in context and figure out what the meaning is, right? So why bother being so tightly bound to the concept meaning. A good use case example is using taxonomies to build machine learning models. The imprecision of having Basketball shoes under multiple parents to provide specific paths for gender navigation while also having the concept nested under sports requires that the model must be trained to understand that a basketball shoe is not a sport but is used for the sport of basketball. The more connections a concept has to other concepts through hierarchical and associative relationships, the more imprecise it becomes across the graph. While hierarchical structures are useful, graphs are even more so, providing the logical underpinnings for machine learning models, knowledge graphs, recommendation systems, semantic search, etc. Precise meaning becomes more important with each use case.

Polyhierarchy isn’t necessarily to be forbidden in semantic structures, but I propose using it sparingly, when a concept has truly more than one meaning, and for semantic structures which can then be transformed to provide concepts in any hierarchical structure for consuming systems and navigational use.

Ontology Migraine or Taxonomy Headaches

04/22/2013 11:46 / Leave a comment

I’ve recently had to defend my position supporting the construction of an overall (uber-, super-, ultra-, mega-) ontology for internal use at my company.

The argument goes something like this: building an all-encompassing, monolithic ontology is a time-consuming, academic exercise with end results which do not justify the effort. Rather than build something so complicated, instead create multiple taxonomies which are specific to the use to which you want to apply them. On the surface, not an unreasonable argument. However, there are several suppositions built into this.

First, you will never mix your content, therefore you will not have to deal with term conflicts arising from multiple taxonomies covering potentially mutually exclusive or overlapping concepts.

Second, you will never have to worry about conflict resolution between taxonomies.

Finally, complicated relationships involved in ontology construction are pervasive within your vocabulary and require rule-building for each and every term.

In my experience, it’s whether you want to suffer the migraine for a shorter period of time or deal with persistent headaches without end. While neither taxonomies or ontologies are ever finished, they do arrive at a place where there is less construction and more maintenance. While an ontology requires a very large up-front construction effort, the maintenance thereafter is not particularly difficult (working on the assumption that your foundation was sound). Contrary to some notions, an ontology does not require laborious rule writing. Save that effort for when you truly can’t resolve conflicts.

The same goes for taxonomies. However, when your content begins to mix, you will have conflicts. The more mixing, the more conflict resolution between terms. These can cascade into the realization that your separate taxonomies are incompatible. I’ve worked on many more projects requiring the creation of one taxonomy or ontology from many taxonomies than the other way around.

Yes, the idea that you’ll build the ontology to cover all topics is ludicrous, but they can be built from the ground up in such a way as to cover your domain and be flexible enough to include new areas. Like anything else, planning a good, all-encompassing ontology is difficult while creating taxonomies “on the fly” or at least in short order based on new content is much easier. What you get, however, is sprawl and conflict which negates the short-term win of an easy build.

Give me one migraine over multiple headaches if I have the choice.

One’s destinati…

01/11/2013 10:25 / Leave a comment

One’s destination is never a place, but rather a new way of looking at things. — Henry Miller

Information Panopticon

Category Archives: Uncategorized

How Not to Say What You Mean

Tempering Your “No Filter” Taxonomy

Defining Taxonomy Borders

Polyhierarchy and the Dissolution of Meaning

Polyhierarchy

Polyhierarchy in Navigational and Information Access Taxonomies

Polyhierarchy in Semantic Taxonomies

Polyhierarchy across a Graph

Ontology Migraine or Taxonomy Headaches

One’s destinati…

Post navigation