No more mystery tins: labels add meaning to content

Yves Barbion
2 dagen geleden
3 minuten om te lezen

The importance of semantic markup for chatbots and AI agents

You may have heard about this wedding-day prank that is sometimes played on a befriended couple: remove all labels from the canned goods in the pantry and neatly return the "blind" cans to the shelves. The result is confusing at best, with time and again a surprising answer to the question "what are we having for dinner tonight?".

Welcome to the world of unstructured content.

Life's a box of chocolates (or unlabeled cans)...

Why structure counts

For humans, such a daily pantry-surprise is still manageable, but for an AI agent or chatbot, unstructured content is difficult to use effectively. Without labels, the system doesn't know what it's reading, where it belongs, or how to use it.

To add to the surprise, you could stick a label reading "RED" to it, in bold. That would tell us something about a property of the content, its color, but nothing about the content itself. That is very similar to adding bold or italics: the text looks a bit different, but the question is still: why? For AI, formatting has very little meaning.

Structure with Markdown or DITA

In the world of technical documentation, you see the same difference between text written in Markdown or in DITA. Both are open, text-based formats, but with a different type of markup.

Markdown is fast, light and human-readable. You use special characters ("syntax") to structure your text, for example # for titles, 1. for steps, and > for warnings. It's like providing the labels on the food cans with a self-invented icon: better than nothing, but still open to interpretation.

DITA markup, on the other hand, uses semantic labels, such as <task>, <step> and <note>, which explicitly show the type of content contained within the element:

<task>

<title>Replace the toner cartridge</title>

<steps>

<step><cmd>Open the front cover.</cmd></step>

<step><cmd>Remove the used cartridge.</cmd></step>

</steps>

<note type="caution">Do not touch the green drum.</note>

</task>

AI now knows the semantic category of each element: a task with steps and a warning. Therefore, DITA content is not just text with a style, but intelligent content with meaningful labels.

A neatly organized pantry shelf filled with assorted canned foods, all with colorful labels facing forward in tidy rows.

Why semantic markup is crucial for AI and chatbots

Chatbots and LLM systems (such as ChatGPT) learn more from meaningful structure than from visual formatting. Semantic markup such as DITA:

Makes meaning explicit: the model does not have to interpret whether something is a step, a warning or a task.
Supports accurate retrieval: Retrival-augmented generation (RAG) systems can retrieve tasks, warnings, or definitions separately.
Improves reuse and consistency: one semantic block can be found in a manual, chatbot or training.
Reduces hallucination: because the structure guards the context, the AI stays closer to the facts.

In short: AI performs better with labels that carry meaning.

The moral of this story

Without semantic markup, your content is like a closet full of anonymous tins. Semantic labels — such as <menucascade>, <term>, <codeblock> or <note type="caution"> — give each can its correct label: peeled tomatoes, asparagus, salsify.

This way, your chatbot knows exactly what's inside, and you get reliable, contextual answers.

Or, in Flow language: Markdown gives shape. DITA gives meaning. And meaning is what AI needs to become truly smart.

Do you want to know how your organization can take that step towards semantic, AI-friendly content? Get in touch with our team at Flow, we'll be happy to help you bring order to your pantry.