Grab Clappia’s 50% OFF Black Friday Deal before it’s gone! Ends 05 Dec 2025.
View offer →
#bf-banner-text { text-transform: none !important; }
Outage Taxonomy Design That Speeds Up Root-Cause Analysis: A Four-Level Framework for Field Operations

Outage Taxonomy Design That Speeds Up Root-Cause Analysis: A Four-Level Framework for Field Operations

By
Verin D'souza
April 13, 2026
|
15 Mins
Table of Contents

Ask any operations manager what slows down their post-incident reviews and the answer is almost always the same: inconsistent fault classification. One engineer logs a power-related outage under 'Equipment Failure'. Another logs an identical incident under 'Site Issue'. A third leaves the category blank and writes three paragraphs in a free-text remarks field. By the time someone tries to run a quarterly root-cause analysis, the data is too noisy to reveal anything actionable.

The frustrating part is that this is not a people problem. It is a form design problem. When a ticketing form presents a single free-text field or a flat list of thirty fault categories with no structure, the person filling it in is being asked to make a classification decision without enough guidance. Different people make that decision differently. Your data inherits that inconsistency permanently.

A well-designed outage taxonomy solves this at the point of entry. Instead of one flat list, you build a guided cascade: four levels of increasingly specific classification, where each selection narrows the next. The person filling in the ticket is led from a broad category to a precise cause in four clicks, and every ticket in your system ends up consistently coded. This article explains how to design that taxonomy and how to build it in Clappia, along with the contextual fields and closure narratives that turn each ticket into a learning record your team can actually act on.

Why Four Levels?

The number of levels in a taxonomy is a balance between granularity and usability. Too few levels and your categories are too broad to be useful for analysis. Too many levels and the form becomes a chore that field teams rush through, selecting the first plausible option rather than the accurate one.

Four levels hits the right balance for most field operations scenarios. Here is what each level does:

LevelNameWhat It CapturesExample
1Fault CategoryThe broad infrastructure domain where the fault occurredActive, Passive, Fiber, Others
2Fault Sub-CategoryThe specific system or component type within that domainRadio Equipment, Power Supply, Backbone Link, Theft
3Outage Major ReasonThe functional area or system that failedHardware failure, Generator issue, Cable cut, Vandalism
4Outage Sub-CategoryThe precise cause, often with exclusion or eligibility flagsBattery failure (excluded), Fibre splice damaged (eligible)

Each level depends on the selection made in the level above. A Fiber fault shows completely different sub-categories than an Active fault. An Active sub-category of Radio Equipment shows different major reasons than one for Power Supply. This cascading dependency is what makes the taxonomy both structured and usable: the person classifying the fault never sees options that are irrelevant to what they have already selected.

A taxonomy that guides the user rather than overwhelming them produces accurate data as a side effect of good design. The user is not trying to classify correctly; they are just answering the next logical question.

Designing Each Level of the Taxonomy

Level 1: Fault Category

This is the entry point and should cover the broadest possible groupings relevant to your infrastructure. For network or field operations, four categories cover the majority of fault types:

  • Active: Faults in electronic or powered equipment such as radios, controllers, or powered network hardware.
  • Passive: Faults in non-powered infrastructure such as enclosures, power systems, cooling, and generators.
  • Fiber: Faults in the physical transmission medium, including backbone links, last-mile connections, and splice points.
  • Others: Incidents that do not fit cleanly into the above categories, such as theft, access issues, or project-related disruptions.

If your operations cover different infrastructure types, adapt these category names to match your environment. The principle stays the same: the categories at Level 1 should be mutually exclusive and collectively exhaustive. Every fault your team encounters should fit into exactly one of them.

Level 2: Fault Sub-Category

Each Level 1 category fans out into the specific component types or system areas that can fail within it. The sub-categories for each parent are distinct, so the options shown to the user change completely based on what they selected at Level 1. Some examples of how this maps:

Fault Category (Level 1)Fault Sub-Categories (Level 2)
ActiveRadio Equipment, Transmission Hardware, Controller / Management System, Power Electronics
PassiveGenerator / DG Set, Battery / UPS, Cooling System, Civil Structure, Earthing
FiberBackbone Link, Last-Mile Connection, Splice Point, Aerial Cable
OthersTheft / Vandalism, Access Issue, Planned Maintenance, Project Work

Keep sub-category lists focused. Resist the temptation to add every possible permutation at this level. The goal is to narrow the domain so that Level 3 becomes meaningful, not to achieve exhaustive classification at Level 2 itself.

Level 3: Outage Major Reason

This is where the taxonomy starts to carry real analytical value. Level 3 identifies the functional reason for the fault within the sub-category selected. For a generator sub-category, the major reasons might be fuel exhaustion, mechanical failure, or control system fault. For a fiber backbone sub-category, the major reasons might be physical cable cut, splice failure, or connector degradation.

At this level, your choices start to map directly to the actions that different teams are responsible for. A fuel exhaustion fault goes to the logistics team. A mechanical failure goes to the maintenance contractor. A cable cut may go to the civil team or the network operator depending on the location. The major reason field is where the taxonomy starts driving dispatch and accountability, not just classification.

Level 4: Outage Sub-Category

The final level is the most granular and the most operationally specific. Each option at this level represents a precise fault cause that your team can investigate and resolve through a defined procedure. This is also where you can tag each option with eligibility or exclusion flags.

Exclusion tagging deserves a specific mention. In SLA-driven operations, not every outage counts against your performance targets. A fault caused by a third-party civil contractor cutting a cable may be excluded from your SLA calculations. A fault caused by a planned maintenance activity is excluded by definition. By tagging Level 4 options with a Yes or No exclusion flag, you build that distinction into the taxonomy itself. Every ticket carries the eligibility status of the fault cause, and your SLA reports can filter on it automatically without anyone having to make a post-hoc judgement about which incidents count.

Contextual Fields That Enrich the Taxonomy

The four-level cascade captures the what and why of a fault. But several additional fields around it provide the context that turns a classified ticket into a fully interpretable record. Three of these deserve particular attention.

Site Power Combination

For infrastructure sites, the power configuration at the time of the fault is highly relevant to root-cause analysis. A site running on a generator at the time of an outage has a different risk profile than a site with mains power. A solar-plus-battery site has different failure modes than a mains-plus-generator site.

Adding a Site Power Combination field (as a Single Selector block with options such as Solar + Mains, Mains + Generator, Solar + Generator) gives your analysis layer the context it needs to correlate fault types with power configurations. Over time, this correlation often reveals patterns that inform infrastructure investment decisions, such as which power configurations are most associated with active equipment failures at specific temperature ranges or in specific regions.

Estimated Time of Resolution

The Estimated Time of Resolution (ETR) field, set at the time the ticket is created, serves a dual purpose. In the short term, it communicates a commitment to the team responding and to any stakeholders who receive the initial notification. In the long term, comparing estimated versus actual resolution times across the taxonomy reveals where your team consistently underestimates complexity.

If your Active equipment failures are consistently resolved faster than estimated, your ETR options are probably too conservative for that category. If your Fiber backbone cuts consistently run over ETR, you may need to extend your default commitment for that fault type, or investigate why restorations in that category take longer than expected. The ETR field creates the data to have that conversation with evidence rather than intuition.

Hub Site Impact

Where a fault involves a hub site affecting multiple dependent sites, capturing that context alongside the taxonomy data is essential for accurate impact assessment. The number of affected child sites and the name of the hub site, stored as conditional fields that only appear when the fault is flagged as a hub incident, give your reports the ability to weight outage severity by total sites affected rather than treating every ticket as a single-site event.

This matters specifically for root-cause analysis at the taxonomy level: if your Fiber backbone faults are disproportionately associated with hub site outages, that tells you something important about where your network's most critical vulnerabilities sit.

Building the Cascade in Clappia

The four-level taxonomy is built using Single Selector blocks with display conditions that link each level to the selection made in the level above. Here is how to set it up, step by step, inside the Fault Description section of your ticketing app.

Step 1: Add the Level 1 Selector

Add a Single Selector block labelled 'Fault Category'. Add your top-level options: Active, Passive, Fiber, Others. This field has no display condition and is always visible whenever the fault description section is open.

Step 2: Add the Level 2 Selector with a Display Condition per Option Group

The simplest approach at Level 2 is to add a Single Selector block labelled 'Fault Sub-Category' and use a display condition that shows it only when Fault Category has been selected:

{Fault Category} <> ""

Then populate the options list with all sub-categories across all parent categories, using a naming convention that makes the parent visible in the option text. For example, prefix each option with its parent category: 'Active: Radio Equipment', 'Active: Power Electronics', 'Passive: Generator', 'Passive: Battery', and so on. This approach keeps the build simple while keeping the options meaningful.

An alternative approach is to create separate Level 2 selector blocks for each parent category, each with a display condition that shows only when Fault Category matches its parent. This is more verbose to build but produces cleaner option lists with no prefix clutter. Choose whichever approach suits your team's tolerance for form complexity.

Step 3: Add the Level 3 Selector

Add a Single Selector block labelled 'Outage Major Reason'. Set the display condition to show this field only when Fault Sub-Category is populated:

{Fault Sub-Category} <> ""

Populate the options list with major reasons relevant to each sub-category. As with Level 2, you can either use a prefix convention or create separate blocks per sub-category. For most implementations, a single block with a prefix convention is sufficient unless your taxonomy is very large.

Step 4: Add the Level 4 Selector with Exclusion Tagging

Add a Single Selector block labelled 'Outage Sub-Category'. Set the display condition to show when Outage Major Reason is populated:

{Outage Major Reason} <> ""

For exclusion tagging, include the eligibility flag in the option text itself. For example: 'Battery failure - SLA Excluded', 'Fibre splice damaged - SLA Eligible', 'Third-party cable cut - SLA Excluded'. This makes the status visible to the person filling in the ticket and searchable in your reports without requiring a separate field.

If you need the exclusion flag as a separate filterable data point, add a read-only Formula block next to the Level 4 selector that returns 'Excluded' or 'Eligible' based on the selected value. In the Clappia formula editor, type @ to insert field variables by selecting from the list of fields in your app. The formula would look like this:

IF(CONTAINS({Outage Sub-Category}, "Excluded"), "SLA Excluded", "SLA Eligible")

Fields used: Outage Sub-Category.

What the formula does: Checks whether the selected sub-category option text contains the word 'Excluded' and returns the appropriate SLA status string.

What the user sees: A read-only field showing 'SLA Excluded' or 'SLA Eligible', which is stored with the submission and available as a filter in reports and exports.

Formula Helpers for Clean Notifications

The fault taxonomy data is most useful when it appears clearly in the notifications that go out at ticket creation time. Raw date and time field values from Clappia can sometimes render in formats that are not immediately readable in a Telegram message or email. Two formula helper fields solve this cleanly.

Formatted Fault Start Date

Add a Formula block labelled 'Formatted Start Date'. In the formula editor, type @ and select the Fault Start Date field from the list. The formula formats it as a readable string:

TEXT({Fault Start Date}, "DD-MMM-YYYY")

Fields used: Fault Start Date.

What the formula does: Converts the date value to a string in the format '14-Jun-2025'.

What the user sees: A read-only field. The value of this formula field is what you reference in your Telegram message and email templates, not the raw date field. This ensures the date appears consistently formatted regardless of the device locale or regional settings of the person who created the ticket.

Formatted Fault Start Time

Add a corresponding Formula block labelled 'Formatted Start Time':

TEXT({Fault Start Time}, "HH:MM")

Fields used: Fault Start Time.

What the formula does: Converts the time value to a 24-hour format string such as '14:35'.

What the user sees: A read-only field used in notification templates. Combined with the formatted date, your Telegram messages will show 'Fault Start: 14-Jun-2025 14:35' rather than a raw timestamp that varies depending on device settings.

These helper fields are kept hidden from the main form view since they are purely for notification formatting. Set their display condition to never show, or place them at the bottom of the section where they do not interrupt the form flow.

Closure Narratives: Turning Tickets into Learning Records

The taxonomy captures what failed and why. The closure narratives capture what was learned and what was done about it. Together, they make each closed ticket a self-contained learning record that your team can reference when a similar fault occurs in the future.

Three fields make up the standard closure narrative set:

Reason for Outage (RFO)

This is a concise factual statement of what caused the outage. It should be specific enough to distinguish this incident from superficially similar ones. 'Generator fuel exhaustion due to delayed replenishment schedule' is a useful RFO. 'Generator issue' is not, because it does not say anything the taxonomy has not already captured.

In Clappia, add a Long Text block labelled 'Reason for Outage (RFO)'. Make it required on ticket closure so it cannot be skipped. A blank RFO is the most common gap in closure documentation.

Root Cause Analysis (RCA)

The RCA goes one level deeper than the RFO. While the RFO describes what happened, the RCA explains why it happened and, where relevant, what conditions allowed it to happen. A strong RCA answers not just 'what broke' but 'what process, maintenance gap, or infrastructure condition created the conditions for this fault'.

Add a Long Text block labelled 'Root Cause Analysis (RCA)'. This is where the operational learning lives. An RCA that identifies a recurring maintenance gap or a design vulnerability has value far beyond the individual ticket; it informs decisions that prevent future incidents.

Action Taken for Fault Rectification

Add a Long Text block labelled 'Action Taken for Fault Rectification'. This documents what was physically done to restore service: which components were replaced, what configuration changes were made, what temporary workarounds are in place, and whether a permanent fix is still pending. This field is particularly valuable for the next engineer who attends the same site for a related fault.

The three closure narrative fields, combined with the four-level taxonomy classification, mean that every closed ticket tells a complete story: what failed (Level 1 to 4 taxonomy), why it failed (RFO and RCA), how it was fixed (Action Taken), and how long it took (the auto-computed repair time formula). That story is queryable, filterable, and exportable in a way that free-text incident reports never are.

How the Taxonomy Improves Over Time

A taxonomy is not a static design. The first version you deploy will be imperfect in ways you cannot fully anticipate until your team has been using it for a few months. Some Level 3 options will turn out to be too granular and will rarely be selected. Some Level 2 categories will attract a disproportionate share of 'Other' selections, which signals that a missing option needs to be added. Some Level 4 options will be consistently selected together, suggesting they should be merged or that the path through the cascade could be shortened.

The way to catch these issues is to review your taxonomy data quarterly. Look for:

  • High 'Other' usage at any level: This means your taxonomy has a gap. People are using 'Other' as a catch-all because the right option does not exist. Add the missing options.
  • Options that are never selected: These add clutter without adding value. Remove them or consolidate them with adjacent options.
  • Inconsistent selections for known fault types: If two engineers attending identical faults are classifying them at Level 3 and 4 differently, your option labels are ambiguous. Rewrite them with more specific descriptions.
  • RCA patterns across multiple tickets: If the same root cause appears in the RCA field of ten separate tickets over a quarter, that is a systemic issue that the taxonomy data has helped surface. It is also an input for your next infrastructure or process improvement review.

The taxonomy is not just a classification system. It is a feedback loop. Every ticket you close with complete taxonomy data and closure narratives is a data point that makes your next quarterly review more accurate and your next infrastructure decision better informed.

Access and Mobile Considerations

The taxonomy section of your ticketing form is typically completed by the field engineer on site or by a NOC analyst creating the ticket remotely. Either way, the form needs to be usable on a mobile device in conditions that are not always ideal.

A few practical points on access configuration and mobile use in Clappia:

  • Field engineers: Should have standard user access with the ability to create and update tickets. The taxonomy fields are part of the standard ticket creation flow, so no special permissions are needed.
  • NOC analysts: Often create tickets based on alarm system alerts rather than being physically present. They should have the same create and update access, with view access to all submissions so they can monitor the ticket list.
  • Offline mode: In field operations, the site experiencing the fault is often the same site providing local network connectivity. Clappia's offline mode allows field engineers to complete the taxonomy classification, attach photos, and submit the ticket without a signal. The submission syncs when connectivity is restored, and the workflow notifications fire at that point.
  • Taxonomy usability on mobile: Because the cascade reduces the visible options at each level, the form stays manageable on a small screen even with four levels of classification. The person never sees more than a handful of relevant options at any point, which is important when the form is being filled out on a phone at a fault site in poor lighting.

Conclusion

The quality of your root-cause analysis is directly proportional to the quality of the data you collect at the time the ticket is created. A four-level cascaded taxonomy, designed with mutually exclusive categories, context-enriching fields like power combination and hub site impact, and closure narratives that capture the learning from each incident, gives you data that is actually useful for analysis rather than data that merely records that something went wrong.

Building this in Clappia is a form design exercise as much as a data design one. The cascade is implemented through Single Selector blocks with display conditions. The contextual fields sit alongside the taxonomy in the same section. The closure narratives live in a separate general information section that becomes relevant at the end of the ticket lifecycle. The formula helpers keep your notification output clean and consistent regardless of device locale.

None of this requires technical development skills. It requires thinking clearly about what your operations team actually needs to know when they look at a fault record three months after it was closed, and designing the form to capture that information at the point when the person attending the fault knows it best.

FAQ

Build Your Outage Taxonomy App in Clappia Today

Build Your Outage Taxonomy App in Clappia TodayGet Started – It’s Free

Build Your Outage Taxonomy App in Clappia Today

Summary

Close