Scenario 3: RAN-Transport-Core Finger Point

Executive summary:

Mobile service incidents often span RAN, transport, and core domains. The operational failure mode is predictable: each domain sees valid symptoms, but without a dependency graph that connects them, teams default to local reasoning and the incident turns into a coordination problem.

This scenario compares two approaches:

  • LLM-based AIOps: good at summarizing alarms and complaints within each domain, but tends to fragment the incident into multiple narratives.
  • GNN-based AIOps: topology-aware reasoning across a dependency graph that links cells, backhaul, aggregation, and core functions into one causal chain.

The outcome is a large MTTR gap: LLM 4 to 12 hours vs GNN 30 to 60 minutes.

Scenario overview

Network context: Mobile operator.
Trigger: Transport path degradation.
Symptoms: RAN, core, and transport alarms; customer experience degradation.
Hidden cause: Transport impairment affecting specific cell clusters and services.
Business impact: Churn risk and NPS hit.

Side-by-side timeline: LLM behavior vs GNN behavior

T+0s: Transport impairment begins

  • Network reality: backhaul transport impairment begins (CRC/errors on a subset of links).
  • LLM behavior: waits for domain alarms/logs.
  • GNN behavior: detects physical impairment on specific transport edges.
  • Outcome: early anchor in transport domain.

T+30s: RAN KPIs degrade

  • Network reality: handover failures and throughput dips appear.
  • LLM behavior: sees RAN alarms; may label as “RAN issue.”
  • GNN behavior: maps RAN cells to backhaul dependencies; flags shared transport edges.
  • Outcome: cross-domain linkage established.

T+1m: Core sees session drops

  • Network reality: core sees session drops for affected subscribers.
  • LLM behavior: sees core logs; may label as “core instability.”
  • GNN behavior: traces service chain: cell -> backhaul -> aggregation -> core function -> subscriber sessions.
  • Outcome: single incident across domains.

T+2m: Multi-team alarms and blame ping-pong

  • Network reality: multiple domain teams see alarms; coordination overhead starts.
  • LLM behavior: produces separate summaries per domain; lacks unifying causal proof.
  • GNN behavior: produces unified causal chain and ownership: transport edge impairment driving RAN and core symptoms.
  • Outcome: faster accountability.

T+5m: Customer complaints increase

  • Network reality: experience metrics worsen; complaints arrive.
  • LLM behavior: summarizes complaints; suggests generic mitigations.
  • GNN behavior: quantifies impacted subscribers/cells and predicts spread if impairment worsens.
  • Outcome: prioritized response.

T+8m: Wrong lever temptation

  • Network reality: engineers consider RAN parameter tweaks (often wrong lever).
  • LLM behavior: may recommend parameter changes based on text patterns.
  • GNN behavior: shows RAN changes will not resolve transport-edge causality; recommends transport reroute/isolation.
  • Outcome: prevents wasted actions.

T+12m: Transport reroute attempted

  • Network reality: reroute is attempted for affected paths.
  • LLM behavior: observes “changes made” and waits.
  • GNN behavior: validates reroute restores dependency chain; monitors expected KPI recovery sequence.
  • Outcome: controlled remediation.

T+15m: KPIs improve

  • Network reality: RAN KPIs improve; core sessions stabilize.
  • LLM behavior: generates resolution narrative; may misattribute cause.
  • GNN behavior: confirms closure: transport edge metrics normalize first; downstream KPIs recover as predicted.
  • Outcome: strong RCA evidence.

T+30m: Postmortem and hardening

  • Network reality: operator wants to identify weak backhaul segments.
  • LLM behavior: produces text summary.
  • GNN behavior: outputs ranked backhaul edges by risk and customer impact using the dependency graph.
  • Outcome: preventive investment guidance.

T+1d: Repeat impairment risk

  • Network reality: similar impairment occurs; need faster detection.
  • LLM behavior: learns incident-by-incident.
  • GNN behavior: continuous cross-domain correlation with deterministic ownership.
  • Outcome: operational maturity gain.

Why this gap exists: cross-domain incidents require a dependency graph

In mobile networks, symptoms are distributed across RAN, transport, and core. Without a graph that connects these domains, “correlation” becomes a meeting.

LLMs can summarize each domain’s alarms. But they do not inherently anchor causality to the physical transport edges and then prove propagation through the service chain.

What to ask when evaluating AIOps for mobile cross-domain incidents

  1. Can it anchor the incident to specific transport edges (CRC/errors) and not just “RAN degradation?”
  2. Can it map impacted cells to backhaul dependencies automatically?
  3. Can it trace a single causal chain across RAN -> transport -> core -> subscriber sessions?
  4. Can it quantify impacted subscribers/cells and prioritize response?
  5. Can it validate remediation by checking the expected recovery order?

NetAI perspective

NetAI GraphIQ uses GNN-powered dependency-graph reasoning to unify RAN, transport, and core into one incident thread with deterministic ownership.