ROS 2 Diagnostics Guide 2026
Monitor robot health with the ROS 2 diagnostics stack: diagnostic_updater for per-node status reporting, FrequencyStatus and TimeStampStatus for common checks, diagnostic_aggregator with GenericAnalyzer YAML config, and rqt_robot_monitor for real-time visualization.
Why Diagnostics Matter
Without diagnostics, a crashed sensor driver looks identical to a healthy robot from the outside — the robot just stops working. With diagnostics, you get a timestamped record of when each component went ERROR or STALE, which component caused the failure, and the exact error message. Essential for field robots where you can't attach a debugger.
ROS 2 Diagnostics Overview
The diagnostics system provides a standardized way for nodes to report their health status. Every component publishes to /diagnostics, which aggregators collect, group, and escalate.
ROS 2 Diagnostics Architecture
═══════════════════════════════════════════════════════
Node (sensor driver, motor controller, etc.)
│
│ publishes DiagnosticArray → /diagnostics (1 Hz)
│
▼
diagnostic_aggregator
│ subscribes to /diagnostics
│ groups by Analyzer (namespace, prefix, name)
│ publishes → /diagnostics_agg (1 Hz)
│
▼
rqt_robot_monitor / robot_monitor / custom dashboard
DiagnosticStatus levels:
OK (0) — component healthy
WARN (1) — degraded, may need attention
ERROR (2) — component failed
STALE (3) — no updates received (timeout)
Key packages:
diagnostic_updater — C++ and Python helper classes for publishers
diagnostic_aggregator — subscriber that groups and analyzes statuses
rqt_robot_monitor — GUI tool to visualize aggregated diagnostics
Install:
sudo apt install -y ros-humble-diagnostic-updater ros-humble-diagnostic-aggregator ros-humble-rqt-robot-monitor⚡ Always add diagnostics to production robot nodes. When a field robot fails, the /diagnostics_agg topic is often the first place to check — it shows which component raised an error and when it went STALE.
diagnostic_updater — Python Node
Use diagnostic_updater.Updater to publish health status from your node. Add DiagnosticTask callbacks that set the status level and message.
import rclpy
from rclpy.node import Node
import diagnostic_updater
import diagnostic_msgs.msg
class MotorControllerNode(Node):
def __init__(self):
super().__init__("motor_controller")
# Create the updater (publishes to /diagnostics at 1 Hz)
self.updater = diagnostic_updater.Updater(self)
self.updater.setHardwareID("MotorController-v2.1")
# Add named diagnostic tasks
self.updater.add("Motor Temperature", self._check_temperature)
self.updater.add("Motor Current", self._check_current)
self.updater.add("Encoder Status", self._check_encoder)
# Simulated sensor values
self._temperature = 45.0 # °C
self._current = 2.3 # Amps
self._encoder_ok = True
# Timer to update sensor readings
self.create_timer(0.1, self._sensor_update)
self.get_logger().info("Motor controller ready")
def _sensor_update(self):
"""Update sensor readings (replace with real hardware reads)."""
self._temperature += 0.01 # simulate heating
# self.updater.update() is called automatically at 1 Hz by the updater
def _check_temperature(self, stat: diagnostic_updater.DiagnosticStatusWrapper):
stat.add("Temperature (°C)", f"{self._temperature:.1f}")
stat.add("Threshold WARN (°C)", "70.0")
stat.add("Threshold ERROR (°C)", "85.0")
if self._temperature > 85.0:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.ERROR,
f"OVERHEATING: {self._temperature:.1f}°C")
elif self._temperature > 70.0:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.WARN,
f"High temperature: {self._temperature:.1f}°C")
else:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.OK,
f"Temperature normal: {self._temperature:.1f}°C")
return stat
def _check_current(self, stat: diagnostic_updater.DiagnosticStatusWrapper):
stat.add("Current (A)", f"{self._current:.2f}")
if self._current > 10.0:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.ERROR,
"Overcurrent detected")
elif self._current > 7.0:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.WARN,
"High current draw")
else:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.OK,
f"Current normal: {self._current:.2f}A")
return stat
def _check_encoder(self, stat: diagnostic_updater.DiagnosticStatusWrapper):
stat.add("Encoder connected", str(self._encoder_ok))
if not self._encoder_ok:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.ERROR,
"Encoder disconnected")
else:
stat.summary(diagnostic_msgs.msg.DiagnosticStatus.OK,
"Encoder OK")
return stat
def main():
rclpy.init()
node = MotorControllerNode()
rclpy.spin(node)FrequencyStatus and TimeStampStatus
diagnostic_updater provides pre-built tasks for common checks: FrequencyStatus monitors topic publish rate, TimeStampStatus checks message timestamp freshness.
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import LaserScan
import diagnostic_updater
class LidarNode(Node):
def __init__(self):
super().__init__("lidar_node")
self.updater = diagnostic_updater.Updater(self)
self.updater.setHardwareID("RPLIDAR-S3-SN12345")
# ── FrequencyStatus ───────────────────────────────────
# Monitors how often tick() is called — warns if rate drifts
freq_params = diagnostic_updater.FrequencyStatusParam(
{"min": 9.5, "max": 10.5}, # expected: 10 Hz ±5%
tolerance=0.05,
window_size=10
)
self.freq_status = diagnostic_updater.FrequencyStatus(freq_params)
self.updater.add(self.freq_status)
# ── TimeStampStatus ───────────────────────────────────
# Warns if message.header.stamp is stale (older than max_acceptable)
ts_params = diagnostic_updater.TimeStampStatusParam(
min_acceptable=-1.0, # allow 1s in the past (bag replay)
max_acceptable=0.5, # warn if older than 0.5s
)
self.ts_status = diagnostic_updater.TimeStampStatus(ts_params)
self.updater.add(self.ts_status)
# Subscriber
self.sub = self.create_subscription(
LaserScan, "/scan", self._scan_cb, 10
)
def _scan_cb(self, msg: LaserScan):
# Tick frequency counter
self.freq_status.tick()
# Tick timestamp checker
stamp_sec = msg.header.stamp.sec + msg.header.stamp.nanosec * 1e-9
self.ts_status.tick(stamp_sec)⚡ FrequencyStatus is the most useful built-in task — it catches sensor drivers that drop packets, get throttled by a busy CPU, or silently stop publishing. Set the window_size to cover ~10 seconds of data so brief glitches don't trigger WARN.
diagnostic_aggregator — Group and Namespace Diagnostics
diagnostic_aggregator subscribes to /diagnostics and organizes statuses by prefix/namespace. Configure it with a YAML analyzers file.
# analyzers.yaml — configuration for diagnostic_aggregator
# Launch: ros2 run diagnostic_aggregator aggregator_node
# --ros-args -p analyzers:=<path>/analyzers.yaml
analyzers:
ros__parameters:
path: /Robot
analyzers:
# Group all motor-related diagnostics under /Robot/Motors
motors:
type: diagnostic_aggregator/GenericAnalyzer
path: Motors
prefix: Motor
# Matches any DiagnosticStatus whose name starts with "Motor"
# e.g., "Motor Temperature", "Motor Current", "Motor Controller"
# Group all sensor diagnostics under /Robot/Sensors
sensors:
type: diagnostic_aggregator/GenericAnalyzer
path: Sensors
contains:
- Lidar
- Camera
- IMU
- Encoder
# Matches DiagnosticStatus names CONTAINING any of these strings
# Group by exact hardware ID
power:
type: diagnostic_aggregator/GenericAnalyzer
path: Power
prefix: Battery
num_items: 3 # expect exactly 3 battery status messages
timeout: 5.0 # go STALE if no update in 5 seconds
# Catch-all for unmatched diagnostics
other:
type: diagnostic_aggregator/GenericAnalyzer
path: Other
regex: ".*"
remove_prefix: ""Launch Aggregator and Monitor
Launch diagnostic_aggregator alongside your robot nodes and view results with rqt_robot_monitor or by echoing /diagnostics_agg.
# launch/diagnostics.launch.py
from launch import LaunchDescription
from launch_ros.actions import Node
import os
def generate_launch_description():
analyzers_config = os.path.join(
get_package_share_directory("my_robot"),
"config", "analyzers.yaml"
)
return LaunchDescription([
# Your robot nodes
Node(package="my_robot", executable="motor_controller"),
Node(package="my_robot", executable="lidar_driver"),
# Diagnostic aggregator
Node(
package="diagnostic_aggregator",
executable="aggregator_node",
name="diagnostic_aggregator",
parameters=[analyzers_config],
output="screen",
),
# Optional: republish aggregated to /diagnostics for rqt_robot_monitor
# rqt_robot_monitor reads /diagnostics_agg by default
])
# ── From terminal ────────────────────────────────────────────
# View raw diagnostics (per-node output)
ros2 topic echo /diagnostics
# View aggregated diagnostics (grouped output)
ros2 topic echo /diagnostics_agg
# Launch rqt_robot_monitor GUI
ros2 run rqt_robot_monitor rqt_robot_monitor
# Check for errors at any level (one-liner for CI)
ros2 topic echo /diagnostics_agg --once | grep -E "level: [12]"
# level 1 = WARN, level 2 = ERROR⚡ In CI pipelines, pipe /diagnostics_agg to a script that exits with code 1 if any status is ERROR. This catches sensor failures in automated tests before the robot ships to the field.
STALE Detection and Timeout Configuration
A diagnostic status goes STALE when the aggregator stops receiving updates from a node. This catches nodes that silently crash or freeze.
# Configuring STALE timeout in diagnostic_aggregator
# analyzers.yaml:
# sensors:
# type: diagnostic_aggregator/GenericAnalyzer
# path: Sensors
# prefix: Lidar
# timeout: 5.0 # mark STALE after 5s with no update
# In your node: updater publishes at 1 Hz by default.
# Force a specific rate:
from diagnostic_updater import Updater
class MyNode(Node):
def __init__(self):
super().__init__("my_node")
self.updater = Updater(self)
self.updater.setHardwareID("sensor-001")
# Change publish period (default = 1.0s)
self.updater.period = 0.5 # 2 Hz
# Add a "heartbeat" task — always publishes OK while node is running
self.updater.add("Heartbeat", lambda stat: (
stat.summary(0, "Node alive"),
stat
)[-1])
# Manually force an update (useful after a state change):
self.updater.force_update()
# Force an error (useful for testing):
self.updater.broadcast(2, "Emergency stop triggered")
# All status in the next publish will be ERROR with this messageQuick Reference
| Concept / API | Details |
|---|---|
| Install | sudo apt install ros-humble-diagnostic-updater ros-humble-diagnostic-aggregator ros-humble-rqt-robot-monitor |
| Updater (Python) | diagnostic_updater.Updater(self); updater.add('name', callback) |
| setHardwareID | self.updater.setHardwareID('MyDevice-SN001') — sets the hardware_id field |
| Status levels | 0=OK, 1=WARN, 2=ERROR, 3=STALE |
| FrequencyStatus | Monitors publish rate; tick() on each message received |
| TimeStampStatus | Monitors header.stamp freshness; tick(stamp_sec) per message |
| aggregator_node | ros2 run diagnostic_aggregator aggregator_node --ros-args -p analyzers:=file.yaml |
| Analyzer prefix | prefix: Motor matches any name starting with 'Motor' |
| Analyzer contains | contains: [Lidar, IMU] matches names containing any of these strings |
| STALE timeout | timeout: 5.0 in analyzer config — mark STALE after 5s no update |