mqtt-protocol

theory

L1 — MQTT Protocol Internals

MQTT (Message Queuing Telemetry Transport) was designed in 1999 by IBM engineers working on satellite-linked pipeline monitoring. The constraints were severe: low bandwidth, high latency, unreliable links, devices with kilobytes of RAM. The protocol is still solving that problem today — except now those constrained devices are controlling your home, your factory floor, and your building's HVAC system, and they're connected to the internet.


The Pub/Sub Model

MQTT is a publish/subscribe protocol. This is not the same as HTTP's request/response. There are three roles:

  • Publisher — a device that sends data to the broker on a topic
  • Broker — the central server that receives all messages and routes them
  • Subscriber — a client that tells the broker which topics it wants to receive

The key asymmetry: publishers don't know who is listening, and subscribers don't know who is sending. A temperature sensor publishes 22.4 to home/bedroom/temperature and has no idea whether zero or one hundred clients are subscribed. A subscriber gets the value with no information about which device produced it unless the payload includes that data.

This decoupling is a design feature for scalability. For security, it means a malicious subscriber is completely invisible to the legitimate publishers — you can read everything on a broker without any device knowing you're there.


The Broker

The broker is the router. Every message passes through it. Common production brokers:

Broker Language Typical Deployment
Mosquitto C Embedded systems, Raspberry Pi, small deployments
HiveMQ Java Enterprise, clustered, commercial
EMQX Erlang High-throughput, telecom-grade
AWS IoT Core Managed Cloud-connected device fleets
VerneMQ Erlang Open-source, scalable

Mosquitto is by far the most common target you'll encounter on the internet. It ships as the default broker in countless IoT tutorials and gets deployed to production unchanged.

Default configuration of Mosquitto — the one shipped in most Linux package managers — allows anonymous connections with no authentication. This is intentional for development convenience and is a documented behavior. It is also the reason thousands of production brokers are wide open.


Topics: The Namespace

Topics are UTF-8 strings using / as a hierarchy separator. There is no schema enforcement — any client can publish to any topic string. You'll see patterns like:

home/bedroom/temperature
home/bedroom/humidity
home/alarm/armed
factory/line1/machine3/status
factory/line1/machine3/rpm
building/floor3/hvac/setpoint
device/00:1A:2B:3C:4D:5E/telemetry

The hierarchy is purely organizational. The broker doesn't care — it's just string matching.

Wildcards

Two wildcard characters exist and are subscriber-only (you cannot publish to a wildcard topic):

+ — single-level wildcard

Matches exactly one level in the hierarchy:

home/+/temperature

Matches: home/bedroom/temperature, home/kitchen/temperature Does NOT match: home/bedroom/sensor/temperature (two levels deep)

# — multi-level wildcard

Matches everything from that point down. Must be the last character:

home/#

Matches: home/bedroom/temperature, home/alarm/armed, home/bedroom/sensor/temperature

The attack implication: subscribing to # alone subscribes to every single topic on the broker. One command, total visibility.


QoS Levels

MQTT defines three quality-of-service levels for message delivery:

Level Name Guarantee Mechanism
QoS 0 Fire and forget At most once No acknowledgment
QoS 1 At least once At least once PUBACK required, retransmit if lost
QoS 2 Exactly once Exactly once 4-way handshake

Why attackers love QoS 0: No acknowledgment means no log entry on the broker side that a message was received. No persistent session state. If you're subscribing to a broker to monitor its traffic, QoS 0 subscriptions leave the smallest footprint. The broker has no record of what you received.

QoS 2 is rarely used in IoT devices — the handshake overhead is expensive for constrained hardware and the exactly-once guarantee only matters for financial or actuator-critical messages. Most sensor telemetry uses QoS 0 or 1.


Retained Messages

A retained message is a flag set by the publisher: mosquitto_pub -r. When set, the broker stores the last message on that topic and immediately delivers it to any new subscriber, even if the original publisher is long offline.

Publisher (offline)          Broker              New Subscriber
                        [stores last msg]
                                         <---subscribe to topic---
                        ----retained msg---->

This is the device equivalent of a sticky note. Legitimate use: a device publishes its current state as retained so any dashboard connecting later gets the current state without waiting.

The attack implication: retained messages accumulate on poorly managed brokers. Topics that haven't had an active publisher in months still serve cached state to new subscribers — including cached credentials, tokens, and configuration payloads that were published once and forgotten. You can read the history of a broker just by subscribing.


Default Ports

Port Protocol Notes
1883 MQTT plaintext Default, no encryption
8883 MQTT over TLS Encrypted transport
9001 MQTT over WebSocket Browser clients, often plaintext
9002 MQTT over WebSocket TLS Rare

Port 1883 is your primary target. Every byte is plaintext: topic names, payloads, and — critically — the MQTT CONNECT packet, which carries the username and password in cleartext.


The Security Model

MQTT's security is entirely optional and layered on top:

  • Authentication: Username/password in the CONNECT packet. Disabled by default in Mosquitto.
  • Authorization (ACL): Per-user topic access lists. Not configured by default.
  • Transport encryption: TLS on port 8883. Not enabled by default.
  • Payload encryption: Application-layer, completely up to the developer. Rarely implemented.

The protocol itself has no mandatory security. A device connecting on port 1883 with no credentials is spec-compliant. The broker accepting it is also spec-compliant. This is not a bug — it was designed for isolated networks where the transport itself provided security (a serial link, a private LAN). The security problem comes from exposing these brokers directly to the internet, which happens constantly.


MQTT Packet Structure (What Matters for Analysis)

The CONNECT packet is the handshake. It contains:

  • Protocol name and version
  • Client ID (often reveals device type: ESP32_sensor_01, shelly-plug-abc123)
  • Username (if auth configured)
  • Password (if auth configured)
  • Clean session flag
  • Keep-alive interval

In Wireshark, filter mqtt.msgtype == 1 to isolate CONNECT packets and extract credentials from any capture.

PUBLISH packets contain the topic string and payload. SUBSCRIBE packets contain the topic filter the client registered. All of this is visible in plaintext on port 1883.


Key Takeaways

  • Pub/sub means subscribers are invisible to publishers — you can monitor silently
  • The # wildcard gives you everything on a broker in one command
  • Retained messages leak historical state even when devices are offline
  • Port 1883 is plaintext: credentials, topics, and payloads are all readable
  • Default Mosquitto allows anonymous connections — this is intentional and common

Next lesson: finding these brokers at scale.