entropy-and-crypto

practical

Lesson 4 — Entropy and Crypto Material

Entropy as a Detection Tool

Entropy, in information theory, measures the unpredictability of data. On a scale from 0.0 to 1.0 (for byte-level Shannon entropy normalized to [0,1]):

  • 0.0: All bytes identical (e.g., a file filled with 0x00).
  • 0.4–0.6: Natural language text, source code, structured configs.
  • 0.7–0.9: Compressed data (ZIP, gzip, LZMA). High entropy but with variation at boundaries.
  • ~1.0, sustained flat: Encrypted data or high-quality random data (cryptographic keys, nonces).

This distinction matters for firmware analysis. A region of the firmware with flat entropy near 1.0 that does not correspond to a known compressed format signature is almost certainly either encrypted or contains raw cryptographic key material.

# Entropy analysis on an entire firmware image
binwalk -E firmware.bin
# Produces a graph (PNG) showing entropy across the file byte-by-byte

# If you want the raw entropy values for scripting
binwalk --entropy firmware.bin

In the entropy graph, look for: - Compressed regions: high entropy with recognizable start (matching a compression magic) and a sharp drop at the end. - Encrypted regions: high entropy with no recognizable magic at the start. Flat line sustaining near 1.0 for kilobytes or megabytes. - Key material regions: small high-entropy blobs (256–4096 bytes) embedded in lower-entropy code sections. These are private keys or symmetric keys embedded in a binary.

Finding Keys Without PEM Headers

A private key in PEM format is trivially found with grep. The harder case is a key stored in DER format — the binary encoding of the same ASN.1 structure, with no text headers.

ASN.1 DER Structure of an RSA Private Key

An RSA private key in DER format starts with a specific byte sequence:

30 82 xx xx  — SEQUENCE, length in next 2 bytes (DER long form)
02 01 00     — INTEGER version = 0
02 82 xx xx  — INTEGER modulus (large)

The first two bytes are always 30 82 for any key whose total DER encoding is between 128 and 65535 bytes (which covers all practical RSA key sizes).

# Search for DER-encoded RSA private key markers in binaries
grep -rl $'\x30\x82' usr/bin/ lib/ 2>/dev/null

# More specific: 30 82 followed by the version integer marker
python3 -c "
import sys, os
pattern = b'\x30\x82'
for fname in sys.argv[1:]:
    try:
        data = open(fname, 'rb').read()
        idx = 0
        while True:
            idx = data.find(pattern, idx)
            if idx == -1:
                break
            print(f'{fname}:0x{idx:x}')
            idx += 1
    except:
        pass
" usr/bin/* lib/*.so 2>/dev/null

Once you find a potential offset, extract the bytes and try to parse them:

# Extract from offset 0x1234 in a binary, take 2048 bytes
dd if=usr/bin/cloud_agent bs=1 skip=$((0x1234)) count=2048 2>/dev/null > candidate.der

# Try to parse as RSA private key
openssl rsa -in candidate.der -inform DER -text -noout 2>/dev/null

# Try to parse as EC private key  
openssl ec -in candidate.der -inform DER -text -noout 2>/dev/null

# Try generic PKCS#8 private key parsing
openssl pkcs8 -in candidate.der -inform DER -nocrypt -text 2>/dev/null

A successful parse confirms it is a private key. Failed parses mean either the length was wrong, the DER is malformed, or it is not a key.

Certificate Extraction with binwalk

binwalk often extracts certificates automatically when processing firmware:

binwalk -e firmware.bin
# Creates _firmware.bin.extracted/ directory
# Certificates appear as .crt or .pem files if binwalk recognizes them

# After extraction, inspect any found certificates
for cert in _firmware.bin.extracted/*.crt; do
    echo "=== $cert ==="
    openssl x509 -in "$cert" -text -noout 2>/dev/null | \
      grep -E "Subject:|Issuer:|Not After|Signature Algorithm"
done

Pay attention to: - Self-signed certificates: the Issuer and Subject are identical. Common on IoT devices for device identity certificates. If the corresponding private key is also present, this is a critical finding. - Internal CA certificates: a certificate signed by an internal CA suggests the vendor operates their own PKI. The CA key may be present elsewhere in the firmware. - Expired certificates: do not discard expired certs — the private key is still valid for identifying the device, and the cert tells you about the vendor's PKI structure.

XOR-Obfuscated Strings

Some firmware developers apply simple XOR obfuscation to sensitive strings — not real encryption, but enough to evade strings. The string is stored as a byte sequence XORed with a constant, and deobfuscated at runtime with a simple loop.

A binary with XOR-obfuscated strings shows high-entropy regions of a few hundred bytes in the middle of otherwise low-entropy .rodata content. The regions have no recognizable magic bytes.

Single-Byte XOR Recovery

For single-byte XOR (by far the most common variant):

#!/usr/bin/env python3
import sys

with open(sys.argv[1], 'rb') as f:
    data = f.read()

targets = [b'password', b'api_key', b'http://', b'https://', b'admin']

for key in range(256):
    decoded = bytes(b ^ key for b in data)
    for target in targets:
        positions = []
        idx = 0
        while True:
            idx = decoded.find(target, idx)
            if idx == -1:
                break
            positions.append(idx)
            idx += 1
        if positions:
            print(f"XOR key 0x{key:02x} -> '{target.decode()}' found at: {positions}")
            # Print surrounding decoded context
            for pos in positions[:3]:
                context = decoded[max(0, pos-10):pos+40]
                printable = ''.join(chr(b) if 32 <= b < 127 else '.' for b in context)
                print(f"  Context: {printable}")

Run it:

python3 xor_brute.py usr/bin/cloud_agent

If this produces output, you have found both the XOR key and the obfuscated strings. Common XOR keys found in real firmware: 0x17, 0x5a, 0xff, 0xAA.

Multi-Byte XOR Keys

If single-byte XOR fails, try 2-byte and 4-byte keys. The search space expands to 65536 and 4 billion respectively — the 4-byte case requires known-plaintext to be tractable.

For 2-byte keys:

for key in range(65536):
    key_bytes = key.to_bytes(2, 'big')
    decoded = bytes(b ^ key_bytes[i % 2] for i, b in enumerate(data))
    # ... same target search

Base64-Encoded Secrets: Detection and Decoding

Base64 encoding doubles the visual length of a secret and obscures its content. It is not encryption — it is trivially reversed. But in strings output, a base64-encoded key looks nothing like a key.

Characteristics of base64 that help detection: - Character set: [A-Za-z0-9+/=] only. - Length: always a multiple of 4 (with padding) or meets the formula ceil(n/3)*4. - Padding: 0, 1, or 2 = characters at the end.

# Find all plausible base64 strings in a binary (at least 20 chars)
strings binary | grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | while read b64; do
    decoded=$(printf '%s' "$b64" | base64 -d 2>/dev/null)
    if [ $? -eq 0 ] && [ -n "$decoded" ]; then
        printable=$(echo "$decoded" | strings -n 4)
        if [ -n "$printable" ]; then
            echo "BASE64: $b64"
            echo "DECODED: $printable"
        fi
    fi
done

URL-Safe Base64

Some API keys use URL-safe base64 (+ replaced with -, / replaced with _):

strings binary | grep -E "^[A-Za-z0-9_\-]{20,}={0,2}$" | while read b64; do
    # Convert URL-safe to standard base64 before decoding
    standard=$(echo "$b64" | tr '_-' '/+')
    decoded=$(printf '%s' "$standard" | base64 -d 2>/dev/null)
    [ -n "$decoded" ] && strings -n 4 <<< "$decoded" && echo "---"
done

Entropy Scanning Without binwalk

If binwalk is unavailable, compute entropy programmatically to find interesting regions:

#!/usr/bin/env python3
import math, sys

def entropy(data):
    if not data:
        return 0.0
    freq = [0] * 256
    for b in data:
        freq[b] += 1
    n = len(data)
    return -sum((f/n) * math.log2(f/n) for f in freq if f > 0) / 8.0

WINDOW = 256
THRESHOLD = 0.9

with open(sys.argv[1], 'rb') as f:
    data = f.read()

in_high = False
start = 0
for i in range(0, len(data) - WINDOW, WINDOW // 2):
    chunk = data[i:i+WINDOW]
    e = entropy(chunk)
    if e > THRESHOLD and not in_high:
        start = i
        in_high = True
    elif e <= THRESHOLD and in_high:
        size = i - start
        print(f"High entropy region: 0x{start:08x} - 0x{i:08x} ({size} bytes)")
        in_high = False

This script outputs offset ranges where entropy exceeds 0.9 — your candidates for encrypted blobs, compressed sections without magic bytes, or embedded cryptographic key material.

Practical Summary: What Each Technique Finds

Technique Finds
binwalk -E firmware.bin Encrypted regions, compressed regions, key material blobs
grep -rl "PRIVATE KEY" PEM-encoded private keys
DER magic byte search (30 82) DER-encoded RSA/EC private keys
openssl x509 on cert files Certificate details, expiry, issuer chain
Single-byte XOR brute force XOR-obfuscated strings with constant key
base64 decode loop Base64-encoded secrets embedded as strings
Entropy windowed scan Unrecognized encrypted/key blobs for further investigation

Work through the techniques in order from easiest (grep for PEM headers) to hardest (XOR brute force, DER structure search). Most real-world firmware has at least one finding at the easy end of the list.