entropy-and-crypto
practicalLesson 4 — Entropy and Crypto Material
Entropy as a Detection Tool
Entropy, in information theory, measures the unpredictability of data. On a scale from 0.0 to 1.0 (for byte-level Shannon entropy normalized to [0,1]):
- 0.0: All bytes identical (e.g., a file filled with
0x00). - 0.4–0.6: Natural language text, source code, structured configs.
- 0.7–0.9: Compressed data (ZIP, gzip, LZMA). High entropy but with variation at boundaries.
- ~1.0, sustained flat: Encrypted data or high-quality random data (cryptographic keys, nonces).
This distinction matters for firmware analysis. A region of the firmware with flat entropy near 1.0 that does not correspond to a known compressed format signature is almost certainly either encrypted or contains raw cryptographic key material.
# Entropy analysis on an entire firmware image
binwalk -E firmware.bin
# Produces a graph (PNG) showing entropy across the file byte-by-byte
# If you want the raw entropy values for scripting
binwalk --entropy firmware.bin
In the entropy graph, look for: - Compressed regions: high entropy with recognizable start (matching a compression magic) and a sharp drop at the end. - Encrypted regions: high entropy with no recognizable magic at the start. Flat line sustaining near 1.0 for kilobytes or megabytes. - Key material regions: small high-entropy blobs (256–4096 bytes) embedded in lower-entropy code sections. These are private keys or symmetric keys embedded in a binary.
Finding Keys Without PEM Headers
A private key in PEM format is trivially found with grep. The harder case is a key stored in DER format — the binary encoding of the same ASN.1 structure, with no text headers.
ASN.1 DER Structure of an RSA Private Key
An RSA private key in DER format starts with a specific byte sequence:
30 82 xx xx — SEQUENCE, length in next 2 bytes (DER long form)
02 01 00 — INTEGER version = 0
02 82 xx xx — INTEGER modulus (large)
The first two bytes are always 30 82 for any key whose total DER encoding is between 128 and 65535 bytes (which covers all practical RSA key sizes).
# Search for DER-encoded RSA private key markers in binaries
grep -rl $'\x30\x82' usr/bin/ lib/ 2>/dev/null
# More specific: 30 82 followed by the version integer marker
python3 -c "
import sys, os
pattern = b'\x30\x82'
for fname in sys.argv[1:]:
try:
data = open(fname, 'rb').read()
idx = 0
while True:
idx = data.find(pattern, idx)
if idx == -1:
break
print(f'{fname}:0x{idx:x}')
idx += 1
except:
pass
" usr/bin/* lib/*.so 2>/dev/null
Once you find a potential offset, extract the bytes and try to parse them:
# Extract from offset 0x1234 in a binary, take 2048 bytes
dd if=usr/bin/cloud_agent bs=1 skip=$((0x1234)) count=2048 2>/dev/null > candidate.der
# Try to parse as RSA private key
openssl rsa -in candidate.der -inform DER -text -noout 2>/dev/null
# Try to parse as EC private key
openssl ec -in candidate.der -inform DER -text -noout 2>/dev/null
# Try generic PKCS#8 private key parsing
openssl pkcs8 -in candidate.der -inform DER -nocrypt -text 2>/dev/null
A successful parse confirms it is a private key. Failed parses mean either the length was wrong, the DER is malformed, or it is not a key.
Certificate Extraction with binwalk
binwalk often extracts certificates automatically when processing firmware:
binwalk -e firmware.bin
# Creates _firmware.bin.extracted/ directory
# Certificates appear as .crt or .pem files if binwalk recognizes them
# After extraction, inspect any found certificates
for cert in _firmware.bin.extracted/*.crt; do
echo "=== $cert ==="
openssl x509 -in "$cert" -text -noout 2>/dev/null | \
grep -E "Subject:|Issuer:|Not After|Signature Algorithm"
done
Pay attention to: - Self-signed certificates: the Issuer and Subject are identical. Common on IoT devices for device identity certificates. If the corresponding private key is also present, this is a critical finding. - Internal CA certificates: a certificate signed by an internal CA suggests the vendor operates their own PKI. The CA key may be present elsewhere in the firmware. - Expired certificates: do not discard expired certs — the private key is still valid for identifying the device, and the cert tells you about the vendor's PKI structure.
XOR-Obfuscated Strings
Some firmware developers apply simple XOR obfuscation to sensitive strings — not real encryption, but enough to evade strings. The string is stored as a byte sequence XORed with a constant, and deobfuscated at runtime with a simple loop.
A binary with XOR-obfuscated strings shows high-entropy regions of a few hundred bytes in the middle of otherwise low-entropy .rodata content. The regions have no recognizable magic bytes.
Single-Byte XOR Recovery
For single-byte XOR (by far the most common variant):
#!/usr/bin/env python3
import sys
with open(sys.argv[1], 'rb') as f:
data = f.read()
targets = [b'password', b'api_key', b'http://', b'https://', b'admin']
for key in range(256):
decoded = bytes(b ^ key for b in data)
for target in targets:
positions = []
idx = 0
while True:
idx = decoded.find(target, idx)
if idx == -1:
break
positions.append(idx)
idx += 1
if positions:
print(f"XOR key 0x{key:02x} -> '{target.decode()}' found at: {positions}")
# Print surrounding decoded context
for pos in positions[:3]:
context = decoded[max(0, pos-10):pos+40]
printable = ''.join(chr(b) if 32 <= b < 127 else '.' for b in context)
print(f" Context: {printable}")
Run it:
python3 xor_brute.py usr/bin/cloud_agent
If this produces output, you have found both the XOR key and the obfuscated strings. Common XOR keys found in real firmware: 0x17, 0x5a, 0xff, 0xAA.
Multi-Byte XOR Keys
If single-byte XOR fails, try 2-byte and 4-byte keys. The search space expands to 65536 and 4 billion respectively — the 4-byte case requires known-plaintext to be tractable.
For 2-byte keys:
for key in range(65536):
key_bytes = key.to_bytes(2, 'big')
decoded = bytes(b ^ key_bytes[i % 2] for i, b in enumerate(data))
# ... same target search
Base64-Encoded Secrets: Detection and Decoding
Base64 encoding doubles the visual length of a secret and obscures its content. It is not encryption — it is trivially reversed. But in strings output, a base64-encoded key looks nothing like a key.
Characteristics of base64 that help detection:
- Character set: [A-Za-z0-9+/=] only.
- Length: always a multiple of 4 (with padding) or meets the formula ceil(n/3)*4.
- Padding: 0, 1, or 2 = characters at the end.
# Find all plausible base64 strings in a binary (at least 20 chars)
strings binary | grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | while read b64; do
decoded=$(printf '%s' "$b64" | base64 -d 2>/dev/null)
if [ $? -eq 0 ] && [ -n "$decoded" ]; then
printable=$(echo "$decoded" | strings -n 4)
if [ -n "$printable" ]; then
echo "BASE64: $b64"
echo "DECODED: $printable"
fi
fi
done
URL-Safe Base64
Some API keys use URL-safe base64 (+ replaced with -, / replaced with _):
strings binary | grep -E "^[A-Za-z0-9_\-]{20,}={0,2}$" | while read b64; do
# Convert URL-safe to standard base64 before decoding
standard=$(echo "$b64" | tr '_-' '/+')
decoded=$(printf '%s' "$standard" | base64 -d 2>/dev/null)
[ -n "$decoded" ] && strings -n 4 <<< "$decoded" && echo "---"
done
Entropy Scanning Without binwalk
If binwalk is unavailable, compute entropy programmatically to find interesting regions:
#!/usr/bin/env python3
import math, sys
def entropy(data):
if not data:
return 0.0
freq = [0] * 256
for b in data:
freq[b] += 1
n = len(data)
return -sum((f/n) * math.log2(f/n) for f in freq if f > 0) / 8.0
WINDOW = 256
THRESHOLD = 0.9
with open(sys.argv[1], 'rb') as f:
data = f.read()
in_high = False
start = 0
for i in range(0, len(data) - WINDOW, WINDOW // 2):
chunk = data[i:i+WINDOW]
e = entropy(chunk)
if e > THRESHOLD and not in_high:
start = i
in_high = True
elif e <= THRESHOLD and in_high:
size = i - start
print(f"High entropy region: 0x{start:08x} - 0x{i:08x} ({size} bytes)")
in_high = False
This script outputs offset ranges where entropy exceeds 0.9 — your candidates for encrypted blobs, compressed sections without magic bytes, or embedded cryptographic key material.
Practical Summary: What Each Technique Finds
| Technique | Finds |
|---|---|
binwalk -E firmware.bin |
Encrypted regions, compressed regions, key material blobs |
grep -rl "PRIVATE KEY" |
PEM-encoded private keys |
DER magic byte search (30 82) |
DER-encoded RSA/EC private keys |
openssl x509 on cert files |
Certificate details, expiry, issuer chain |
| Single-byte XOR brute force | XOR-obfuscated strings with constant key |
| base64 decode loop | Base64-encoded secrets embedded as strings |
| Entropy windowed scan | Unrecognized encrypted/key blobs for further investigation |
Work through the techniques in order from easiest (grep for PEM headers) to hardest (XOR brute force, DER structure search). Most real-world firmware has at least one finding at the easy end of the list.