Lesson 5 — Automation

Why Automate

A single firmware analysis takes 20–30 minutes if done manually. An IoT vendor ships 50 firmware images across their product line. A security researcher auditing a sector might look at firmware from 20 vendors. Manual analysis at that scale is impractical.

Automation solves two problems: 1. Coverage: automated tools do not skip files or get tired. 2. Repeatability: the same search runs identically on every firmware, producing comparable results.

The tradeoff is false positives. Automated tools cast wide nets. Your job shifts from "find the secrets" to "triage the output and confirm real findings."

trufflehog — Regex and Entropy-Based Scanner

trufflehog scans for secrets using a library of regular expressions covering hundreds of known secret formats (AWS keys, Stripe, GitHub, Twilio, Google, and more), combined with entropy analysis for generic high-entropy strings.

# Install
pip install trufflehog3
# or with pipx for isolation
pipx install trufflehog3

# Scan a filesystem directory
trufflehog filesystem ./squashfs-root/

# JSON output for programmatic processing
trufflehog filesystem ./squashfs-root/ --json 2>/dev/null | jq .

# Increase sensitivity (lower entropy threshold)
trufflehog filesystem ./squashfs-root/ --entropy-threshold 3.0

# Only specific detector types
trufflehog filesystem ./squashfs-root/ --only-verified

trufflehog classifies findings as "verified" (it made an API call and confirmed the credential works) or "unverified" (pattern matched but not confirmed). For offline firmware analysis, all findings are unverified — the distinction is meaningless. Do not filter by verification status when analyzing extracted firmware.

Important limitation: trufflehog v3 scans text content effectively. It treats binary files as binary and extracts strings from them — but it is less thorough than a dedicated strings + grep pass on binaries. Use both.

semgrep — Pattern-Based Code Scanner

semgrep applies semantic code analysis patterns to source-like files (PHP, JavaScript, Python, shell scripts). The "secrets" ruleset covers patterns like:

# What semgrep's p/secrets catches:
password = "..."        # Assignment to password variable
api_key = "..."         # Assignment to api_key variable
Authorization: Bearer   # Hardcoded auth headers in code

# Install
pip install semgrep

# Run the secrets ruleset on web files
semgrep --config "p/secrets" squashfs-root/www/ --json 2>/dev/null | \
  jq '.results[] | {path: .path, rule: .check_id, match: .extra.lines}'

# Run on all text-like files (PHP, JS, shell scripts)
semgrep --config "p/secrets" squashfs-root/ \
  --include="*.php" --include="*.js" --include="*.sh" --include="*.conf"

# Custom rule for IoT-specific patterns (inline YAML)
semgrep --config - squashfs-root/ <<'EOF'
rules:
  - id: hardcoded-mqtt-password
    patterns:
      - pattern: |
          mosquitto_sub ... -P "..." ...
    message: "Hardcoded MQTT password in shell command"
    severity: ERROR
    languages: [bash]
EOF

semgrep is most effective on PHP and JavaScript. It understands syntax and avoids matching variable names in comments. For shell scripts it works but is less precise.

Custom IoT Firmware Scanner

The most reliable approach for IoT work is a purpose-built script that combines all the techniques from the previous lessons. Here is a complete scanner:

#!/bin/bash
# iot-secret-scan.sh — Systematic secret hunter for extracted firmware
# Usage: ./iot-secret-scan.sh squashfs-root/ [output-report.txt]

set -euo pipefail

TARGET="${1:-.}"
REPORT="${2:-secrets-report-$(date +%Y%m%d-%H%M%S).txt}"

RED='\033[0;31m'
YELLOW='\033[1;33m'
GREEN='\033[0;32m'
NC='\033[0m'

log() { echo -e "${GREEN}[*]${NC} $1" | tee -a "$REPORT"; }
hit() { echo -e "${RED}[!]${NC} $1" | tee -a "$REPORT"; }
info() { echo -e "${YELLOW}[-]${NC} $1" | tee -a "$REPORT"; }

echo "=== IoT Firmware Secret Scanner ===" | tee "$REPORT"
echo "Target: $(realpath $TARGET)" | tee -a "$REPORT"
echo "Date: $(date)" | tee -a "$REPORT"
echo "" | tee -a "$REPORT"

# --- Config file credentials ---
log "Scanning config files for credentials"
results=$(grep -rEi "(password|passwd|pwd|passw)\s*[:=]\s*\S+" "$TARGET" \
  --include="*.conf" --include="*.cfg" --include="*.ini" \
  --include="*.json" --include="*.yaml" --include="*.yml" \
  -n 2>/dev/null | grep -v "^Binary" || true)
if [ -n "$results" ]; then
    hit "Config file credentials found:"
    echo "$results" | head -50 | tee -a "$REPORT"
fi

# --- Private keys ---
log "Searching for private keys (PEM format)"
while IFS= read -r -d '' f; do
    if grep -q "PRIVATE KEY" "$f" 2>/dev/null; then
        hit "Private key: $f"
        head -3 "$f" | tee -a "$REPORT"
    fi
done < <(find "$TARGET" -type f -print0 2>/dev/null)

# --- AWS credentials ---
log "Searching for AWS keys"
results=$(grep -rE "AKIA[0-9A-Z]{16}" "$TARGET" 2>/dev/null | \
  grep -v "^Binary" || true)
if [ -n "$results" ]; then
    hit "AWS Access Key IDs:"
    echo "$results" | tee -a "$REPORT"
fi

# --- Init script secrets ---
log "Mining init scripts for secrets"
results=$(grep -rEi "export\s+(.*_KEY|.*_TOKEN|.*_SECRET|.*_PASSWORD)\s*=" \
  "$TARGET/etc/" 2>/dev/null || true)
if [ -n "$results" ]; then
    hit "Exported secrets in init scripts:"
    echo "$results" | tee -a "$REPORT"
fi

# --- Shadow file ---
log "Checking /etc/shadow for active accounts"
shadow="$TARGET/etc/shadow"
if [ -f "$shadow" ]; then
    results=$(grep -v ":\!:\|:\*:\|:x:" "$shadow" 2>/dev/null || true)
    if [ -n "$results" ]; then
        hit "Active password hashes in shadow:"
        echo "$results" | tee -a "$REPORT"
    fi
fi

# --- Binary secrets ---
log "Running strings against executables and libraries"
find "$TARGET/usr/bin" "$TARGET/usr/sbin" "$TARGET/bin" "$TARGET/sbin" \
     "$TARGET/lib" "$TARGET/usr/lib" -type f 2>/dev/null | \
  while read -r binary; do
    findings=$(strings -n 12 "$binary" 2>/dev/null | \
      grep -iE "(password|api.?key|access.?token|secret.?key|bearer)" | \
      grep -vE "(example|sample|placeholder|your_|changeme|xxx)" || true)
    if [ -n "$findings" ]; then
        hit "Binary secrets in: $binary"
        echo "$findings" | head -10 | tee -a "$REPORT"
    fi
  done

# --- Base64 candidates in binaries ---
log "Checking for base64-encoded secrets in binaries"
find "$TARGET/usr/bin" "$TARGET/usr/sbin" -type f 2>/dev/null | \
  while read -r binary; do
    strings -n 20 "$binary" 2>/dev/null | \
      grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | \
      while read b64; do
        decoded=$(printf '%s' "$b64" | base64 -d 2>/dev/null | strings -n 6)
        if echo "$decoded" | grep -qiE "key|pass|token|http"; then
            hit "Base64 in $binary: $b64"
            info "  Decoded: $decoded"
            echo "BASE64 [$binary]: $b64 -> $decoded" >> "$REPORT"
        fi
      done
  done

# --- SQLite databases ---
log "Checking SQLite databases"
find "$TARGET" -name "*.db" -o -name "*.sqlite" -o -name "*.sqlite3" \
  2>/dev/null | while read db; do
    results=$(strings -n 8 "$db" | grep -iE "pass|token|key|secret" | \
      grep -v "^Binary" || true)
    if [ -n "$results" ]; then
        hit "Potential secrets in database: $db"
        echo "$results" | head -20 | tee -a "$REPORT"
    fi
  done

echo "" | tee -a "$REPORT"
log "Scan complete. Report saved to: $REPORT"

Save this as iot-secret-scan.sh, make it executable (chmod +x iot-secret-scan.sh), and run it against any extracted firmware directory.

Making It Repeatable: Documenting Findings

A finding that is not documented is not a finding. Every output from your scanner that you confirm as real must be recorded with:

FINDING-001
Type:       API Key (AWS Access Key ID)
File:       squashfs-root/usr/bin/cloud_agent
Offset:     0x1a3c0 (binary) / line 47 (config file)
Value:      AKIAIOSFODNN7EXAMPLE
Context:    [2 lines before + 2 lines after the match]
Reachable:  Yes — binary runs as cloud_agent daemon on boot
CVSS:       9.8 (AV:N/AC:L/PR:N/UI:N/S:U/C:H/I:H/A:H)
Notes:      Key grants access to vendor's S3 bucket (confirmed by URL context)

Use a consistent format. You will have multiple findings. They need to be distinguishable and reproducible by someone else following the same steps.

False Positive Reduction

Automated scanning produces noise. Apply these filters to your output:

Exclude obvious non-secrets:

# Common false positive patterns to exclude from results
EXCLUDE="example|sample|placeholder|your_|changeme|CHANGEME|xxx+|TODO|FIXME|test|demo"

grep -rE "AKIA[0-9A-Z]{16}" . | grep -vE "$EXCLUDE"

Verify format correctness: - AWS Access Key IDs are exactly 20 characters (4 prefix + 16 char body): AKIA[0-9A-Z]{16} - AWS Secret Access Keys are exactly 40 characters of base64-like chars - If the pattern matches but the length is wrong, it is a false positive

Context check:

# Get context around a match
grep -n "AKIA[0-9A-Z]{16}" file.js
# Line 47: var example_key = "AKIAIOSFODNN7EXAMPLE"   <- variable named "example" is suspicious
# Line 82: aws_key = "AKIAT7XNRY2SDJK8BVZP"          <- more plausible

Variable names containing example, sample, test, fake, or dummy are almost always false positives even if the value looks like a real key.

Reporting: CVSS Scoring for Hardcoded Credentials

When you report findings, include a CVSS v3.1 score. Hardcoded credentials follow a predictable scoring pattern:

Hardcoded admin credentials (web interface), device local network only: - AV:A (Adjacent) — device only accessible on LAN - AC:L, PR:N, UI:N (trivial exploitation, no interaction needed) - S:U, C:H, I:H, A:H (full device compromise) - Score: 8.8 (High)

Hardcoded API key for cloud service, internet-accessible: - AV:N (Network) — cloud API is internet-facing - AC:L, PR:N, UI:N - S:U, C:H, I:H, A:H - Score: 9.8 (Critical)

Hardcoded SSH authorized public key (vendor backdoor): - AV:N, AC:L, PR:N, UI:N, S:U, C:H, I:H, A:H - Score: 9.8 (Critical) — vendor can access any deployed device

Private key embedded in firmware (device impersonation possible): - AV:N, AC:H (requires extracting the private key), PR:N, UI:N, S:C (scope changed — affects TLS trust), C:H, I:H, A:N - Score: 8.7 (High) to 9.1 (Critical) depending on usage

Use the CVSS calculator for the final score. Document your reasoning for each vector selection — reviewers will question it.

Toolchain Summary

Tool	Installation	Best for
`grep`	Built-in	Fast keyword/pattern search
`strings`	`binutils`	Extracting text from binaries
`binwalk`	`apt install binwalk` or pip	Entropy analysis, extraction
`trufflehog`	`pip install trufflehog3`	Broad secret pattern coverage
`semgrep`	`pip install semgrep`	PHP/JS/shell code patterns
`hashcat`	`apt install hashcat`	Offline password hash cracking
`openssl`	`apt install openssl`	Certificate/key parsing and verification
`sqlite3`	`apt install sqlite3`	Database dumping
`john`	`apt install john`	Alternative hash cracker

Install all of these on your analysis machine. A full firmware audit uses every one of them.