binary-secrets

practical
🔧
Required hardware

Lesson 3 — Binary Secrets

Why Secrets End Up in Compiled Binaries

The filesystem grep in Lesson 2 covers the easy cases. The harder and more common case is secrets embedded directly in compiled code.

Here is the C code that creates the problem:

/* cloud_agent.c */
#define API_ENDPOINT   "https://api.vendor.com/v2"
#define API_KEY        "sk-live-a3f9bc7d2e1847930fc5"
#define DEVICE_SECRET  "8f2a1c6b9d4e3f7a0b5c"

int cloud_connect(void) {
    return http_post(API_ENDPOINT, API_KEY, DEVICE_SECRET);
}

After compilation and stripping: - API_ENDPOINT, API_KEY, and DEVICE_SECRET no longer exist as symbols. They are gone. - The string values "https://api.vendor.com/v2", "sk-live-a3f9bc7d2e1847930fc5", and "8f2a1c6b9d4e3f7a0b5c" are still there, verbatim, in the .rodata section of the binary. - strings finds them. No disassembly required.

This is the key insight: stripping a binary removes symbols (function names, variable names, debug info). It does not and cannot remove the string data the code uses at runtime.

strings — The Essential Tool

strings extracts printable character sequences from any file. It does not parse the ELF structure — it scans the raw bytes and outputs sequences of printable ASCII that meet a minimum length threshold.

Basic Usage

# Default: print all strings of 4+ printable chars
strings usr/bin/httpd

# Increase minimum length to reduce noise (10 is a practical starting point)
strings -n 10 usr/bin/httpd

# Show the file offset of each string (hex format with -t x)
strings -t x usr/bin/httpd

# Decimal offset
strings -t d usr/bin/httpd

The -t x flag is useful when you find an interesting string and want to locate it in the binary with a hex editor for context. The surrounding bytes may reveal more about how the string is used.

Filtering the Output

Raw strings output on a 500KB binary is thousands of lines. Filter it immediately.

# Keyword filter — broad
strings -n 10 usr/bin/httpd | grep -iE "key|pass|token|secret|auth|api"

# URL patterns — often contain embedded credentials
strings usr/bin/httpd | grep -E "https?://"

# URLs with embedded credentials (user:pass@host)
strings usr/bin/httpd | grep -E "https?://[^@]+@"

# Email addresses (sometimes used as account identifiers with passwords nearby)
strings usr/bin/httpd | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"

# Long alphanumeric strings in quotes (common API key appearance in format strings)
strings usr/bin/httpd | grep -E '"[A-Za-z0-9_\-]{20,}"'

Handling Multiple Binaries

Do not analyze binaries one by one when you have dozens. Run strings across the entire filesystem at once.

# All binaries in standard locations, minimum 12 chars, filter for secrets
find usr/bin usr/sbin bin sbin lib/ -type f | \
  xargs strings -n 12 2>/dev/null | \
  grep -iE "password|api_key|token|secret|private_key" | \
  sort -u

# Show which binary contains each match (use strings with filename output)
find usr/bin usr/sbin -type f | \
  while read f; do
    strings -n 12 "$f" | grep -iE "api.key|token|secret" | \
    sed "s|^|[$f] |"
  done

The sed trick prepends the filename to each output line, giving you provenance for every match.

Architecture-Aware String Extraction

The default strings mode handles 8-bit characters (ASCII, Latin-1). ARM binaries on Windows CE or some RTOS environments use UTF-16 (wide strings). Standard strings silently skips these.

# Little-endian 16-bit (UTF-16LE) — common on ARM Windows CE
strings -e l usr/bin/httpd

# Big-endian 16-bit (UTF-16BE)
strings -e b usr/bin/httpd

# 32-bit little-endian (rare but exists on some RTOS)
strings -e L usr/bin/httpd

If you run standard strings and get suspicious gaps — short strings near long stretches of non-printable bytes — switch to -e l and check if readable strings appear.

What to Look for in strings Output

Work through this checklist when reviewing strings output:

Credential patterns: - password=, passwd=, pwd= followed by a value - HTTP Basic Auth format: Authorization: Basic [base64] - URLs with credentials: http://admin:password@192.168.1.1

API key patterns: - AWS: AKIA[A-Z0-9]{16} — always exactly 20 chars starting with AKIA - Stripe live key: sk_live_[A-Za-z0-9]{24} - Twilio: SK[a-f0-9]{32} - Generic: any 32–64 character alphanumeric string in isolation

Base64-encoded secrets:

A base64-encoded secret does not look like a password — it looks like noise. But it has a recognizable character set and length characteristics.

# Find base64-like strings (20+ chars of valid base64 alphabet, optional padding)
strings binary | grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | while read b64; do
    decoded=$(echo "$b64" | base64 -d 2>/dev/null)
    # Only print if decoded output contains printable content
    if echo "$decoded" | strings -n 4 | grep -qE ".{4}"; then
        echo "B64: $b64"
        echo "    -> $(echo "$decoded" | strings -n 4 | head -3)"
    fi
done

This loops over every candidate base64 string and decodes it, printing only those that decode to something with printable content.

Private key material fragments:

Even if a private key is not in PEM format, its numeric components are large integers that show up as long strings of digits or specific ASN.1 patterns.

# PEM header lines
strings binary | grep -E "BEGIN .* (KEY|CERTIFICATE)"

# DER-encoded key material often starts with these byte sequences (shown as hex)
# 30 82 = ASN.1 SEQUENCE, length encoded in 2 bytes (long form)
# Look for surrounding context when you find this with a hex editor

MQTT and protocol credentials:

strings usr/bin/cloud_agent | grep -iE "mqtt|broker|subscribe|publish" | head -20
# Then look at offsets nearby for associated credentials
strings -t x usr/bin/cloud_agent | grep -iE "mqtt|broker" | \
  awk '{print $1}' | while read offset; do
    # Convert hex offset to decimal for context inspection
    echo "Offset 0x$offset"
  done

Shared Libraries: The Overlooked Target

Most researchers run strings on executables and stop. Shared libraries are just as important.

# Find all shared libraries
find lib/ usr/lib/ -name "*.so*" -type f

# Run strings on all of them
find lib/ usr/lib/ -name "*.so*" -type f | \
  xargs strings -n 12 2>/dev/null | \
  grep -iE "key|token|secret|password|auth" | sort -u

# Libraries implementing cloud connectivity are especially valuable
find . -name "*.so*" | xargs grep -l "api\|cloud\|aws\|iot" 2>/dev/null

A library implementing device-to-cloud authentication frequently contains the API endpoint URL and a hardcoded device token. These tokens are often device-class secrets (shared across all devices of that model), not per-device secrets. Finding one means finding the credential for every device of that model globally.

The Offset Workflow: From String to Context

When you find an interesting string, knowing its offset lets you examine the surrounding bytes for context.

# Find the offset of a specific string
strings -t x usr/bin/httpd | grep "interesting_string"
# Output: 1a3c0 interesting_string

# Use xxd to examine bytes around that offset
xxd usr/bin/httpd | grep -A 5 -B 5 "1a3c0"

# Or use dd to extract a block of bytes around the offset
dd if=usr/bin/httpd bs=1 skip=$((0x1a3c0 - 32)) count=128 2>/dev/null | xxd

The 32 bytes before a string often contain: the format string that precedes it, the variable name in a debug print statement, or a length field. All of these give you context about what the string is used for.

Dynamic Analysis: strace and ltrace (When Emulation is Available)

If you can emulate the binary with QEMU (full or user-mode), you get a second weapon: runtime tracing.

# Trace system calls — look for network connections and file reads with credentials
qemu-arm -L squashfs-root/ strace -e trace=network,file squashfs-root/usr/bin/cloud_agent

# Trace library calls — catches string operations and network setup
qemu-arm -L squashfs-root/ ltrace squashfs-root/usr/bin/cloud_agent

ltrace intercepts calls to shared libraries. If the binary calls SSL_CTX_use_certificate_file or curl_easy_setopt(CURLOPT_PASSWORD, ...), you see the arguments — including the actual credential values — in the trace output, even if the binary would have been difficult to analyze statically.

Dynamic analysis is more complex to set up (QEMU user-mode emulation has compatibility issues), but it is the definitive method when static string extraction leaves ambiguity about what a binary actually uses at runtime.