binary-secrets
practicalLesson 3 — Binary Secrets
Why Secrets End Up in Compiled Binaries
The filesystem grep in Lesson 2 covers the easy cases. The harder and more common case is secrets embedded directly in compiled code.
Here is the C code that creates the problem:
/* cloud_agent.c */
#define API_ENDPOINT "https://api.vendor.com/v2"
#define API_KEY "sk-live-a3f9bc7d2e1847930fc5"
#define DEVICE_SECRET "8f2a1c6b9d4e3f7a0b5c"
int cloud_connect(void) {
return http_post(API_ENDPOINT, API_KEY, DEVICE_SECRET);
}
After compilation and stripping:
- API_ENDPOINT, API_KEY, and DEVICE_SECRET no longer exist as symbols. They are gone.
- The string values "https://api.vendor.com/v2", "sk-live-a3f9bc7d2e1847930fc5", and "8f2a1c6b9d4e3f7a0b5c" are still there, verbatim, in the .rodata section of the binary.
- strings finds them. No disassembly required.
This is the key insight: stripping a binary removes symbols (function names, variable names, debug info). It does not and cannot remove the string data the code uses at runtime.
strings — The Essential Tool
strings extracts printable character sequences from any file. It does not parse the ELF structure — it scans the raw bytes and outputs sequences of printable ASCII that meet a minimum length threshold.
Basic Usage
# Default: print all strings of 4+ printable chars
strings usr/bin/httpd
# Increase minimum length to reduce noise (10 is a practical starting point)
strings -n 10 usr/bin/httpd
# Show the file offset of each string (hex format with -t x)
strings -t x usr/bin/httpd
# Decimal offset
strings -t d usr/bin/httpd
The -t x flag is useful when you find an interesting string and want to locate it in the binary with a hex editor for context. The surrounding bytes may reveal more about how the string is used.
Filtering the Output
Raw strings output on a 500KB binary is thousands of lines. Filter it immediately.
# Keyword filter — broad
strings -n 10 usr/bin/httpd | grep -iE "key|pass|token|secret|auth|api"
# URL patterns — often contain embedded credentials
strings usr/bin/httpd | grep -E "https?://"
# URLs with embedded credentials (user:pass@host)
strings usr/bin/httpd | grep -E "https?://[^@]+@"
# Email addresses (sometimes used as account identifiers with passwords nearby)
strings usr/bin/httpd | grep -E "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}"
# Long alphanumeric strings in quotes (common API key appearance in format strings)
strings usr/bin/httpd | grep -E '"[A-Za-z0-9_\-]{20,}"'
Handling Multiple Binaries
Do not analyze binaries one by one when you have dozens. Run strings across the entire filesystem at once.
# All binaries in standard locations, minimum 12 chars, filter for secrets
find usr/bin usr/sbin bin sbin lib/ -type f | \
xargs strings -n 12 2>/dev/null | \
grep -iE "password|api_key|token|secret|private_key" | \
sort -u
# Show which binary contains each match (use strings with filename output)
find usr/bin usr/sbin -type f | \
while read f; do
strings -n 12 "$f" | grep -iE "api.key|token|secret" | \
sed "s|^|[$f] |"
done
The sed trick prepends the filename to each output line, giving you provenance for every match.
Architecture-Aware String Extraction
The default strings mode handles 8-bit characters (ASCII, Latin-1). ARM binaries on Windows CE or some RTOS environments use UTF-16 (wide strings). Standard strings silently skips these.
# Little-endian 16-bit (UTF-16LE) — common on ARM Windows CE
strings -e l usr/bin/httpd
# Big-endian 16-bit (UTF-16BE)
strings -e b usr/bin/httpd
# 32-bit little-endian (rare but exists on some RTOS)
strings -e L usr/bin/httpd
If you run standard strings and get suspicious gaps — short strings near long stretches of non-printable bytes — switch to -e l and check if readable strings appear.
What to Look for in strings Output
Work through this checklist when reviewing strings output:
Credential patterns:
- password=, passwd=, pwd= followed by a value
- HTTP Basic Auth format: Authorization: Basic [base64]
- URLs with credentials: http://admin:password@192.168.1.1
API key patterns:
- AWS: AKIA[A-Z0-9]{16} — always exactly 20 chars starting with AKIA
- Stripe live key: sk_live_[A-Za-z0-9]{24}
- Twilio: SK[a-f0-9]{32}
- Generic: any 32–64 character alphanumeric string in isolation
Base64-encoded secrets:
A base64-encoded secret does not look like a password — it looks like noise. But it has a recognizable character set and length characteristics.
# Find base64-like strings (20+ chars of valid base64 alphabet, optional padding)
strings binary | grep -E "^[A-Za-z0-9+/]{20,}={0,2}$" | while read b64; do
decoded=$(echo "$b64" | base64 -d 2>/dev/null)
# Only print if decoded output contains printable content
if echo "$decoded" | strings -n 4 | grep -qE ".{4}"; then
echo "B64: $b64"
echo " -> $(echo "$decoded" | strings -n 4 | head -3)"
fi
done
This loops over every candidate base64 string and decodes it, printing only those that decode to something with printable content.
Private key material fragments:
Even if a private key is not in PEM format, its numeric components are large integers that show up as long strings of digits or specific ASN.1 patterns.
# PEM header lines
strings binary | grep -E "BEGIN .* (KEY|CERTIFICATE)"
# DER-encoded key material often starts with these byte sequences (shown as hex)
# 30 82 = ASN.1 SEQUENCE, length encoded in 2 bytes (long form)
# Look for surrounding context when you find this with a hex editor
MQTT and protocol credentials:
strings usr/bin/cloud_agent | grep -iE "mqtt|broker|subscribe|publish" | head -20
# Then look at offsets nearby for associated credentials
strings -t x usr/bin/cloud_agent | grep -iE "mqtt|broker" | \
awk '{print $1}' | while read offset; do
# Convert hex offset to decimal for context inspection
echo "Offset 0x$offset"
done
Shared Libraries: The Overlooked Target
Most researchers run strings on executables and stop. Shared libraries are just as important.
# Find all shared libraries
find lib/ usr/lib/ -name "*.so*" -type f
# Run strings on all of them
find lib/ usr/lib/ -name "*.so*" -type f | \
xargs strings -n 12 2>/dev/null | \
grep -iE "key|token|secret|password|auth" | sort -u
# Libraries implementing cloud connectivity are especially valuable
find . -name "*.so*" | xargs grep -l "api\|cloud\|aws\|iot" 2>/dev/null
A library implementing device-to-cloud authentication frequently contains the API endpoint URL and a hardcoded device token. These tokens are often device-class secrets (shared across all devices of that model), not per-device secrets. Finding one means finding the credential for every device of that model globally.
The Offset Workflow: From String to Context
When you find an interesting string, knowing its offset lets you examine the surrounding bytes for context.
# Find the offset of a specific string
strings -t x usr/bin/httpd | grep "interesting_string"
# Output: 1a3c0 interesting_string
# Use xxd to examine bytes around that offset
xxd usr/bin/httpd | grep -A 5 -B 5 "1a3c0"
# Or use dd to extract a block of bytes around the offset
dd if=usr/bin/httpd bs=1 skip=$((0x1a3c0 - 32)) count=128 2>/dev/null | xxd
The 32 bytes before a string often contain: the format string that precedes it, the variable name in a debug print statement, or a length field. All of these give you context about what the string is used for.
Dynamic Analysis: strace and ltrace (When Emulation is Available)
If you can emulate the binary with QEMU (full or user-mode), you get a second weapon: runtime tracing.
# Trace system calls — look for network connections and file reads with credentials
qemu-arm -L squashfs-root/ strace -e trace=network,file squashfs-root/usr/bin/cloud_agent
# Trace library calls — catches string operations and network setup
qemu-arm -L squashfs-root/ ltrace squashfs-root/usr/bin/cloud_agent
ltrace intercepts calls to shared libraries. If the binary calls SSL_CTX_use_certificate_file or curl_easy_setopt(CURLOPT_PASSWORD, ...), you see the arguments — including the actual credential values — in the trace output, even if the binary would have been difficult to analyze statically.
Dynamic analysis is more complex to set up (QEMU user-mode emulation has compatibility issues), but it is the definitive method when static string extraction leaves ambiguity about what a binary actually uses at runtime.