Part 2 – Passive OSINT and Footprinting

Introduction

Passive OSINT is the quiet stage of recon.
You collect what is already public without touching the target directly.
This is the highest value, lowest risk work you will do.

If you do passive recon well, you get a clean map of the attack surface.
You also reduce noise when you move to active checks later.

This Part explains exactly what to collect, how to collect it, quick commands, and how to turn the results into prioritized work.

Why passive OSINT matters

It reveals assets that people forget exist: subdomains, test environments, staging URLs.
It returns sensitive information without sending a single packet to the target.
It helps you prioritize what to scan actively later.
It gives defensible evidence for bug submissions or client reports.

Passive recon is like listening at the door. You hear valuable things without knocking.

What to check – short checklist

Domains and subdomains – cert logs, public lists.
Certificate transparency entries – historical hostnames.
Public repos – leaked configs, endpoints, tokens.
Archived URLs – Wayback and common-crawl data.
Search engine footprints – Google dorks, Bing operators.
Paste sites and leak feeds – Pastebin, GitHub Gists, public dumps.
Shodan/Censys results – exposed services listed publicly.
Cloud buckets and public storage – S3, GCS, Azure blobs.
Whois and registration history – ownership and name servers.
Employee mentions and social references – LinkedIn, Twitter.
Third-party references – CDN hosts, analytics, SDKs.

Collect everything, then dedupe and enrich. That single list becomes your inventory.

Tools – practical, not academic

crt.sh (web + curl) – certificate transparency lookup.
amass (passive mode) – large passive collectors.
subfinder (passive) – many data sources.
gau / waybackurls / waybackpack – archive URL collectors.
GitHub search / GH API / truffleHog / gitrob – repo scraping.
Censys / Shodan / BinaryEdge – public scans and banners (read-only).
SecurityTrails / Spyse – enrichment APIs (use with API keys).
Common Crawl – archive scraping.
pastebin scrapers / search engines – paste detection.
jq / sort / uniq / massdns – merge and resolve pipeline.
simple Python scripts – for bespoke parsing and enrichment.

Keep the list short and practical. Add paid services only if you will use them often.

Step-by-step passive workflow

1. Certificate transparency lookup (crt.sh)
This often gives subdomains you will not find elsewhere.

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > crt_subs.txt

Explanation: fetches CT entries for example.com, removes wildcard prefixes, and saves unique hostnames.

2. Passive collectors: amass and subfinder (passive only)

amass enum -passive -d example.com -o amass_passive.txt
subfinder -silent -d example.com -o subfinder_passive.txt

Explanation: amass pulls many public sources; subfinder aggregates multiple passive sources. Both complement each other.

3. Archive and historical URL collection

gau example.com > gau_urls.txt
waybackurls example.com >> gau_urls.txt
waybackpack https://example.com -o wayback_archive.zip

Explanation: gau and waybackurls collect old endpoints and pages. These often contain forgotten admin paths and backups.

4. GitHub and public repo scraping

Option A – GitHub web search (quick):

site:github.com "example.com" "KEY" OR "password"

Use GitHub search on the web for quick checks.

Option B – GH API / CLI (if you have token)

gh api -H "Accept: application/vnd.github.v3.text-match+json" /search/code -f q='example.com in:file' -f per_page=100 | jq '.'

Explanation: search code for references to the domain. Look for config files, URLs, tokens.

Tool note: run truffleHog, gitrob or custom regex scanners for deeper sweeps.

5. Paste sites and leak feeds

Check Pastebin, Ghostbin, and public dump monitors.
Search with queries like site:pastebin.com "example.com" or use leak monitoring services.

Quick example:

curl -s "https://search.censys.io/hosts?q=example.com" | jq .

Use public feeds but be cautious: many dumps contain sensitive personal data. Do not exfiltrate or store PII unnecessarily.

6. Whois and registration history

whois example.com

Look for: registrar, nameservers, created/updated dates, contact emails.

Use services like SecurityTrails for historical WHOIS and DNS changes.

7. Passive service and banner discovery (read-only)

Search Shodan or Censys for hostname:example.com or ASN ranges.

Shodan example (web): search ssl.cert.subject.CN:"example.com" or hostname:example.com

Note: Do not run active queries that you are not allowed to. Use public read-only data.

8. Social, employee, and business footprint

Check LinkedIn for employees mentioning internal tools or custom domains.
Look for job posts that reveal tech stack, ports, or internal services.
Tweets and forum posts sometimes leak staging URLs.

Search strings:

site:linkedin.com "example.com"
site:stackoverflow.com "example.com"

Merging and enrichment – the practical pipeline

After you collect outputs from all tools, merge and dedupe.

cat crt_subs.txt amass_passive.txt subfinder_passive.txt | sort -u > merged_passive_subs.txt

Resolve to live hosts:

massdns -r resolvers.txt -t A -o S -w resolved.txt merged_passive_subs.txt
cat resolved.txt | awk '{print $1}' | sed 's/\.$//g' | sort -u > live_passive_subs.txt

Enrich with whois, ASN and CDN provider:

Use whois or SecurityTrails API to tag ownership.
Use curl -I or dig to check CNAMEs and CDN headers.

Prioritisation: how to pick what to test next

Tag each host with small scores. Example columns:

Resolved? yes/no
Web response? 200/301/403/404/other
Contains JS? yes/no
Has API endpoints? yes/no
High-confidence leak? yes/no

Priority rules:

Hosts with a web response and JS are high priority.
Hosts referenced in public repos or certs get bumped.
Hosts under the same CDN or cloud account as main app get higher weight.

What to do with the passive results

Feed resolvable hosts into subdomain takeover checks.
Harvest JS files for endpoint extraction.
Push URLs to your URL collection pipeline for parameter extraction.
Create a “to-fuzz” list for safe fuzzing later.
Document any leaked secrets and follow responsible disclosure.

Common pitfalls in passive recon

Too much noise from wildcards: check for wildcard certs and wildcard DNS.
Old archives that are irrelevant: filter Wayback results by HTTP status and content-type.
False positives in GitHub results: validate every credential or endpoint before calling it sensitive.
Collecting PII unnecessarily: avoid storing real personal data unless you must report it securely.

Quick examples and useful commands – summary

Certificate transparency:

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > crt_subs.txt

Passive collectors:

amass enum -passive -d example.com -o amass_passive.txt
subfinder -d example.com -silent -o subfinder_passive.txt

Archive URLs:

gau example.com > gau_urls.txt
waybackurls example.com >> gau_urls.txt
sort -u gau_urls.txt > all_urls.txt

Merge and resolve:

cat crt_subs.txt amass_passive.txt subfinder_passive.txt | sort -u > merged.txt
massdns -r resolvers.txt -t A -o S -w resolved.txt merged.txt

Quick GitHub search (web):

site:github.com "example.com"

Mini lab exercise – practice passive recon (30 minutes)

Pick a domain you own or a lab domain.
Run certificate lookup and save results:

curl -s "https://crt.sh/?q=%25.yourlabdomain.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > crt_subs.txt

Run amass passive and subfinder passive:

amass enum -passive -d yourlabdomain.com -o amass_passive.txt
subfinder -d yourlabdomain.com -silent -o subfinder_passive.txt

Collect archive URLs:

gau yourlabdomain.com > gau_urls.txt
waybackurls yourlabdomain.com >> gau_urls.txt

Merge and resolve, then open the top three live hosts in your test browser.
Note anything interesting in tracker.csv.

This builds the habit of collecting, storing, and enriching passive data.

What comes next (how this feeds other Parts)

Subdomain list goes to Part 3 – Subdomain enumeration and brute-force.
Live hosts and JS links go to Part 13 – JavaScript reconnaissance.
Archived URLs and endpoints go to Part 11 – URL collection and crawling.
Leaked secrets or repo mentions are triaged and handled under responsible disclosure.

Short checklist – copy this into your notes

Run crt.sh and save results.
Run amass passive and subfinder passive.
Pull Wayback and Common Crawl URLs.
Search GitHub and paste sites.
Merge, dedupe and resolve.
Tag and prioritise results.
Export top hosts to URL collection and JS harvesting.

Common questions I get

Q: Do I need paid APIs like SecurityTrails?
A: Not always. You can do a lot with free sources. Paid APIs speed up enrichment and reduce manual work.

Q: How often should I run passive checks?
A: For live programmes, daily watches will catch changes. For learning, run once per engagement and set a weekly watch if needed.

Q: I found a secret in GitHub – what next?
A: Do not use it. Verify minimally, then follow responsible disclosure: notify owner, suggest rotation, include proof references.

Closing notes – be curious and careful

Passive recon is powerful exactly because it is quiet.
If you move carefully you will find high-value assets without burning noise or permissions.

Keep your inventory clean. Save your proofs. Tag everything.

Next post preview

Part 3 – Subdomain enumeration (passive + active merged).
We will cover permutation techniques, brute force with tuned lists, wildcard handling, resolving pipelines and how to avoid common false positives.
Expect full commands, ready lists and a short lab exercise.

Disclaimer

This material is for educational purposes only. Use it ethically and only against targets you own or have explicit permission to test. Do not use any techniques described here in ways that break laws, platform rules, or third-party rights. If in doubt, stop and get permission.

Share the Post:

CyberXsociety