Part 5 – Certificate Transparency, CT Logs and Cert-Based Discovery

Introduction

Every time a company issues an SSL certificate, it leaves a public trace.
That trace lives in Certificate Transparency logs.

Most teams forget this.
Attackers and good researchers do not.

In this Part, we will learn how to use certificate data to discover subdomains, staging apps, forgotten hosts, and third-party services.
Quietly. Reliably. With high confidence.

If passive OSINT was listening at the door, CT logs are reading the guest list.

Why certificate transparency matters

SSL certificates are public by design.
Companies often issue certs for internal, staging, or test domains.
Even deleted infrastructure leaves historical cert records.
Cert data usually has higher signal than brute force wordlists.

If a hostname ever needed HTTPS, there is a good chance it exists in CT logs.

What you can find using CT logs

Subdomains never linked on the main site
Staging and dev environments
Old infrastructure still reachable
Third-party services tied to the company
Wildcard cert usage patterns
Candidates for subdomain takeover checks

This is one of the cleanest ways to grow your asset list.

How certificate transparency works (simple explanation)

Whenever an SSL certificate is issued,
the Certificate Authority logs it publicly.

These logs include:

Domain names in the certificate
Issue date and expiry
Issuing CA

Anyone can query these logs.
That is what tools like crt.sh and certstream do.

Tools you will use

crt.sh – manual and automated CT queries
certstream – live certificate monitoring
amass – cert enrichment and correlation
jq, sed, sort – parsing and cleanup
curl – automated queries
dnsx / dig – validation and enrichment

No heavy tooling needed here. Signal over noise.

Step-by-step: CT log discovery workflow

1. Manual crt.sh search (fast sanity check)

Open in browser:

https://crt.sh/?q=%25.example.com

Look for:

Unexpected subdomains
Environment names like dev, stage, internal
Long or oddly named hosts

This gives you intuition before automation.

2. Automated crt.sh extraction (copy-paste)

curl -s "https://crt.sh/?q=%25.example.com&output=json" \
| jq -r '.[].name_value' \
| sed 's/\*\.//g' \
| sort -u > crt_raw.txt

Explanation in simple words:

Fetch all cert records for the domain
Extract domain names
Remove wildcard prefixes
Deduplicate the list

This file is your CT asset base.

3. Cleaning noisy entries

Certificates sometimes include:

Email addresses
Invalid formatting
Duplicate hostnames

Clean them:

cat crt_raw.txt | grep -E "^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" | sort -u > crt_clean.txt

Now you have only valid hostnames.

Wildcard certificates and how to handle them

Wildcard certs look like this:

*.example.com

Important points:

A wildcard cert does NOT mean all subdomains exist
It only means the cert allows them
You must still resolve hosts to confirm they are real

Always treat wildcard entries as candidates, not assets.

Detecting wildcard certificate usage

Check how many wildcard entries exist:

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq '.[].name_value' | grep '\*'

If wildcards dominate:

Expect noise
Focus on unique non-wildcard names first

Resolution and validation (critical step)

Never trust CT data blindly.

Resolve discovered hostnames:

cat crt_clean.txt | dnsx -a -cname -resp -silent -o crt_resolved.txt

What this gives you:

IP addresses
CNAME chains
Confirmation of existence

Only resolved hosts move forward.

Enriching CT data

For each resolved host, collect:

IP address
CNAME target
Cloud provider hint
CDN usage

Example quick enrichment:

cat crt_resolved.txt | while read h; do
  echo "---- $h ----"
  dig +short $h
  dig +short CNAME $h
done

This helps identify:

Cloud services
Third-party hosting
Takeover opportunities later

Live monitoring with certstream (advanced but powerful)

Certificate transparency is not only historical.
You can watch new certs in real time.

Certstream use-case

Catch new staging domains immediately
Monitor large programs continuously
Discover assets before others

Basic certstream setup (Python)

pip install certstream

Simple listener example:

certstream --domain example.com

Whenever a new cert is issued for the domain, you see it instantly.

This is gold for active bug bounty programs.

How to reduce false positives

CT logs may include:

Typo domains
Unrelated subdomains
Shared certs

Filtering tips:

Resolve everything
Compare IP ranges with known assets
Check HTTP response and headers
Correlate with repo leaks and OSINT

Only keep what connects logically to the target.

How CT data feeds other recon steps

CT output goes directly into:

Subdomain enumeration refinement
Subdomain takeover checks
URL collection
JS harvesting
Cloud footprint mapping

CT logs act as a high-confidence seed list.

Practical real-world use-cases

Finding staging.example.com used only for QA
Discovering forgotten admin-old.example.com
Catching new infrastructure added during a release
Identifying SaaS services connected via CNAME
Spotting takeover-prone dangling records

These are real bugs found repeatedly in programs.

Mini lab exercise (25–30 minutes)

Pick a domain you own or a lab domain.
Extract CT data:

curl -s "https://crt.sh/?q=%25.yourdomain.com&output=json" \
| jq -r '.[].name_value' \
| sed 's/\*\.//g' \
| sort -u > crt_raw.txt

Clean and resolve:

grep -E "^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" crt_raw.txt | sort -u > crt_clean.txt
dnsx -l crt_clean.txt -a -cname -resp -silent -o crt_resolved.txt

Open top three resolved hosts in your browser.
Add notes to your tracker:

Host
Status
Why it looks interesting

This trains you to treat cert data seriously.

Common mistakes and how to avoid them

Mistake: Treating wildcard certs as real hosts
Fix: Always resolve and validate

Mistake: Ignoring historical certs
Fix: Old infrastructure is often still reachable

Mistake: Not correlating CT with other data
Fix: Combine with repo leaks and OSINT

Mistake: Forgetting live monitoring
Fix: Use certstream for active programs

Quick command summary

CT extraction:

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | sed 's/\*\.//g' | sort -u > crt_raw.txt

Cleaning:

grep -E "^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$" crt_raw.txt | sort -u > crt_clean.txt

Resolution:

dnsx -l crt_clean.txt -a -cname -resp -silent -o crt_resolved.txt

Wildcard check:

curl -s "https://crt.sh/?q=%25.example.com&output=json" | jq -r '.[].name_value' | grep '\*'

What to do after this Part

Merge CT hosts into your master subdomain list
Run takeover checks on CNAME-based hosts
Feed live hosts into URL and JS collection
Set up certstream monitoring for active targets

CT data becomes a long-term asset, not a one-time check.

Next post preview

Part 6 – DNS Records, Zone Discovery and Wildcard Handling

We will cover:

DNS record types that matter for web apps
Zone transfer checks
Misconfigured DNS setups
Wildcard DNS detection and filtering
How DNS mistakes lead to real bugs

This builds directly on CT and subdomain work.

Closing thought

Certificates were meant to improve security.
They also improved visibility.

Use that visibility calmly and ethically.
CT logs reward patience and pattern recognition.

Disclaimer

This content is for educational purposes only. Use it ethically and only against targets you own or have explicit permission to test. Do not use any techniques described here in ways that break laws, platform rules, or third-party rights. If in doubt, stop and get permission.

Share the Post:

CyberXsociety