Introduction
Now things start getting real.
Until now, you found:
- Subdomains
- IPs
- Live hosts
- Running services
But vulnerabilities do not live in domains.
They live in URLs and endpoints.
This part is where recon becomes practical.
You are no longer mapping.
You are preparing to test.
Why URL collection matters
- URLs show actual functionality
- Hidden endpoints often lead to bugs
- Old URLs reveal forgotten features
- Parameters become entry points for testing
A single good endpoint is worth more than 100 subdomains.
What you are trying to collect
- All reachable URLs
- Hidden endpoints
- API paths
- Parameters
- Old and archived URLs
This becomes your testing dataset.
Two approaches you will combine
Passive collection
No interaction with target.
Sources:
- Wayback Machine
- Common Crawl
- Public datasets
Active collection
Direct interaction with target.
Methods:
- Crawling
- Spidering
- Endpoint discovery
Best results come from combining both.
Tools you will use
- gau – gather URLs from multiple sources
- waybackurls – archive URLs
- katana – modern crawler
- hakrawler – lightweight crawler
- httpx – validation
- uro – URL deduplication
- grep / gf – filtering
- jq – parsing
Step-by-step URL collection workflow
1. Passive URL collection (start here)
Use gau:
gau example.com > gau_urls.txt
Use wayback:
waybackurls example.com >> gau_urls.txt
Merge and clean:
sort -u gau_urls.txt > passive_urls.txt
What you get
- Historical endpoints
- Old APIs
- Deprecated paths
These are high-value.
2. Active crawling with katana
katana -u https://example.com -d 3 -o katana_urls.txt
Explanation
-d 3sets crawl depth- Extracts endpoints dynamically
This finds:
- Live pages
- JS-linked endpoints
- Hidden paths
3. Combine passive and active results
cat passive_urls.txt katana_urls.txt | sort -u > all_urls.txt
Now you have a unified dataset.
4. Remove noise (very important)
Many URLs are useless:
- Images
- CSS
- Fonts
Filter them:
cat all_urls.txt | grep -Ev "\.(jpg|jpeg|png|gif|css|js|woff|svg)$" > clean_urls.txt
Now your list is usable.
Parameter extraction (critical step)
You want URLs like:
https://example.com/page?id=123
Extract parameterised URLs:
cat clean_urls.txt | grep "=" > params.txt
These are your attack points.
Normalising URLs
Avoid duplicates:
uro < clean_urls.txt > final_urls.txt
This removes:
- Duplicate parameters
- Repeated endpoints
Clean data = faster testing.
Finding interesting endpoints
Filter for keywords:
cat final_urls.txt | grep -E "api|admin|login|auth|debug" > interesting.txt
Focus on:
/api/admin/login/auth/internal
These are high-value.
Extracting endpoints from JavaScript
JS files often contain hidden URLs.
First collect JS files:
cat final_urls.txt | grep "\.js$" > js_files.txt
Then extract endpoints:
cat js_files.txt | while read url; do curl -s $url; done | grep -oE "https?://[^\"']+" | sort -u > js_endpoints.txt
This reveals:
- APIs
- Hidden routes
- Internal services
Very powerful step.
Validating collected URLs
Not all URLs are alive.
Check:
httpx -l final_urls.txt -status-code -silent -o live_urls.txt
Now you have only working endpoints.
Prioritisation strategy
Focus on:
High priority:
- URLs with parameters
- API endpoints
- Auth-related paths
- Admin panels
Medium priority:
- Static pages
- Informational endpoints
Low priority:
- Repeated or duplicate URLs
Real-world use-cases
- Finding
/api/v1/user?id=endpoint - Discovering hidden
/admin-panel - Identifying
/debugendpoints - Extracting internal APIs from JS
- Finding old vulnerable endpoints from Wayback
These are common bug bounty wins.
Mini lab exercise (30–40 minutes)
- Run passive collection:
gau example.com > urls.txt
waybackurls example.com >> urls.txt
- Run active crawl:
katana -u https://example.com -o katana.txt
- Merge:
cat urls.txt katana.txt | sort -u > all.txt
- Filter:
grep -Ev "\.(jpg|png|css|js)$" all.txt > clean.txt
- Extract params:
grep "=" clean.txt > params.txt
- Open 3 endpoints and note:
- What they do
- What you can test
Common mistakes and fixes
Mistake: Only using one tool
Fix: Combine passive and active
Mistake: Not filtering noise
Fix: Remove static files early
Mistake: Ignoring parameters
Fix: Parameters are key attack points
Mistake: Not validating URLs
Fix: Always use httpx
Quick command summary
Passive collection:
gau example.com
waybackurls example.com
Active crawl:
katana -u https://example.com
Filter:
grep -Ev "\.(jpg|png|css|js)$"
Params:
grep "="
Validation:
httpx -l urls.txt
What to do after this Part
- Start parameter fuzzing
- Test for XSS, SQLi, IDOR
- Analyze APIs
- Move into vulnerability testing phase
Now you are no longer doing recon only.
You are entering exploitation phase preparation.
Next post preview
Part 12 – Visual Recon and Quick Triage (Screenshots, Patterns, Grouping)
We will cover:
- Screenshot-based recon
- Fast manual triage
- Identifying patterns visually
- Grouping similar apps
This will help you move faster with clarity.
Closing thought
Domains show structure.
URLs show behaviour.
And behaviour is where vulnerabilities live.
Disclaimer
This content is for educational purposes only. Use it ethically and only against targets you own or have explicit permission to test. Do not use any techniques described here in ways that break laws, platform rules, or third-party rights. If in doubt, stop and get permission.

