How PayPerScrape Works
Your code sends a URL to PayPerScrape. We intelligently route to the right scraping strategy, fetch via ScraperAPI, evaluate completeness, and return clean HTML with metadata.
The Complete Flow
Your Code → PayPerScrape
Send a POST request to /api/scrape with the URL you want to scrape. Request is validated and rate-limited.
Strategy Lookup & Routing
We check Redis cache for known domain strategies. If cached, use optimal settings. If unknown, start with static mode.
Completeness Evaluation & Response
Analyze HTML for completeness, update domain strategy cache, and return clean HTML with detailed metadata about the scraping process.
Step-by-step, technically explicit
Every request follows this flow. No magic — just deterministic routing and a battle-tested escalation engine.
POST request containing the URL to scrape. Optional fields include render_js, strict_mode, wait_for, and timeout_ms.
POST https://api.payperscrape.com/scrape
{
"url": "https://example.com/products",
"render_js": false,
"strict_mode": false
}We validate the URL, block private networks, enforce size limits, and apply strict per-IP + per-wallet rate limits.
// Validation steps
✓ URL normalization
✓ Private network protection
✓ Rate limit check (IP + wallet)
✓ Body size guardrails
✓ Abuse prevention middlewareEvery domain earns a cached strategy. If known, we reuse optimized settings. Unknown domains start with the cheapest static mode.
// Strategy lookup
strategy = redis.get("domain_strategy:example.com")
if (strategy) {
mode = strategy.mode // "static" | "js" | "advanced"
render_js = strategy.render
region = strategy.country
} else {
mode = "static" // cold start
}HTML is fetched using the selected mode: static, JS-rendering, or advanced mode. Up to 4 attempts with smart escalation rules.
// Scraping attempt
response = scrapeEngine.fetch({
url: "https://example.com/products",
mode: "static",
render_js: false,
timeout: 35000
})
// If incomplete → escalate to JS
// If blocked/challenged → escalate to advanced mode
// Max 4 attempts with adaptive logicWe evaluate the HTML for completeness: hydration skeletons, bot challenges, geoblocks, semantic structure, and baseline size.
// Completeness checks
✓ Hydration skeleton detection
✓ Bot challenge detection
✓ Geo-blocking indicators
✓ Semantic HTML score
✓ Content-size comparison
✓ Login-wall heuristicsIf incomplete, the engine escalates from static → JS → advanced, then updates the domain's cached strategy for future requests.
// Escalation logic
if (incomplete) {
if (reason === "hydration_skeleton") escalate("js")
if (reason === "bot_challenge") escalate("advanced")
// Attempt again with new strategy
// Cache the result for the next request
}HTML is returned along with metadata: attempts, escalations, byte size, completeness, and decision reasoning.
{
"status": "success",
"html": "<!DOCTYPE html>...",
"metadata": {
"domain": "example.com",
"attempts": 2,
"escalations": 1,
"byte_size": 45230,
"is_partial": false,
"domain_confidence": 0.85,
"decision_reason": "ok"
},
"attempt_log": [
"static:ok",
"js:ok"
]
}We never guess blindly
Our detection logic combines multiple signals to route your requests correctly.
Redis-backed domain cache
Cached strategies stored in Redis with 5-day TTL. Known domains route instantly to optimal strategy (static/js/premium).
Completeness evaluation
Analyzes HTML for hydration skeletons, bot challenges, geoblocks, semantic content, and size comparisons to detect incomplete responses.
Feedback & reporting
Report incomplete scrapes via /api/feedback. Domain classifications can be manually overridden by admins for fine-tuning.
Automatic strategy updates
Every scrape attempt updates the domain strategy cache. Success rates, failure rates, and confidence scores improve over time.
Reliability & speed you can count on
Domain strategies are cached in Redis with 5-day TTL for fast lookups. Unknown domains start cheap (static mode) and learn over time. Up to 4 automatic retry attempts with smart escalation ensure high success rates.
