How PayPerScrape Works
Your code sends a URL to PayPerScrape. We intelligently route to the right scraping strategy, fetch via PayPerScrape, evaluate completeness, and return clean HTML with metadata.
The Complete Flow
Your Code → PayPerScrape
Send a POST request to /api/scrape with the URL you want to scrape. Request is validated and rate-limited.
Strategy Lookup & Routing
We check Redis cache for known domain strategies. If cached, use optimal settings. If unknown, start with static mode.
Completeness Evaluation & Response
Analyze HTML for completeness, update domain strategy cache, and return clean HTML with detailed metadata about the scraping process.
Step-by-step, technically explicit
Every request follows this flow. No magic — just deterministic routing and a battle-tested escalation engine.
Use the x402 SDK to send a POST request with payment. The SDK handles payment signing and includes it in the request headers.
import { createSigner, wrapFetchWithPayment } from "@coinbase/x402";
const signer = await createSigner("base", privateKey);
const fetchWithPayment = wrapFetchWithPayment(fetch, signer);
const response = await fetchWithPayment(
"https://payperscrape.com/api/scrape",
{
method: "POST",
body: JSON.stringify({
url: "https://example.com"
})
}
);
const body = await response.json();We validate the URL, block private networks, enforce size limits, and apply strict per-IP + per-wallet rate limits.
// Validation steps
✓ URL normalization
✓ Private network protection
✓ Rate limit check (IP + wallet)
✓ Body size guardrails
✓ Abuse prevention middlewareEvery domain earns a cached strategy. If known, we reuse optimized settings. Unknown domains start with static mode.
// Strategy lookup
strategy = redis.get("domain_strategy:example.com")
if (strategy) {
mode = strategy.mode // "static" | "js" | "advanced"
render_js = strategy.render
region = strategy.country
} else {
mode = "static" // cold start
}HTML is fetched using the selected mode: static, JS-rendering, or advanced mode. Up to 4 attempts with smart escalation rules.
// Scraping attempt
response = scrapeEngine.fetch({
url: "https://example.com/products",
mode: "static",
render_js: false,
timeout: 35000
})
// If incomplete → escalate to JS
// If blocked/challenged → escalate to advanced mode
// Max 4 attempts with adaptive logicWe evaluate the HTML for completeness: hydration skeletons, bot challenges, geoblocks, semantic structure, and baseline size.
// Completeness checks
✓ Hydration skeleton detection
✓ Bot challenge detection
✓ Geo-blocking indicators
✓ Semantic HTML score
✓ Content-size comparison
✓ Login-wall heuristicsIf incomplete, the engine escalates from static → JS → advanced, then updates the domain's cached strategy for future requests.
// Escalation logic
if (incomplete) {
if (reason === "hydration_skeleton") escalate("js")
if (reason === "bot_challenge") escalate("advanced")
// Attempt again with new strategy
// Cache the result for the next request
}HTML is returned along with metadata: attempts, escalations, byte size, completeness, and decision reasoning.
{
"status": "success",
"html": "<!DOCTYPE html>...",
"metadata": {
"domain": "example.com",
"attempts": 2,
"escalations": 1,
"byte_size": 45230,
"is_partial": false,
"domain_confidence": 0.85,
"decision_reason": "ok"
},
"attempt_log": [
"static:ok",
"js:ok"
]
}We never guess blindly
Our detection logic combines multiple signals to route your requests correctly.
Redis-backed domain cache
Cached strategies stored in Redis. Known domains route instantly to optimal strategy (static/js/advanced).
Completeness evaluation
Analyzes HTML for hydration skeletons, bot challenges, geoblocks, semantic content, and size comparisons to detect incomplete responses.
Feedback & reporting
Report incomplete scrapes via /api/feedback. Domain classifications can be manually overridden by admins for fine-tuning.
Automatic strategy updates
Every scrape attempt updates the domain strategy cache. Success rates, failure rates, and confidence scores improve over time.
Reliability & speed you can count on
Domain strategies are cached in Redis for fast lookups. Unknown domains start on static mode and learn over time. Up to 4 automatic retry attempts with smart escalation ensure high success rates.
