How PayPerScrape Works

Your code sends a URL to PayPerScrape. We intelligently route to the right scraping strategy, fetch via ScraperAPI, evaluate completeness, and return clean HTML with metadata.

Your CodeHTTP request
PayPerScrapeStrategy Router
Redis CacheDomain strategies
HTML + MetadataClean response

The Complete Flow

1

Your Code → PayPerScrape

Send a POST request to /api/scrape with the URL you want to scrape. Request is validated and rate-limited.

2

Strategy Lookup & Routing

We check Redis cache for known domain strategies. If cached, use optimal settings. If unknown, start with static mode.

3

Completeness Evaluation & Response

Analyze HTML for completeness, update domain strategy cache, and return clean HTML with detailed metadata about the scraping process.

Step-by-step, technically explicit

Every request follows this flow. No magic — just deterministic routing and a battle-tested escalation engine.

01
Your code sends a request

POST request containing the URL to scrape. Optional fields include render_js, strict_mode, wait_for, and timeout_ms.

POST https://api.payperscrape.com/scrape
{
  "url": "https://example.com/products",
  "render_js": false,
  "strict_mode": false
}
02
Request validation & rate limiting

We validate the URL, block private networks, enforce size limits, and apply strict per-IP + per-wallet rate limits.

// Validation steps
✓ URL normalization
✓ Private network protection
✓ Rate limit check (IP + wallet)
✓ Body size guardrails
✓ Abuse prevention middleware
03
Domain strategy lookup

Every domain earns a cached strategy. If known, we reuse optimized settings. Unknown domains start with the cheapest static mode.

// Strategy lookup
strategy = redis.get("domain_strategy:example.com")

if (strategy) {
  mode = strategy.mode        // "static" | "js" | "advanced"
  render_js = strategy.render
  region = strategy.country
} else {
  mode = "static"  // cold start
}
04
Scraping (PayPerScrape Engine)

HTML is fetched using the selected mode: static, JS-rendering, or advanced mode. Up to 4 attempts with smart escalation rules.

// Scraping attempt
response = scrapeEngine.fetch({
  url: "https://example.com/products",
  mode: "static",
  render_js: false,
  timeout: 35000
})

// If incomplete → escalate to JS
// If blocked/challenged → escalate to advanced mode
// Max 4 attempts with adaptive logic
05
Completeness evaluation

We evaluate the HTML for completeness: hydration skeletons, bot challenges, geoblocks, semantic structure, and baseline size.

// Completeness checks
✓ Hydration skeleton detection
✓ Bot challenge detection
✓ Geo-blocking indicators
✓ Semantic HTML score
✓ Content-size comparison
✓ Login-wall heuristics
06
Automatic strategy escalation

If incomplete, the engine escalates from static → JS → advanced, then updates the domain's cached strategy for future requests.

// Escalation logic
if (incomplete) {
  if (reason === "hydration_skeleton") escalate("js")
  if (reason === "bot_challenge")      escalate("advanced")

  // Attempt again with new strategy
  // Cache the result for the next request
}
07
Response with metadata

HTML is returned along with metadata: attempts, escalations, byte size, completeness, and decision reasoning.

{
  "status": "success",
  "html": "<!DOCTYPE html>...",
  "metadata": {
    "domain": "example.com",
    "attempts": 2,
    "escalations": 1,
    "byte_size": 45230,
    "is_partial": false,
    "domain_confidence": 0.85,
    "decision_reason": "ok"
  },
  "attempt_log": [
    "static:ok",
    "js:ok"
  ]
}

We never guess blindly

Our detection logic combines multiple signals to route your requests correctly.

Redis-backed domain cache

Cached strategies stored in Redis with 5-day TTL. Known domains route instantly to optimal strategy (static/js/premium).

Completeness evaluation

Analyzes HTML for hydration skeletons, bot challenges, geoblocks, semantic content, and size comparisons to detect incomplete responses.

Feedback & reporting

Report incomplete scrapes via /api/feedback. Domain classifications can be manually overridden by admins for fine-tuning.

Automatic strategy updates

Every scrape attempt updates the domain strategy cache. Success rates, failure rates, and confidence scores improve over time.

Reliability & speed you can count on

Domain strategies are cached in Redis with 5-day TTL for fast lookups. Unknown domains start cheap (static mode) and learn over time. Up to 4 automatic retry attempts with smart escalation ensure high success rates.

Fast Redis lookups for cached domain strategies
Up to 4 automatic retry attempts with escalation
Intelligent escalation: static → JS → premium
performance metrics
Strategy lookup (Redis)<10ms
Static scrape (baseline)1-3s
JS rendering scrape3-8s
Max retry attempts4 attempts