How PayPerScrape Works

Your code sends a URL to PayPerScrape. We intelligently route to the right scraping strategy, fetch via PayPerScrape, evaluate completeness, and return clean HTML with metadata.

Your CodeHTTP request

PayPerScrapeStrategy Router

Redis CacheDomain strategies

HTML + MetadataClean response

The Complete Flow

Your Code → PayPerScrape

Send a POST request to /api/scrape with the URL you want to scrape. Request is validated and rate-limited.

Strategy Lookup & Routing

We check Redis cache for known domain strategies. If cached, use optimal settings. If unknown, start with static mode.

Completeness Evaluation & Response

Analyze HTML for completeness, update domain strategy cache, and return clean HTML with detailed metadata about the scraping process.

Step-by-step, technically explicit

Every request follows this flow. No magic — just deterministic routing and a battle-tested escalation engine.

Your code sends a request

Use the x402 SDK to send a POST request with payment. The SDK handles payment signing and includes it in the request headers.

import { createSigner, wrapFetchWithPayment } from "@coinbase/x402";

const signer = await createSigner("base", privateKey);
const fetchWithPayment = wrapFetchWithPayment(fetch, signer);

const response = await fetchWithPayment(
  "https://payperscrape.com/api/scrape",
  {
    method: "POST",
    body: JSON.stringify({
      url: "https://example.com"
    })
  }
);

const body = await response.json();

Request validation & rate limiting

We validate the URL, block private networks, enforce size limits, and apply strict per-IP + per-wallet rate limits.

// Validation steps
✓ URL normalization
✓ Private network protection
✓ Rate limit check (IP + wallet)
✓ Body size guardrails
✓ Abuse prevention middleware

Domain strategy lookup

Every domain earns a cached strategy. If known, we reuse optimized settings. Unknown domains start with static mode.

// Strategy lookup
strategy = redis.get("domain_strategy:example.com")

if (strategy) {
  mode = strategy.mode        // "static" | "js" | "advanced"
  render_js = strategy.render
  region = strategy.country
} else {
  mode = "static"  // cold start
}

Scraping (PayPerScrape Engine)

HTML is fetched using the selected mode: static, JS-rendering, or advanced mode. Up to 4 attempts with smart escalation rules.

// Scraping attempt
response = scrapeEngine.fetch({
  url: "https://example.com/products",
  mode: "static",
  render_js: false,
  timeout: 35000
})

// If incomplete → escalate to JS
// If blocked/challenged → escalate to advanced mode
// Max 4 attempts with adaptive logic

Completeness evaluation

We evaluate the HTML for completeness: hydration skeletons, bot challenges, geoblocks, semantic structure, and baseline size.

// Completeness checks
✓ Hydration skeleton detection
✓ Bot challenge detection
✓ Geo-blocking indicators
✓ Semantic HTML score
✓ Content-size comparison
✓ Login-wall heuristics

Automatic strategy escalation

If incomplete, the engine escalates from static → JS → advanced, then updates the domain's cached strategy for future requests.

// Escalation logic
if (incomplete) {
  if (reason === "hydration_skeleton") escalate("js")
  if (reason === "bot_challenge")      escalate("advanced")

  // Attempt again with new strategy
  // Cache the result for the next request
}

Response with metadata

HTML is returned along with metadata: attempts, escalations, byte size, completeness, and decision reasoning.

{
  "status": "success",
  "html": "<!DOCTYPE html>...",
  "metadata": {
    "domain": "example.com",
    "attempts": 2,
    "escalations": 1,
    "byte_size": 45230,
    "is_partial": false,
    "domain_confidence": 0.85,
    "decision_reason": "ok"
  },
  "attempt_log": [
    "static:ok",
    "js:ok"
  ]
}

We never guess blindly

Our detection logic combines multiple signals to route your requests correctly.

Redis-backed domain cache

Cached strategies stored in Redis. Known domains route instantly to optimal strategy (static/js/advanced).

Completeness evaluation

Analyzes HTML for hydration skeletons, bot challenges, geoblocks, semantic content, and size comparisons to detect incomplete responses.

Feedback & reporting

Report incomplete scrapes via /api/feedback. Domain classifications can be manually overridden by admins for fine-tuning.

Automatic strategy updates

Every scrape attempt updates the domain strategy cache. Success rates, failure rates, and confidence scores improve over time.

Reliability & speed you can count on

Domain strategies are cached in Redis for fast lookups. Unknown domains start on static mode and learn over time. Up to 4 automatic retry attempts with smart escalation ensure high success rates.

Fast Redis lookups for cached domain strategies

Up to 4 automatic retry attempts with escalation

Intelligent escalation: static → JS → advanced

performance metrics

Strategy lookup (Redis)<10ms

Static scrape (baseline)1-3s

JS rendering scrape3-8s

Max retry attempts4 attempts