Scraping in 2026 is mostly about deciding whose problem the bot detection is. There's a real spectrum, and the right pick is rarely the same for two scrapers.
Here's the decision tree we use now.
Layer 1 — start with HTTP
If the page renders the data into HTML on the server, just fetch. Add a normal
User-Agent, the Accept and Accept-Language headers a browser would send, and
respect robots.txt. We use undici directly; no library is necessary.
This works for ~30% of sites we touch — small business directories, gov data, classifieds, old WordPress sites. It's the cheapest, most maintainable layer. Use it whenever it works.
Layer 2 — headless browser, but yours
When the page needs JavaScript to render the data — most modern SPAs — switch to Playwright.
Run it in your own infra. Use stealth plugins to flatten the obvious "I am Playwright"
fingerprints (navigator.webdriver, the missing window.chrome shim, the wrong WebGL
renderer string).
This works for another ~50%. Anti-bot vendors can still detect you, but most sites don't buy anti-bot vendors.
Layer 3 — rent the browser
For Cloudflare-protected, DataDome-protected, PerimeterX-protected, or "Press and Hold" sites, your local Playwright loses. The right answer is to rent a browser that's hard to fingerprint as automation: Browserless, Bright Data Scraping Browser, ScrapingBee. These ship with a residential IP pool and continuous fingerprint updates.
You're paying for someone else to do the cat-and-mouse. The price is real ($1–$5 per 1000 requests) but so is the time you'd spend keeping up otherwise.
Layer 4 — stop scraping, start partnering
The single best scraper we ever shipped was a phone call. The site we wanted owned a direct competitor to our client. We called them. They sold us an API for less than we were spending on proxies.
If the data is core to your client's business, get on the phone before you write code. You'll often be surprised.
Picking the layer
Three questions:
- Does
curlget the data? If yes, you're at Layer 1. Ship it. - Is the site mainstream or commercial enough to use anti-bot software? If no, Layer 2.
- Is the value worth $1k+/month in tooling, plus engineer time on fingerprint maintenance? If yes, Layer 3. If no, Layer 4 — re-scope or partner.
We've seen too many teams jump straight to Layer 3 because it sounds robust. It is. It's also slow, expensive, and brittle in different ways. Most sites you'll ever scrape live at Layer 1 or Layer 2.
Operational notes
- Queue everything. Even a 5-page-per-minute scraper will eventually hit 30 pages a minute by accident, and you'll get banned. We use BullMQ on Redis.
- Log every request: URL, status, latency, response bytes. When something starts failing, this is the first place you look.
- Run scrapers from a single fixed IP region. Rotating regions changes too many variables when you're debugging.
- Cache aggressively. The cheapest request is the one you don't make.
The pattern that's worked best for us: small Playwright workers behind a queue, with a fingerprint preset per target site, and a fallback to a rented browser only for the specific routes that need it. Most clients don't need a Bright Data subscription — they need an honest queue and a polite UA.