Learn about Bright Data’s Scraping Browser solution, how to get started, and some tips for best use.
Bright Data’s Scraping Browser
Scraping Browser is one of our proxy-unblocking solutions and is designed to help you easily focus on your data collection from browsers while we take care of the full proxy and unblocking infrastructure for you.
You can now easily access and navigate target websites via libraries such as Pupateer or Playwright and interact with the site's HTML code to extract the data you need.
Behind the scenes, our Scraping Browser solution incorporates our complete proxy infrastructure along with our dynamic unlocking capabilities to get you the exact data you need wherever it may be.
Best for
-
-
Puppeteer/Playwright integration
-
Navigating through a website, clicking buttons, scrolling to load a full page, hovering, and more
- Teams that don’t have a reliable browser unblocking infrastructure in-house
-
Quick start
- Sign in to your Bright Data control panel
- If you haven’t yet signed up for Bright Data, you can sign up for free, and when adding your payment method, you’ll receive a $5 credit to get you started!
- Create your new Scraping Browser proxy
- Navigate to ‘My Proxies’ page, and under ‘Scraping Browser’ click ‘Get started’
Note: If you already have an active proxy, simply choose ‘Add proxy’ at the top right
- Navigate to ‘My Proxies’ page, and under ‘Scraping Browser’ click ‘Get started’
- In the ‘Create a new proxy” page, choose and input a name for your new Scraping Browser proxy zone
Note: Please select a meaningful name, as the zone's name cannot be changed once created - To create and save your proxy, click ‘Add proxy’
A note on Account verification:
If you haven’t yet added a payment method, you’ll be prompted to add one at this point in order to verify your account. If it’s your first time using Bright Data, then you’ll also receive a $5 bonus credit to get you started!
Be advised: You will not be charged anything at this point and this is solely for verification purposes. - Creating your first Scraping Browser session in Node.js or Python
After verifying your account above, you can now create your first browser session.
In your proxy zone’s ‘Access parameters’ tab, you’ll find your API credentials which include your Username (Customer_ID), Zone name (attached to username), and Password. You will use them in the following integration.
Node.js Example
Install Puppeteer-core (lightweight package without its own browser distribution)
npm i puppeteer-core
See the example script below (swap in your credentials, zone, and target URL):
constpuppeteer=require('puppeteer-core');
// should look like 'brd-customer-<ACCOUNT ID>-zone-<ZONE NAME>:<PASSWORD>'
constauth='USERNAME:PASSWORD';
asyncfunctionrun(){
let browser;
try {
browser =await puppeteer.connect({
browserWSEndpoint: `wss://${auth}@zproxy.lum-superproxy.io:9222`,
});
constpage=await browser.newPage();
page.setDefaultNavigationTimeout(2*60*1000);
awaitpage.goto('https://example.com');
consthtml=awaitpage
.evaluate(() =>document.documentElement.outerHTML);
console.log(html);
} catch(e){
console.error('run failed', e);
} finally {
await browser?.close();
}
}
if (require.main==module)
run();
Run the script:
node script.js
Python Example
Install Playwright
pip3 install playwright
See the example script below (swap in your credentials, zone, and target URL):
import asyncio
from playwright.async_api import async_playwright
# should look like 'brd-customer-<ACCOUNT ID>-zone-<ZONE NAME>:<PASSWORD>'
browser_url = f'https://{auth}@zproxy.lum-superproxy.io:9222'
asyncdefmain():
asyncwith async_playwright() as pw:
print('connecting');
browser = await pw.chromium.connect_over_cdp(browser_url)
print('connected');
page = await browser.new_page()
print('goto')
await page.goto('https://example.com', timeout=120000)
print('done, evaluating')
print(await page.evaluate('()=>document.documentElement.outerHTML'))
await browser.close()
asyncio.run(main())
Run the script
python scrape.py
Additional Info and Resources
Scraping Browser Demo
Blocking Requests
It is possible to block endpoints that are not required to save bandwidth.
See an example of this below:
// connect to a remote browser...
const blockedUrls = ['*doubleclick.net*];
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Network.enable');
await client.send('Network.setBlockedURLs', {urls: blockedUrls});
await page.goto('https://washingtonpost.com');
Country Targeting
When using the Scraping Browser, the same country-targeting parameter is available to use as in our other proxy products.
When sending your request, add the -country
flag, after your zone’s name in the request, followed by the 2-letter ISO code for that country.
In the example below, we added -country-us
to our request, so our request will originate from the United States ("us").
curl--proxy zproxy.lum-superproxy.io:22225 --proxy-user brd-customer-<CUSTOMER_ID>-zone-<ZONE_NAME>-country-us: <ZONE_PASSWORD> "http://target.site"
EU region
You can target the entire European Union region in the same manner as "Country" above by adding "eu" after "country" in your request: -country-eu
Requests sent using -country-eu
, will use IPs from one of the countries below which are included automatically within "eu":
AL, AZ, KG, BA, UZ, BI, XK, SM, DE, AT, CH, UK, GB, IE, IM, FR, ES, NL, IT, PT, BE, AD, MT, MC, MA, LU, TN, DZ, GI, LI, SE, DK, FI, NO, AX, IS, GG, JE, EU, GL, VA, FX, FO
Note: The allocation of a country within the EU is random.
Troubleshooting
Viewing CAPTCHA Solver Status in CDP
When Scraping Browser encounters a CAPTCHA, the following events will be triggered to show you the current phase of our built in CAPTCHA solver (and can be viewed within CDP):
Captcha.detected |
Scraping Browser has encountered a CAPTCHA and has begun to solve it |
Captcha.solveFinished | Scraping Browser successfully solved the CAPTCHA |
Captcha.solveFailed |
Scraping Browser failed in solving the CAPTCHA |
In the case of a failure to solve a CAPTCHA, please retry and/or send a support request with the exact issue encountered.