Getting started with Scraping Browser

Learn about Bright Data’s Scraping Browser solution, how to get started, and some tips for best use.

Bright Data’s Scraping Browser

Scraping Browser is one of our proxy-unblocking solutions and is designed to help you easily focus on your data collection from browsers while we take care of the full proxy and unblocking infrastructure for you. 

You can now easily access and navigate target websites via libraries such as Pupateer or Playwright and interact with the site's HTML code to extract the data you need.

Behind the scenes, our Scraping Browser solution incorporates our complete proxy infrastructure along with our dynamic unlocking capabilities to get you the exact data you need wherever it may be.

Best for

    • Puppeteer/Playwright integration

    • Navigating through a website, clicking buttons, scrolling to load a full page, hovering, and more

    • Teams that don’t have a reliable browser unblocking infrastructure in-house 

Quick start

  1. Sign in to your Bright Data control panel
    • If you haven’t yet signed up for Bright Data, you can sign up for free, and when adding your payment method, you’ll receive a $5 credit to get you started!
  2. Create your new Scraping Browser proxy
    • Navigate to ‘My Proxies’ page, and under ‘Scraping Browser’ click ‘Get started’
      mceclip0.png
      Note: If you already have an active proxy, simply choose ‘Add proxy’ at the top right
  3. In the ‘Create a new proxy” page, choose and input a name for your new Scraping Browser proxy zone
    Note: Please select a meaningful name, as the zone's name cannot be changed once created

  4. To create and save your proxy, click ‘Add proxy

    A note on Account verification:

    If you haven’t yet added a payment method, you’ll be prompted to add one at this point in order to verify your account. If it’s your first time using Bright Data, then you’ll also receive a $5 bonus credit to get you started!

    Be advised: You will not be charged anything at this point and this is solely for verification purposes.

     

  5. Creating your first Scraping Browser session in Node.js or Python

    After verifying your account above, you can now create your first browser session.

    In your proxy zone’s ‘Access parameters’ tab, you’ll find your API credentials which include your Username (Customer_ID), Zone name (attached to username), and Password. You will use them in the following integration.

Node.js Example

Install Puppeteer-core (lightweight package without its own browser distribution)

npm i puppeteer-core

See the example script below (swap in your credentials, zone, and target URL):

constpuppeteer=require('puppeteer-core');

// should look like 'brd-customer-<ACCOUNT ID>-zone-<ZONE NAME>:<PASSWORD>'

constauth='USERNAME:PASSWORD';

asyncfunctionrun(){

let browser;

try {

      browser =await puppeteer.connect({

browserWSEndpoint: `wss://${auth}@zproxy.lum-superproxy.io:9222`,

      });

constpage=await browser.newPage();

page.setDefaultNavigationTimeout(2*60*1000);

awaitpage.goto('https://example.com');

consthtml=awaitpage

          .evaluate(() =>document.documentElement.outerHTML);

console.log(html);

  } catch(e){

console.error('run failed', e);

  } finally {

await browser?.close();

  }

}

if (require.main==module)

run();

Run the script:

node script.js

Python Example

Install Playwright

pip3 install playwright

See the example script below (swap in your credentials, zone, and target URL):

import asyncio
from playwright.async_api import async_playwright
# should look like 'brd-customer-<ACCOUNT ID>-zone-<ZONE NAME>:<PASSWORD>'
browser_url = f'https://{auth}@zproxy.lum-superproxy.io:9222'
asyncdefmain():
asyncwith async_playwright() as pw:
print('connecting');
     browser = await pw.chromium.connect_over_cdp(browser_url)
print('connected');
     page = await browser.new_page()
print('goto')
await page.goto('https://example.com', timeout=120000)
print('done, evaluating')
print(await page.evaluate('()=>document.documentElement.outerHTML'))
await browser.close()
asyncio.run(main())

Run the script

python scrape.py

 

Additional Info and Resources

Scraping Browser Demo

Blocking Requests

It is possible to block endpoints that are not required to save bandwidth.
See an example of this below:

 // connect to a remote browser...

const blockedUrls = ['*doubleclick.net*];
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Network.enable');
await client.send('Network.setBlockedURLs', {urls: blockedUrls});
await page.goto('https://washingtonpost.com');

Country Targeting

When using the Scraping Browser, the same country-targeting parameter is available to use as in our other proxy products. 

When sending your request, add the -country flag, after your zone’s name in the request, followed by the 2-letter ISO code for that country. 

In the example below, we added -country-us to our request, so our request will originate from the United States ("us").

curl--proxy zproxy.lum-superproxy.io:22225 --proxy-user brd-customer-<CUSTOMER_ID>-zone-<ZONE_NAME>-country-us: <ZONE_PASSWORD>  "http://target.site"

EU region

You can target the entire European Union region in the same manner as "Country" above by adding "eu" after "country" in your request: -country-eu

Requests sent using -country-eu, will use IPs from one of the countries below which are included automatically within "eu":

AL, AZ, KG, BA, UZ, BI, XK, SM, DE, AT, CH, UK, GB, IE, IM, FR, ES, NL, IT, PT, BE, AD, MT, MC, MA, LU, TN, DZ, GI, LI, SE, DK, FI, NO, AX, IS, GG, JE, EU, GL, VA, FX, FO

Note: The allocation of a country within the EU is random. 

Troubleshooting

Viewing CAPTCHA Solver Status in CDP

When Scraping Browser encounters a CAPTCHA, the following events will be triggered to show you the current phase of our built in CAPTCHA solver (and can be viewed within CDP):

Captcha.detected

Scraping Browser has encountered a CAPTCHA and has begun to solve it
Captcha.solveFinished Scraping Browser successfully solved the CAPTCHA

Captcha.solveFailed

Scraping Browser failed in solving the CAPTCHA

In the case of a failure to solve a CAPTCHA, please retry and/or send a support request with the exact issue encountered. 

Was this article helpful?