How To: common browser navigation functions & monitoring CAPTCHA solving

Scraping Browser supports CDP so all puppeteer functions/features work within our browsers. You can find all puppeteer API documentation and usage examples on the official puppeteer documentation page. We have also added a few Bright Data-specific CDP events which can be useful as well.

The following are a few common feature examples to get you started.

Common Browser Navigation Functions

Get page HTML

// node.js puppeteer 
const
page = await browser.newPage();
await page.goto('https://example.com');
const html = await page.content();

For more info: https://pptr.dev/api/puppeteer.page.content

Click on element

// node.js puppeteer 
const
page = await page.newPage();
await page.goto('https://example.com');
await page.click('a[href]');

For more info: https://pptr.dev/api/puppeteer.page.click

Scroll to page bottom

You might need to scroll the viewport to the bottom at times, such as when activating 'infinite scroll'. Here's how:

// node.js puppeteer 
const
page = await page.newPage();
await page.goto('https://example.com');
await page.evaluate(()=>window.scrollBy(0, window.innerHeight));

Take a screenshot

// node.js puppeteer - Taking screenshot to file screenshot.png 
// More info at https://pptr.dev/api/puppeteer.page.screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });
# python playwright - Taking screenshot to file screenshot.png
# More info at https://playwright.dev/python/docs/screenshots
await page.screenshot(path='screenshot.png', full_page=True)
// C# PuppeteerSharp - Taking screenshot to file screenshot.png
await page.ScreenshotAsync("screenshot.png", new ()
{
    FullPage = true,
});

When running the example scripts above the screenshot above will be saved as “screenshot.png” within your files.

Set Cookies for your targeted domain

Please note that this is supported for KYC-approved customers only.

// node.js puppeteer 
const
page = await browser.newPage();
await page.setCookie({name: 'LANG', value: 'en-US', domain: 'example.com'});
await page.goto('https://example.com');

For more info: https://pptr.dev/api/puppeteer.page.setcookie

Blocking Endpoints

It is possible to block endpoints that are not required to save bandwidth. See an example of this below:

// node.js puppeteer 
// connect to a remote browser... const blockedUrls = ['*doubleclick.net*]; const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Network.enable'); await client.send('Network.setBlockedURLs', {urls: blockedUrls}); await page.goto('https://washingtonpost.com');

Country Targeting

When using the Scraping Browser, the same country-targeting parameter is available to use as in our other proxy products. 

When setting up your script, add the -country flag, after your "USER" credentials within the Bright Data endpoint, followed by the 2-letter ISO code for that country. 

const SBR_WS_ENDPOINT = `wss://${USER-country-us:PASS}@brd.superproxy.io:9222`;

In the example above, we added -country-us to the Bright Data endpoint within our script, so our request will originate from the United States ("us").

EU region

You can target the entire European Union region in the same manner as "Country" above by adding "eu" after "country" in your request: -country-eu

Requests sent using -country-eu, will use IPs from one of the countries below which are included automatically within "eu":

AL, AZ, KG, BA, UZ, BI, XK, SM, DE, AT, CH, UK, GB, IE, IM, FR, ES, NL, IT, PT, BE, AD, MT, MC, MA, LU, TN, DZ, GI, LI, SE, DK, FI, NO, AX, IS, GG, JE, EU, GL, VA, FX, FO

Note: The allocation of a country within the EU is random. 

Bright Data CDP Events

CAPTCHA Solver - Automatic solving status

When navigating a page with Scraping Browser, our integrated CAPTCHA solver automatically solves all CAPTCHAs by default. You can monitor this auto-solving process in your code with the following custom CDP events.

Note: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default.

Puppeteer & Playwright 

Captcha.solve

Use this command to return the status after the captcha was solved, failed, or not detected. 

Captcha.solve({
  detectTimeout?: number // Detect timeout in millisecond for solver to detect captcha
  options?: CaptchaOptions[] // Configuration options for captcha solving
}) : SolveResult
SolveResult : {
  status: SolveStatus // Detect and solve status
  type?: string // Detected captcha type
  error?: string // Error if captcha was not solved
}
SolveStatus : string enum {
  "not_detected" // Captcha was not detected
  "solve_finished" // Captcha successfully solved
  "solve_failed" // Captcha detected, but was not solved
  "invalid" // Something goes wrong
}

Examples

// NodeJS - puppeteer 
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await page.goto('https://site-with-captcha.com');
// Note 1: If no captcha was found it will return not_detected status after detectTimeout
// Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default
const client = await page.target().createCDPSession();
const {status} = await client.send('Captcha.solve', {detectTimeout: 30*1000});
console.log(`Captcha solve status: ${status}`)
# python - playwright
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await page.goto('https://site-with-captcha.com')
# Note 1: If no captcha was found it will return not_detected status after detectTimeout
# Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default
client = await page.context.new_cdp_session(page)
solve_result = await client.send('Captcha.solve', { 'detectTimeout': 30*1000 })
status = solve_result['status']
print(f'Captcha solve status: {status}')

Note: If CAPTCHA-solving fails, please attempt a retry. If the issue persists, submit a support request detailing the specific problem you encountered.

Additional custom CDP commands for CAPTCHA status

Use the commands below to pinpoint a more specific stage in the CAPTCHA solving flow:

Captcha.detected

Scraping Browser has encountered a CAPTCHA and has begun to solve it
Captcha.solveFinished Scraping Browser successfully solved the CAPTCHA

Captcha.solveFailed

Scraping Browser failed in solving the CAPTCHA

Examples

The following node.js code sets up a CDP session, listens for CAPTCHA events, and handles timeouts:

// Node.js - Puppeteer - waiting for CAPTCHA solving events
const
client = await page.target().createCDPSession();
await new Promise((resolve, reject)=>{
  client.on('Captcha.solveFinished', resolve);
  client.on('Captcha.solveFailed', ()=>reject(new Error('Captcha failed')));
  setTimeout(reject, 5 * 60 * 1000, new Error('Captcha solve timeout'));
});

The following python code sets up a CDP session and listens for CAPTCHA-related events:

# Python - Playwright - waiting for CAPTCHA solving events
client
= await page.context.new_cdp_session(page)
client.on('Captcha.detected', lambda c: print('Captcha detected', c))
client.on('Captcha.solveFinished', lambda _: print('Captcha solved!')) 
client.on('Captcha.solveFailed', lambda _: print('Captcha failed!'))

WebDriver (selenium)

WebDriver doesn't support asynchronous server-driven events like the above libraries, so to achieve a similar result you will need to instead call the CDP command: Captcha.waitForSolve 

The "Captcha.waitForSolve" command waits for Scraping Browser's CAPTCHA solver to finish.

# Python Selenium - Waiting for Captcha to auto-solve after navigate
driver.execute('executeCdpCommand', {
    'cmd': 'Captcha.waitForSolve',
    'params': {},
})

CAPTCHA Solver - Manual Control

If you would like to either manually configure or fully disable our default CAPTCHA solver and instead call the solver manually or solve on your own, see the following CDP commands and functionality.

Captcha.setAutoSolve

This command is used to control the auto-solving of a CAPTCHA. You can disable auto-solve or configure algorithms for different CAPTCHA types and manually trigger this:

Captcha.setAutoSolve({
  autoSolve: boolean // Whether to automatically solve captcha after navigate
  options?: CaptchaOptions[] // Configuration options for captcha auto-solving
}) : void
CaptchaOptions : {
  type: string // Captcha type
  disabled?: boolean // Disable detect and solve for specified captcha
  ... // additinal options related to captcha type
}

Examples of CDP commands to disable auto-solver completely within the session: 

// Node.js Puppeteer - Disable Captcha auto-solver completely
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', { autoSolve: false })
# Python Playwright - Disable Captcha auto-solver completely
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {'autoSolve': False}):
# Python Selenium - Disable Captcha auto-solver completely
driver.execute('executeCdpCommand', {
   'cmd': 'Captcha.setAutoSolve',
   'params': {'autoSolve': False},
})

Disable auto-solver for a specific CAPTCHA type only - Examples

// Node.js Puppeteer - Disable Captcha auto-solver for ReCaptcha only
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', {
    autoSolve: true,
   options: [{
        type: 'usercaptcha',
       disabled: true,
   }],
});
# Python Playwright - Disable Captcha auto-solver for ReCaptcha only
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {
    'autoSolve': True,
    'options': [{
        'type': 'usercaptcha',
        'disabled': True,
   }],
})

Manually solving CAPTCHAs - Examples

// Node.js Puppeteer - manually solving CAPTCHA after navigation
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', { autoSolve: false });
await page.goto('https://site-with-captcha.com', { timeout: 2*60*1000 });
const {status} = await client.send('Captcha.solve', { detectTimeout: 30*1000 });
console.log('Captcha solve status:', status);
# Python Playwright- manually solving CAPTCHA after navigation
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {'autoSolve': False})
await page.goto('https://site-with-captcha.com', timeout=2*60_000)
solve_result = await client.send('Captcha.solve', {'detectTimeout': 30_000})
print('Captcha solve status:', solve_result['status'])
# Python Selenium - manually solving CAPTCHA after navigation
driver.execute('executeCdpCommand', {
   'cmd': 'Captcha.setAutoSolve',
   'params': {'autoSolve': False},
})
driver.get('https://site-with-captcha.com')
solve_result = driver.execute('executeCdpCommand', {
   'cmd': 'Captcha.solve',
   'params': {'detectTimeout': 30_000},
})
print('Captcha solve status:', solve_result['value']['status'])

CaptchaOptions

For the following three CAPTCHA types (cf_challenge, captcha, usercaptcha) we support the following additional options to control and configure our auto-solving algorithm. 

cf_challenge

timeout: 40000
selector: '#challenge-body-text, .challenge-form'
check_timeout: 300
error_selector: '#challenge-error-title'
success_selector: '#challenge-success[style*=inline]'
check_success_timeout: 300
btn_selector: '#challenge-stage input[type=button]'
cloudflare_checkbox_frame_selector: '#turnstile-wrapper iframe'
checkbox_area_selector: '.ctp-checkbox-label .mark'
wait_timeout_after_solve: 500
wait_networkidle: {timeout: 500}

hcaptcha

detect_selector:
 '#cf-hcaptcha-container, #challenge-hcaptcha-wrapper .hcaptcha-box, .h-captcha'
pass_proxy: true
submit_form: true
submit_selector: '#challenge-form body > form[action*="internalcaptcha/captchasubmit"]
value_selector: '.h-captcha textarea[id^="h-captcha-response"]'

usercaptcha (reCAPTCHA)

{ // configuration keys and default values for reCAPTCHA (type=usercaptcha)
 type: 'usercaptcha',
 // selector to retrieve sitekey and/or action
 selector: '.g-recaptcha, .recaptcha',
 // attributes to search for sitekey
 sitekey_attributes: ['data-sitekey', 'data-key'],
 // attributes to search for action
 action_attributes: ['data-action'],
 // detect selectors
 detect_selector: `
   .g-recaptcha[data-sitekey] > *,
   .recaptcha > *,
   iframe[src*="www.google.com/recaptcha/api2"],
   iframe[src*="www.recaptcha.net/recaptcha/api2"],
   iframe[src*="www.google.com/recaptcha/enterprise"]`,
 // element to type response code into
 reponse_selector: '#g-recaptcha-response, .g-recaptcha-response',
 // should solver submit form automatically after captcha solved
 submit_form: true,
 // selector for submit button
 submit_selector: '[type=submit]',
}

How to integrate Scraping Browser with .NET Puppeteer Sharp

Integration with the Scraping browser product with C# requires patching the PuppeteerSharp library to add support for websocket authentication. This can be done like the following:

using PuppeteerSharp;
using System.Net.WebSockets;
using System.Text;

// Set the authentication credentials
var auth = "USER:PASS";
// Construct the WebSocket URL with authentication
var ws = $"wss://{auth}@zproxy.lum-superproxy.io:9222";
// Custom WebSocket factory function
async Task<WebSocket> ws_factory(Uri url, IConnectionOptions options, CancellationToken cancellationToken)

{
   // Create a new ClientWebSocket instance
   var socket = new ClientWebSocket();
   // Extract the user information (username and password) from the URL
   var user_info = url.UserInfo;
   if (user_info != "")
   {
       // Encode the user information in Base64 format
       var auth = Convert.ToBase64String(Encoding.Default.GetBytes(user_info));
       // Set the "Authorization" header of the WebSocket options with the encoded credentials
       socket.Options.SetRequestHeader("Authorization", $"Basic {auth}");
   }

    // Disable the WebSocket keep-alive interval
   socket.Options.KeepAliveInterval = TimeSpan.Zero;
   // Connect to the WebSocket endpoint
   await socket.ConnectAsync(url, cancellationToken);
   return socket;
}

// Create ConnectOptions and configure the options
var options = new ConnectOptions()

{
   // Set the BrowserWSEndpoint to the WebSocket URL
   BrowserWSEndpoint = ws,
   // Set the WebSocketFactory to the custom factory function
   WebSocketFactory = ws_factory,
};

// Connect to the browser using PuppeteerSharp
Console.WriteLine("Connecting to browser...");

using (var browser = await Puppeteer.ConnectAsync(options))
{
   Console.WriteLine("Connected! Navigating...");
   // Create a new page instance
   var page = await browser.NewPageAsync();
   // Navigate to the specified URL
   await page.GoToAsync("https://example.com");
   Console.WriteLine("Navigated! Scraping data...");
   // Get the content of the page
   var content = await page.GetContentAsync();
   Console.WriteLine("Done!");
   Console.WriteLine(content);
}




Was this article helpful?