Scraping Browser supports CDP so all puppeteer functions/features work within our browsers. You can find all puppeteer API documentation and usage examples on the official puppeteer documentation page. We have also added a few Bright Data-specific CDP events which can be useful as well.
The following are a few common feature examples to get you started.
Common Browser Navigation Functions
Get page HTML
// node.js puppeteer
const page = await browser.newPage();
await page.goto('https://example.com');
const html = await page.content();
For more info: https://pptr.dev/api/puppeteer.page.content
Click on element
// node.js puppeteer
const page = await page.newPage();
await page.goto('https://example.com');
await page.click('a[href]');
For more info: https://pptr.dev/api/puppeteer.page.click
Scroll to page bottom
You might need to scroll the viewport to the bottom at times, such as when activating 'infinite scroll'. Here's how:
// node.js puppeteer
const page = await page.newPage();
await page.goto('https://example.com');
await page.evaluate(()=>window.scrollBy(0, window.innerHeight));
Take a screenshot
// node.js puppeteer - Taking screenshot to file screenshot.png
// More info at https://pptr.dev/api/puppeteer.page.screenshot
await page.screenshot({ path: 'screenshot.png', fullPage: true });
# python playwright - Taking screenshot to file screenshot.png
# More info at https://playwright.dev/python/docs/screenshots
await page.screenshot(path='screenshot.png', full_page=True)
// C# PuppeteerSharp - Taking screenshot to file screenshot.png
await page.ScreenshotAsync("screenshot.png", new ()
{
FullPage = true,
});
When running the example scripts above the screenshot above will be saved as “screenshot.png” within your files.
Set Cookies for your targeted domain
Please note that this is supported for KYC-approved customers only.
// node.js puppeteer
const page = await browser.newPage();
await page.setCookie({name: 'LANG', value: 'en-US', domain: 'example.com'});
await page.goto('https://example.com');
For more info: https://pptr.dev/api/puppeteer.page.setcookie
Blocking Endpoints
It is possible to block endpoints that are not required to save bandwidth. See an example of this below:
// node.js puppeteer
// connect to a remote browser... const blockedUrls = ['*doubleclick.net*]; const page = await browser.newPage(); const client = await page.target().createCDPSession(); await client.send('Network.enable'); await client.send('Network.setBlockedURLs', {urls: blockedUrls}); await page.goto('https://washingtonpost.com');
Country Targeting
When using the Scraping Browser, the same country-targeting parameter is available to use as in our other proxy products.
When setting up your script, add the -country
flag, after your "USER" credentials within the Bright Data endpoint, followed by the 2-letter ISO code for that country.
const SBR_WS_ENDPOINT = `wss://${USER-country-us:PASS}@brd.superproxy.io:9222`;
In the example above, we added -country-us
to the Bright Data endpoint within our script, so our request will originate from the United States ("us").
EU region
You can target the entire European Union region in the same manner as "Country" above by adding "eu" after "country" in your request: -country-eu
Requests sent using -country-eu
, will use IPs from one of the countries below which are included automatically within "eu":
AL, AZ, KG, BA, UZ, BI, XK, SM, DE, AT, CH, UK, GB, IE, IM, FR, ES, NL, IT, PT, BE, AD, MT, MC, MA, LU, TN, DZ, GI, LI, SE, DK, FI, NO, AX, IS, GG, JE, EU, GL, VA, FX, FO
Note: The allocation of a country within the EU is random.
Bright Data CDP Events
CAPTCHA Solver - Automatic solving status
When navigating a page with Scraping Browser, our integrated CAPTCHA solver automatically solves all CAPTCHAs by default. You can monitor this auto-solving process in your code with the following custom CDP events.
Note: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default.
Puppeteer & Playwright
Captcha.solve
Use this command to return the status after the captcha was solved, failed, or not detected.
Captcha.solve({
detectTimeout?: number // Detect timeout in millisecond for solver to detect captcha
options?: CaptchaOptions[] // Configuration options for captcha solving
}) : SolveResult
SolveResult : {
status: SolveStatus // Detect and solve status
type?: string // Detected captcha type
error?: string // Error if captcha was not solved
}
SolveStatus : string enum {
"not_detected" // Captcha was not detected
"solve_finished" // Captcha successfully solved
"solve_failed" // Captcha detected, but was not solved
"invalid" // Something goes wrong
}
Examples
// NodeJS - puppeteer const page = await browser.newPage(); const client = await page.target().createCDPSession(); await page.goto('https://site-with-captcha.com');
// Note 1: If no captcha was found it will return not_detected status after detectTimeout
// Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default
const client = await page.target().createCDPSession();
const {status} = await client.send('Captcha.solve', {detectTimeout: 30*1000});
console.log(`Captcha solve status: ${status}`)
# python - playwright
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await page.goto('https://site-with-captcha.com')
# Note 1: If no captcha was found it will return not_detected status after detectTimeout
# Note 2: Once a CAPTCHA is solved, if there is a form to submit, it will be submitted by default
client = await page.context.new_cdp_session(page)
solve_result = await client.send('Captcha.solve', { 'detectTimeout': 30*1000 })
status = solve_result['status']
print(f'Captcha solve status: {status}')
Note: If CAPTCHA-solving fails, please attempt a retry. If the issue persists, submit a support request detailing the specific problem you encountered.
Additional custom CDP commands for CAPTCHA status
Use the commands below to pinpoint a more specific stage in the CAPTCHA solving flow:
Captcha.detected |
Scraping Browser has encountered a CAPTCHA and has begun to solve it |
Captcha.solveFinished | Scraping Browser successfully solved the CAPTCHA |
Captcha.solveFailed |
Scraping Browser failed in solving the CAPTCHA |
Examples
The following node.js code sets up a CDP session, listens for CAPTCHA events, and handles timeouts:
// Node.js - Puppeteer - waiting for CAPTCHA solving events
const client = await page.target().createCDPSession();
await new Promise((resolve, reject)=>{
client.on('Captcha.solveFinished', resolve);
client.on('Captcha.solveFailed', ()=>reject(new Error('Captcha failed')));
setTimeout(reject, 5 * 60 * 1000, new Error('Captcha solve timeout'));
});
The following python code sets up a CDP session and listens for CAPTCHA-related events:
# Python - Playwright - waiting for CAPTCHA solving events
client = await page.context.new_cdp_session(page)
client.on('Captcha.detected', lambda c: print('Captcha detected', c))
client.on('Captcha.solveFinished', lambda _: print('Captcha solved!'))
client.on('Captcha.solveFailed', lambda _: print('Captcha failed!'))
WebDriver (selenium)
WebDriver doesn't support asynchronous server-driven events like the above libraries, so to achieve a similar result you will need to instead call the CDP command: Captcha.waitForSolve
The "Captcha.waitForSolve" command waits for Scraping Browser's CAPTCHA solver to finish.
# Python Selenium - Waiting for Captcha to auto-solve after navigate
driver.execute('executeCdpCommand', {
'cmd': 'Captcha.waitForSolve',
'params': {},
})
CAPTCHA Solver - Manual Control
If you would like to either manually configure or fully disable our default CAPTCHA solver and instead call the solver manually or solve on your own, see the following CDP commands and functionality.
Captcha.setAutoSolve
This command is used to control the auto-solving of a CAPTCHA. You can disable auto-solve or configure algorithms for different CAPTCHA types and manually trigger this:
Captcha.setAutoSolve({
autoSolve: boolean // Whether to automatically solve captcha after navigate
options?: CaptchaOptions[] // Configuration options for captcha auto-solving
}) : void
CaptchaOptions : {
type: string // Captcha type
disabled?: boolean // Disable detect and solve for specified captcha
... // additinal options related to captcha type
}
Examples of CDP commands to disable auto-solver completely within the session:
// Node.js Puppeteer - Disable Captcha auto-solver completely
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', { autoSolve: false })
# Python Playwright - Disable Captcha auto-solver completely
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {'autoSolve': False}):
# Python Selenium - Disable Captcha auto-solver completely
driver.execute('executeCdpCommand', {
'cmd': 'Captcha.setAutoSolve',
'params': {'autoSolve': False},
})
Disable auto-solver for a specific CAPTCHA type only - Examples
// Node.js Puppeteer - Disable Captcha auto-solver for ReCaptcha only
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', {
autoSolve: true,
options: [{
type: 'usercaptcha',
disabled: true,
}],
});
# Python Playwright - Disable Captcha auto-solver for ReCaptcha only
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {
'autoSolve': True,
'options': [{
'type': 'usercaptcha',
'disabled': True,
}],
})
Manually solving CAPTCHAs - Examples
// Node.js Puppeteer - manually solving CAPTCHA after navigation
const page = await browser.newPage();
const client = await page.target().createCDPSession();
await client.send('Captcha.setAutoSolve', { autoSolve: false });
await page.goto('https://site-with-captcha.com', { timeout: 2*60*1000 });
const {status} = await client.send('Captcha.solve', { detectTimeout: 30*1000 });
console.log('Captcha solve status:', status);
# Python Playwright- manually solving CAPTCHA after navigation
page = await browser.new_page()
client = await page.context.new_cdp_session(page)
await client.send('Captcha.setAutoSolve', {'autoSolve': False})
await page.goto('https://site-with-captcha.com', timeout=2*60_000)
solve_result = await client.send('Captcha.solve', {'detectTimeout': 30_000})
print('Captcha solve status:', solve_result['status'])
# Python Selenium - manually solving CAPTCHA after navigation
driver.execute('executeCdpCommand', {
'cmd': 'Captcha.setAutoSolve',
'params': {'autoSolve': False},
})
driver.get('https://site-with-captcha.com')
solve_result = driver.execute('executeCdpCommand', {
'cmd': 'Captcha.solve',
'params': {'detectTimeout': 30_000},
})
print('Captcha solve status:', solve_result['value']['status'])
CaptchaOptions
For the following three CAPTCHA types (cf_challenge, captcha, usercaptcha) we support the following additional options to control and configure our auto-solving algorithm.
cf_challenge
timeout: 40000
selector: '#challenge-body-text, .challenge-form'
check_timeout: 300
error_selector: '#challenge-error-title'
success_selector: '#challenge-success[style*=inline]'
check_success_timeout: 300
btn_selector: '#challenge-stage input[type=button]'
cloudflare_checkbox_frame_selector: '#turnstile-wrapper iframe'
checkbox_area_selector: '.ctp-checkbox-label .mark'
wait_timeout_after_solve: 500
wait_networkidle: {timeout: 500}
hcaptcha
detect_selector:
'#cf-hcaptcha-container, #challenge-hcaptcha-wrapper .hcaptcha-box, .h-captcha'
pass_proxy: true
submit_form: true
submit_selector: '#challenge-form body > form[action*="internalcaptcha/captchasubmit"]
value_selector: '.h-captcha textarea[id^="h-captcha-response"]'
usercaptcha (reCAPTCHA)
{ // configuration keys and default values for reCAPTCHA (type=usercaptcha)
type: 'usercaptcha',
// selector to retrieve sitekey and/or action
selector: '.g-recaptcha, .recaptcha',
// attributes to search for sitekey
sitekey_attributes: ['data-sitekey', 'data-key'],
// attributes to search for action
action_attributes: ['data-action'],
// detect selectors
detect_selector: `
.g-recaptcha[data-sitekey] > *,
.recaptcha > *,
iframe[src*="www.google.com/recaptcha/api2"],
iframe[src*="www.recaptcha.net/recaptcha/api2"],
iframe[src*="www.google.com/recaptcha/enterprise"]`,
// element to type response code into
reponse_selector: '#g-recaptcha-response, .g-recaptcha-response',
// should solver submit form automatically after captcha solved
submit_form: true,
// selector for submit button
submit_selector: '[type=submit]',
}
How to integrate Scraping Browser with .NET Puppeteer Sharp
Integration with the Scraping browser product with C# requires patching the PuppeteerSharp library to add support for websocket authentication. This can be done like the following:
using PuppeteerSharp;
using System.Net.WebSockets;
using System.Text;
// Set the authentication credentials
var auth = "USER:PASS";
// Construct the WebSocket URL with authentication
var ws = $"wss://{auth}@zproxy.lum-superproxy.io:9222";
// Custom WebSocket factory function
async Task<WebSocket> ws_factory(Uri url, IConnectionOptions options, CancellationToken cancellationToken)
{
// Create a new ClientWebSocket instance
var socket = new ClientWebSocket();
// Extract the user information (username and password) from the URL
var user_info = url.UserInfo;
if (user_info != "")
{
// Encode the user information in Base64 format
var auth = Convert.ToBase64String(Encoding.Default.GetBytes(user_info));
// Set the "Authorization" header of the WebSocket options with the encoded credentials
socket.Options.SetRequestHeader("Authorization", $"Basic {auth}");
}
// Disable the WebSocket keep-alive interval
socket.Options.KeepAliveInterval = TimeSpan.Zero;
// Connect to the WebSocket endpoint
await socket.ConnectAsync(url, cancellationToken);
return socket;
}
// Create ConnectOptions and configure the options
var options = new ConnectOptions()
{
// Set the BrowserWSEndpoint to the WebSocket URL
BrowserWSEndpoint = ws,
// Set the WebSocketFactory to the custom factory function
WebSocketFactory = ws_factory,
};
// Connect to the browser using PuppeteerSharp
Console.WriteLine("Connecting to browser...");
using (var browser = await Puppeteer.ConnectAsync(options))
{
Console.WriteLine("Connected! Navigating...");
// Create a new page instance
var page = await browser.NewPageAsync();
// Navigate to the specified URL
await page.GoToAsync("https://example.com");
Console.WriteLine("Navigated! Scraping data...");
// Get the content of the page
var content = await page.GetContentAsync();
Console.WriteLine("Done!");
Console.WriteLine(content);
}