Interaction Functions

Overview

This article lists and explains the available commands within the Interaction code for writing a scraper using the IDE.

Commands marked with a star are proprietary functions developed by Bright Data.

 

bad_input

Mark the collector input as bad. Will prevent any crawl retries (error_code=bad_input)

bad_input();
bad_input('Missing search term');

 

blocked

Mark the page as failed because of the website refusing access (error_code=blocked)

blocked();
blocked('Login page was shown');

 

⭐bounding_box

The box of coordinates that describes the area of an element (relative to the page, not the browser viewport). Only the first element matched will be measured

let box = bounding_box('.product-list');
// box == {
//   top: 10,
//   right: 800,
//   bottom: 210,
//   left: 200,
//   x: 200,
//   y: 10,
//   width: 600,
//   height: 200,
// }
  • Selector: A valid CSS selector for the element

 

⭐browser_size

Returns current browser window size

TBD

 

⭐capture_graphql

Capture and replay graphql requests with changed variables

let q = capture_graphql({
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when
    // graphql endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b/ // default
});
navigate('https://example.com');
let [first_query, first_response] = q.wait_captured();
collect(first_response.data.profile);
let second = q.replay({
    variables: {other_id: 2},
});
collect(second.data.profile);

////////////

let q = capture_graphql({
    payload: {id: 'ProfileQuery'},
    // you may need to pass url opt as RegExp in case when
    // graphql endpoint is not "*/graphql" which is default value
    // url: /\bgraphql\b/ // default
});
navigate('https://example.com');
if (!q.is_captured())
    click('#load_more');
let [first_query, first_response] = q.wait_captured();
collect(first_response.data.profile);
let second = q.replay({
    variables: {other_id: 2},
});
collect(second.data.profile);
  • options: Params to control graphql request to capture
    • url
    • payload

 

⭐click

Click on an element (will wait for the element to appear before clicking on it)

click(<selector>);
click('#show-more');
$('#show-more').click()
// Click the closest match to the passed coordinates
// (relative to the page).
// For example, clicking the center pin in a map
let box = bounding_box('#map')
let center = {x: (box.left+box.right)/2, y: (box.top+box.bottom)/2};
click('.map-pin', {coordinates: center});
  • selector: Element selector

 

⭐close_popup

Popups can appear at any time during a crawl and it's not always clear when you should be waiting for or closing them. Add close_popup() at the top of your code to add a background watcher that will close the popup when it appears. If a popup appears multiple times, it will always be closed

close_popup('.popup', '.popup_close');
close_popup('iframe.with-popup', '.popup_close', {click_inside: 'iframe.with-popup'});
  • popup selector: A valid CSS selector
  • close selector: A valid CSS selector
  • options: click_inside: selector of parent iframe which contains close button selector

 

collect

Adds a line of data to the dataset created by the crawler

collect(<data_line>[, <validate_fn>]);
collect({price: data.price});
collect(product, p=>{
    if (!p.title)
        throw new Error('Product is missing a title');
})
  • data_line: A object with the fields you want to collect
  • validate_fn: Optional function to check that the line data is valid

 

console

Log messages from the interaction code

console.log(1, 'brightdata', [1, 2], {key: value});
console.error(1, 'brightdata', [1, 2], {key: value});

 

country

Configure your crawl to run from a specific country

country(<code>);
country('us');
  • code: 2-character ISO country code

 

dead_page

Mark a page as a dead link so you can filter it from your future collections (error_code=dead_page)

dead_page();
dead_page('Product was removed');

 

⭐detect_block

Detects a block on the page

detect_block({selector: '.foo'}, {exists: true});
detect_block({selector: '.bar'}, {has_text: 'text'});
detect_block({selector: '.baz'}, {has_text: /regex_pattern/});
  • resource: An object specifying the resource required for the detection
    • Selector
  • condition: An object specifying how the resource should be processed for detection
    • exists
    • has_text

 

⭐disable_event_listeners

Stop all event listeners on the page from running. track_event_listeners() must have been called first

disable_event_listeners();
disable_event_listeners(['hover', 'click']);
  • event_types: Specific event types that should be disabled

 

el_exists

Check if an element exists on page, and return a boolean accordingly

el_exists('#example'); // => true
el_exists('.does_not_exist'); // => false
el_exists('.does_not_exist', 5e3); // => false (after 5 seconds)
  • selector: Valid CSS selector
  • timeout: Timeout duration to wait for the element to appear on the page

 

el_is_visible

Check if element is visible on page

el_is_visible('#example');
el_is_visible('.is_not_visible', 5e3); // false (after 5 seconds)
  • selector: Valid CSS selector
  • timeout: Timeout duration to wait for the element to be visible on the page

 

embed_html_comment

Add a comment in the page HTML. Can be used to embed metadata inside HTML snapshots.

embed_html_comment('trace-id: asdf123');
  • comment: Body of the comment

 

⭐font_exists

Assert the capability of the browser to render the given font family on the page

font_exists(<font-family>);
font_exists('Liberation Mono');

 

⭐freeze_page

Force the page to stop making changes. This can be used to save the page in a particular state so page snapshots that run after crawl won't see a different page state than you see now. This command is experimental. If you see problems, please report them to support

freeze_page();

 

⭐hover

hover on an element (will wait for the element to appear before hovering on it)

hover(<selector>);
hover('#item');
  • selector: Element selector

 

⭐html_capture_options

Influence the process of the HTML capturing

html_capture_options({
    coordinate_attributes: true,
});
  • options: An object which accepts options defining how HTML capturing should be processed
    • coordinate_attributes

 

Image

Collect image data

let i = new Image('https://example.com/image.png');
collect({image: i});
  • src: Image URL or data:image URI string

 

input

Global object available to the interaction code. Provided by trigger input or next_stage() calls

navigate(input.url);

 

job

Global object available to the interaction code. Provided by trigger input or next_stage() calls

let {created} = job;

 

load_html

Load html and return Cheerio instance

let $$ = load_html('<p id="p1">p1</p><p id="p2">p2</p>');
collect({data: $$('#p2').text()});
  • html: Any HTML string

 

⭐load_more

Scroll to the bottom of a list to trigger loading more items. Useful for lazy-loaded infinite-scroll sites

load_more(<selector>);
load_more('.search-results');
load_more('.search-results', {children: '.result-item', trigger_selector: '.btn-load-more', timeout: 10000});
  • selector: Selector for the element that contains the lazy-loaded items

 

load_sitemap

Read a list of urls from a sitemap xml (supports sitemap indexes, and .gz compressed sitemaps. see examples.)

 

let {pages} = load_sitemap({url: 'https://example.com/sitemap.xml.gz'});

let {children} = load_sitemap({url: 'https://example.com/sitemap-index.xml'});

 

location

Object with info about current location. Available fields: href

navigate('https://example.com');
location.href; // "https://example.com/"

 

Money

Collect price/money data

let p = new Money(10, 'USD');
collect({product_price: p});
  • value: Amount of money
  • currency: Currency code

 

⭐mouse_to

Move the mouse to the specified (x,y) position

mouse_to(<x>, <y>);
mouse_to(0, 0);
  • x: Target x position
  • y: Target y position

 

navigate

Navigate the browser to a URL

navigate(<url>);
navigate(input.url);
navigate('https://example.com');

// waits until DOM content loaded event is fired in the browser
navigate(<url>, {wait_until: 'domcontentloaded'}); 

// adds a referer to the navigation
navigate(<url>, {referer: <url>}); 

// the number of milliseconds to wait for. Default is 30000 ms
navigate(<url>, {timeout: 45000}); 

// Don't throw an error if this URL sends a 404 status code
navigate(<url>, {allow_status: [404]});

// Specify browser width/height
navigate(<url>, {
    fingerprint: {screen: {width: 400, height: 400}},
});
  • A 404 status code will throw a dead_page error by default. Use opt.allow_status to override this
  • url: A URL to navigate to
  • opt: navigate options (see examples)

 

next_stage

Run the next stage of the crawler with the specified input

next_stage({url: 'http://example.com', page: 1});
  • input: Input object to pass to the next browser session

 

parse

Parse the page data

let page_data = parse();
collect({
    title: page_data.title,
    price: page_data.price,
});

 

preserve_proxy_session

Preserve proxy session across children of this page

preserve_proxy_session();

 

⭐press_key

Type special characters like Enter or Backspace in the currently focused input (usually used after typing something in a search box)

press_key('Enter');
press_key('Backspace');

 

⭐proxy_location

Configure your crawl to run from a specific location. Unless you need high resolution control over where your crawl is running from, you probably want to use `country(code)` instead

proxy_location({country: 'us'});

// lat in range: [-85, 85], long in range: [-180, 180]
proxy_location({lat: 37.7749, long: 122.4194}); 

// radius in km
proxy_location({lat: 37.7749, long: 122.4194, country: 'US', radius: 100}); 
  • configuration: Object with a desired proxy location, check examples for more info

 

⭐redirect_history

Returns history of URL redirects since last navigate

navigate('http://google.com');
let redirects = redirect_history();
// returns:
// [
//   'http://google.com',
//   'http://www.google.com',
//   'https://www.google.com/',
// ]

 

rerun_stage

Run this stage of the crawler again with new input

rerun_stage({url: 'http://example.com/other-page'});

 

resolve_url

Returns the final URL that the given url argument leads to

let {href} = parse().anchor_elem_data;
collect({final_url: resolve_url(href)});
  • url: URL string/instance

 

response_headers

Returns the response headers of the last page load

let headers = response_headers();
console.log('content-type', headers['content-type']);

 

request

Make a direct HTTP request

let res = request('http://www.example.com');
let res = request({url: 'http://www.example.com', method: 'POST', headers: {'Content-type': 'application/json'}, body: {hello: 'world'}})
  • url | options: the url to make the request to, or request options (see examples)

 

⭐right_click

The same as click but use right mouse button instead (will wait for the element to appear before clicking on it)

right_click(<selector>);
right_click('#item');
  • selector: Element selector

 

run_stage

Run a specific stage of the crawler with a new browser session

run_stage(2, {url: 'http://example.com', page: 1});
  • stage: Which stage to run (1 is first stage)
  • input: Input object to pass to the next browser session

 

⭐scroll_to

Scroll the page so that an element is visible.If you're doing this to trigger loading some more elements from a lazy loaded list, use load_more(). Defaults to scrolling in a natural way, which may take several seconds. If you want to jump immediatley, use {immediate: true}

scroll_to(<selector>);
scroll_to('.author-profile');
scroll_to('top'); // scroll to the top of the page
scroll_to('bottom'); // scroll to the bottom of the page
scroll_to('top', {immediate: true}); // jump to top of page immediately
  • selector: Selector of the element you want to scroll to

 

⭐scroll_to_all

Scroll through the page so that all the elements matching the selector will be visible on screen

scroll_to_all(<selector>);
scroll_to_all('.author-profiles');
  • selector: Selector of the elements you want to scroll through

 

⭐select

Pick a value from a select element

select(<select>, <value>);
select('#country', 'Canada');
  • selector: Element selector

 

set_lines

An array of lines to add to your dataset at the end of this page crawl. Each call to set_lines() will override previous ones, and only the last set of lines will be added into the dataset (tracked per page crawl). This is a good fit when the collector is set to collect partial on errors. You can keep calling set_lines() with the data you gathered so far, and the last call will be used if the page crawl throws an error

set_lines(<data_line>[, <validate_fn>]);
set_lines(products_so_far);
set_lines(products_so_far, i=>{
    if (!i.price)
        throw new Error('Missing price');
});
  • lines: An array of data lines to add to your final dataset
  • validate_fn: Optional function to check that the line data is valid (run once per line)

 

set_session_cookie

Sets a cookie with the given cookie data; may overwrite equivalent cookies if they exist

set_session_cookie(<domain>, <name>, <value>);

 

set_session_headers

Set extra headers for all the HTTP requests

set_session_headers({'HEADER_NAME': 'HEADER_VALUE'});
  • headers: Object with extra headers in key-value format

 

⭐solve_captcha

Solve any captchas shown on the page

solve_captcha();
solve_captcha({type: 'simple', selector: '#image', input: '#input'});

 

status_code

Returns the status code of the last page load

collect({status_code: status_code()});

 

⭐tag_all_responses

Save the responses from all browser request that match

tag_all_responses(<field>, <pattern>, <options>);
tag_all_responses('resp', /url/, {jsonp: true});
tag_all_responses('resp', /url/, {allow_error: true});
tag_all_responses('profiles', /\/api\/profile/);
navigate('https://example.com/sports');
let profiles = parse().profiles;
for (let profile of profiles)
    collect(profile);
  • field: The name of the tagged field
  • pattern: The URL pattern to match
  • options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible

 

⭐tag_download

Allows to get files downloaded by browser

        let SEC = 1000;
        let download = tag_download(/example.com\/foo\/bar/);
        click('button#download');
        let file1 = download.next_file({timeout: 10*SEC});
        let file2 = download.next_file({timeout: 20*SEC});
        collect({file1, file2});
  • url: A pattern or a string to match requests against

 

⭐tag_image

Save the image url from an element

tag_image(field, selector);
tag_image('image', '#product-image');
  • field: The name of the tagged field
  • selector: A valid CSS selector

 

⭐tag_response

Save the response data from a browser request

tag_response(<field>, <pattern>, <options>);
tag_response('resp', /url/, {jsonp: true});
tag_response('resp', /url/, {allow_error: true});
tag_response('resp', (req, res)=>{
            if (req.url.includes('/api/'))
            {
                let request_body = req.body;
                let request_headers = req.headers;
                let response_body = res.body;
                let response_headers = res.headers;
            }
        });

tag_response('teams', /\/api\/teams/);
navigate('https://example.com/sports');
let teams = parse().teams;
for (let team of teams)
    collect(team);
  • name: The name of the tagged field
  • pattern: The URL pattern to match
  • options: Set options.jsonp=true to parse response bodies that are in jsonp format. This will be automatically detected when possible

 

⭐tag_screenshot

Save a screenshot of the page HTML

enshot(<field>, <options>);
tag_screenshot('html_screenshot', {filename: 'screen'});
tag_screenshot('view', {full_page: false}); // full_page defaults to true
  • field: The name of the tagged field
  • options: Download options (see example)

 

⭐tag_script

Extract some JSON data saved in a script on the page

tag_script(<field>, <selector>);
tag_script('teams', '#preload-data');
tag_script('ssr_state', '#__SSR_DATA__');
navigate('https://example.com/');
collect(parse().ssr_state);
  • name: The name of the tagged script
  • selector: The selector of the script to tag

 

⭐tag_serp

Parse the current page as a search engine result page

tag_serp('serp_bing_results', 'bing')
tag_serp('serp_google_results', 'google')
  • field: The name of the tagged field
  • type: Parser type: (e.g. bing, google)

 

⭐tag_video

Save the video url from an element

tag_video(field, selector);
tag_video('video', '#product-video', {download: true});
  • field: The name of the tagged field
  • selector: A valid CSS selector
  • opt: download options (see example)

 

⭐tag_window_field

Tag a javascript value from the browser page

tag_window_field(<field>, <key>);
tag_window_field('initData', '__INIT_DATA__');
  • field: The path to the relevant data

 

⭐track_event_listeners

Start tracking the event listeners that the browser creates. It's needed to run disable_event_listeners() later

track_event_listeners();

 

⭐type

Enter text into an input (will wait for the input to appear before typing)

type(<selector>, <text>);
type('#location', 'New York');

// replacing text in input if it is not empty
type(<selector>, <text>, {replace: true}); 

// type text to an element with id ending "input-box" (e.g. <input id="c2E57-input-box">)
type('[id$=input-box]', <text>); 

// dispatching 'Enter' key press
type(<selector>, ['Enter']); 

// typing text and then dispatching 'Enter' key press
type(<selector>, ['Some text', 'Enter']); 

// deleting 1 char from input
type(<selector>, ['Backspace']); 
  • selector: Element selector
  • text: Text to enter

 

URL

URL class from NodeJS standard "url" module

let u = new URL('https://example.com');
  • url: URL string

 

⭐verify_requests

Monitor failed requests with a callback function

verify_requests(({url, error, type, response})=>{
    if (response.status!=404 && type=='Font')
        throw new Error('Font failed to load');
});
  • callback: A function which will be called on each failed request with an object in format: {url, error, type, response}

 

Video

Collect video data

let v = new Video('https://example.com/video.mp4');
collect({video: v});
  • src: Video URL

 

⭐wait

Wait for an element to appear on the page

wait(<selector>);
wait('#welcome-splash');
wait('.search-results .product');
wait('[href^="/product"]');

// the number of milliseconds to wait for. Default is 30000 ms
wait(<selector>, {timeout: 5000}); 

// wait for element to be hidden
wait(<selector>, {hidden: true}); 

// wait for element inside in an iframe
wait(<selector>, {inside: '#iframe_id'}); 
  • selector: Element selector
  • opt: wait options (see examples)

 

⭐wait_any

Wait for any matching condition to succeed

wait_any(['#title', '#notfound']);

 

wait_for_parser_value

Wait for a parser field to contain a value. This can be useful after you click something to wait for some data to appear

wait_for_parser_value(<field>[, <validate_fn>][, opt]);
wait_for_parser_value('profile');
wait_for_parser_value('listings.0.price', v=>{
            return parseInt(v)>0;
        }, {timeout: 5000});
  • field: The parser value path to wait on
  • validate_fn: An optional callback function to validate that the value is correct
  • opt: Extra options (e.g. timeout)

 

⭐wait_for_text

Wait for an element on the page to include some text

wait_for_text(<selector>, <text>);
wait_for_text('.location', 'New York');
  • selector: Element selector
  • text: The text to wait for

 

⭐wait_hidden

Wait for an element to not be visible on the page (removed or hidden)

wait_hidden(<selector>);
wait_hidden('#welcome-splash');
wait_hidden(<selector>, {timeout: 5000});
  • selector: Element selector

 

⭐wait_network_idle

Wait the browser network has been idle for a given time

wait_network_idle();
wait_network_idle({
    timeout: 1e3,
    ignore: [/long_request/, 'https://example.com'],
});
  • timeout: Wait for browser network to be idle for X milliseconds
  • options: ignore: an array of patterns to exclude requests from monitoring timeout: how long the network needs to be idle in milliseconds (default 500)

 

⭐wait_page_idle

Wait until no changes are being made on the DOM tree for a given time

wait_page_idle();
wait_page_idle({
    ignore: [<selector1>, <selector2>],
    idle_timeout: 1000,
});
  • timeout: Milliseconds to wait for no changes
  • options: An object, which can accept a ignore argument to exclude some elements from monitoring

 

⭐wait_visible

Wait for an element to be visible on the page

wait_visible(<selector>);
wait_visible('#welcome-splash');
wait_visible(<selector>, {timeout: 5000});
  • selector: Element selector

 

$

Helper for jQuery-like expressions

$(<selector>);
wait($('.store-card'))
  • selector: Element selector

 

⭐emulate_device

View pages as a mobile device. This command will change user agent and screen parameters (resolution and device pixel ratio)

emulate_device('iPhone X');
emulate_device('Pixel 2');
  • device: A string with the name of device
  • here is the full list of device names
    • Blackberry PlayBook
    • Blackberry PlayBook landscape
    • BlackBerry Z30
    • BlackBerry Z30 landscape
    • Galaxy Note 3
    • Galaxy Note 3 landscape
    • Galaxy Note II
    • Galaxy Note II landscape
    • Galaxy S III
    • Galaxy S III landscape
    • Galaxy S5
    • Galaxy S5 landscape
    • Galaxy S8
    • Galaxy S8 landscape
    • Galaxy S9+
    • Galaxy S9+ landscape
    • Galaxy Tab S4
    • Galaxy Tab S4 landscape
    • iPad
    • iPad landscape
    • iPad (gen 6)
    • iPad (gen 6) landscape
    • iPad (gen 7)
    • iPad (gen 7) landscape
    • iPad Mini
    • iPad Mini landscape
    • iPad Pro
    • iPad Pro landscape
    • iPad Pro 11
    • iPad Pro 11 landscape
    • iPhone 4
    • iPhone 4 landscape
    • iPhone 5
    • iPhone 5 landscape
    • iPhone 6
    • iPhone 6 landscape
    • iPhone 6 Plus
    • iPhone 6 Plus landscape
    • iPhone 7
    • iPhone 7 landscape
    • iPhone 7 Plus
    • iPhone 7 Plus landscape
    • iPhone 8
    • iPhone 8 landscape
    • iPhone 8 Plus
    • iPhone 8 Plus landscape
    • iPhone SE
    • iPhone SE landscape
    • iPhone X
    • iPhone X landscape
    • iPhone XR
    • iPhone XR landscape
    • iPhone 11
    • iPhone 11 landscape
    • iPhone 11 Pro
    • iPhone 11 Pro landscape
    • iPhone 11 Pro Max
    • iPhone 11 Pro Max landscape
    • iPhone 12
    • iPhone 12 landscape
    • iPhone 12 Pro
    • iPhone 12 Pro landscape
    • iPhone 12 Pro Max
    • iPhone 12 Pro Max landscape
    • iPhone 12 Mini
    • iPhone 12 Mini landscape
    • iPhone 13
    • iPhone 13 landscape
    • iPhone 13 Pro
    • iPhone 13 Pro landscape
    • iPhone 13 Pro Max
    • iPhone 13 Pro Max landscape
    • iPhone 13 Mini
    • iPhone 13 Mini landscape
    • JioPhone 2
    • JioPhone 2 landscape
    • Kindle Fire HDX
    • Kindle Fire HDX landscape
    • LG Optimus L70
    • LG Optimus L70 landscape
    • Microsoft Lumia 550
    • Microsoft Lumia 950
    • Microsoft Lumia 950 landscape
    • Nexus 10
    • Nexus 10 landscape
    • Nexus 4
    • Nexus 4 landscape
    • Nexus 5
    • Nexus 5 landscape
    • Nexus 5X
    • Nexus 5X landscape
    • Nexus 6
    • Nexus 6 landscape
    • Nexus 6P
    • Nexus 6P landscape
    • Nexus 7
    • Nexus 7 landscape
    • Nokia Lumia 520
    • Nokia Lumia 520 landscape
    • Nokia N9
    • Nokia N9 landscape
    • Pixel 2
    • Pixel 2 landscape
    • Pixel 2 XL
    • Pixel 2 XL landscape
    • Pixel 3
    • Pixel 3 landscape
    • Pixel 4
    • Pixel 4 landscape
    • Pixel 4a (5G)
    • Pixel 4a (5G) landscape
    • Pixel 5
    • Pixel 5 landscape
    • Moto G4
    • Moto G4 landscape







Was this article helpful?