Geo-targeting with Proxies: Extracting Location-Based Data using the Scraping Browser

Reading time: 16 min read
Marco Rodrigues
Written by
Marco Rodrigues

Updated · Aug 21, 2023

Marco Rodrigues
Software Engineer / Python | Joined July 2023 | LinkedIn
Marco Rodrigues

Meet Marco Rodrigues, a trailblazer with a Master's in Nanotechnology and Microelectronics. A Softwa... | See full bio

Florence Desiata
Edited by
Florence Desiata

Editor

Florence Desiata
Joined June 2023 | LinkedIn
Florence Desiata

Florence is a dedicated wordsmith on a mission to make technology-related topics easy-to-understand.... | See full bio

Techjury is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more.

Data has become the cornerstone of numerous industries today, empowering businesses with crucial insights and competitive advantages. Location-based data is precious in understanding consumer behaviour and optimizing marketing strategies.

Geo-targeting allows organizations to obtain this valuable location-specific data, enabling them to tailor their offerings to: 

  • suit regional preferences
  • target specific demographics
  • optimize advertising campaigns 

It enables access to content or data from various locations, allowing for comparative analysis. For example, if you reside in Germany and wish to examine price listings for a product, you can use geo-targeting to access and compare prices in the United States.

However, geo-targeting through web scraping is not without its challenges. Websites implement geolocation restrictions as a defensive measure to prevent data extraction from specific regions or IP addresses, limiting the scope of conventional scraping methods.

To circumvent these geolocation blocks, you’ll need to use a proxy. It works by providing scrapers with alternate IP addresses from different geographic locations.

Continue reading this article to learn more about: 

  • importance of geo-targeting
  • the role of proxies in extracting location-specific data
  • why the naive approach to proxies might end up hurting you in the long run. 

We'll also cover an advanced scraping solution, Bright Data's Scraping Browser, with sophisticated proxy management to enable uninterrupted and continuous scraping at scale, bypassing IP-based website blocks.

Let’s get into it.

Key Takeaways

 
  • Geo-targeting with proxies empowers businesses to access location-specific data, optimizing marketing strategies for regional preferences.
 
  • Proxies mask IP addresses, ensuring anonymity during scraping, overcoming geolocation blocks, and preventing IP bans.
 
  • Traditional proxy approaches can be time-consuming and unreliable, but Bright Data's Scraping Browser offers a seamless and scalable solution.
 
  • The Scraping Browser comes with a vast network of high-quality proxies from 195 countries, ensuring uninterrupted data extraction.
 
  • Leveraging location-based insights through Bright Data's advanced proxy infrastructure can drive business growth and improve customer experiences

How Geo-Targeting Enables Better Marketing Strategies

Geo-targeting is a game-changer for businesses aiming to connect with their audiences on a personal level. It allows companies to customize products, services, and marketing campaigns based on specific locations. 

This approach leads to better decision-making and improved customer experience because it helps businesses understand the following:

  • regional consumer preferences
  • cultural differences
  • local trends

Additionally, here are other advantages businesses can derive from geo-targeting: 

  • Geographical Market Research: Geo-targeting facilitates comprehensive geographical market research, allowing companies to identify untapped opportunities, assess competition, and make informed decisions about new markets.
  • Personalized Marketing Campaigns: With location-specific data, businesses can create targeted advertisements that resonate with customers' unique needs and interests in a region. 
  • Enhanced Local Business Visibility: Businesses can optimize their online presence with location-based data. This optimization can help potential customers find nearby businesses easily.

But as mentioned earlier, websites often use geo-location blocks, making it difficult for scrapers to gather valuable data using globally distributed IPs. 

 

Warning: Repeated and aggressive scraping attempts from a single IP address can lead to IP blocks and rate limiting, further hindering the scraping process.

To overcome the challenges posed by geolocation blocks and IP restrictions, proxies present a powerful solution for achieving accurate and reliable results in geo-targeted web scraping.

In a nutshell . . .

Geo-targeting enhances marketing strategies by customizing products, understanding regional preferences, and conducting geographical market research. Proxies overcome geolocation blocks to ensure accurate data extraction in web scraping.

The Role of Proxies in Extracting Location-Based Data

Proxies are servers that sit between the user and the target website. They act as intermediaries between the scraper and the target website, masking the scraper's true IP address and allowing them to rotate between multiple IP addresses. 

This approach brings several advantages, such as:

  • Masking IP Addresses for Anonymity: Proxies act as intermediaries between the scraper and the target website, hiding the scraper's true IP address. This IP masking enhances anonymity during the scraping process, ensuring that the scraping activities remain undetected. By rotating through a pool of diverse IP addresses, proxies prevent websites from identifying and blocking the scraper, enabling uninterrupted data extraction.
  • Overcoming Location Restrictions: Certain websites may detect attempts at geo-targeting and restrict access based on geographical regions. Proxies circumvent this by allowing scrapers to appear as if they were real users browsing from various locations worldwide. This capability allows businesses to extract data from any region of interest.
  • Preventing IP Blocks and Rate Limiting: Frequent and aggressive scraping from a single IP address can trigger IP blocks and rate-limiting measures, hindering data extraction and disrupting geo-targeting. Proxies fix this issue by distributing scraping requests across multiple IP addresses, ensuring no individual IP address is overwhelmed. 
  • Load Distribution: By rotating proxies within a given geographical region, you prevent excessive requests from a single IP, avoiding potential bans while scraping data.
 

Helpful Articles: To gain more knowledge about proxies, explore Techjury's article on private proxies and fresh proxies.

However, while proxies are indispensable tools for geo-targeted web scraping, traditional approaches to implementing proxies comes with the following limitations:

  • Manual Proxy Management: Manually obtaining and managing proxies can be time-consuming and resource-intensive, especially when dealing with a large-scale or ongoing scraping operation.
  • Dealing with IP Rotation: Implementing manual IP rotation can be technically challenging and time-consuming. 
  • Reliability Concerns: Free or public proxies often suffer from limited availability and slow response times, leading to frequent data extraction disruptions and potentially jeopardizing geo-targeting efforts' success. 
 

Pro Tip

Free proxies frequently do not address IP diversity, making scrapers vulnerable to IP blocks and bans when websites detect excessive scraping activities. Always check if it’s safe to use the free proxy trial.

  • Limited Geographical Coverage: Some proxy providers may have a restricted geographical reach, limiting access to data from specific locations and hindering comprehensive geo-targeting initiatives.
  • Ethical Compliance in Proxy Acquisition: The gray area associated with how free proxies acquired IP addresses may raise legal and ethical concerns down the line.
  • Limited Scalability: As scraping needs grow, traditional proxy setups may struggle to accommodate increasing demands that require upgraded infrastructure on your part, leading to a bottleneck that can be expensive to address.

Where traditional approaches to proxy implementation fall short, a robust proxy infrastructure effectively addresses these limitations. 

In a nutshell. . .

Proxies act as intermediaries, masking the scraper's IP, enabling anonymity, overcoming location restrictions, and preventing IP blocks. However, manual management and reliability issues pose challenges.

The following section examines how Bright Data’s Scraping Browser, a fully GUI headful browser with Bright Data’s premium proxy network, makes for a highly scalable, seamless, and uninterrupted scraping process.    

Geo-targeting at scale with Bright Data’s Scraping Browser

The Scraping Browser is a ‘headful’, fully GUI Chrome instance running on Bright Data’s servers that can be remotely connected to, with headless browsers like Puppeteer and Playwright, with the help of a WebSocket connection. 

It ships with sophisticated unlocker technology to bypass all anti-scraping and internet bot detection measures. Still, most importantly, for our purposes, it also comes with Bright Data’s cutting-edge proxy infrastructure out-of-the-box, boasting: 

  • a vast network of high-quality
  • ethically sourced and vetted (EU General Data Protection Regulation and the California Consumer Privacy Act compliant)
  • diverse IP addresses (spanning over 195 countries)
  • 99.9% network uptime. 

It uses four different kinds of proxy services: residential proxies, data center proxies, ISP proxies, and mobile proxies, selecting from this pool based on the automatically-detected use case.

The Scraping Browser uses a combination of this proxy infrastructure and the unlocker technology to bypass blocks and handle IP rotation, throttling, and retries automatically for you; no code is required. 

This eliminates any need to rely on external infrastructure or needs advanced code or third-party libraries, making for a seamless, comprehensive, and highly scalable solution for your data collection needs.

In a nutshell. . .

Bright Data's Scraping Browser is a GUI Chrome instance with unlocker tech and a premium proxy infrastructure, offering seamless geo-targeted web scraping.

Let’s now actually use the Scraping Browser for geo-targeting Amazon products. 

 Web Scraping with the Scraping Browser: Comparing Prices in Different Countries

For our example, we’ll use the Scraping Browser with Python’s Playwright package to compare and contrast Lenovo’s products on Amazon’s website in different country locations.

Lenovo’s products on Amazon’s website

To set up the Scraping Browser, first start by signing up (click on ‘Start free trial’) and entering your details.

Scraping Browser

Once you’re logged in, go to Proxies & Scraping Infrastructure and select the feature Scraping Browser.

Proxies & Scraping Infrastructure

This equips us with a robust browser with built-in unlocking capabilities and proxy management services, seamlessly bypassing geolocation blocks and other restrictions.

geolocation blocks and other restrictions

Activate the Scraping Browser, and you can access and navigate websites via headless browsers. Bright Data provides a $5 credit to try out the Scraping Browser without any additional costs.

without any additional costs

To start using Playwright’s seamless integration with the Scraping Browser, install the Python package by running the following command:

pip install playwright

In the Access Parameters under the Scraping Browser window, you’ll find the API credentials: username (Customer_ID), zone name (attached to username), and password.Scraping Browser window

These credentials can create a session in Playwright or any other supported headless browser.

With the help of Bright Data’s detailed instructions in their documentation for seamless integration with Playwright, I built the following Python script to scrape the Amazon website from three different countries: Algeria, the United States, and Colombia.

import asyncio
from playwright.async_api import async_playwright

async def main(max_items, country):
  # username, password and host provided by the Scraping Browser.
  auth = f"<username>-country-{country}:<password>"
  browser_url = f"wss://{auth}@<host>"
  item = "lenovo"
  website_to_crawl = f"https://www.amazon.com/s?k={item}"
  async with async_playwright() as pw:
      print('connecting')
      browser = await pw.chromium.connect_over_cdp(browser_url)
      print('connected')
      page = await browser.new_page()
      print('goto')
      await page.goto(website_to_crawl, timeout=120000)
      print('done, evaluating')
      # Extract information about the items
      items = await page.query_selector_all(
          '.a-section.a-spacing-small.a-spacing-top-small')
      for item in items[:max_items]:
          title_element = await item.query_selector(
              'span.a-size-medium.a-color-base.a-text-normal')
          title = await title_element.evaluate(
              '(element) => element.textContent') if title_element else None
          price_element = await item.query_selector('span.a-price')
          price = await price_element.evaluate(
              '(element) => element.textContent') if price_element else None
          rank_element = await item.query_selector('span.a-icon-alt')
          rank = await rank_element.evaluate(
              '(element) => element.textContent') if rank_element else None
          elements = {'title':title, 'price':price, 'rank': rank}
          print(elements)

      await browser.close()

if __name__ == '__main__':
  # dictionary of countries
  countries = {
      'Algeria':'dz',
      'United States':'us',
      'Brazil':'br'}

  # k (key) is the name of the country and v(value) the code
  for k, v in countries.items():
      print("\n", f"----- GEO-TARGETING {k.upper()} -----", "\n")
      # create a coroutine object
      coro = main(10, v)
      asyncio.run(coro)

The browser_url variable makes the remote connection between the client and Bright Data’s server by using the WebSocket protocol (wss://). The client initiates the request, and the server responds if it accepts the connection. Once connected, the client and the server can share data using an API, which, in this case, comprises the provided username and password (auth).

Another line of code that requires further explanation:

browser = await pw.chromium.connect_over_cdp(browser_url)

The connect_over_cdp() Python method attaches Playwright to the remote Bright Data browser instance (more about it here) using the Chrome DevTools Protocol, which is only supported by Chromium-based browsers. Developers use the Chrome DevTools Protocol to automate tests, web scraping, and perform other browser interactions.

Adding the parameter -country-{country} just after the <username> on the auth variable is the trick for geo-targeting. It activates the proxy’s IP address for that specific region. Let’s say we use -country-us. Bright Data’s proxy servers will pick an IP address for the United States. That is why I made a dictionary at the end, with each country and country’s code, functioning as a simple IP rotator.

The script runs three times (one for each of the countries) and prints the results as shown below: 

----- GEO-TARGETING ALGERIA -----

connecting
connected
goto
done, evaluating
{'title': None, 'price': None, 'rank': None}
{'title': None, 'price': None, 'rank': None}
{'title': 'Lenovo Smart True Wireless Earbuds - Smart Switch Fast Pair - Active Noise Cancelling Earphones with Wireless Charging Case - 28 Hrs Playtime Headphones - 6 Built-in Mics - Bluetooth - White', 'price': '$79.19$79.19', 'rank': '4.0 out of 5 stars'}
{'title': 'Lenovo IdeaPad Flex 5-2023 - Touchscreen 2-in-1 Laptop - Windows 11 Home - 14" FHD Display - 16GB Memory - 512GB Storage - AMD Ryzen 5 5500U - Abyss Blue', 'price': '$593.07$593.07', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo 2022 Newest Ideapad 3 Laptop, 15.6" HD Touchscreen, 11th Gen Intel Core i3-1115G4 Processor, 8GB DDR4 RAM, 256GB PCIe NVMe SSD, HDMI, Webcam, Wi-Fi 5, Bluetooth, Windows 11 Home, Almond', 'price': '$407.00$407.00', 'rank': '4.3 out of 5 stars'}
{'title': "Lenovo 2023 High Performance 15'' FHD IPS Laptop, Intel Quad-Core Pentium Processor Up to 3.0GHz, 8GB RAM, 256GB SSD, Super-Fast WiFi Speed, Windows 11 OS, Dale Blue (Renewed)", 'price': '$253.99$253.99', 'rank': '4.3 out of 5 stars'}
{'title': 'Lenovo IdeaPad 1 14 Laptop, 14.0" HD Display, Intel Celeron N4020, 4GB RAM, 64GB Storage, Intel UHD Graphics 600, Win 11 in S Mode, Cloud Grey', 'price': '$171.99$171.99', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad Gaming 3 - (2022) - Essential Gaming Laptop Computer - 15.6" FHD - 120Hz - AMD Ryzen 5 6600H - NVIDIA GeForce RTX 3050 - 8GB DDR5 RAM - 256GB NVMe Storage - Windows 11 Home', 'price': '$679.99$679.99', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad 3 - (2023) - Everyday Notebook - Windows 11-14" Full HD - 8GB Memory - 128GB Storage - Intel Core i3-1115G - Platinum Grey', 'price': '$297.99$297.99', 'rank': '4.5 out of 5 stars'}
{'title': 'Lenovo IdeaPad Flex 5-2023 - Touchscreen 2-in-1 Laptop - Windows 11 Home - 14" FHD Display - 16GB Memory - 512GB Storage - AMD Ryzen 5 5500U - Abyss Blue', 'price': '$593.07$593.07', 'rank': '4.2 out of 5 stars'}

----- GEO-TARGETING UNITED STATES -----

connecting
connected
goto
done, evaluating
{'title': None, 'price': None, 'rank': None}
{'title': None, 'price': None, 'rank': None}
{'title': 'Lenovo 2022 IdeaPad 3i 15.6" Touchscreen Business Laptop Computer, Intel Core i3-1115G4 (Beat i5-10210U), 8GB DDR4 RAM, 256GB PCIe SSD, WiFi 6, BT 5.0, Grey, Windows 11 Pro S, BROAG 64GB Flash Stylus', 'price': '$449.00$449.00', 'rank': '4.5 out of 5 stars'}
{'title': 'Lenovo IdeaPad, 20GB RAM, 1TB SSD, AMD Dual-core Processor, 15.6 Inch HD Anti-Glare Display, Long Battery Life Up to 9.5Hr, HDMI, SD Card Reader, Windows 11, 1 Year Microsoft 365', 'price': '$439.99$439.99', 'rank': '4.3 out of 5 stars'}
{'title': 'Lenovo 2022 Newest Ideapad 3 Laptop, 15.6" HD Touchscreen, 11th Gen Intel Core i3-1115G4 Processor, 8GB DDR4 RAM, 256GB PCIe NVMe SSD, HDMI, Webcam, Wi-Fi 5, Bluetooth, Windows 11 Home, Almond', 'price': '$398.99$398.99', 'rank': '4.3 out of 5 stars'}
{'title': 'Lenovo IdeaPad 1 14 Laptop, 14.0" HD Display, Intel Celeron N4020, 4GB RAM, 64GB Storage, Intel UHD Graphics 600, Win 11 in S Mode, Cloud Grey', 'price': '$159.00$159.00', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo 2022 Newest IdeaPad 1 Laptop, 14" Anti-Glare Display, Intel Quad-Core Processor, Intel UHD Graphics, 4GB RAM, 256GB PCIe SSD, Windows 11 + Microfiber Cloth', 'price': '$279.00$279.00', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad Gaming 3 - (2022) - Essential Gaming Laptop Computer - 15.6" FHD - 120Hz - AMD Ryzen 5 6600H - NVIDIA GeForce RTX 3050 - 8GB DDR5 RAM - 256GB NVMe Storage - Windows 11 Home', 'price': '$669.99$669.99', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo 2023 Newest IdeaPad Flex 5 2-in-1 Laptop, 16" 2.5K WQXGA Touchscreen Display, Intel Core i7-1255U Processor, 16GB DDR4 RAM, 512GB SSD, Intel Iris Xe Graphics, Webcam, Wifi6, Windows 11 Home', 'price': '$769.99$769.99', 'rank': '5.0 out of 5 stars'}
{'title': 'Original Lenovo LP40 Pro TWS Earphones Wireless Bluetooth 5.1 Sport Noise Reduction Headphones Touch Control 250mAH 2022 New (Black)', 'price': '$12.00$12.00', 'rank': '4.5 out of 5 stars'}

----- GEO-TARGETING BRAZIL -----

connecting
connected
goto
done, evaluating
{'title': None, 'price': None, 'rank': None}
{'title': None, 'price': None, 'rank': None}
{'title': 'Lenovo IdeaPad, 20GB RAM, 1TB SSD, AMD Dual-core Processor, 15.6 Inch HD Anti-Glare Display, Long Battery Life Up to 9.5Hr, HDMI, SD Card Reader, Windows 11, 1 Year Microsoft 365', 'price': '$439.99$439.99', 'rank': '4.3 out of 5 stars'}
{'title': 'Lenovo Yoga 7i 2-in-1 Laptop, 16" WUXGA (1920 x 1200) Touch Screen, Intel Iris Xe Graphics, Intel Core i5-1335U, 8GB RAM, 512GB PCIe SSD, Backlit, Windows 11 Home, Storm Grey, with 5ave Stylus Pen', 'price': '$749.99$749.99', 'rank': None}
{'title': 'Lenovo 2022 Newest Ideapad 3 Laptop, 15.6" HD Touchscreen, 11th Gen Intel Core i3-1115G4 Processor, 8GB DDR4 RAM, 256GB PCIe NVMe SSD, HDMI, Webcam, Wi-Fi 5, Bluetooth, Windows 11 Home, Almond', 'price': '$398.99$398.99', 'rank': '4.3 out of 5 stars'}
{'title': 'Lenovo 2022 Newest IdeaPad 1 Laptop, 14" Anti-Glare Display, Intel Quad-Core Processor, Intel UHD Graphics, 4GB RAM, 256GB PCIe SSD, Windows 11 + Microfiber Cloth', 'price': '$279.00$279.00', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad 1 14 Laptop, 14.0" HD Display, Intel Celeron N4020, 4GB RAM, 64GB Storage, Intel UHD Graphics 600, Win 11 in S Mode, Cloud Grey', 'price': '$159.00$159.00', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad Gaming 3 - (2022) - Essential Gaming Laptop Computer - 15.6" FHD - 120Hz - AMD Ryzen 5 6600H - NVIDIA GeForce RTX 3050 - 8GB DDR5 RAM - 256GB NVMe Storage - Windows 11 Home', 'price': '$679.99$679.99', 'rank': '4.2 out of 5 stars'}
{'title': 'Lenovo IdeaPad 3 - (2023) - Everyday Notebook - Windows 11-14" Full HD - 8GB Memory - 128GB Storage - Intel Core i3-1115G - Platinum Grey', 'price': '$297.99$297.99', 'rank': '4.5 out of 5 stars'}
{'title': 'Lenovo IdeaPad Flex 5-2023 - Touchscreen 2-in-1 Laptop - Windows 11 Home - 14" FHD Display - 16GB Memory - 512GB Storage - AMD Ryzen 5 5500U - Abyss Blue', 'price': '$593.07$593.07', 'rank': '4.2 out of 5 stars'}

The results show the first 10 products for each country, along with their prices and rating. As expected, the output is different for each region. Some products appear in the three countries, but their order is different, and so is their price in some cases. Let’s take the following item as an example:

'Lenovo 2022 Newest Ideapad 3 Laptop, 15.6" HD Touchscreen, 11th Gen Intel Core i3-1115G4 Processor, 8GB DDR4 RAM, 256GB PCIe NVMe SSD, HDMI, Webcam, Wi-Fi 5, Bluetooth, Windows 11 Home, Almond'

The price of this item is $407.00 in Algeria, but it decreases to $398.99 in the United States and Brazil.

 

Helpful Articles: Techjury features additional articles related to data scraping.. Read our guides on how to scrape Google Search data and 5 simple methods to scrape e-commerce websites.

Conclusion

The importance of location-based data in various industries cannot be overstated. Geo-targeting empowers businesses to understand regional preferences, optimize marketing strategies, and enhance user experiences. 

Proxies play a vital role in overcoming geolocation restrictions for effective web scraping. But while traditional free proxies are limited, Bright Data's premium proxy infrastructure offers a powerful solution, with a vast network of high-quality IP addresses from 195 countries, ensuring uninterrupted data gathering. 

With the Scraping Browser, you can leverage location-based insights effectively, driving growth and improving customer experiences through Bright Data's advanced proxy infrastructure while seamlessly bypassing geolocation and other IP-based blocks. Sign up for a free trial to experience the power of Bright Data's advanced proxy infrastructure and the Scraping Browser.

FAQs.


How accurate is proxy geo-targeting in determining a user's location?

IP geolocation accuracy can be affected by your provider and device location. In general, IP-based geolocation can give 50% to 80% in providing a user's region, state, or city.

Can proxy geo-targeting be used for fraudulent purposes?

Proxy servers feature to mask IP address geolocations and are often used by fraudsters to make deceptive purchases and chargebacks. People do this to prevent being detected for having a payment method address that does not match their IP address's location.

Is it illegal to change my IP location?

Changing your IP address is legal in the United States. It can be done to enhance online safety and protect online privacy. However, if you are from another country or traveling to one, it's important to check the legality to avoid 

SHARE:

Facebook LinkedIn Twitter
Leave your comment

Your email address will not be published.