How to Scrape Images From a Website? [A Step-by-Step Guide]

Reading time: 9 min read
Artem Minaev
Written by
Artem Minaev

Updated · Oct 25, 2023

Artem Minaev
Management Consultant | Joined May 2023 | Twitter LinkedIn
Artem Minaev

Artem is a management consultant with a strong background in marketing and branding. As a valuable m... | See full bio

Florence Desiata
Edited by
Florence Desiata

Editor

Florence Desiata
Joined June 2023 | LinkedIn
Florence Desiata

Florence is a dedicated wordsmith on a mission to make technology-related topics easy-to-understand.... | See full bio

Techjury is supported by its audience. When you purchase through links on our site, we may earn an affiliate commission. Learn more.

Browser technologies allow the downloading of images from the web. It's possible by right-clicking through the image and copying or saving it.

While it is doable, doing the same thing for hundreds of images might not be the best idea. This case is where image scraping comes into the picture.

This article discusses what image scraping is. It will also teach you a popular method of scraping images from websites with a Python image scraper.

Read on to learn more.

🔑 Key Takeaways

  • Image Scraping is a practical method for efficiently collecting numerous images from websites.
  • Utilize Python and BeautifulSoup for a user-friendly and flexible image scraping solution.
  • Image scraping is valuable in AI development for training data and in eCommerce research for extracting product information.
  • Respect copyright and website terms of service when performing image scraping to ensure it's conducted lawfully and ethically.

Scraping Pictures From a Website: How Does It Work?

There are many ways to scrape website images. One of them is using a Python script for image scraping.

You can also opt for a paid image scraper. However, the cost of web scraping tools differs for each provider. Prices can be on a per-page/request basis or a monthly subscription.

Image scrapers work by requesting the whole HTML content of a webpage. Then, they identify the target image elements for scraping. 

Afterward, it will use the collected source URLs of images to download the files for storage.

📝 Note

Paid image scrapers can be pricey—especially if you plan on using them many times. Here’s an example of how automated image scraping works using paid tools like Octoparse: 

Common Use Cases of Image Scraping

Image scraping is ideal for any type of data gathering. However, scraped images are commonly used for two applications: AI development and eCommerce.

Here’s how image scraping is used in those areas:

AI Development

AI developers use image scraping to gather data for training their models. These projects need countless images that one can only get with premium scrapers. 

In short, image scraping makes gathering references for AI projects easier. 

eCommerce Research

Image scraping is also a typical method used in eCommerce research. Photos often contain data such as prices, descriptions, and customer reviews. 

Scraping images with those makes analysis and research less time-consuming. 

👍 Helpful Article

When scraping eCommerce websites, the process depends on the target data. Check out this TechJury guide to know more about how to scrape eCommerce websites.

What You Need To Scrape Photos

This article emphasizes how to scrape images from a website using simple Python scripts. This method allows you to scrape images for free while enjoying the same quality output.

You do not have to worry if you do not have any coding experience. All you have to do is follow the steps. 

However, before starting with the process, here are the things you will need:

  1. Code Editor or Integrated Development Environment

A code editor is where you will write your scripts. The most popular IDE is Visual Studio Code from Microsoft. However, you are free to use any code editor that you prefer.

  1. Python

Python is a programming language that’s simple and easy to use. It supports many libraries to maximize scraping activities.

Pro Tip

Always use the latest version of Python. This way, you can ensure the language will be compatible with new libraries and packages.

To know if you already have the latest version, run this command on your command prompt or terminal:

python -V

The results should display Python’s most recent version number:

Python 3.11.4

Pro Tip

If you plan to scrape images from several pages, you will need rotating proxies to avoid blockages from your target website's anti-bot measures. 

Fortunately, Python can rotate proxies, and you can easily do it for an extra layer of security.

  1. BeautifulSoup

Besides the IDE and Python, you will also use the BeautifulSoup module to scrape images. To install it, run:

pip install bs4

This module has many selector functions to pinpoint the data set you want to scrape. 

💡 Did You Know?

Besides image scraping, you can also use the BeautifulSoup module when scraping search results from Google. It is a practical tool compatible with Python and works well in other scraping tasks.

Once you have all three, you are ready to build your image scraper in Python.

5 Steps to Scrape Images From a Website

There are five steps to scrape photos on any website. These steps are:

How to Scrape Images from a Website

Continue reading to know how to do each step. 

Step 1: Import the Modules

The first step is to import the modules needed to perform the task.

import requests
from bs4 import BeautifulSoup
import os

The requests module will send HTTP requests to the target website. You can run pip install requests if not installed on your device.

The OS module manipulates the desktop functions, such as creating folders and editing files. BeautifulSoup is the main parsing module.

Step 2: Send the HTTP Request

You can use the following code to send the HTTP request:

url = "https://www.freeimages.com/search/dogs"

response = requests.get(url)

soup = BeautifulSoup(response.content, "html.parser")

You can set the ‘url’ as the target address, use the get() function, then save the response to the ‘response’ object.

The BeautifulSoup() function creates an object from the content of the ‘response.’ The html.parser argument specifies the parser you’ll use that is built-in to BeautifulSoup.

Step 3: Identify the Selectors

For example, your target data is a set of dog images on the first page of the “dogs” search results. To pinpoint them, you must inspect the HTML structure of the web page. 

You can do this by right-clicking anywhere on the page and selecting Inspect to access the DevTools.

Source Page of a Sample Web Page

Hover your mouse pointer over any element, and it will highlight the corresponding item in the actual rendered page.

In this example, the ‘img’ tag containing the image file is nested within the ‘div’ tag with the ‘class’ attribute and “grid-image-wrapper” value.

To capture the source of the image file, use the following code:

div_elements = soup.find_all("div", class_="grid-image-wrapper")

for index, div_element in enumerate(div_elements):
    img_element = div_element.find("img")

The find_all() function will find all the tags and attribute-value matches as indicated.

On the following line of code, the find() function will find the ‘img’ tag from each match.

If you go back to the HTML content, the ‘src’ attribute only contains the preview or thumbnail of the image, which is not what you want.

The natural source of the image is in the ‘data-src’ attribute. However, this is not the case for all of the photos. Some of them are in ‘src’! To resolve this, you can use:

    if "data-src" in img_element.attrs:
       image_url = img_element["data-src"]
    else:
        image_url = img_element["src"]

Though it’s not necessary, here’s an additional code to skip base64 encoded images:

    if image_url.startswith("data:image/"):
       print("Skipping base64-encoded image:", image_url)
        continue

It is time to download the files.

Step 4: Download and Store File Images

You've already collected the image URLs in the previous step. To send a request to ‘get’ them, use:

image_response = requests.get(image_url)

To name the image files:

filename = f"image_{index+1}.jpg"

You can manually create a folder where to save the files or use this to create one automatically:

os.makedirs("scraped_images", exist_ok=True)

To download and save the files to the folder “scraped_images”:

    file_path = os.path.join("scraped_images", filename)

    with open(file_path, "wb") as f:
        f.write(image_response.content)

To print the results in your terminal:

print("Downloaded:", file_path)

This final piece of code will help you keep track of the results on the terminal.

Step 5: Running The Code

Review the whole script for syntax issues. The entire code should look like this:

import requests
from bs4 import BeautifulSoup
import os

# URL of the website to scrape
url = "https://www.freeimages.com/search/dogs"

# Send a GET request to the website
response = requests.get(url)

# Create a BeautifulSoup object
soup = BeautifulSoup(response.content, "html.parser")

# Create a folder to save the scraped images
os.makedirs("scraped_images", exist_ok=True)

# Find all div elements with class "grid-image-wrapper"
div_elements = soup.find_all("div", class_="grid-image-wrapper")

# Loop through each div element
for index, div_element in enumerate(div_elements):
    # Find the img tag within the div
   img_element = div_element.find("img")

   # Extract the image URL from the "data-src" attribute if available
    # otherwise fallback to the "src" attribute
    if "data-src" in img_element.attrs:
        image_url = img_element["data-src"]
    else:
        image_url = img_element["src"]

   # Check if the image URL starts with "data:image/"
    if image_url.startswith("data:image/"):
        print("Skipping base64-encoded image:", image_url)
        continue

    # Send a GET request to download the image
   image_response = requests.get(image_url)

    # Generate a filename for the image
    filename = f"image_{index+1}.jpg"

   # Set the file path to save the image
    file_path = os.path.join("scraped_images", filename)

    # Save the image to the specified directory
    with open(file_path, "wb") as f:
       f.write(image_response.content)

    print("Downloaded:", file_path)

Once you have saved the Python script as imagescraper.py, run the script on your terminal as:

python imagescraper.py

Wait for the downloads to complete. You should get a perfectly organized set of images saved in the designated folder.

Scraped Images from the Target Webpage

You just completed building your image scraper in Python.

Pro Tip

Be mindful of the time when scraping. Scrape images outside the site’s peak hours to prevent the server from overloading. It will also prevent actual users from having a slow or bad experience.

The Legality of Photo Scraping

Like web scraping, image scraping is legal if you are not extracting copyright or password-protected content. Also, you should note that the site owner has the final say if they want their content to be scraped. 

📝 Note

Always check sites for robot.txt or Terms of Service (ToS). These files and pages will show you what you can and cannot do with their published content.

Copyrighted content can also be publicly available, so it is a matter of how one will use it. This type of content is protected by DMCA, no matter how available it is.

Conclusion

Image scraping is extremely useful for various research activities that require images. With only a code editor, Python, and BeautifulSoup, you can scrape images easily—even without prior coding experience.

However, always keep in mind that you must respect site sources by following their Terms of User and limiting the volume of scraping requests.

FAQs.


Is it legal to scrape Google images?

Results from a Google search come from various sources, so it is hard to tell which one is copyrighted or not. It will depend on what you wish to do with the collected images.

How to fetch image files in PHP?

You can use the file_get_contents() function to fetch an image file in PHP. To save the content, use the standard file-handling functions such as fopen() or fwrite().

How do I extract text from a JPEG?

You must use an Optical Character Recognition tool to extract text from a JPEG image file. There are many free OCR tools that you can use, as well as paid data parsing services.

SHARE:

Facebook LinkedIn Twitter
Leave your comment

Your email address will not be published.