How to Scrape Google Search Results in Python Without Getting Blocked

If you’ve used Google at all, you know that even slightly unusual behavior will send you to the ReCaptcha Gods. But in a world where search results are consumed by so many different businesses,whether directly or indirectly, not being able to access these results can be quite problematic.

A How-to-guide to Scraping Google

Instead of using something like requests, BeautifulSoup or something similar, we will use Playwright. Playwright is a headless browser automation tool, to make our scraper appear like a real user (or at least a real browser 😉).

Install the Required Dependencies

We’ll be using python, so make sure that that is installed, and then install playwright.

pip install playwright

Playwright needs to set up the browsers it will use. This can be done with

playwright install

Scraping with Playwright

The code is largely straightforward and self-documenting, but let’s just go over it quickly. The scrape_google function starts the async playwright process, and starts a new context and page. Then we perform a navigation to the scraping url. Playwright will automatically wait for the browser state to settle down. Then we have some magic code, where we look for all the h3 tags with the classes LC20lb, MBeuO and DKV0Md. This will resolve to a list of all valid search result headings. We then look for the text and link content within these tags using the extract_results function.

Enhancing the Scraper with Hyperbrowser

While Playwright is better than the more basic options, it has its own set of issues. For one, any large volume of scraping will cause the IP to get blocked. Usually we could just wait for this ban to time out, or this could throw Captchas our way, which would need to be solved before we could start again. The simplest solution here is to use something like Hyperbrowser.

Let's use it along with Playwright to scrape Google search results:

Hyperbrowser helps make web scraping more reliable by:

  • Automatically rotating proxies
    • Spreading requests across different IPs helps reduce detection risks, and avoids rate limit blocks. More importantly, that means that your real IP is not banned either.
  • Running in the cloud and scaling with your needs
    • If you've run any scraping operation at scale, you know it's not easy to manage a fleet of browsers.
  • Reducing detection risks with stealth browsing techniques.
    • Hyperbrowser uses a range of techniques to make your browsing behavior look like a normal human user.
  • Solving CAPTCHAs automatically

Conclusion

Scraping Google search results with Python is challenging because of anti-bot defenses, but it doesn’t need to be. Playwright is a great starting point for this, but its stealthiness can be enhanced still in many ways. If you're facing aggressive anti-scraping measures, give Hyperbrowser a try today!

Hyperbrowser

Start building today

Launch your browser in seconds. No credit card required.