Web Scraping with Python

Tools you can use for web scraping

Requests library

Beautiful Soup

Scrapy - used for web crawling websites

Selenium - Automated bot

Why Scrap the web?

The web is filled with profitable data

For data trapped in web pages, scraping is the only option.

Human vs Web Scrapers

Human
Web Scraper

Enters a url or clicks a bookmark

Set a start_url

Download HTML

Download HTML

Parse HTML & render

Parse HTML

Review for useful information

Extract useful information

Interpret

Transform or Aggregate

Remember the information

Save the data

Click a link-Enter another URL

Go to the next URL

URL Hacking

https://
www.website.com:443
/path
query_string

URL Fragments

# Location
& query strings

The '#' is the main filter set

The '&' is the searched query string

start_url = f'http://{host}{path}{query_string}'

Last updated