Web Scraping with Python
Tools you can use for web scraping
Requests library
Beautiful Soup
Scrapy - used for web crawling websites
Selenium - Automated bot
Why Scrap the web?
The web is filled with profitable data
For data trapped in web pages, scraping is the only option.
Human vs Web Scrapers
Human
Web Scraper
Enters a url or clicks a bookmark
Set a start_url
Download HTML
Download HTML
Parse HTML & render
Parse HTML
Review for useful information
Extract useful information
Interpret
Transform or Aggregate
Remember the information
Save the data
Click a link-Enter another URL
Go to the next URL
URL Hacking
https://
www.website.com:443
/path
query_string
URL Fragments
# Location
& query strings
The '#' is the main filter set
The '&' is the searched query string
start_url = f'http://{host}{path}{query_string}'
Last updated