Web Scraping
Web scraping is the process of extracting information from websites, keeping in mind that not all websites are open for this data sharing.
Steps:
- Inspect the website
- Import the following libraries: requests, urllib.request, time, and BeautifulSoup
- Use
requests.get(url)
to get the data from the required url
- Parse data using
BeautifulSoup(response.text, “html.parser”)
- Use
time.sleep()
, and set time as argument, so the website will not recognize you as spammer
How to Scrape Websites Without Getting Blocked
- Respect scraping rules of the site
- Don’t over scrape data (slam the website with a lot)
- Act like real human
- Use several IP addresses
- Rotate User Agents and corresponding HTTP Request Headers between requests