Month: July 2020

Get and pass CSRF token using python requests library

Post author By admin
Post date July 20, 2020
No Comments on Get and pass CSRF token using python requests library

import sys
import requests
URL = 'https://portal.bitcasa.com/login'
client = requests.session()

# Retrieve the CSRF token first
client.get(URL)  # sets cookie
if 'csrftoken' in client.cookies:
    # Django 1.6 and up
    csrftoken = client.cookies['csrftoken']
else:
    # older versions
    csrftoken = client.cookies['csrf']

# Pass CSRF token both in login parameters (csrfmiddlewaretoken)
# and in the session cookies (csrf in client.cookies)
login_data = dict(username=EMAIL, password=PASSWORD, csrfmiddlewaretoken=csrftoken, next='/')
r = client.post(URL, data=login_data, headers=dict(Referer=URL))

Tags cookie, CSRF, Python

Challenge Development

Business directory simple scraper (python) at pythonanywhere

Post author By admin
Post date July 3, 2020
No Comments on Business directory simple scraper (python) at pythonanywhere

My goal was to retrieve data from a web business directory.

Since the business directories scrape is the most challenging task (beside SERP scrape) there are some basic questions for me to answer:

Is there any scrape protection set at that site?
How much data is in that web business directory?
What kind of queries can I run to find all the directory’s items?

Continue reading

Tags business directory

Challenge

Most popular web scraping targets and how to scrape them

Post author By admin
Post date July 1, 2020
No Comments on Most popular web scraping targets and how to scrape them

Online marketplaces
In the marketplaces people offer their products for sale. Similar to garage sales, but online. (eg. eCrater, www.1188.no).
Easy to scrape since they are usually free and do not tend to protect their data.
Business directories
The usually huge online directories targeted at the general audience. (eg. Yellow Pages). They do protect their data to avoid duplication and loss of audience. See some posts on this.

Continue reading

Tags scrape protection, web scraping