Categories
Development

Simple JAVA scraper that handles user-agent, headers and cookies

How to handle cookie, user-agent, headers when scraping with JAVA? We’ll use for this a static class ScrapeHelper that easily handles all of this. The class uses Jsoup library methods to fetch from data from server and parse html into DOM document.

Categories
Development

Check if cookies are enabled

function areCookiesEnabled() 
	{
		var cookieEnabled = (navigator.cookieEnabled) ? true : false;
		if (typeof navigator.cookieEnabled == "undefined" && !cookieEnabled)
		{
			document.cookie = "test";
			cookieEnabled = (document.cookie.indexOf("test") != -1) ? true : false;
		}
		return cookieEnabled;
	}

Navigator is the interface represents the state and the identity of the user agent. It allows scripts to query it and to register themselves to carry on some activities.
Navigator object can be retrieved using the read-only window.navigator property.

Categories
Development

Get and pass CSRF token using python requests library

import sys
import requests
URL = 'https://portal.bitcasa.com/login'
client = requests.session()

# Retrieve the CSRF token first
client.get(URL)  # sets cookie
if 'csrftoken' in client.cookies:
    # Django 1.6 and up
    csrftoken = client.cookies['csrftoken']
else:
    # older versions
    csrftoken = client.cookies['csrf']

# Pass CSRF token both in login parameters (csrfmiddlewaretoken)
# and in the session cookies (csrf in client.cookies)
login_data = dict(username=EMAIL, password=PASSWORD, csrfmiddlewaretoken=csrftoken, next='/')
r = client.post(URL, data=login_data, headers=dict(Referer=URL))
Categories
Development

Python: submit authenticated form using cookie and session

Recently, I was challenged to do bulk submits through an authenticated form. The website required a login. While there are plenty of examples of how to use POST and GET in Python, I want to share with you how I handled the session along with a cookie and authenticity token (CSRF-like protection).

In the post, we are going to cover the crucial techniques needed in the scripting web scraping:

  • persistent session usage
  • cookie finding and storing [in session]
  • “auth token” finding, retrieving and submitting in a form
Categories
Development Miscellaneous

Cookie browser-server workflow

Categories
Development

Handling HTTP Cookies in cURL

http cookieMost of developers stuck with the cookie handlng in web scraping. Sure it’s a tricky thing and this once has been my stumbling stone too. So here mainly for new scraing engineers i’d like to share of how to handle cookie in web scraping when using PHP. We’ve already done the post on scrape by cURL in PHP, so here we’ll only focus on a cookie side. The cookie is a small piece of data sent from a website and stored in a user’s web browser while the user is browsing that website. So when browser requests a page and along with web content cookie is returned browser does all the dirty job to store cookie and later send them back to server which rendered that web page in following web requests.