Category: Development

Proxy speed and performance test

Post author By admin
Post date August 30, 2017
3 Comments on Proxy speed and performance test

I want to test a proxy [gateway] service. What would be the simplest script to check the proxy’s IP speed and performance? See the following script.

Tags proxy

Development

Php Curl download file

We want to show how one can make a Curl download file from a server. See comments in the code as explanations.

// open file descriptor
$fp = fopen ("image.png", 'w+') or die('Unable to write a file'); 
// file to download
$ch = curl_init('http://scraping.pro/ewd64.png');
// enable SSL if needed
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
// output to file descriptor
curl_setopt($ch, CURLOPT_FILE, $fp);          
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// set large timeout to allow curl to run for a longer time
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);     
curl_setopt($ch, CURLOPT_USERAGENT, 'any');
// Enable debug output
curl_setopt($ch, CURLOPT_VERBOSE, true);   
curl_exec($ch);
curl_close($ch);                               
fclose($fp);

Tags Curl, PHP

Development

Selenium using proxy gateway, how?

Post author By admin
Post date June 13, 2017
No Comments on Selenium using proxy gateway, how?

I develop a web scraping project using Selenium. Since I need rotating proxies [in mass quantities] to be utilized in the project, I’ve turned to the proxy gateways (nohodo.com, charityengine.com and some others). The problem is how to incorporate those proxy gateways into Selenium for surfing web?

Tags proxy

Development

Python – parameterized storing into db to prevent SQL injection example

Post author By admin
Post date May 24, 2017
No Comments on Python – parameterized storing into db to prevent SQL injection example

test.py

import MySQLdb, db_config
class Test:
    def connect(self): 
        self.conn = MySQLdb.connect(host=config.db_credentials["mysql"]["host"],
                                   user=config.db_credentials["mysql"]["user"],
                                   passwd=config.db_credentials["mysql"]["pass"],
                                   db=config.db_credentials["mysql"]["name"]) 
        self.conn.autocommit(True) 
        return self.conn  

    def insert_parametrized(self, test_value="L'le-Perrot"):
        cur = self.connect().cursor()
        cur.execute("INSERT INTO a_table (name, city) VALUES (%s,%s)", ('temp', test_value))

# run it
t=Test().insert_parametrized("test city'; DROP TABLE a_table;")

db_config.py (place it in the same directory as the test.py file)

db_credentials = {
    "mysql": {
        "name": "db_name",
        "host": "db_host", # eg. '127.0.0.1'
        "user": "xxxx",
        "pass": "xxxxxxxx",
    }
}

Tags Python

Development

What are the ways of inserting web scraping results into an SQL server?

Post author By admin
Post date May 18, 2017
No Comments on What are the ways of inserting web scraping results into an SQL server?

Apply a webhook service to request your target data and store them to DB.
Continue reading “What are the ways of inserting web scraping results into an SQL server?”

Tags Curl, PHP

Development

Charles CA certificate with OpenSSL in Windows

Post author By admin
Post date April 25, 2017
No Comments on Charles CA certificate with OpenSSL in Windows

Today I needed to enable a Charles proxy on my Windows PC. Later I have managed the Genymotion virtual device to be monitored by the Charles proxy.

Tags proxy

Development Miscellaneous

SSH connection in terminal for Linux

Post author By admin
Post date February 23, 2017
No Comments on SSH connection in terminal for Linux

Given:

host: lx567.certain.com (SFTP)
user: igor_user
password: testPass

For SSH access in a terminal type:

$ ssh igor_user@lx567.certain.com

then enter the password (testPass) at a password prompt.

Tags Linux

Development

Python requests vs urllib2 for JS-stuffed website scrape

Post author By admin
Post date February 21, 2017
1 Comment on Python requests vs urllib2 for JS-stuffed website scrape

Question:

The Python requests library is a useful library having tons of advantages compared to other similar libraries. However, as I was trying to retrieve the Wikipedia page, requests.get() retrieved it only partially:

Tags Python

Development

Headless browser python scraper at pythonanywhere

Post author By admin
Post date February 13, 2017
No Comments on Headless browser python scraper at pythonanywhere

Recently I decided to work with pythonanywhere.com for running python scripts on JS stuffed websites.

Originally I tried to leverage the dryscrape library, but I failed to do it, and a nice support explained to me: “…unfortunately dryscrape depends on WebKit, and WebKit doesn’t work with our virtualisation system.”

Tags headless, Python

Development

Find XPath using web developer tools

Post author By admin
Post date February 10, 2017
2 Comments on Find XPath using web developer tools

Often for the purpose of scraping, one needs to find certain elements’ XPath on a webpage. How can one do that with browser Web developer tools, aka Web inspector? A picture is worth of thousand words.

Tags Xpath