Categories
Development

Design patterns for hierarchical data storage and effective processing

The hierarchical data storage problem is a non-trivial task in relational database context. For example, your online shop has goods of different categories and subcategories creating tree spans for 5 levels. How should they be stored in a database?

Luckily, there are several approaches (design patterns) that will help the developer to design database structure without both odd tables and code. As a result, the site will work faster and any changes, even on database layer, won’t cause troubles. We will study these approaches below.

Categories
Development

SQL-injection: how to use them and how to defend against them

sql-injection-logoSQL (Structured Query Language) is a powerful language for working with relational databases, but quite a few people are in fact ignorant of the dark side of this language, which is called SQL-injection. Anyone who knows this language well enough can extract the needed data from your site by means of SQL – unless developers build defenses against SQL-injection, of course. Let’s discuss how to hack data and how to secure your web resource from these kinds of data leaks!

Categories
Development

Web parsing php tools

Almost all developers have faced a parsing data task. Needs can be different –  from a product catalog to parsing stock pricing. Parsing is a very popular direction in back-end development; there are specialists creating quality parsers and scrapers. Besides, this theme is very interesting and appeals to the tastes of everyone who enjoys web. Today we review php tools used in parsing web content.

Categories
Development

PHPxcel for importing and exporting data in working with Excel

Sometimes when you are developing a project, it might be necessary to do a parsing of xls documents. To give an example: you do a synchronization between xls worksheets and a website database, and you need to convert xls data to the Mysql and want to do it completely automatically.

If you work with Windows it is simple enough – you just need to use COM objects. However, it is another thing if you work with PHP and need to make it work under the UNIX systems. Fortunately there are many classes and libraries for this purpose. One of them is the class PHPExcel. This library is completely cross-platform, so you will not have problems with portability. 

Categories
Development Guest posting

Web Scraping with Java and HtmlUnit

java-htmlunit-post-front-cover-smallWeb scraping or crawling is the act of fetching data from a third party website by downloading and parsing the HTML code to extract the data you want. It can be done manually, but generally this term refers to the automated process of downloading the HTML content of a page, parsing/extracting the data, and saving it into a database for further analysis or use.

Categories
Development

Finding duplicate rows in a SQL table and printing out their ids

Today I was trying to find duplicate rows in db table rows, but besides finding them, I needed to get their indexes in order to address them.

Categories
Development Guest posting

CaptchaSolutions test results for ReCaptcha v2.0

captchasolutionsRecently we executed the CaptchaSolutions.com service testing. CaptchaSolutions.com provides an automated online captcha solver API service (the name speaks for itself). It also includes solving google reCaptcha 2.0. So we decided to test it against this challenging captcha.

If you want to compare the other services’ solving reCaptcha 2.0 test results, then please refer to this post
Categories
Development

Proxy speed and performance test

I want to test a proxy [gateway] service. What would be the simplest script to check the proxy’s IP speed and  performance? See the following script.

Categories
Development

Php Curl download file

We want to show how one can make a Curl download file from a server. See comments in the code as explanations.

// open file descriptor
$fp = fopen ("image.png", 'w+') or die('Unable to write a file'); 
// file to download
$ch = curl_init('http://scraping.pro/ewd64.png');
// enable SSL if needed
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false); 
// output to file descriptor
curl_setopt($ch, CURLOPT_FILE, $fp);          
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
// set large timeout to allow curl to run for a longer time
curl_setopt($ch, CURLOPT_TIMEOUT, 1000);     
curl_setopt($ch, CURLOPT_USERAGENT, 'any');
// Enable debug output
curl_setopt($ch, CURLOPT_VERBOSE, true);   
curl_exec($ch);
curl_close($ch);                               
fclose($fp);

 

Categories
Development

Selenium using proxy gateway, how?

I develop a web scraping project using Selenium. Since I need rotating proxies [in mass quantities] to be utilized in the project, I’ve turned to the proxy gateways (nohodo.com, charityengine.com and some others). The problem is how to incorporate those proxy gateways into Selenium for surfing web?