Categories
Miscellaneous

Octoparse review

octoparse-logo

Octoparse is a new modern visual web data extraction software. It provides users a point-&-click UI to develop extraction patterns, so that scrapers can apply these patterns to structured websites. Both experienced and inexperienced users find it easy to use Octoparse to bulk extract information from websites – for most of scraping tasks no coding needed! 

Categories
Miscellaneous

Edit and resend HTTP POST in a browser

Recently I’ve encountered a challenge to make a series of HTTP POST requests with different parameters. This has forced me to look for existing tooling in the marketplace; the features I am looking for are getting POST request,  editing the request and resending it. What I’ve found useful for this is the FireFox browser + dev tools – really quick and usable for this purpose. All other methods are either not full stack (only resend without edit) or require much soft to plug in.

Categories
Miscellaneous

Data Scraping Studio review

dss-logoData Scraping Studio (DSS) is a new free, multi-threading studio for effective data extraction. It consists of two parts: (1) the Google Chrome extension with point-&-click interface to setup a web scraping agent and (2) the Desktop app for executing scraping agents.

Categories
Miscellaneous SaaS

CloudScrape to transform into Dexi.io

dexi-logo-transparentWe have already written some posts on CloudScrape, a Copenhagen, Denmark-based web scraping service startup. The service now has a new look and new features for data extraction and business intelligence – with the launch of new name: Dexi.io.

Categories
Miscellaneous

How to insert reCaptcha, video

The following video shows how to insert reCaptcha v2.0 into a php-driven website.

https://www.youtube.com/watch?v=rks41ENvzWY

Thanks to the Webucator for PHP training supplied. Read the original post with the php code.

Categories
Miscellaneous

DOM elements number counter and sum up

I wanna provide you with a nice utility for quick summing of multiple DOM element values. Why? Well, suppose you’ve at a page like this and you want to sum up the total number of hotels in all the countries. 

Categories
Miscellaneous

Content Grabber self-contained (standalone) agent

As web scraping is becoming easier to use, more and more people are able to leverage the world’s web resources. As this trend grows, structured data from the web empower businesses and enable a wave of new business ideas to become a reality. Now there is a new technology on the market called: “self-contained agents” that might just make this a tsunami!

Categories
Development Miscellaneous

Import.io: Connector-GUIDs, User-GUIDs, API keys and how to get them?

Suppose I run a query to import.io API:

$url = "https://query.import.io/store/connector/" . $connectorGuid . "/_query?_user=" . urlencode($userGuid) . "&_apikey=" . urlencode($apiKey);

“HI there can you please tell me that what are connector-guid, user-guid and api key in below given code and how to get them for any website?”

I came across this question on StackOverflow, and as an avid import.io user I thought I’d answer it here as well, in case any of you have the same issue. 

Categories
Miscellaneous

Search queries in a search engine for scraping

search engine queriesRecently I’ve got a note with the question on search engine queries through the web scraping software.

“I’m looking for a scraper program that can initiate search queries in a search engine automatically, using proxies would be an added benefit if possible.”  – Mike
Categories
Development Miscellaneous Web Scraping Software

Import.Io Magic Method API

Recently Import.io introduced a new extraction technique called Magic. The Magic scraping method works be attempting to scrape all the information off the page automatically and in one shot. We covered it in another post early last year. When we covered it back then, we noted a few issues:

  • The scraper only works on pages with more than one row of data like a search results page, category pages and etc.
  • It seems to have trouble with some javascript pages.

But now Import.io has released a second version of Magic which seems to have dealt with those obstacles. Not only that, but they have released an API for Magic that lets you see what’s going on behind the scenes.