Dexi.io is a powerful scraping suite (SaaS). This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and running them in clouds for user convenience. Yet it includes the API, each scraper being a JSON definition similar to other services like Import.io and ParseHub.
This is the modern JS representation of the scraping robot as an object which can be easily edited, adjusted and transferred for other projects.
Robot building workflow
The robot building workflow is quite straightforward. You log in, choose right top New+ button and Create new Robot. Enter starting URL, name it and choose its type: Extractor, Crawler or Pipes.
Following that you just utilize point-&-click UI to select page elements, choose actions, set before/after steps and more. Read more about the browser based robot building.
One may add on Crawler robot. It is formed based on conditions and processes. The crawling depth is adjustable. It took me quite a while to compose my first robot (mainly through watching tutorial videos).
Runs and execution
After the robot is ready you need to configure its run. Run is the configuration of how to execute it comprising of concurrency, scheduling, integrations and inputs.
Robot execution happens in the cloud and results are stored in the available storage until you wish to download, request through API or/and delete them.
Also I’d underline some more features the Dexi provides:
- The system operates with CSS and JQuery selectors. Better you get familiar with them.
- For each robot’s run a User-Agent might be set up.
- Robots.txt respect on/off for a single run.
- The system produces screenshots at each extraction step to help debug what went wrong. (+1)
- Can extract images, file downloads and take screenshots of any element
You should plug in 3rd party proxies to be used within Dexi tool.
“We do not allow running without proxies so if your account has no proxies – we will use our free proxies for your executions.”
They now have over 160 proxies (61 DE proxies and 100 US proxies), so you might need need to plug in the 3rd party proxy service for professional web scraping.
As the modern cloud scraper tool it works to be monitored, executed and fetched results thru ReST API. More details here. So your results might get fetched by a simple php code:
The Dexi now provides a built-in CAPTCHA solving service that at the moment (July 2015) is free of charge for the SaaS users. Which is something you rarely see in scrapers. Yet, the service is able to solve only the input field CAPTCHA s (not JS-driven CAPTCHAs: draggable, drag and drop & etc.). The following are the steps (as suggested by their support) detailing how to set CAPTCHA solving steps in robot:
Just point to the image from the captcha, and select ‘Add step for element’. Click on the step in the timeline, and click ‘edit step’. In the ‘Type’ selector choose ‘Resolve Captcha’. Then select the ‘Captcha Input’ by selecting the icon and afterwards point it to the input field on your page. Finally add a step, which click the submit button.
The system provides the variety of exporting options. For that you follow the Integration sign. Select integrations and formats for each run. These will be invoked automatically whenever an execution of this run succeeds.
Pricing and counting
The Dexi SaaS offers a standard account ($119/month) with feature full functionality, execution concurrent robots (workers) being limited to 1 only. But for pro usage you do need to upgrade to the PROFESSIONAL plan that start at $399/month with 3 workers, unlimited execution time and full feature Add-ons access. Dexi also provides medium to large scale data intelligence projects handling large datasets and higher capacity requirements ($699/month).
My impression of the Dexi web scraping suite is that it is a modern environment for building and hosting scrapers. It offers users the “gentleman set” for the web scraping, which is not provided by any similar tool. Their CAPTCHA solving sets Dexi apart from services like Import.io. Compared to Mozenda, this suite is more convenient purely because it is fully browser based (Mozenda requires you to install a desktop agent builder).
The docs are well developed and a learning curve doesn’t appear to be too steep. Their support is very responsive and always ready to assist you. It looks modern, developing and it will sure have its place among the other scraping services and tools.