Categories
Review SaaS

Dexi.io review

Dexi.io is a powerful scraping suite (SaaS). This cloud scraping service provides development, hosting and scheduling tools. The suite might be compared with Mozenda for making web scraping projects and running them in clouds for user convenience. Yet it includes the API, each scraper being a JSON definition similar to other services like Import.io and ParseHub.

Overview

In the nutshell the Dexi is a web environment for building and hosting web scraping robots. The scraped output is available both as JSON/CSV data and can also be queried through ReST from external applications. The web suite provides most of the modern web scraping functionality: CAPTCHA solving, proxy socket, filling out forms including dependent fields (drop downs), regex support and others. Robots also support the javascript evaluation for the scraped code.

The tool offers a point and click UI; no coding unless you need to handle javascript tricks :-).  So for harder tasks you’ll sure need a programmer’s help. Eventually, the robot is boiled down to the JSON command definition containing meta and service info.

{
  "firstStep": "1601a39a-21d3-4ccd-8e61-29ca80c4903c",
  "steps": {
    "1601a39a-21d3-4ccd-8e61-29ca80c4903c": { 
      "tags": [],
      "field": null,
      "value": "https://de.aliexpress.com/category/100003070/",
      "formatters": [],
      "options": {}
    }
  },
  "functions": {},
  "javascriptEnabled": true,
  "autoLoadImages": true,
  "stylesheetsEnabled": true,
  "forceSinglePageNavigation": false,
  "formatter": null,
  "categoryId": "0cf9f777-036d-4aed-93f2-b11621576d25",
  "type": "SCRAPER",
  "editorVersion": 2,
  "hidden": false,
  "proxies": [],
  "output": {
    "image": {
      "id": "image",
      "uuid": "7ec9dd55-f09d-43b8-8ff2-14959bb67bb9",
      "type": "image",
      "checksum": "326163117611f23fe8f493131bde0f4e",
      "items": null,
      "properties": null
    }
  },
  "tags": [],
  "name": "Get and transform images (aliexpress)",
  "created": 1526384574356,
  "createdBy": "862f5052-eb1d-4014-9502-dfed4fe49d1f",
  "lastModified": 1526390051065,
  "modifiedBy": "862f5052-eb1d-4014-9502-dfed4fe49d1f",
  "deleted": false,
  "_id": "5148d877-2044-4877-aa61-ae33cf0c018f"
}

This is the modern JS representation of the scraping robot as an object which can be easily edited, adjusted and transferred for other projects.

Robot building workflow

The robot building workflow is quite straightforward. You log in, choose right top New+ button and Create new Robot. Enter starting URL, name it and choose its type: Extractor, Crawler or Pipes.

dexi-pipes-logo

Following that you just utilize point-&-click UI to select page elements, choose actions, set before/after steps and more. Read more about the browser based robot building.

One may add on Crawler robot. It is formed based on conditions and processes. The crawling depth is adjustable. It took me quite a while to compose my first robot (mainly through watching tutorial videos).

Runs and execution

After the robot is ready you need to configure its run. Run is the configuration of how to execute it comprising of concurrency, scheduling, integrations and inputs.

Robot execution happens in the cloud and results are stored in the available storage until you wish to download, request through API or/and delete them.

Also I’d underline some more features the Dexi provides:

  • The system operates with CSS and JQuery selectors. Better you get familiar with them.
  • For each robot’s run a User-Agent might be set up.
  • Robots.txt respect on/off for a single run.
  • Execution of an original javascript at predefined moment of workflow (before, during or after workflow process) to make the site’s content available.
  • The system produces screenshots at each extraction step to help debug what went wrong. (+1)
  • Can extract images, file downloads and take screenshots of any element
Proxies

You should plug in 3rd party proxies to be used within Dexi tool.

“We do not allow running without proxies so if your account has no proxies – we will use our free proxies for your executions.”

They now have over 160 proxies (61 DE proxies and 100 US proxies), so you might need need to plug in the 3rd party proxy service for professional web scraping.

API

As the modern cloud scraper tool it works to be monitored, executed and fetched results thru ReST API. More details here. So your results might get fetched by a simple php code:

 

$runId – get it through API Runs get method (might also be available when you edit each run).
$accountId – see it at your personal API page.
$apiKey – generate one through personal API page

CAPTCHAs handling

The Dexi now provides a built-in CAPTCHA solving service that at the moment (July 2015) is free of charge for the SaaS users. Which is something you rarely see in scrapers. Yet, the service is able to solve only the input field CAPTCHA s (not JS-driven CAPTCHAs: draggable, drag and drop & etc.). The following are the steps (as suggested by their support) detailing how to set CAPTCHA solving steps in robot:

Just point to the image from the captcha, and select ‘Add step for element’. Click on the step in the timeline, and click ‘edit step’. In the ‘Type’ selector choose ‘Resolve Captcha’. Then select the ‘Captcha Input’ by selecting the icon and afterwards point it to the input field on your page. Finally add a step, which click the submit button.

Integration

The system provides the variety of exporting options. For that you follow the Integration sign. Select integrations and formats for each run. These will be invoked automatically whenever an execution of this run succeeds.dexi integrations small

Pricing and counting  

The Dexi SaaS offers a standard account ($119/month) with feature full functionality, execution concurrent robots (workers) being limited to 1 only. But for pro usage you do need to upgrade to the PROFESSIONAL plan that start at $399/month with 3 workers, unlimited execution time and full feature Add-ons access. Dexi also provides medium to large scale data intelligence projects handling large datasets and higher capacity requirements ($699/month).

Try new Dexi Pipes robots (starting from Professional plan) that make a user to integrate (1) data extraction and  (2) data processing into a single seamless workflow.

Conclusion

My impression of the Dexi web scraping suite is that it is a modern environment for building and hosting scrapers. It offers users the “gentleman set” for the web scraping, which is not provided by any similar tool. Their CAPTCHA solving sets Dexi apart from services like Import.io. Compared to Mozenda, this suite is more convenient purely because it is fully browser based (Mozenda requires you to install a desktop agent builder).

The docs are well developed and a learning curve doesn’t appear to be too steep. Their support is very responsive and always ready to assist you. It looks modern, developing and it will sure have its place among the other scraping services and tools.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.