The Dexi.io web scraping service has remade its functionality by adding [paid plan] addons. Through addons, more features are made available to customers, e.g. more step types/pipe actions. Those features also allow the integration of scrape results to data stores and endpoints like PostgreSQL, MySQL, Amazon S3 and other.
The addon functions are for unlocking the corresponding functionality for robot (Pipe, Extractor) actions. However, you need a professional plan (all plans) to use most of the addons. Major addon categories are the following:
- Captcha solving (integration with 3-d party solving services)
- Extractors
- Geography (communication with social networks for geo info)
- Image processing
- Integrations (AWS, Google Drive, Google Sheets, Box, (S)FTP, Webhook)
- Machine learning (Text analysis with machine learning – MonkeyLearn service integration)
- Math
- Social media
- Text analysis
We’ll try two addons and you can also contribute your experience of Dexi.io addons in the comments.
Image Manipulation addon
First you need to add an Image Manipulation addon. Note, this addon performs image manipulations only in Pipe robots.
- Create an Extractor robot to get images from aliexpress or any other source.
- Create Pipes robot and include Execute robot action (or node), linking to the previously created robot.
- From Transforms actions choose As Fields node to treat scraped images as data fields.
- Now, based on unlocked Image Manipulation functionality, you add Resize image from Images actions set; see picture below.
Now you run the Pipe robot (creating configuration) and get the execution results.
Google GEO addon
Get Google Geocoding API credentials
To apply this addon you need to check with your Google cloud console and get developer’s API keys.
- Visit Google Maps Platform. The start guide will lead you thru adding billing info (free for the first 12 months of use)
- Among its APIs get enabled Geocoding API that converts between addresses and geographic coordinates.
- Get to the google developer console and create a new project (try here if failed).
- Go to the https://console.cloud.google.com/apis/dashboard
- As soon as you’ve got a project created, go to API credentials and choose your newly created project. Then copy an API key.
Now we can use that API key in an addon configuration.
Configure addon
Open the Google Maps Geocoding addon and edit it, inserting the API key that you’ve gotten from Google Maps Console (Credentials):
Ok. Now the addon functionality is available among the Pipe actions.
Create data type and load a data set
To test the addon we assemble a list of cities in Canada and save them as a Data set. But before that we need to create a Data type matching to the new Data set.
Create a new data type Cities Simple containing 3 fields: name, latitude, longitude.
Now I created a new data set named Canada cities and loaded data from CSV file into it.
After the import, the data set looks like this:
Pipe robot to process Geo info
Now it’s time to make a new Pipe robot.
Open a new Pipe robot and add new Pipe actions into it.
- From Dexi.io actions category choose and add an action: For each row in data set. Configure it adding the Canada cities data set.
- Add As Fields action. Thus each row from data set will be treated as corresponding fields.
- From Geography actions choose Address lookup.
Configure it with your existing Google Maps Geocoding addon.
and connect only the name field (from As Fields node) to that Address lookup node input called address. - Now let’s get Address lookup rows into fields. Add another As Fields node.
See the full Pipe diagram.
If we need only the city name and latitude and longitude, then we can add a data type node at the end to restrict the output result to only that type of fields.
Save it, close and create a New run. The Dexi will offer to make a new configuration. Name it, and start execution.
The results are excellent (below we showed all non-empty Addon lookup columns/fields).
Conclusion
The Dexi web scraping tool is an excellent mix of data extraction robots (Extractor, Crawler) along with data post-processing (images processing, geo info, cloud integrations, etc.) and social media retrievals. This is possible thru addon-added features. It works well for both medium-size and enterprise level extractions.