Sequentum Enterprise is a powerful, multi-featured enterprise data pipeline platform and web data extraction solution. Sequentum’s CEO Sarah Mckenna doesn’t like to call it web scraping because, in its description, the web scraping refers to many different types of unmanaged and non-compliant techniques for obtaining web-based datasets.
Sequentum says that an enterprise data pipeline is the software category for obtaining any type of unstructured data (from the web or anywhere), transforming it, enriching it, and then formatting and delivering it to any type of end point. It was developed by the folks that brought you Visual Web Ripper and Content Grabber. It includes all the VWR features and more. In fact, Sequentum Enterprise truly has raised the bar not only for Sequentum, but for the entire Information Technology industry.
Sequentum’s software is targeted at companies with a critical reliance on web data extraction. It is a great fit for those who want to run their own data operation by building, packaging and selling their own datasets or other processed data product offerings. Sequentum is an Enterprise grade solution which has been built from the ground up with a focus on quality, compliance, performance, scalability and usability.
Here we highlight its main characteristics.
The Sequentum Enterprise platform consists of three major components:
Sequentum Enterprise Desktop is a point-&-click low-code** configuration and development environment for web scraping. Sequentum calls it web data extraction or web data pipelines because they say that it does so much more than web scraping. While Sequentum’s tools can be used to process any sort of dataset, we are going to focus in the review on the web data extraction use case.
**Sequentum Enterprise Desktop is used by many non-programmers. There are some instances where scripting is required for complex websites with lots of blocking, but a non-programmer can call the support team for help… or one can engage the Sequentum team to do the work for them as professional services. 95% of the time there is no coding required.
So, the Sequentum Desktop (SD) is used to create web data extraction agents. If your scraping requirements do not require scheduling jobs, job management (high scalability, scheduling, and quality monitoring), then the SD is all you need to get started and run a data operation. Sequentum Enterprise Desktop enables you to create, run and monitor your agents as well as manage related services like proxy pools, de-captcha, etc. One annual user Enterprise Desktop license is US $5,000. This includes maintenance (software updates) and support.
Sequentum Enterprise Server is the component that runs the jobs. The Sequentum Desktop can connect to a Sequentum Server to load and run the agents. The Server provides scalability and runs agents in production environments. It is an optimized production runtime license which can also be used for full maintenance of your existing agents. The Server license also includes the Agent Control Center which provides a centralized data extraction platform for larger data extraction operations. One annual user Enterprise Server license is US $10,000 including maintenance and support and the ACC.
Sequentum Agent Control Center (ACC) is used for enterprise scale web data extraction operations. The ACC integrates with the Sequentum Desktop and Server to provide a fully managed web data extraction platform. The ACC provides cloud-based management for agent management, data quality and job run validation, API integration to control jobs and results, compliance management for controlling and throttling web scraping agents, version control, scheduling, database connections, user access, proxy pool management, data extraction auditing, etc. The license for the Agent Control Center is included with any purchase of a Sequentum Server.
Some Cool Features
1. Visual Scraper
Sequentum Desktop truly stands for easy development of visual web scrapers. It has a simple point-&-click UI where users browse the website and click on the data elements in the order to collect the data.
2. Stand alone agents
The really unique feature is the ability to compose scraping agents and compile them into stand-alone Win applications, that are to be run without exterior help. In other words, you can use the Sequentum Desktop to generate an executable web scraper that you can run anywhere. The self-contained agents include the actual Sequentum Enterprise engine so they can run independent of the licensed version of the Sequentum Enterprise software. This enables developers to build self-contained web scraping agents which they can run independently from the licensed software royalty free.
How to compose such agents? In Menu choose File ->Export Agent, there you set the option Create Self-Contained agent. In the following posts we get a closer look at the Sequentum Desktop standalone agent.
3. Incorporating agents into web apps
Sequentum Enterprise Desktop also allows users to run agents and display extracted data in their own web applications. Sequentum Desktop can even manage a separate instance of the same agents for each web user. View Programming Interface in the help file for more information.
4. Supports dynamic websites scrape
This tool allows you to inspect the log of all the API requests made on the page and then to insert the needed [API] request into the agent in order to directly access the data without the formatting/display burden of the website.
The tools can parse out the data at the API level (without you needing to write any code for that). This way you can, for example, use a JSON parser to more quickly, scalably, and reliably extract the data without the need to consume the resources of a full browser. This procedure can speed up data operation at least 20 times faster and thousands of times more scalable than using a dynamic browser.
Feel free to watch this overview video that shows how to compose an agent and get data in 60 seconds. There is also a description of how to use templates to speed up agent development even more!
|Easy to learn|
|Rating by Webscraping.pro|
|Customer support||Phone, email|
|Price||1. From $5,000 annually for Sequentum Enterprise Desktop (all you need to get started)
2. Sequentum Enterprise Servers are also available at $10,000 annual license (for when you are ready to join the big leagues)
|Trial period/Free version||30 days trial|
|Data Export formats||CSV, Excel, JSON, Parquet, XML, PDF, SQL Server, MySQL, ODBC, PostGres, Mongo, Azure Cosmos, SQLite, Oracle and OleDB, customized C# or VB script file output (if additionally programmed).
Built-ins for data delivery to: AWS S3, Azure Storage, Dropbox, Email, FTP. Google Drive, Google Storage, File System Folders, NAS Storage, Zipped (compressed) Storage, Snowflake, One may develop custom script ones own output format (or for that matter one can extend any part for the tool using almost any programming language)
|Multi-thread||Yes - unlimited|
|API||Yes - includes web API with royalty-free runtime|
|Scheduling||Yes - including centralized management of multiple agents|
|Free Demo project||Yes - for trial period only|
A Sequentum Enterprise web scraping agent is a collection of commands which are executed in serial until completed. These commands are recorded in order of execution. They are displayed in the Agent Explorer area of the Content Grabber screen.
Sequentum Enterprise facilitates simple macro automation methods and scripting for agent creation, or you can take direct control over the treatment of each command within your agent. This gives you both simplicity and developer-level control if wanted or needed.
Agent Explore panel with new commands
If you want to make other adjustments or gain more control of your commands, you can make changes in the Configure Agent Command panel using the point and click properties sheets or by adding custom wizard driven regex based data transformations, or by adding scripting anywhere in the flow of the agent.
Management Tools for Developers
Sequentum Enterprise Server includes enterprise level debugging, logging, error handling and error recovery features. This is important to ensure the reliability of the web scraping agents. Also included are centralized management tools for scheduling, database connections, proxies, notifications and script libraries on a per server basis.
SE Agent Control Center allows the access to the centralized management screen for all web scraping agents on a server and is like a complete control center for managing the entire web data extraction operation…this what Sequentum means when they say “Enterprise Scale Web Data Pipelines”. The ACC is like sitting in a NASA Control center watching all aspects of the operation across hundreds, thousands, or more web scraping agents in order to control the quality, compliance, error logging, ticketing for errors, job scheduling, and so much more.
Command line runs
Running agents from the command-line using Sequentum Enterprise’s command-line program. You can specify command-line parameters that can easily be used as input data by your agents.
Agent and command templates
Sequentum Enterprise provides agent and command templates for easy reusability, with many included agent templates for popular websites, and command templates such as a fully-fledged web site crawler. What I liked is premade agent templates to extract in some popular scrape sites.
All Sequentum Enterprise agents run multi-threaded by default and you can control the number of web browsers used to extract data. A Sequentum Enterprise agent can use a mix of (1) web browsers that can process dynamic pages and (2) ultra-fast HTML or JSON parsers for web pages that do not require a [dynamic] browser.
One reason you might need a dynamic browser (which uses a lot of resources and is much slower than a lower level parser) is that you might need a full dynamic browser in order to login into a site or to resolve a captcha.
Sequentum is so flexible that you can switch to a full browser mode when you need it and then switch back to the low level parsers once you do what the site needs you to do in a full browser. This makes the Sequentum Agent act like a Transformer going through a security check point, it transforms into a real looking browser user when doing something like resolving captcha, then once it gets through the security checkpoint it can transform back into a full speed low level web data extraction machine and efficiently execute at a full speed.
These features and more are covered in the features overview online that you can find at support.sequeentum.com. Especially good for developers there is a detailed online software manual, as well as rich training videos for every level of difficulty.
The only possible cons I can see are the prices (starting at $5,000).
Sequentum Enterprise in my opinion is the most feature-rich visual scraper which is worth the price. It is valuable for anyone who has spent hours managing a pile of python scripts as requirements and gotten into websites changes. Those can tell you that $5,000 – while it sounds like a lot – is nothing compared to spending months maintaining spaghetti code web scrapers or using tools that don’t have the quality controls or visibility into the entire web scraping operation.
Sequentum Enterprise is easy to use, catering well to the beginner user, yet it has an extensive feature list and provides great control for experienced users and developers. For difficult projects, users can leverage XPath, Regex and programming scripts***. Sequentum Enterprise also has an extensive API which is well documented and includes a royalty free runtime so users can add scraping functionality to their desktop applications. Users can also produce stand-alone royalty free web scraping agents.
Organizations that are serious about web scraping will find this tool to be a must have. It has set a new benchmark in web scraping and, at the time of the writing this article, truly does stand on its own.
***The Sequentum Enterprise scripting engine supports C#, VB.NET, Python and Regular Expressions. C#, VB.NET and Python can be used for all types of scripts, while the Regular Expressions can only be used by content transformation scripts. Read more here on that topic.