Categories
Development

Make crawling easy with Real Time Crawler of Oxylabs.io

logo-oxylabs-ioNowadays, it’s hard to imagine our life without search systems. “If you don’t know something, google it!” –  is one of the most popular maxims in our life. But how many people use Google in an optimal way? A lot of developers use google commands to get needed answers as fast as it possible.

Even this is not enough today! Large and small companies need terabytes of data to make their business profitable. It’s necessary to automate the search process and make it reliable to satisfy the user with fresh news, updates or posts. In today’s article we will consider a very helpful tool – Real-Time Crawler (RTC) for the collection of fresh data. Let’s start!

Categories
Review

Introducing Octoparse New Version 7.1 – web scraping for dummies is official

octoparse-logoThroughout the years of working in the data industry, the Octoparse team had never slowed down its pace in making data more accessible and readily to all people. It’s rooted in our belief that in the era of big data, anyone should be blessed with the capability to collect data so as to harness the power of big data.

Categories
Miscellaneous

Python, web2py – open MS Word file on-the-fly

Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request.

Categories
Development

Creating REST API with Spring

Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck.

Categories
Guest posting

Web scraping and why you should learn It

Why should you learn web scraping and who is doing web scraping out there? We are going to address this question by looking into the different industries and jobs that require web scraping skills. To do this, we’ve compiled and analyzed the data extracted from job sites, including Indeed, Glassdoor and LinkedIn. Followings are our findings to share with you.

Categories
Miscellaneous

Bright Data exclusive residential proxies to reach Linkedin in Russia

For some of our readers from Russia, it’s a new challenge to get to www.linkedin.com, which has been officially blocked in Russia.

On 4 August 2016, a Moscow court ruled that Linkedin must be blocked in Russia because it stores the user data of Russian citizens outside of the country, in violation of the new data retention law. The law requires all companies doing business in the country to store their users’ data locally.

Categories
Challenge Development Web Scraping Software

Brigth Data residential proxy for extracting from a data aggregator

In this post I’d like to share my experience with scraping data aggregator/business directory using the residential proxy of the Bright Data proxy provider in conjuction with its proxy manager.

Categories
Development

Web Scraping with Node.js

nodejs web scrapingThe web scraping topic has been actively growing in popularity for dozens of years now. Freelance sites are overcrowded with orders connected with this contradictory data extracting process. Today we will combine two new and revolutionary directions in web development. So, let’s consider an elegant and modern way to scrape data from websites with Node.js!

Categories
Development Web Scraping Software

JavaScript rendering library for scraping javascript sites

Can you imagine how many scraping instruments are at our service? Though it has a long history, scraping has at last become a multi-lingual and simple approach. Unfortunately, there is a list of non-trivial tasks which can’t be resolved in a snap.

One of these tasks is scraping javascript sites, those that output data using JavaScript. Facing this task, classic scrapers (not all of them though) ignore JS-data and continue their own life-cycle. However, when this little defect becomes a big trouble, developers all over the world take measures. And they did it! Today we consider one of the most awesome tools which scrapes JS-generated data – Splash.

Categories
Development

Dexi.io addon functionality (images & geocoding)

dexi-io-geo-maps-logoThe Dexi.io web scraping service has remade its functionality by adding [paid plan] addons. Through addons, more features are made available to customers, e.g. more step types/pipe actions. Those features also allow the integration of scrape results to data stores and endpoints like PostgreSQL, MySQL, Amazon S3 and other.