Categories
Development

Is there any way to skip CAPTCHA?

 

Is there a way to skip CAPTCHA?

Categories
Challenge Development

How do I get pass dynamic “load more” btn?

Recently I’ve got a question:

How do I get pass the dynamic “load more” button using a Python web scraper?

Categories
Miscellaneous

Scraping HTML graphic elements: possibilities and limits

Question: “How do I set up a daily automatic scraping of www.pollen.com data into a Google sheet?” (link)

Answer: Originally I doubted if svg HTML elements are scrapable. After some trial and error experience I realized, that svg elements are indeed scrapable; one can get their xPath, children nodes. Yet, they are scrapable by importXML() when being static html.

Categories
Development

Choosing a technology for a web-project

You have an idea for a web-project. You (or your team) have already thought over the concept and the strategy for to becoming successful in the field. Now it’s time to ask the main question – how should this awesome idea be brought to life? The great variety of solutions complicates the decision-making process: classic Java? Modern MEAN? Easy PHP & CMS?

Categories
Development

Make crawling easy with Real Time Crawler of Oxylabs.io

logo-oxylabs-ioNowadays, it’s hard to imagine our life without search systems. “If you don’t know something, google it!” –  is one of the most popular maxims in our life. But how many people use Google in an optimal way? A lot of developers use google commands to get needed answers as fast as it possible.

Even this is not enough today! Large and small companies need terabytes of data to make their business profitable. It’s necessary to automate the search process and make it reliable to satisfy the user with fresh news, updates or posts. In today’s article we will consider a very helpful tool – Real-Time Crawler (RTC) for the collection of fresh data. Let’s start!

Categories
Review

Introducing Octoparse New Version 7.1 – web scraping for dummies is official

octoparse-logoThroughout the years of working in the data industry, the Octoparse team had never slowed down its pace in making data more accessible and readily to all people. It’s rooted in our belief that in the era of big data, anyone should be blessed with the capability to collect data so as to harness the power of big data.

Categories
Miscellaneous

Python, web2py – open MS Word file on-the-fly

Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request.

Categories
Development

Creating REST API with Spring

Today we are going to discuss the quite huge and engaging theme – REST API – and make our own web application based on the most popular Java framework – Spring. To start with, we will explain the two main concepts of this article – REST API and Spring. Note, these two concepts are quite complex, and, unfortunately, we can’t fully describe them, but in the article you will find links that will help you cope with moments where you might get stuck.

Categories
Guest posting

Web scraping and why you should learn It

Why should you learn web scraping and who is doing web scraping out there? We are going to address this question by looking into the different industries and jobs that require web scraping skills. To do this, we’ve compiled and analyzed the data extracted from job sites, including Indeed, Glassdoor and LinkedIn. Followings are our findings to share with you.

Categories
Miscellaneous

Bright Data exclusive residential proxies to reach Linkedin in Russia

For some of our readers from Russia, it’s a new challenge to get to www.linkedin.com, which has been officially blocked in Russia.

On 4 August 2016, a Moscow court ruled that Linkedin must be blocked in Russia because it stores the user data of Russian citizens outside of the country, in violation of the new data retention law. The law requires all companies doing business in the country to store their users’ data locally.