Categories
Miscellaneous

Python, web2py – open MS Word file on-the-fly

Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request.

Categories
Miscellaneous

Luminati exclusive residential proxies to reach Linkedin in Russia

For some of our readers from Russia, it’s a new challenge to get to www.linkedin.com, which has been officially blocked in Russia.

On 4 August 2016, a Moscow court ruled that Linkedin must be blocked in Russia because it stores the user data of Russian citizens outside of the country, in violation of the new data retention law. The law requires all companies doing business in the country to store their users’ data locally.

Categories
Miscellaneous

Luminati residential proxy for extracting from a data aggregator

In this post I’d like to share my experience with the residential proxy of the Luminati proxy provider.

Categories
Miscellaneous

Prevent automated services from solving captcha?

reCaptcha-resist-2captchaQuestion: Is there any way to include captcha on the site and at the same time prevent services like 2captcha from resolving it?

Categories
Miscellaneous

Scraping.pro load test

Recently I got a chance to perform a website load test. Since I run the blog, it’s always useful to check its abilities, load capacity. So, I was offered a free opportunity for a load test by www.dotcom-monitor.com .

Categories
Miscellaneous

FunCaptcha solve algorithm needed

One of our readers is interesting if there is any tools/algorithms to solve funcaptcha.
If you have any ideas or you’re willing to take this project please comment down here.

 

Categories
Miscellaneous

Octoparse – a scraping tool designed for non-programmers

Octoparse is an easy and powerful visual web scraper enabling anyone, even those without much programming background, to collect and extract data from the web. Octoparse is designed in a way to help users easily deal with complex website structures, such as those with JavaScript; it can be compared to other web scraping tools such as Import.io and Mozenda.

[box style=”blue info”]

Octoparse 2nd Anniversary Sale – Up to 40% Off!

[/box]

Categories
Miscellaneous Web Scraping Software

Hotel: scrape prices, Q&A

 

Question

I want to extract the hotel name and the current room price of some hotels daily from https://www.expedia.ca/Hotel-Search?#&destination=Quebec,%20Quebec,%20Canada&startDate=06/11/2016&endDate=07/11/2016&regionId=&adults=2

I am a small hotel owner and want those info quite often, and hope I can do it with codes automatically in someway.  You are expert in this field, what is the easiest ways to get those information?  Can you give me some example codes?

Categories
Legal Miscellaneous

Is this a legal method of acquiring insurance leads?

Recently I received a question on insurance leads:

Is this a legal method of acquiring insurance leads [from the web]? Are there any agent testimonials on the efficiency of this type of service?

Legality issue in web scraping

With the matter of legality in web scraping, there should be a clear approach –  it depends on the website and its privacy policy. There could be at least 2 cases:

  1. Public info (prices, inventory info, public offers), i.e. everything that is not protected by copyright and available for scraping.
  2. The copyright protected info –  website Terms of Use or Terms of Service restrictions make copying and therefore web scraping illegal.
The US court of appeals has affirmed that a certain [data] analytic company is lawful to scrape data aggregator’s (LinkedIn’s) public profiles info.

So far I have no insurance agent testimonies on the efficiency of any insurance lead scrape service. The web sites I searched [on the insurance leads] have given me the impression that the customer info they gather is highly secured (not viewable). I doubt that any sites are going to expose insurance leads. In most of them the leads are available by paid subscription plans.

If there are any such websites like insurance leads directories (public insurance quotes), we might develop a scraper that consistently grabs fresh or new info for further analysis. It does save the agent’s time for re-searching, re-visiting and so on. One scraper might work with multiple directory pages for scrape.

The US district court has concluded that moderate scraping, even when against ToS, is legal.

You might find it interesting to read about web page change tracking if you only need to see updates (no data storing applied).

Categories
Miscellaneous

Death By Captcha Updated API clients

Death By Captcha is a reputable CAPTCHA solving service with more than 7 years in the Captcha Solving business. They have recently updated all their API clients, so users can experience maximum efficiency and faster solving times.

They enthusiastically recommend that users and software developers visit the API page and update their DBC API implementation in order to get the most out of it (the API and docs are available for registered users only).  The free credits are provided for users to test or implement the new client API!
[box style=’info blue’]If you tell them you saw this info through the scraping.pro blog, they’ll give you a 1K free CAPTCHAs additional credit![/box]
For further info, you may contact them directly.