Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request.
Category: Miscellaneous
For some of our readers from Russia, it’s a new challenge to get to www.linkedin.com, which has been officially blocked in Russia.
On 4 August 2016, a Moscow court ruled that Linkedin must be blocked in Russia because it stores the user data of Russian citizens outside of the country, in violation of the new data retention law. The law requires all companies doing business in the country to store their users’ data locally.
In this post I’d like to share my experience with the residential proxy of the Luminati proxy provider.
Question: Is there any way to include captcha on the site and at the same time prevent services like 2captcha from resolving it?
Scraping.pro load test
Recently I got a chance to perform a website load test. Since I run the blog, it’s always useful to check its abilities, load capacity. So, I was offered a free opportunity for a load test by www.dotcom-monitor.com .
FunCaptcha solve algorithm needed
One of our readers is interesting if there is any tools/algorithms to solve funcaptcha.
If you have any ideas or you’re willing to take this project please comment down here.
Octoparse is an easy and powerful visual web scraper enabling anyone, even those without much programming background, to collect and extract data from the web. Octoparse is designed in a way to help users easily deal with complex website structures, such as those with JavaScript; it can be compared to other web scraping tools such as Import.io and Mozenda.
Octoparse 2nd Anniversary Sale – Up to 40% Off!
[/box]
Question
I want to extract the hotel name and the current room price of some hotels daily from https://www.expedia.ca/Hotel-
I am a small hotel owner and want those info quite often, and hope I can do it with codes automatically in someway. You are expert in this field, what is the easiest ways to get those information? Can you give me some example codes?
Recently I received a question on insurance leads:
Is this a legal method of acquiring insurance leads [from the web]? Are there any agent testimonials on the efficiency of this type of service?
Legality issue in web scraping
With the matter of legality in web scraping, there should be a clear approach – it depends on the website and its privacy policy. There could be at least 2 cases:
- Public info (prices, inventory info, public offers), i.e. everything that is not protected by copyright and available for scraping.
- The copyright protected info – website Terms of Use or Terms of Service restrictions make copying and therefore web scraping illegal.
So far I have no insurance agent testimonies on the efficiency of any insurance lead scrape service. The web sites I searched [on the insurance leads] have given me the impression that the customer info they gather is highly secured (not viewable). I doubt that any sites are going to expose insurance leads. In most of them the leads are available by paid subscription plans.
If there are any such websites like insurance leads directories (public insurance quotes), we might develop a scraper that consistently grabs fresh or new info for further analysis. It does save the agent’s time for re-searching, re-visiting and so on. One scraper might work with multiple directory pages for scrape.
You might find it interesting to read about web page change tracking if you only need to see updates (no data storing applied).
Death By Captcha is a reputable CAPTCHA solving service with more than 7 years in the Captcha Solving business. They have recently updated all their API clients, so users can experience maximum efficiency and faster solving times.
They enthusiastically recommend that users and software developers visit the API page and update their DBC API implementation in order to get the most out of it (the API and docs are available for registered users only). The free credits are provided for users to test or implement the new client API!
[box style=’info blue’]If you tell them you saw this info through the scraping.pro blog, they’ll give you a 1K free CAPTCHAs additional credit![/box]
For further info, you may contact them directly.