Category: Miscellaneous

What information do internet services collect about users?

Post author By admin
Post date August 14, 2020
No Comments on What information do internet services collect about users?

…we use your personal data so we can provide the best service, tell you about products and services you may be interested in…

These or similar statements are often “tiny printed” at most of modern sites as part of Terms of Service (ToS). Below we share with you what particular data are collected from web users or app users.

Development Miscellaneous

Extracting sequential HTML elements with XPath and Regex

Post author By admin
Post date January 10, 2020
No Comments on Extracting sequential HTML elements with XPath and Regex

Often, we need to extract some HTML elements ordered sequentially rather than in hierarhical order.

Tags Regex, Xpath

Guest posting Miscellaneous

Data extraction: web crawling vs. web scraping in E-commerce

Post author By admin
Post date November 8, 2019
No Comments on Data extraction: web crawling vs. web scraping in E-commerce

Nowadays, when one has some questions, it comes almost naturally for us to just type it in a search bar and get helpful answers. But we rarely wonder how all that information is available and how it appears as soon as we start typing. Search engines provide easy access to information, but web crawling and scraping tools, which are not such well-known players, have a crucial role in wrapping up online content.

Tags crawling

Miscellaneous

Huge JSON files view and search tool with excellent performance

Post author By admin
Post date October 1, 2019
No Comments on Huge JSON files view and search tool with excellent performance

Tags JSON

Miscellaneous

Endcaptcha now solving Recaptcha V2!

Post author By admin
Post date September 27, 2019
No Comments on Endcaptcha now solving Recaptcha V2!

So far the latest developments of the services that develop captchas (google, nucaptcha, etc.) are no match for the captcha bypassers, and Endcaptcha is living proof of it.
Endcaptcha developers have been working hard to make this new feature possible – they’re finally releasing Recaptcha V2 support!

Tags captcha, service

Miscellaneous

Phantombuster API list

Post author By admin
Post date April 17, 2019

I’ve categorized the Phantombuster’s scraping APIs for my sake. Yet it might be a good reference point to others too.

Tags API

Miscellaneous

Bypass distil network, the anti-scraper protection

Post author By admin
Post date March 27, 2019
No Comments on Bypass distil network, the anti-scraper protection

safe-key

For details of how to bypass distil-network, the anti-scraper protection, please contact by email: igor [dot] savinkin [at] gmail [dot] com.

Tags anti-scrape, security

Miscellaneous

Scraping HTML graphic elements: possibilities and limits

Post author By admin
Post date December 20, 2018
No Comments on Scraping HTML graphic elements: possibilities and limits

Question: “How do I set up a daily automatic scraping of www.pollen.com data into a Google sheet?” (link)

Answer: Originally I doubted if svg HTML elements are scrapable. After some trial and error experience I realized, that svg elements are indeed scrapable; one can get their xPath, children nodes. Yet, they are scrapable by importXML() when being static html.

Tags scraping tool

Miscellaneous

Python, web2py – open MS Word file on-the-fly

Post author By admin
Post date October 30, 2018
No Comments on Python, web2py – open MS Word file on-the-fly

Recently I was seeking how to open MS Word file on-the-fly for processing by the python-docx library. By trials and errors I could get the code work. I use web2py framework as a wrapper of POST request.