The Python requests library is a useful library having tons of advantages compared to other similar libraries. However, as I was trying to retrieve the Wikipedia page, requests.get() retrieved it only partially:
response = requests.get('https://en.wikipedia.org/wiki/Talk:Land_value_tax', verify=False) html = response.text
I tried it using urllib2, and urllib2.urlopen retrieved the same page completely:
html = urllib2.urlopen('https://en.wikipedia.org/wiki/Talk:Land_value_tax').read()
Why does this happen, and how can one solve it using requests?
It seems to me that the problem lies in the scripting on the target page. The js-driven content is rendered in here (especially I’ve found calls to mediawiki). So, let’s look at a web sniffer to identify it:
Later, the one who asked added a comment:
I am not interested in retrieving the whole page and statistics or JS libraries retrieved from MediaWiki. I only need the whole content of the page (through scraping, not MediaWiki API).
The issue is that those JS calls to other resources (incl. mediawiki) make it possible to render the whole page to the client (by a browser), but since the requests library does not support JS execution, JS is not executed => page parts are not loaded from other resources => the target page is not loaded as a whole as it might be in a browser.