I recently got a question and it looked like this : how to download a file from a link in Python?
“I need to go to every link which will open a website and that would have the download file “Export offers to XML”. This link is javascript enabled.”
Let us consider how to get a file from a JS-driven weblink using Python :
1. Look at the link content for file download:
<a href="javascript:__doPostBack('ctl00$ContentPlaceHolder1$lnkExportToExcel')">Export offers to XML</a>
Here comes the __doPostBack() js function.
2. I’ve found this function in code:
<script type="text/javascript">
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
</script>
Obviously the function submits a form. Form id is aspnetForm.
3. Form in the code :
<form method="post" action="ApplesToApplesComparision.aspx?Category=Electric&TerritoryId=2&RateCode=1" id="aspnetForm">
<div class="aspNetHidden">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="qqEqj+MJZCXWWypTsbeT2OudaHSwkSmxn4MMtBuWopgD50psDlTzoVSH0gMVRNktX7EW7I2uWKnF9IzD8/BkloDdz+4OSdWS7MbiJaQ2KVBHoZCFqMN0IgLe82fkuPJxk/wf1h/ZWYjOwi5XRTLZEy4JKRc...
In dealing with ASP.NET remember that they try to save/handle the state of HTTP requests in a special variable: __VIEWSTATE. So at each request (to and fro) this variable (input field in a form) is being changed, its value getting increased. Therefore it’s vital to send requests thru this same form with updated __VIEWSTATE.
We recommend that you simulate a form in python, it having been loaded with parameters from a needed link and especially with the parameter __VIEWSTATE from the real form of a scraped page.
See below the screenshot with important info :