Categories

## Linear regression and Stochastic Gradient Descent

In this post we’ll show how to make a linear regression model for a data set and perform a stochastic gradient descent in order to optimize the model parameters. As in a previous post we’ll calculate MSE (Mean squared error) and minimize it.

Categories

## Linear Regression application for data analysis and scientific computing

In this post we’ll share with you the vivid yet simple application of the Linear regression methods. We’ll be using the example of predicting a person’s height based on their weight. There you’ll see what kind of math is behind this. We will also introduce you to the basic Python libraries needed to work in the Data Analysis.

Categories

## Weibull distribution & sample averages approximation using Python and scipy

In this post we share how to plot distribution histogram for the Weibull ditribution and the distribution of sample averages as approximated by the Normal (Gaussian) distribution. We’ll show how the approximation accuracy changes with samples volume increase.

One may get the full .ipynb file here.

Categories

Categories

## Get and pass CSRF token using python requests library

``````import sys
import requests
client = requests.session()

# Retrieve the CSRF token first
# Django 1.6 and up
else:
# older versions

# Pass CSRF token both in login parameters (csrfmiddlewaretoken)
Categories

## Scrape a JS Lazy load page by Python requests

The JS loading page is usually scraped by Selenium or another browser emulator. Yet, for a certain shopping website we’ve
found a way to perform a pure Python requests scrape.

Categories

## Scrape text, parse it with BeautifulSoup and save it as Pandas data frame

We want to share with you how to scrape text and store it as Pandas data frame using BeautifulSoup (Python). The code below works to store html li items in the ‘engine, ‘trans’, ‘colour’ and ‘interior’ columns.

``````from bs4 import BeautifulSoup
import pandas as pd
import requests

main_url = "https://www.example.com/"

def getAndParseURL(url):
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
return(soup)

soup = getAndParseURL(main_url)
ul = soup.select('ul[class="list-inline lot-breakdown-list"] li', recursive=True)
lis_e = []
for li in ul:
lis = []
lis.append(li.contents[1])
lis_e.extend(lis)

engine.append(lis_e[0])
trans.append(lis_e[1])
colour.append(lis_e[2])
interior.append(lis_e[3])

scraped_data = pd.DataFrame({'engine': engine,
'transmission': trans, 'colour': colour,
'interior': interior})
``````
By default, Beautiful Soup searches through all of the child elements. So, setting recursive = False (line 13) will restrict the search to the first found element and its child only.

The code was provided by Ahmed Soliman.

Categories

I recently got a question and it looked like this : how to download a file from a link in Python?

“I need to go to every link which will open a website and that would have the download file “Export offers to XML”. This link is javascript enabled.”

##### Let us consider how to get a file from a JS-driven weblink using Python :
Categories

We’ve done the Linkedin scraper that downloades the free study courses. They include text data, exercise files and 720HD videos. The code does not represent the pure Linkedin scraper, a business directory data extractor. Yet, you might grasp the main thoughts and useful techniques for your Linkedin scraper development.

Categories

## Python: submit authenticated form using cookie and session

Recently, I was challenged to do bulk submits through an authenticated form. The website required a login. While there are plenty of examples of how to use POST and GET in Python, I want to share with you how I handled the session along with a cookie and authenticity token (CSRF-like protection).

In the post, we are going to cover the crucial techniques needed in the scripting web scraping:

• persistent session usage
• cookie finding and storing [in session]
• “auth token” finding, retrieving and submitting in a form