Categories
Development

Python: submit authenticated form using cookie and session

Recently, I was challenged to do bulk submits through an authenticated form. The website required a login. While there are plenty of examples of how to use POST and GET in Python, I want to share with you how I handled the session along with a cookie and authenticity token (CSRF-like protection).

In the post, we are going to cover the crucial techniques needed in the scripting web scraping:

  • persistent session usage
  • cookie finding and storing [in session]
  • “auth token” finding, retrieving and submitting in a form

Given

A website with an input form where auth token is present. The auth token (CSRF-like) is different each time the form gets loaded. The website requires a login.

What I want:

I want to submit a lot of similar input data like ‘GE 1’, ‘GE 2’, etc. through that format into my account.

This website is not JS-rendered, so we do not apply here a browser emulation by Selenium WebDriver or similar.

The main steps necessary to achieve the goal

  1. Get a cookie from a logged-in browser.
  2. Insert cookie into a session (of Python requests library).
  3. Fetch the current form hidden “auth token” (using regex) before each submit.
  4. Use that unique “auth token” for each POST request inside the session.
    1. Getting a cookie value from the browser

    We get the cookie value(s) using the web developer tools (F12 in most browsers). Look at the following picture (a picture is better than 1000 words):

    find-cookie-value

    2. Adding the cookie into a session object

    First, we add a cookie(s) into *.cookie file at a disk using a pickle module.

    with open(cookieFile, 'rb') as f: 
        print("Loading cookies...") 
        session.cookies.update(pickle.load(f))

    Second, every time that we activate the session, we add the file into the session object. All cookies are thus joined into a session.

    ## One time cookie saving into a file 
    import pickle 
    URL = 'http://www.excellentbeliever.com/' 
    urlData = urlparse(URL) 
    cookieFile = urlData.netloc + '.cookie' 
    cookie1={'_exbel_session':'63b55ca6.............a2a5215e'} 
    with open(cookieFile, 'wb') as fp: 
        pickle.dump(cookie1, fp)

    After we have loaded the cookie, we start scripting.

    Main operations inside a loop

    Inside the loop, over the input values, we do the following:

    1. Visit the page with the form and fetch the “auth token”

      How to identify a form’s hidden field value? See the figure below:

      get-form-hidden-fieldThe code to extract the form’s hidden input by regex:

      regex_auth = r'(?:name="authenticity_token")\s+value="(.*?)"' 
      page = session.get( urljoin(URL, '/dashboard?prediction=false')) 
      matches = re.findall(regex_auth, page.text, re.MULTILINE) 
      auth_token = matches[0]
    2. Make a POST request to submit data
      pattern = 'GE '
      post_data = {"utf8": "✓", "authenticity_token": auth_token ,
                           "fragment": pattern + str(i), 'commit': 'Post Reading!'}
      post_URL = urljoin(URL,'/readings')
      page = session.post( post_URL , data = post_data)

    The whole code

    import os, re
    import pickle, requests
    from urllib.parse import urljoin, urlparse
    
    # init vars
    URL = 'http://www.excellentbeliever.com/'
    regex_auth = r'(?:name="authenticity_token")\s+value="(.*?)"'
    urlData = urlparse(URL)
    cookieFile = urlData.netloc + '.cookie'
    
    ## One time cookie saving into a file
    ##cookie1={'_exbel_session':'63b5568921de51fe67fe847ca2a5215e'} 
    ##with open(cookieFile, 'wb') as fp:
    ##    pickle.dump(cookie1, fp)
    ##print ('cookieFile:', cookieFile)
    login='xxx'
    password='xxx'
    signinUrl = urljoin(URL, "users/sign_in") # http://www.excellentbeliever.com/users/sign_in
    with requests.Session() as session:
        try:
            with open(cookieFile, 'rb') as f:
                print("Loading cookies...")
                session.cookies.update(pickle.load(f))
        except Exception:
            # If could not load cookies from file, get the new ones by login in
            print("Login in...")
            post = session.post(
                signinUrl,
                data={
                    'email': login,
                    'password': password,
                }
            )
            try:
                with open(cookieFile, 'wb') as f:
                    jar = requests.cookies.RequestsCookieJar()
                    for cookie in session.cookies:
                        if cookie.name in persistentCookieNames:
                            jar.set_cookie(cookie)
                    pickle.dump(jar, f)
            except Exception as e:
                os.remove(cookieFile)
                raise(e)
        # load headers
        session.headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
                          'Origin': URL,
                          'Upgrade-Insecure-Requests': '1',
                          'Content-Type': 'application/x-www-form-urlencoded',
                          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.79 Safari/537.36'}  
        page = session.get(URL)
        print ('url:', URL)
        print ('status code:', page.status_code)
        login_marker = 'Igor Savinkin'
        if login_marker in page.text:
            print (login_marker , 'is logged in.' )
            print ("Session cookies:", session.cookies)
        pattern='GE '
        max_num=26
        for i in range(26, max_num+1):    
            # get the auth token from authenticated form
            print ('Get the token authenticated form')
            page = session.get( urljoin(URL, '/dashboard?prediction=false'))
            print ('Page with form status code:', page.status_code)        
            matches = re.findall(regex_auth, page.text, re.MULTILINE)
            if matches:
                auth_token = matches[0]
                print ('Form auth token:', auth_token)
                post_data = {"utf8": "✓", "authenticity_token": auth_token ,
                         "fragment": pattern + str(i), 'commit': 'Post Reading!'}
                post_URL = urljoin(URL,'/readings')
            else: 
                exit('Smth went wrong.')
    
            # send a post 
            page = session.post( post_URL , data = post_data) 
            print ('POST submit status code:', page.status_code)
            if 'Successfully' in page.text:
                print ('Form with "' + post_data["fragment"] + '" has been successfully submitted.' )

    Leave a Reply

    Your email address will not be published.

    This site uses Akismet to reduce spam. Learn how your comment data is processed.