We want to share with you how to scrape text and store it as Pandas data frame using BeautifulSoup (Python). The code below works to store html li items in the ‘engine, ‘trans’, ‘colour’ and ‘interior’ columns.
from bs4 import BeautifulSoup
import pandas as pd
import requests
main_url = "https://www.example.com/"
def getAndParseURL(url):
result = requests.get(url)
soup = BeautifulSoup(result.text, 'html.parser')
return(soup)
soup = getAndParseURL(main_url)
ul = soup.select('ul[class="list-inline lot-breakdown-list"] li', recursive=True)
lis_e = []
for li in ul:
lis = []
lis.append(li.contents[1])
lis_e.extend(lis)
engine.append(lis_e[0])
trans.append(lis_e[1])
colour.append(lis_e[2])
interior.append(lis_e[3])
scraped_data = pd.DataFrame({'engine': engine,
'transmission': trans, 'colour': colour,
'interior': interior})
The code was provided by Ahmed Soliman.