0

I am creating a small web scraping program in Python that takes GPU information from newegg.com and notes down all of the prices.
As of now, I have not implemented the spreadsheet as every time I run it, I get one of 2 errors.

The code is below:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import numpy as np

myURL = "https://www.newegg.com/global/uk/Product/ProductList.aspx?Submit=ENE&N=-1&IsNodeId=1&Description=graphics%20card&bop=And&PageSize=96&order=BESTMATCH" # defining my url as a variable

uClient = uReq(myURL) #opening the connection
page_html = uClient.read() # getting html
uClient.close() # closing the client

page_soup = soup(page_html, "html.parser") # html parsing

containers = page_soup.findAll("div", {"class":"item-container"}) #get all 
item containers/product

container = containers[0]

count = 0

for container in containers:

    print(count)

    brand = container.div.div.a.img["title"]# get the brand of the card
    if brand == None:
        print("N/A")
    else:
        print(brand)

    title_container = container.findAll("a", {"class", "item-title"})
    product_name = title_container[0].text # getting the product name
    if product_name == None:
        print("N/A")
    else:
        print(product_name)

    price1 = container.find("div",{"class":"item-action"})
    price1 = price1.ul
    price2 = price1.find("li", {"class": "price-current"}).contents #defining the product price
    if not price2:
        print("N/A")
    else:
        print(price2[2])
        print(price2[3].text)
        print(price2[4].text) 

    print()
    count+=1

The errors say the following:

  1. Traceback (most recent call last): File "C:/Users/Ethan Price/Desktop/test.py", line 23, in brand = container.div.div.a.img["title"]# get the brand of the card TypeError: 'NoneType' object is not subscriptable

  2. Traceback (most recent call last): File "C:/Users/Ethan Price/Desktop/test.py", line 43, in print(price2[2]) IndexError: list index out of range

In trying to fix it, I tried to turn the list into an array and tried changing the if statements.

1
  • 1
    You need to verify tags and elements exist before accessing them, not checking for None after . Error handling is more important than perfect functionality Commented Feb 10, 2018 at 18:39

2 Answers 2

2

Both error messages mean that some element you expect to see doesn't exist. The first is complaining that container.div.div.a.img is None when you try to subscript it (and Nones can't be subscripted, for obvious reasons). The other is complaining that the list price2 isn't as long as you think it is, so price2[2] is out of range.

Sign up to request clarification or add additional context in comments.

Comments

0

First error, check the image and its title tag exists

brand = None
# might want to check there is even an anchor tag 
_img = container.div.div.a.img
if _img:
    brand = _img["title"]

Second, check the length of the price listings

If 2 <= len(price2) <= 5:
    for p in price2[2:]
        print(p)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.