Phone

+919997782184

Email

support@roboticswithpython.com

Geeks of Coding

Join us on Telegram

Home Forums Assignment courserra Python for Everybody – Specialization Using Python to access Web data Assignment 4.1 Scraping HTML Data with BeautifulSoup

Viewing 0 reply threads
  • Author
    Posts
    • #890
      Abhishek TyagiAbhishek Tyagi
      Keymaster

      Scraping Numbers from HTML using BeautifulSoup In this assignment you will write a Python program similar to http://www.py4e.com/code3/urllink2.py. The program will use urllib to read the HTML from the data files below, and parse the data, extracting numbers and compute the sum of the numbers in the file.
      We provide two files for this assignment. One is a sample file where we give you the sum for your testing and the other is the actual data you need to process for the assignment.

      Sample data: http://py4e-data.dr-chuck.net/comments_42.html (Sum=2553)
      Actual data: http://py4e-data.dr-chuck.net/comments_97408.html (Sum ends with 93)
      You do not need to save these files to your folder since your program will read the data directly from the URL. Note: Each student will have a distinct data url for the assignment – so only use your own data url for analysis

      #Enter the url to scrape -  http://py4e-data.dr-chuck.net/comments_97408.html
      #Count  50
      #Sum  2893
      
      import urllib.request as ur
      from bs4 import *
      
      url = input('Enter the url to scrape - ')
      
      html = ur.urlopen(url).read()
      soup = BeautifulSoup(html, 'html.parser')
      
      count_of_spans = 0
      sum = 0
      
      spans = soup('span')
      for span in spans:
          sum += int(span.contents[0])
          count_of_spans += 1
      
      print('Count ', count_of_spans)
      print('Sum ', sum)
Viewing 0 reply threads
  • You must be logged in to reply to this topic.