stanford_headlinez_souped / Week 5 / due 2018-02-12 23:59

Basically, the same scraper as stanford_headlinez, except done in a sane, practical way.

Again, this assignment assumes you lack knowledge about HTML and the DOM – except that HTML is just text, and that Python must provide libraries to do the menial work of properly parsing HTML.

The homework repo page has the context/background reading/examples in full:

https://github.com/compciv/homeworkhome/tree/master/stanford_headlinez_souped

Requirements

When I visit your Github.com repo page

I expect your Github repo at compciv-2018-SUNETID repo to have the following subfolder:

compciv-2018-SUNETID/week-05/stanford_headlinez_souped/

On this subfolder’s page, I would expect the file tree to look like this:

└── bscraper.py

When I clone your Github repo

If I were to clone your repo onto my own computer, e.g.

$ git clone https://github.com/GITHUBID/compciv-2018-SUNETID.git

I would expect your homework subfolder to look like this:

compciv-2018-SUNETID/
└── week-05/
    └── stanford_headlinez_souped/
        └── bscraper.py

Setting up with the command-line

Here are the shell commands (Windows/Mac) to get the directory created and set as your working directory:

$ cd ~/Desktop/compciv-2018-SUNETID
$ mkdir -p week-05/stanford_headlinez_souped
$ cd week-05/stanford_headlinez_souped

The homework repo page has the instructions for using curl to quickly down the skeleton and test script for your convenience:

Setup: From the command-line

To run the test script, simply invoke pytest at the shell prompt.