stanford_headlinez_souped / Week 5 / due 2018-02-12 23:59¶
Basically, the same scraper as stanford_headlinez, except done in a sane, practical way.
Again, this assignment assumes you lack knowledge about HTML and the DOM – except that HTML is just text, and that Python must provide libraries to do the menial work of properly parsing HTML.
The homework repo page has the context/background reading/examples in full:
https://github.com/compciv/homeworkhome/tree/master/stanford_headlinez_souped
Requirements¶
When I visit your Github.com repo page¶
I expect your Github repo at compciv-2018-SUNETID
repo to have the following subfolder:
compciv-2018-SUNETID/week-05/stanford_headlinez_souped/
On this subfolder’s page, I would expect the file tree to look like this:
└── bscraper.py
When I clone your Github repo¶
If I were to clone your repo onto my own computer, e.g.
$ git clone https://github.com/GITHUBID/compciv-2018-SUNETID.git
I would expect your homework subfolder to look like this:
compciv-2018-SUNETID/
└── week-05/
└── stanford_headlinez_souped/
└── bscraper.py
Setting up with the command-line¶
Here are the shell commands (Windows/Mac) to get the directory created and set as your working directory:
$ cd ~/Desktop/compciv-2018-SUNETID
$ mkdir -p week-05/stanford_headlinez_souped
$ cd week-05/stanford_headlinez_souped
The homework repo page has the instructions for using curl to quickly down the skeleton and test script for your convenience:
To run the test script, simply invoke pytest at the shell prompt.