Syllabus

This is meant as a course overview without every detail of what will be covered on a given week. For the most part, these topics contain equal weight and sophistication, for example, web scraping versus natural language processing, and may be switched around. Other topics, like understanding the design of programming languages, will be topics every week/day, but some weeks will be more emphasized than others.

Week 1: Hello COMM 113/213

Monday: Hello, Journalism and Computation

For the first day, a mix of overview of concepts, course logistics, and an installation party.

Assignments

None, except to send me a response to a Google Form (about operating system, etc.)

Hands-on

Lecture

Wednesday: Hello, Computers and Programming

Hands-on agenda

  • Signing up to Github, installing Git
  • Setting up the Visual Code editor
  • Setting up the Terminal/Powershell

Assignments

Readings
(all of which we will be reviewing, practicing next week):

Week 2: Python programming and the command-line

Wednesday agenda: Do one thing and do it well

  • How to use your text editor and your command-line (keep it minimal!)
  • Use your Tab key, constantly
  • Why do people use the command-line?
    • curl
    • youtube-dl
    • soundscrape
  • Moving between Python scripts and iPython

Trying out homework:

Python programming fundamentals

  • Introduction to the Python syntax and language features
  • Simple data types (numbers, strings, booleans)
  • Sequences (lists, dictionaries, tuples, ranges)
  • Dot-notation and the concept of “everything is an object”
  • Functions
  • Imports and code re-use
  • Loops
  • Conditional branches
  • How to execute Python programs

Week 3: Design and debugging and data

Assignments

I’m in the middle of porting/adding assignments to this repo:

https://github.com/compciv/homeworkhome

You can see a posting for the assignment:

It contains a skeleton file to build off from, as well as a test suite that you can run on your own in your system shell. This will be the workflow moving forward so you can know even before you turn anything in whether your code passes spec.

More info at the README:

https://github.com/compciv/homeworkhome/tree/master/stanford_headlinez_souped#test-suite

Readings

Lecture topics

  • APIs

  • PEP 8 - The Style Guide for Python

  • PEP 20 - The Zen of Python

  • Error handling and debugging

  • Data fundamentals (to be covered more next week)

    • What is data?
    • What is binary?
    • Why CSV/JSON/XML?
    • Deserializing and serializing data with Python
    • Reading and writing data with files files

Monday lecture: Basically, catchup

Readings for next lecture

Don’t worry about fully understanding the content, just get your feet wet. Please at least do a skim – understanding the Internet and the web stack and scraping, etc. is something we have to do iteratively, even if it means slamming a bunch of knowledge early on (and reviewing it later!).

No quizzes, though you should take the Zen of Python to heart…

About the web:

Python stuff:

You don’t have to do these exercises (yet), but do the reading:

Week 4: More with APIs and building programs

For Wednesday

No homework due.

Readings

The Follower Factory: https://www.nytimes.com/interactive/2018/01/27/technology/social-media-bots.html

This story just came out this weekend in the New York Times. It’s pretty long – probably the longest mainstream media story I’ve ever read on the topic of programmatic bots – and it’s perhaps the best. Well-written, full of fantastic interactive graphics, and chock-full of details about how bot-makers try to fool people.

csv - reading and writing delimited text data: http://2017.compciv.org/guide/topics/python-standard-library/csv.html

This is a primer on using Python’s csv library, which we will be doing in Wednesday class.

Brief history of the CSV file http://blog.sqlizer.io/posts/csv-history/

Basically what the title says – pretty short article with some optional links to follow about the plain ol CSV text format.

Creating URL query strings in Python http://www.compciv.org/guides/python/how-tos/creating-proper-url-query-strings/

Again, assuming people have very little knowledge about what a URL actually is, I assume that even fewer folks know that the URL specification includes a syntax for serializing key-value pairs. It’s easier demonstrated than explained.

For Google Search, here’s a URL in which the “query” term is set to stanford: https://google.com/search?q=stanford

And here’s a URL in which Google is instructed to return only French-language results: https://www.google.com/search?q=stanford&lr=lang_fr

Can you guess the pattern/syntax/delimiters for specifying key-value pairs in the URL? We’ll be seeing a bunch of examples on Wednesday.

Wednesday lecture

Walkthrough this exercise of reusing someone else’s Haversine code for our own purposes (ignore the rest of the earthquake stuff):

https://github.com/compciv/project-stanford-quakebot/tree/master/steps/calc_geo_distance

But we’ll start out trying to set things up via the command-line, as that will be the practice going on forward (practice hitting Tab!)

https://github.com/compciv/project-stanford-quakebot/tree/master/steps/calc_geo_distance#folder-setup

In-class exploration: The NYT’s Follower Factory, why fake followers are being bought, and how hard really is it to detect them?

https://github.com/compciv/project-compciv-twitterfakes

Coding instructions here:

https://github.com/compciv/project-twitterfakes#getting-a-project-folder-set-up-with-the-command-line

Understanding application program interfaces

  • Why programs depend on APIs
  • The purpose and motives of creating or not creating a public API
  • How to read API documentation

Week 5: The Web

  • Understanding HTTP
  • HTML and Web page design
  • Web scraping

Week 7: More Text as Data

In-class coding

An example web-scraping project/question – How many current Congressmembers attended Stanford University?

Tasks

  • How to test if a filepath exists
  • How to write text/bytes to a file
  • How to open and read text/bytes from a file
  • How to manage data from a CSV

Questions to ask

  • What is the simplest computational way to figure out whether someone attended Stanford or not?
  • How about graduated? How about attended/graduated from any other institution? Or served in the armed forces?
  • Once figuring out this problem for just Senators, how hard is it to solve for Representatives? How about all Congress legislators in all of history?

Assignments

Due Feb. 21 Wednesday, 11:59 PM:

Due Feb 27, Tuesday, 11:59 PM:

Week 8: More Text as Data and Data Analysis

Wednesday

Readings

Quasi-homework: Create a folder in your homework folder, named solid-serialization-skills, and do the following data serialization exercises:

I won’t be testing you on correctness (some of the pages even have the answers!), but going forward, I’ll expect that you know what they cover. So I would actually try to solve them as if they were homework, and even if you start out with copy-pasting my answers, you commit to rewriting the code in your own style.

Try out visualization with Matplotlib

Monday

Understanding regular expressions

I didn’t post this on Thursday but I had meant for this to be the readings for this week:

Readings for Wednesday:

About Text:

About Regular Expressions:

Re-read, review, try-out Peter Norvig’s spellcheck in Python: https://norvig.com/spell-correct.html

Week 9: Data Analysis with Pandas (continued)

Wednesday

Going to spend time going back to more advanced web scraping and understanding POST requests and forms.

Answers for the txdeathrow_scraper assignment

Monday

Readings and in-class

We’ll use Ben Welsh’s/California Civic Data Coalition’s “First Python Notebook” lesson. We can skip a few chapters (mostly about the setup process – we don’t need to use virtualenv, but try to get through all of these: