One of the most interesting and important things we can do as a programmer is manipulate and analyze real world data. Python is a particularly great language for data processing, because of the very powerful open source data manipulation libraries available online.
Installing such libraries is beyond the scope of this course (see CS61A and/or DATA C8 instead), so we'll be working mostly with libraries that are built into Python. In this lab, we'll work with text data.
To get started on this lab:
unzip command. Remember if you're working on a Windows machine use dir instead of ls.
$ unzip lab14starter.zip
$ ls
lab14starter.zip lab14starter
$ cd lab14starter
$ ls
beatles.txt nietzsche.txt
democratic_debate_2015.txt presedential_debate_2016.txt
ee_cummings.txt republican_debate_2015.txt
gettysburg.txt savio.txt
horse_ebooks.txt state_of_the_union_2015.txt
i_have_a_dream.txt word_analyzer.py
jay_z_lyrics.txt
If you need help with this process, make sure to ask your lab TA.
The following functions are required for this lab:
pig_latin(word)
izzle(word)
apply_language_game(text, language_game)
count_words(text)
top_n_words(counts, n)
print_top_n_words(counts, n)
average_word_length(counts)