Word Length and Diversity

The length of the average word in a text is a proxy for the literary sophistication of its author. Fill in average_word_lengths, which takes a counts dictionary and returns the average length of words in that text.


def average_word_length(counts):
    """Returns the average length of a text with given counts. For example:

    average_word_length({'and': 5, 'on': 1, 'Vegetables': 5, 'Budget': 1, 'to': 1, 'Fruit': 1, 'a': 2, 'Clean': 1, 'Fruits': 1, 'Store': 1, 'at': 1})

    would return 5.0. This is because:
    Total letters: 3*5 + 2*1 + 10*5 + 6*1 + 2*1 + 5*1 + 1*2 + 5*1 + 6*1 + 5*1 + 2*1 = 100
    Total words: 5 + 1 + 5 + 1 + 1 + 1 + 2 + 1 + 1 + 1 + 1 = 20"""

Hint: To get the length of a string, use the len function.

Make sure to test your code with the autograder by running the following line.

$ python -m doctest word_analyzer.py

When you've completed this task, you will have completed all of the required portions of this lab. For fun, test the length of the average word used by Nietzsche, MLK, Jay-Z, candidates running for office in the recent debates, etc. using print statements!

Measuring Word Diversity (optional)

Another interesting measure of an author or speaker is the diversity of their words. One crude measure we might use is as follows: Define the diversity score of a text as the number of distinct words divided by the total number of words. For example "Fruits and Vegetables and Vegetables on a Budget and Vegetables at a Store and Vegetables to Clean Fruit and Vegetables" is not a very diverse sentence, as it has 20 total words, but only 11 distinct words, with a diversity score of 11/20 = 0.55.

Fill in the function word_diversity(counts) which returns the diversity of a text. Test the diversity of the given input texts.

Hint: To find the number of distinct items, use the set function to convert a list into a set data structure. For example x = set(["dog", "cat", "fish", "dog", "dog", "fish"]) would give us set(["dog", "fish", "cat"]).



  You can check your work using the autograder:

        $ python -m doctest word_analyzer.py