Building a Markov Model

Your ultimate goal is to write a function build_markov_model(text, k) that generates a dictionary representation of a Markov Chain of order k for the given text. For example, build_markov_model("the ether") will return {' et': {'h': 1}, 'the': {' ': 1, 'r': 1}, 'e e': {'t': 1}, 'he ': {'e': 1}, 'eth': {'e': 1}}. Note that this is just a dictionary representation of our table from the previous page:

                frequency of           probability that
                 next char             
kgram   freq         e   h   r   t   
-------------------------------------
 the      2      1   0   0   1   0    
 he       1      0   1   0   0   0    
 e e      1      0   0   0   0   1   
  et      1      0   0   1   0   0   
 eth      1      0   1   0   0   0   
 her      1      0   0   0   0   1   
 ert      1      0   0   1   0   0   
 rth      1      0   1   0   0   0

To reach this goal, let's take things one step at a time. Let's start by filling out the get_kgram function. Test it to your satisfaction with the autograder before proceeding.

def get_kgram(text, ptr, k):
    """Returns the kgram starting at position ptr in a text. For example:

    get_kgram("hello", 0, 3) would return "hel"
    get_kgram("hello", 1, 3) would return "ell"
    get_kgram("hello", 1, 4) would return "ello"
    get_kgram("hello", 3, 4) would crash since 3 + 4 is longer than the string."""

$ python -m doctest word_analyzer.py

Adding a Single Character to our Model

Next we'll create a function that adds single characters to our model. Fill in the following function:

def process_character(m, text, ptr, k):
    """Adds information about the given character to the model

    m is the model (a dictionary)
    text is the text
    text[ptr] is the character to be processed
    k is the order of our Markov chain"""

For example, consider what happens if we want to build a markov model of "the ether". Suppose our markov model is stored in a variable called markovmodel and we have k = 4. After we've processed the first three characters, markovmodel = {'the ': {'e': 1}, 'he e': {'t': 1}, 'e et': {'h': 1}}. To process the next character, we'd call process_character(markovmodel, "the ether", 3, 4). Make sure you understand this before proceeding!

Write the process_character function. Test it with the following code and make sure your output matches the output given in the example in the paragraph above.

markovmodel = {}
process_character(markovmodel, "the ether", 0, 4)
process_character(markovmodel, "the ether", 1, 4)
process_character(markovmodel, "the ether", 2, 4)

You can test your code with the autograder as well by running:

$ python -m doctest word_analyzer.py

Building our Markov Model

Now we have all the functions we need to build a markov model from a corpus of text! Fill in build_markov_model:

def build_markov_model(text, k):
    """ Returns the markov model for text.
    m is the model (a dictionary)
    text is the text
    k is the order of our Markov chain"""

This function should loop through text, process each character and add it to the model accordingly. For example calling build_markov_model('the ether', 3) would return {' et': {'h': 1}, 'the': {' ': 1, 'r': 1}, 'e e': {'t': 1}, 'he ': {'e': 1}, 'eth': {'e': 1}}.

Hint: Use the process_character function in your implementation of build_markov_model.

You can check your work with the autograder:

$ python -m doctest word_analyzer.py