Your ultimate goal is to write a function build_markov_model(text, k) that generates a dictionary representation of a Markov Chain of order k for the given text. For example, build_markov_model("the ether")
will return {' et': {'h': 1}, 'the': {' ': 1, 'r': 1}, 'e e': {'t': 1}, 'he ': {'e': 1}, 'eth': {'e': 1}}
. Note that this is just a dictionary representation of our table from the previous page:
frequency of probability that next char kgram freq e h r t ------------------------------------- the 2 1 0 0 1 0 he 1 0 1 0 0 0 e e 1 0 0 0 0 1 et 1 0 0 1 0 0 eth 1 0 1 0 0 0 her 1 0 0 0 0 1 ert 1 0 0 1 0 0 rth 1 0 1 0 0 0
To reach this goal, let's take things one step at a time. Let's start by filling out the get_kgram function. Test it to your satisfaction with the autograder before proceeding.
def get_kgram(text, ptr, k):
"""Returns the kgram starting at position ptr in a text. For example:
get_kgram("hello", 0, 3) would return "hel"
get_kgram("hello", 1, 3) would return "ell"
get_kgram("hello", 1, 4) would return "ello"
get_kgram("hello", 3, 4) would crash since 3 + 4 is longer than the string."""
$ python -m doctest word_analyzer.py
Next we'll create a function that adds single characters to our model. Fill in the following function:
def process_character(m, text, ptr, k):
"""Adds information about the given character to the model
m is the model (a dictionary)
text is the text
text[ptr] is the character to be processed
k is the order of our Markov chain"""
For example, consider what happens if we want to build a markov model of "the ether". Suppose our markov model is stored in a variable called markovmodel
and we have k = 4. After we've processed the first three characters, markovmodel = {'the ': {'e': 1}, 'he e': {'t': 1}, 'e et': {'h': 1}}
. To process the next character, we'd call process_character(markovmodel, "the ether", 3, 4)
. Make sure you understand this before proceeding!
Write the process_character
function. Test it with the following code and make sure your output matches the output given in the example in the paragraph above.
markovmodel = {}
process_character(markovmodel, "the ether", 0, 4)
process_character(markovmodel, "the ether", 1, 4)
process_character(markovmodel, "the ether", 2, 4)
You can test your code with the autograder as well by running:
$ python -m doctest word_analyzer.py
Now we have all the functions we need to build a markov model from a corpus of text! Fill in build_markov_model:
def build_markov_model(text, k):
""" Returns the markov model for text.
m is the model (a dictionary)
text is the text
k is the order of our Markov chain"""
This function should loop through text, process each character and add it to the model accordingly. For example calling build_markov_model('the ether', 3)
would return {' et': {'h': 1}, 'the': {' ': 1, 'r': 1}, 'e e': {'t': 1}, 'he ': {'e': 1}, 'eth': {'e': 1}}
.
Hint: Use the process_character function in your implementation of build_markov_model.
You can check your work with the autograder:
$ python -m doctest word_analyzer.py