## 5 Python dictionaries

DRAFT 5 © June 28, 2002 Ron Zacharski Python dictionaries Your programming skills are about to take a quantum leap forward. This tutorial covers P...
Author: Lenard Carr
DRAFT

5

© June 28, 2002 Ron Zacharski

Python dictionaries

Your programming skills are about to take a quantum leap forward. This tutorial covers Python dictionaries, a very versatile way of structuring your data. You'll find them useful for a variety of natural language applications including parsers and semantic analyzers.

5.1 A 'not so wonderful' analogy I've been trying to come up with a useful analogy to explain what's so special about Python dictionaries. The best I could come up with—and it's not that wonderful of an analogy—is the following. Suppose you are not the best organized person in the world. When someone gives you their phone number you write that person's name and number on an index card and throw it in your 'address' box. When you want to look up a person's number you need to examine each card in the box until you find the relevant information. Now let's suppose a different scenario. Suppose you have your own personal assistant who has a photographic memory. You can just give your assistant the name of a person and he instantly responds with the telephone number. That is, instead of spending time searching through your box, you give your assistant what we call a key (a person's name) and he responds with the value for this key (the phone number). A Python dictionary consists of a set of key value pairs. Let's look at an example: >>> a = {'Ann': '592-6372', 'Ben': '282-8992', 'Flora': '927-9021', 'Isaac': '423-3790'}

Here we've defined a list of key-value pairs. For example, 'Ann': '592-6372'. The members of the pair are separated by a colon. Each pair is separated from the other pairs by a comma. To look-up a value we do the following:

DRAFT

© June 28, 2002 Ron Zacharski

>>> a['Ben'] '282-8992' >>> a['Flora'] '927-9021'

and to add an entry to the dictionary we type: >>> a['Hamid'] = '991-1911' We can view the entire dictionary by typing the name of the dictionary: >>> a {'Isaac': '423-3790', 'Hamid': '991-1911', 'Ann': '592-6372', 'Flora': '9279021', 'Ben': '282-8992'} >>>

One way of viewing these dictionaries is as a table:

Key

Value

Isaac

423-3790

Hamid

991-1911

Ann

592-6372

Flora

927-9021

Ben

282-8992

Here's another example:

Key

Value

Baker

English Syntax

Spencer

Morphological Theory

Radford

Syntactic Theory

Here's the associated Python code for the above table: >>> books = {'Baker': 'English Syntax', 'Spencer': 'Morphological Theory', 'Radford': 'Syntactic Theory'} >>> books['Baker'] 'English Syntax' >>> books['Radford'] 'Syntactic Theory'

I can add an entry into the dictionary as follows: >>> books['Jackendorf'] = 'Semantic Theory'

And then check the result:

DRAFT

© June 28, 2002 Ron Zacharski

>>> books['Jackendorf'] 'Semantic Theory' >>>

Let's look at another example. Suppose that the following words are unambiguous:

Word

Part of Speech

dog

noun

cat

noun

saw

verb

a

det (determiner)

the

det

happy

adj (adjective)

lazy

adj

Let's represent the relationship between a word and its part of speech by using a Python dictionary: >>> pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} >>>

What's the part of speech of happy? >>> pos['happy'] 'adj' >>>

Let's say I want to write a function, tag, that does the following: >>> tag('the happy cat') ' det adj noun' >>> tag('the happy cat saw the lazy dog') ' det adj noun verb det adj noun' >>>

That is, it takes as input a string of words and returns a string of part-of-speech tags. If you think you can write this on your own, give it a try before turning the page. Otherwise, continue.

Let's rough out our function: def tag(sentence): """look up each word in the sentence and return its part of speech""" # create dictionary # initialize return string # divide string into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

So, the input to this function is called 'sentence'.1 My plan is to have the function return a string called pos_tags (for 'part of speech tags') that contains the part-of-speech tag for each word in the sentence—pos_tags will start out empty ('') and we will add the part-of-speech tags to the end. Let's convert each of the English-like comments in the function to Python code.

def tag(sentence): """look up each word in the sentence and return its part of speech""" # create dictionary #initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

The first thing we want to do is create a Python dictionary, let's call it pos (for 'part of speech'):

def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} #initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

1 Remember that sentence is just a name. The name has no significance—it's just a name I invented. You can replace every occurrence of sentence with a name you invent and the example will work equally well.

86

The next step is to initialize the return string: def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

We want the return string to initially be empty, '': def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

Next, we want to divide the string into words: def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words # for each word # lookup part of speech and add it to pos_tags # return pos_tags

87

Recall from chapter 4, that we can split a string into words by using the split method: def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} # initialize return string pos_tags = '' # divide sentence into words words = sentence.split() # for each word # lookup part of speech and add it to pos_tags # return pos_tags

Now, we are at the for loop: # for each word # lookup part of speech and add it to rstring

which we can translate as follows: # for each word for word in words: # lookup part of speech and add it to rstring pos_tags = pos_tags + ' ' + pos[word]

The line pos_tags = pos_tags + ' ' + pos[word] adds a space and the part-of-speech of the current word to the end of pos_tags. Finally, we need to return pos_tags and our function is finished: def tag(sentence): """look up each word in the sentence and return it's part of speech""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} #initialize return string pos_tags = '' # divide sentence into words words = sentence.split() # for each word for word in words: # lookup part of speech and add it to pos_tags pos_tags = pos_tags + ' ' + pos[word] # return pos_tags return pos_tags

Exercise 1: A modification to the tagger:

88

A current format for tagging text uses opening and closing tags as follows: raw text: tagged text:

the happy cat the happy cat

Can you modify the above tagger program to have similar output? That is, >>> tagger('the happy dog') ' the happy dog' >>>

(see solutions at the end of this tutorial)

5.2 Entries not in dictionary Let’s say we wrote the function described in exercise 1 and it’s looking pretty good: >>> tagger('the lazy cat') ' the lazy cat' >>>

What happens if we try a word not in our part of speech list: >>> tagger('the happy poodle') Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\tagger.py", line 17, in tagger opening_tag = '' KeyError: 'poodle' >>>

Hmmm. This does not look so wonderful. Let’s look at another example of the problem: >>> pos = {'dog': 'noun', 'cat': 'noun'} >>> pos['dog'] 'noun' >>> pos['poodle'] Traceback (most recent call last): File "", line 1, in ? KeyError: 'poodle' >>>

Remember, we described dictionaries as containing key-value pairs. This error occurs when the key is not present in the dictionary. So the above error occurred because poodle is not a

89

key in the pos dictionary. This makes our tagger function not so useful. It will crash and die (well, just an error) if it encounters an unknown word. However, there is an easy way to fix this problem. Just like with lists, we can check if a dictionary has a specific key by using “in”: >>> pos = {'dog': 'noun', 'cat': 'noun'} >>> 'dog' in pos True >>> 'poodle' in pos False >>> 'cat' in pos True

As you can see, using “in” returns True or False depending on whether or not the dictionary contains a specific key. What we might want to do for our tagger program is for a given word: if the pos dictionary has the word as a key: then look up the pos of the word and create a tag else: use the generic tag ‘word’ So instead of >>> tagger('the happy poodle') Traceback (most recent call last): File "", line 1, in ? File "C:\Python23\tagger.py", line 17, in tagger opening_tag = '' KeyError: 'poodle' >>>

we would get >>> tagger('the happy poodle') ' the happy poodle'

Adding this English description of what we want to do to our tagger function: def tagger(sentence): """look up each word in the sentence and tag it""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} tagged_sentence = '' words = sentence.split() for word in words: # # if the pos dictionary has the word as a key: # then set the_pos to the pos of the word

90

# else: # set the_pos to the generic 'word' # opening_tag = '' closing_tag = '' item = opening_tag + word + closing_tag tagged_sentence = tagged_sentence + ' ' + item return tagged_sentence

Let’s look at the bolded block line by line. We can translate # if the pos dictionary has the word as a key:

to Python: if word in pos:

resulting in # if the pos dictionary has the word as a key: if word in pos: # then set the_pos to the pos of the word # else: # set the_pos to the generic 'word'

Now we convert the next comment line to Python: # if the pos dictionary has the word as a key: if word in pos: # then set the_pos to the pos of the word the_pos = pos[word] # else: # set the_pos to the generic 'word'

Finally, we convert the last two lines: # if the pos dictionary has the word as a key: if word in pos: # then set the_pos to the pos of the word the_pos = pos[word] # else: else: # set the_pos to the generic 'word' the_pos = 'word'

The final version of our tagger function (minus the comments) is as follows: def tagger(sentence): """look up each word in the sentence and tag it""" pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} tagged_sentence = ''

91

words = sentence.split() for word in words: if word in pos: the_pos = pos[word] else: the_pos = 'word' opening_tag = '' closing_tag = '' item = opening_tag + word + closing_tag tagged_sentence = tagged_sentence + ' ' + item return tagged_sentence

Exercise 2: A simple word-for-word machine translation system.

Suppose I have a file containing a simple Spanish-English dictionary: amigos friends sapo frog sepo toad y and es is son are

How would I create a function that would read this Spanish-English dictionary into a Python dictionary? We know every component of this task. We know how to read a file into an array of lines. We know how to split a line into an array. We know how to reference items in an array. And we know how to add items to a Python dictionary. So in pseudocode our function would be: def read_dictionary(spanish_dictionary): """Read file and add entries to dictionary""" # initialize Python dictionary # read in Spanish-English dictionary file # for each line in file # split line # add entry to Python dictionary # return Python dictionary

Finish writing this function. If you are stuck you'll find a solution at the end of this tutorial.

Okay. Now that you have that function written can you write a function, translate, that takes one argument, a Spanish string, and translates the Spanish string into English? If a word in the Spanish text is not in the dictionary the Spanish word will appear in the translation:

92

>>> translate('sapo ' frog and toad are >>> translate('sapo ' frog and elefante >>>

y sepo son amigos', ) friends' y elefante son amigos') are friends'

Inside your function you will use your read_dictionary function to load the dictionary.

93

Solutions: Exercise 1: def tagger(sentence): """look up each word in the sentence and tag it""" # create dictionary pos = {'dog': 'noun', 'cat': 'noun', 'saw': 'verb', 'a': 'det', 'the': 'det', 'happy': 'adj', 'lazy': 'adj'} #initialize return string tagged_sentence = '' # divide sentence into words words = sentence.split() # for each word for word in words: # lookup part of speech and add it to pos_tags # # this part constructs the the item # opening_tag = '' closing_tag = '' item = opening_tag + word + closing_tag tagged_sentence = tagged_sentence + ' ' + item # return tagged sentence return tagged_sentence

Exercise 2 def read_dictionary(spanish_dictionary): """Read file and add entries to dictionary""" # initialize Python dictionary table = {} # read in Spanish-English dictionary file infile = open(spanish_dictionary) lines = list(infile) infile.close() # for each line in file for line in lines: # split line entry = line.split() # add entry to Python dictionary table[entry[0]] = entry[1] # return Python dictionary return table

The translation function

94

def translate(sentence): table = read_dictionary('\\python23\\spanish.txt') words = sentence.split() result = '' for word in words: if word in table: result = result + ' ' + table[word] else: result = result + ' ' + word return result

95