XClose

COMP0233: Research Software Engineering With Python

Home
Menu

Dictionaries and Sets

The Python Dictionary

Python supports a container type called a dictionary.

This is also known as an "associative array", "map" or "hash" in other languages.

In a list, we use a number to look up an element:

In [1]:
names = "Martin Luther King".split(" ")
names[1]
Out[1]:
'Luther'

In a dictionary, we look up an element using another object of our choice:

In [2]:
chapman = {"name": "Graham", "age": 48, 
           "Jobs": ["Comedian", "Writer"] }
In [3]:
chapman
Out[3]:
{'name': 'Graham', 'age': 48, 'Jobs': ['Comedian', 'Writer']}
In [4]:
print(chapman["Jobs"])
['Comedian', 'Writer']
In [5]:
print(chapman["age"])
48
In [6]:
print(type(chapman))
<class 'dict'>

Keys and Values

The things we can use to look up with are called keys:

In [7]:
chapman.keys()
Out[7]:
dict_keys(['name', 'age', 'Jobs'])

The things we can look up are called values:

In [8]:
chapman.values()
Out[8]:
dict_values(['Graham', 48, ['Comedian', 'Writer']])

When we test for containment on a dict we test on the keys:

In [9]:
"Jobs" in chapman
Out[9]:
True
In [10]:
"Graham" in chapman
Out[10]:
False
In [11]:
"Graham" in chapman.values()
Out[11]:
True

Immutable Keys Only

The way in which dictionaries work is one of the coolest things in computer science: the "hash table". This is way beyond the scope of this course, but it has a consequence:

You can only use immutable things as keys.

In [12]:
good_match = {
    ("Lamb", "Mint"): True, 
    ("Bacon", "Chocolate"): False
   }

but:

In [13]:
illegal = {
    ["Lamb", "Mint"]: True, 
    ["Bacon", "Chocolate"]: False
   }
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[13], line 1
----> 1 illegal = {
      2     ["Lamb", "Mint"]: True, 
      3     ["Bacon", "Chocolate"]: False
      4    }

TypeError: unhashable type: 'list'

Supplementary material: You can start to learn about the 'hash table'. Though this video is very advanced, it's really interesting!

No guarantee of order

Another consequence of the way dictionaries work is that there's no guaranteed order among the elements:

In [14]:
my_dict = {"0": 0, "1": 1, "2": 2, "3": 3, "4": 4}
print(my_dict)
print(my_dict.values())
{'0': 0, '1': 1, '2': 2, '3': 3, '4': 4}
dict_values([0, 1, 2, 3, 4])

Safe Lookup

Some times you want a program to keep working even when a key is looked up but it's not there. Python dictionaries offers that through the get method.

In [15]:
x = {"a": 1, "b": 2}
In [16]:
x["a"]
Out[16]:
1
In [17]:
x["fish"]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[17], line 1
----> 1 x["fish"]

KeyError: 'fish'
In [18]:
x.get("a")
Out[18]:
1
In [19]:
x.get("fish")

By default get returns None if the key searched is not in the dictionary. However, you can change that default by adding what's the value you want it to return.

In [20]:
x.get("fish", "tuna") == "tuna"
Out[20]:
True

Sets

A set is a list which cannot contain the same element twice. We make one by calling set() on any sequence, e.g. a list or string.

In [21]:
name = "Graham Chapman"
unique_letters = set(name)
In [22]:
unique_letters
Out[22]:
{' ', 'C', 'G', 'a', 'h', 'm', 'n', 'p', 'r'}

Or by defining a literal like a dictionary, but without the colons:

In [23]:
primes_below_ten = { 2, 3, 5, 7}
In [24]:
print(type(unique_letters))
print(type(primes_below_ten))
<class 'set'>
<class 'set'>
In [25]:
unique_letters
Out[25]:
{' ', 'C', 'G', 'a', 'h', 'm', 'n', 'p', 'r'}

This will be easier to read if we turn the set of letters back into a string, with join:

In [26]:
print("".join(unique_letters))
CpGharm n

join uses the character give to be what joins the sequence given:

In [27]:
"-".join(["a", "b", "c"])
Out[27]:
'a-b-c'

Note that a set has no particular order, but is really useful for checking or storing unique values.

In [28]:
alist = [1, 2, 3]
is_unique = len(set(alist)) == len(alist)
print(is_unique)
True

Set operations work as in mathematics:

In [29]:
x = set("Hello")
y = set("Goodbye")
In [30]:
x & y # Intersection
Out[30]:
{'e', 'o'}
In [31]:
x | y # Union
Out[31]:
{'G', 'H', 'b', 'd', 'e', 'l', 'o', 'y'}
In [32]:
y - x # y intersection with complement of x: letters in Goodbye but not in Hello
Out[32]:
{'G', 'b', 'd', 'y'}

Your programs will be faster and more readable if you use the appropriate container type for your data's meaning. Always use a set for lists which can't in principle contain the same data twice, always use a dictionary for anything which feels like a mapping from keys to values.