Python: Tuples, Dictionaries and Sets¶
Tuple¶
A tuple is defined with ()
. It can store multiple values like a list, but the items are unchangeable (immutable). This means that the order and the values of the items are fixed.
rgb = (93, 217, 117)
type(rgb)
tuple
# Loop over a tuple:
for item in rgb:
print(item)
93
217
117
# We can access items through their index:
rgb[1]
217
# It's not possible to change the values.
rgb[1] = 216
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/tmp/ipykernel_8042/3999292401.py in <module>
1 # It's not possible to change the values.
----> 2 rgb[1] = 216
TypeError: 'tuple' object does not support item assignment
Change tuple values through list conversion¶
It’s not possible to change the tuple directly, instead it can be converted to a list, then modified, then converted into a tuple:
print(rgb)
# Convert tuple to list.
rgb = list(rgb)
# Change value.
rgb[1] = 216
# Convert list to tuple.
rgb = tuple(rgb)
print(rgb)
(93, 217, 117)
(93, 216, 117)
Convert it to a list and modify the list (append, remove).
Convert it back to a tuple and iterate over it.
rose = ('a', 'rose')
rose = list(rose)
rose += ['is a rose']*2
rose = tuple(rose)
for item in rose:
print(item)
a
rose
is a rose
is a rose
Data types inside a tuple¶
# A tuple can contain any data type and also mixed data (like a list).
mixed_tuple = (93, '217', 117.0, ['a', 'list', 'inside', 'a', 'tuple'], ('a', 'tuple', 'inside', 'a', 'tuple'))
print(mixed_tuple)
(93, '217', 117.0, ['a', 'list', 'inside', 'a', 'tuple'], ('a', 'tuple', 'inside', 'a', 'tuple'))
for item in mixed_tuple:
print(item)
93
217
117.0
['a', 'list', 'inside', 'a', 'tuple']
('a', 'tuple', 'inside', 'a', 'tuple')
'217' in mixed_tuple
True
Unpacking a tuple¶
# Unpacking a tuple into single variables:
r, g, b = rgb
print(r)
print(g)
print(b)
93
217
117
This is for example useful, if a function should return multiple values into several variables:
def return_multiple_values():
a = 10
b = a*2
# Return values as a tuple:
return (a, b) # It's possible to leave the (): return a, b
# Unpack return values in two separate variables:
outer, inner = return_multiple_values()
print(outer)
print(inner)
10
20
def generate_rgb():
import random
# Generate three values in a list.
rgb_list = [random.randrange(0, 256) for i in range(3)]
# Convert to tuple
rgb_tuple = tuple(rgb_list)
return rgb_tuple
r, g, b = generate_rgb()
print(r)
print(g)
print(b)
86
123
77
Dictionaries¶
Dictionaries consists of key:value
pairs. A key is unique. The syntax is
{'key':'value', 'key':'value', 'key':'value'}
d = {'x':'value1', 'y':'value2'}
type(d)
dict
# get keys
d.keys()
dict_keys(['x', 'y'])
# get values
d.values()
dict_values(['value1', 'value2'])
# get value from key
d.get('x')
# or
d['x']
'value1'
for key in d.keys():
print(d.get(key))
value1
value2
Add and remove items¶
d['z'] = 'zebra'
print(d.keys())
print(d.get('z'))
dict_keys(['x', 'y', 'z'])
zebra
d.pop('z')
print(d.keys())
dict_keys(['x', 'y'])
Loop dictionaries¶
for key in d.keys():
print(key)
x
y
for value in d.values():
print(value)
value1
value2
for key, value in d.items():
print(key, '-', value)
x - value1
y - value2
More about dictionaries¶
# Copying a dictionary is done in one of the two following ways:
d_copy = d.copy()
# or
d_copy = dict(d)
d
Change a value.
Remove a key
Add a new key:value pair
Change the key of that pair.
Loop over the dictionary and print the pairs.
# Create a copy of d.
d_new = dict(d)
# Change a value.
d_new['x'] = 'updated value'
# Remove a key.
d_new.pop('y')
# Add a new key:value pair.
d_new['z'] = 'a new key:value pair'
# Change its key.
d_new['z2'] = d_new.pop('z')
# Print it.
for key, value in d_new.items():
print(key, ':', value)
x : updated value
z2 : a new key:value pair
Markov text generator with dictionary¶
To illustrate the usage of a dictionary we’ll create a very simple Markov Chain.
Markov chains consist of probability distributions for predicting next values based on an existing value. This could be used for text generation, for example as a next word suggestion tool in a mobile phone.
The probability is drawn from a data set, for example a text.
# Text corpora for the probabilities:
txt = '''The quick brown fox jumps over the lazy dog. The lazy programmer jumps over the fire fox.'''
txt = txt.lower().split()
Loop over txt and create key:value
pairs for each word.
Each unique word is stored as a key and all the words next to that key are stored as values.
dictionary = {}
debug = True
for i in range(len(txt)-1):
key = txt[i]
value = txt[i+1]
# Check if key exists:
if key in dictionary.keys():
# Then append the value to it's list of values.
dictionary[key].append(value)
if debug:
print(key, '\tin dictionary,', value, 'added as value')
# Else create the key and a list which holds the value.
else:
dictionary[key] = [value]
if debug:
print(key, '\tadded as key to dictionary,', value, 'added as value')
the added as key to dictionary, quick added as value
quick added as key to dictionary, brown added as value
brown added as key to dictionary, fox added as value
fox added as key to dictionary, jumps added as value
jumps added as key to dictionary, over added as value
over added as key to dictionary, the added as value
the in dictionary, lazy added as value
lazy added as key to dictionary, dog. added as value
dog. added as key to dictionary, the added as value
the in dictionary, lazy added as value
lazy in dictionary, programmer added as value
programmer added as key to dictionary, jumps added as value
jumps in dictionary, over added as value
over in dictionary, the added as value
the in dictionary, fire added as value
fire added as key to dictionary, fox. added as value
dictionary
{'the': ['quick', 'lazy', 'lazy', 'fire'],
'quick': ['brown'],
'brown': ['fox'],
'fox': ['jumps'],
'jumps': ['over', 'over'],
'over': ['the', 'the'],
'lazy': ['dog.', 'programmer'],
'dog.': ['the'],
'programmer': ['jumps'],
'fire': ['fox.']}
Based on this very small dictionary we can start the text generation. It will start with a given input. This input is used as a key to get all possible next words from the corpora:
inp_ = 'the'
# Get all values for the key inp_:
possibilities = dictionary[inp_]
possibilities
['quick', 'lazy', 'lazy', 'fire']
In the text above the word the
is followed by quick, lazy, lazy, fire
, which are the possible next words for the input the
.
Then we can pick one of them:
random.choice(possibilities)
'lazy'
Then this word is the next key and a value is looked up.
gen_txt = ['the'] # Input / generated text as a list.
# Loop until the last word ends with a dot.
while not gen_txt[-1].endswith('.'):
# Pick a value. Key is the last item from the list.
new_token = random.choice(dictionary[gen_txt[-1]])
# Append the picked value to the list.
# This value is the key in the next iteration of the loop.
gen_txt.append(new_token)
# Join list to string and print it.
gen_txt = ' '.join(gen_txt)
print(gen_txt)
the lazy programmer jumps over the quick brown fox jumps over the lazy programmer jumps over the fire fox.
For more about generating text with a Markov chain see hands-on text generators.
Set¶
Set is the 4th data type for storing collections of data next to list, tuple and dictionary. A set is like a list except it can contain only one occurence of each item. A set is created with {}
and comma-separated values.
data = {4, 29, 'two words', 4}
print(data)
print(type(data))
{'two words', 4, 29}
<class 'set'>
The 4 appears two times in the creation but only once in the object. Furthermore the items in a set are unordered.
set()¶
A set can be created by transforming a sequence of data into a set with the set()
method.
txt = '''The quick brown fox jumps over the lazy dog. The lazy programmer jumps over the fire fox.'''
txt_list = txt.split()
print('list:', txt_list)
print(len(txt_list), 'items\n')
# Transform the list into a set:
txt_set = set(txt.split())
print('set:', txt_set)
print(len(txt_set), 'items')
list: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog.', 'The', 'lazy', 'programmer', 'jumps', 'over', 'the', 'fire', 'fox.']
17 items
set: {'The', 'programmer', 'jumps', 'over', 'lazy', 'fox.', 'brown', 'the', 'dog.', 'quick', 'fox', 'fire'}
12 items
Access items¶
Items in a set don’t have an index. Instead we have to iterate over them.
for item in txt_set:
print(item)
fox.
The
fox
over
lazy
dog.
programmer
the
quick
fire
jumps
brown
Add items¶
txt_set.add('squirrel')
Remove items¶
txt_set.remove('fox.')
Join multiple sets¶
Sets can be joined either with update()
or with union()
. Update will merge the sets into one, union will create a new combined set.
# Create two sets with overlapping content:
set_a = set([x for x in range(5, 10)])
set_b = set([x for x in range(7, 13)])
# Combine them to a new set.
set_c = set_a.union(set_b)
print(set_c)
# Write set_b into set_a.
set_a.update(set_b) # This happens in place.
print(set_a)
{5, 6, 7, 8, 9, 10, 11, 12}
{5, 6, 7, 8, 9, 10, 11, 12}
More methods for set¶
set_a = set([x for x in range(5, 10)])
set_b = set([x for x in range(7, 13)])
# Calculate the difference
diff_a = set_a.difference(set_b)
print(diff_a)
diff_b = set_b.difference(set_a)
print(diff_b)
# Calculate the intersection
inter = set_a.intersection(set_b)
print(inter)
{5, 6}
{10, 11, 12}
{8, 9, 7}
You can inspect more methods with help(set)
.
Summary¶
It’s not necessary to know all the methods of all different data types for collections (list
, tuple
, dictionary
, set
). It’s more important to know of their existence and to know their characteristics (differences). Then you can choose the best type for your task. Most of the time this may be a list
. But for example if you need to remove all multiple occurences of items from a list, you can easily convert it into a set
and convert it back again into a list
. A tuple
may be useful to return
multiple values from a function, a dictionary
is obviously useful when dealing with pairs of data.