9.5. Loop over Dict

9.5.1. Rationale

  • Since Python 3.7: dict keeps order

  • Before Python 3.7: dict order is not ensured!!

9.5.2. Iterate

  • By default dict iterates over keys

  • Suggested variable name: key

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> for obj in DATA:
...     print(obj)
Sepal length
Sepal width
Petal length
Petal width
Species

9.5.3. Iterate Keys

  • Suggested variable name: key

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> list(DATA.keys())
['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>>
>>> for obj in DATA.keys():
...     print(obj)
Sepal length
Sepal width
Petal length
Petal width
Species

9.5.4. Iterate Values

  • Suggested variable name: value

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>> list(DATA.values())
[5.1, 3.5, 1.4, 0.2, 'setosa']
>>>
>>> for obj in DATA.values():
...     print(obj)
5.1
3.5
1.4
0.2
setosa

9.5.5. Iterate Key-Value Pairs

  • Suggested variable name: key, value

Getting pair: key, value from dict items:

>>> DATA = {'Sepal length': 5.1,
...         'Sepal width': 3.5,
...         'Petal length': 1.4,
...         'Petal width': 0.2,
...         'Species': 'setosa'}
>>>
>>>
>>> list(DATA.items())  # doctest: +NORMALIZE_WHITESPACE
[('Sepal length', 5.1),
 ('Sepal width', 3.5),
 ('Petal length', 1.4),
 ('Petal width', 0.2),
 ('Species', 'setosa')]
>>>
>>> for key, value in DATA.items():
...     print(key, '->', value)
Sepal length -> 5.1
Sepal width -> 3.5
Petal length -> 1.4
Petal width -> 0.2
Species -> setosa

9.5.6. List of Dicts

Unpacking list of dict:

>>> DATA = [{'Sepal length': 5.1, 'Sepal width': 3.5, 'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'},
...         {'Sepal length': 5.7, 'Sepal width': 2.8, 'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
...         {'Sepal length': 6.3, 'Sepal width': 2.9, 'Petal length': 5.6, 'Petal width': 1.8, 'Species': 'virginica'}]
>>>
>>> for row in DATA:
...     sepal_length = row['Sepal length']
...     species = row['Species']
...     print(f'{species} -> {sepal_length}')
setosa -> 5.1
versicolor -> 5.7
virginica -> 6.3

9.5.7. Generate with Range

  • range()

  • Pythonic way is to use zip()

  • Don't use len(range(...)) - it evaluates generator

Create dict from two list:

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for i in range(len(header)):
...     key = header[i]
...     value = data[i]
...     result[key] = value
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

9.5.8. Generate with Enumerate

  • enumerate()

  • _ regular variable name (not a special syntax)

  • _ by convention is used when variable will not be referenced

Create dict from two list:

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for i, key in enumerate(header):
...     result[key] = data[i]
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

9.5.9. Generate with Zip

  • zip()

  • The most Pythonic way

>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = {}
>>>
>>> for key, value in zip(header, data):
...     result[key] = value
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}
>>> header = ['Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species']
>>> data = [5.1, 3.5, 1.4, 0.2, 'setosa']
>>> result = dict(zip(header, data))
>>>
>>> print(result)  # doctest: +NORMALIZE_WHITESPACE
{'Sepal length': 5.1,
 'Sepal width': 3.5,
 'Petal length': 1.4,
 'Petal width': 0.2,
 'Species': 'setosa'}

9.5.10. Assignments

Code 9.16. Solution
"""
* Assignment: Loop Dict To Dict
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 8 min

English:
    1. Convert to `result: dict[str, str]`
    2. Run doctests - all must succeed

Polish:
    1. Przekonwertuj do `result: dict[str, str]`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> type(result)
    <class 'dict'>

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    {'Doctorate': '6',
     'Prof-school': '6',
     'Masters': '5',
     'Bachelor': '5',
     'Engineer': '5',
     'HS-grad': '4',
     'Junior High': '3',
     'Primary School': '2',
     'Kindergarten': '1'}
"""

DATA = {
    6: ['Doctorate', 'Prof-school'],
    5: ['Masters', 'Bachelor', 'Engineer'],
    4: ['HS-grad'],
    3: ['Junior High'],
    2: ['Primary School'],
    1: ['Kindergarten'],
}

result = ...  # dict[str,str]: converted DATA. Note values are str not int!

Code 9.17. Solution
"""
* Assignment: Loop Dict To List
* Required: yes
* Complexity: medium
* Lines of code: 4 lines
* Time: 5 min

English:
    1. Define `result: list[dict]`:
        a. key - name from the header
        b. value - measurement or species
    2. Run doctests - all must succeed

Polish:
    1. Zdefiniuj `result: list[dict]`:
        a. klucz - nazwa z nagłówka
        b. wartość - wyniki pomiarów lub gatunek
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> type(result)
    <class 'list'>

    >>> assert all(type(x) is dict for x in result)

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    [{'Sepal length': 5.8, 'Sepal width': 2.7, 'Petal length': 5.1, 'Petal width': 1.9, 'Species': 'virginica'},
     {'Sepal length': 5.1, 'Sepal width': 3.5, 'Petal length': 1.4, 'Petal width': 0.2, 'Species': 'setosa'},
     {'Sepal length': 5.7, 'Sepal width': 2.8, 'Petal length': 4.1, 'Petal width': 1.3, 'Species': 'versicolor'},
     {'Sepal length': 6.3, 'Sepal width': 2.9, 'Petal length': 5.6, 'Petal width': 1.8, 'Species': 'virginica'},
     {'Sepal length': 6.4, 'Sepal width': 3.2, 'Petal length': 4.5, 'Petal width': 1.5, 'Species': 'versicolor'},
     {'Sepal length': 4.7, 'Sepal width': 3.2, 'Petal length': 1.3, 'Petal width': 0.2, 'Species': 'setosa'}]
"""

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
]

result = ...  # list[dict]: with converted DATA

Code 9.18. Solution
"""
* Assignment: Loop Dict Label Encoder
* Required: no
* Complexity: hard
* Lines of code: 9 lines
* Time: 13 min

English:
    1. Define:
        a. `features: list[tuple]` - measurements
        b. `labels: list[int]` - species
        c. `label_encoder: dict[int, str]`
            dictionary with encoded (as numbers) species names
    2. Separate header from data
    3. To encode and decode `labels` (species) we need:
        a. Define `label_encoder: dict[int, str]`
        a. key - id (incremented integer value)
        b. value - species name
    4. `label_encoder` must be generated from `DATA`
    5. For each row add values to `features`, `labels` and `label_encoder`
    6. Run doctests - all must succeed

Polish:
    1. Zdefiniuj:
        a. `features: list[tuple]` - pomiary
        b. `labels: list[int]` - gatunki
        c. `label_encoder: dict[int, str]`
            słownik zakodowanych (jako cyfry) nazw gatunków
    2. Odseparuj nagłówek od danych
    3. Aby móc zakodować i odkodować `labels` (gatunki) potrzebujesz:
        a. Zdefiniuj `label_encoder: dict[int, str]`:
        a. key - identyfikator (kolejna liczba rzeczywista)
        b. value - nazwa gatunku
    4. `label_encoder` musi być wygenerowany z `DATA`
    5. Dla każdego wiersza dodawaj wartości do `feature`, `labels` i `label_encoder`
    6. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * Reversed lookup dict

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(features) is list
    >>> assert type(labels) is list
    >>> assert type(label_encoder) is dict
    >>> assert all(type(x) is tuple for x in features)
    >>> assert all(type(x) is int for x in labels)
    >>> assert all(type(x) is int for x in label_encoder.keys())
    >>> assert all(type(x) is str for x in label_encoder.values())

    >>> features  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9),
     (5.1, 3.5, 1.4, 0.2),
     (5.7, 2.8, 4.1, 1.3),
     (6.3, 2.9, 5.6, 1.8),
     (6.4, 3.2, 4.5, 1.5),
     (4.7, 3.2, 1.3, 0.2)]
    >>> labels
    [0, 1, 2, 0, 2, 1]
    >>> label_encoder  # doctest: +NORMALIZE_WHITESPACE
    {0: 'virginica',
     1: 'setosa',
     2: 'versicolor'}
"""

DATA = [
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
    (5.8, 2.7, 5.1, 1.9, 'virginica'),
    (5.1, 3.5, 1.4, 0.2, 'setosa'),
    (5.7, 2.8, 4.1, 1.3, 'versicolor'),
    (6.3, 2.9, 5.6, 1.8, 'virginica'),
    (6.4, 3.2, 4.5, 1.5, 'versicolor'),
    (4.7, 3.2, 1.3, 0.2, 'setosa'),
]

features = ...  # list[tuple]: values from column 0-3 from DATA without header
labels = ...  # list[str]: species name from column 4 from DATA without header
label_encoder = ...  # dict[int,str]: lookup dict generated from species names