5.3. Sequence Set

5.3.1. Rationale

  • Only unique values

  • Mutable - can add, remove, and modify items

  • Can store elements of any hashable types

  • Set is unordered data structure and do not record element position or insertion

  • Do not support getitem and slice

Hashable (Immutable):

  • int

  • float

  • bool

  • NoneType

  • str

  • tuple

  • frozenset

Non-hashable (Mutable):

  • list

  • set

  • dict

"Hashable types are also immutable" is true for builtin types, but it's not a universal truth. More information in OOP Hash More information in OOP Object Identity

5.3.2. Definition

Defining only with set() - no short syntax:

>>> data = set()

Comma after last element of a one element set is optional. Brackets are required

>>> data = {1}
>>> data = {1, 2, 3}
>>> data = {1.1, 2.2, 3.3}
>>> data = {True, False}
>>> data = {'a', 'b', 'c'}
>>> data = {'a', 1, 2.2, True, None}

Stores only unique values:

>>> {1, 2, 1}
{1, 2}

Compares by values, not types:

>>> {1}
{1}
>>> {1.0}
{1.0}
>>> {1, 1.0}
{1}
>>> {1.0, 1}
{1.0}

Can store elements of any hashable types:

>>> data = {1, 2, 'a'}
>>> data = {1, 2, (3, 4)}
>>>
>>> data = {1, 2, [3, 4]}
Traceback (most recent call last):
TypeError: unhashable type: 'list'
>>>
>>> data = {1, 2, {3, 4}}
Traceback (most recent call last):
TypeError: unhashable type: 'set'

5.3.3. Type Casting

  • set() converts argument to set

>>> data = 'abcd'
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = ['a', 'b', 'c', 'd']
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = ('a', 'b', 'c', 'd')
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = {'a', 'b', 'c', 'd'}
>>> set(data) == {'a', 'b', 'c', 'd'}
True
>>> data = frozenset({'a', 'b', 'c', 'd'})
>>> set(data) == {'a', 'b', 'c', 'd'}
True

5.3.4. Deduplicate

Works with str, list, tuple, frozenset

>>> data = [1, 2, 3, 1, 1, 2, 4]
>>> set(data)
{1, 2, 3, 4}

Converting set deduplicate items:

>>> data = ['Twardowski',
...         'Twardowski',
...         'Watney',
...         'Twardowski']
...
>>> set(data) == {'Twardowski', 'Watney'}
True

5.3.5. Add

>>> data = {1, 2}
>>>
>>> data.add(3)
>>> data == {1, 2, 3}
True
>>>
>>> data.add(3)
>>> data == {1, 2, 3}
True
>>>
>>> data.add(4)
>>> data == {1, 2, 3, 4}
True
>>> data = {1, 2}
>>> data.add([3, 4])
Traceback (most recent call last):
TypeError: unhashable type: 'list'
>>> data = {1, 2}
>>> data.add((3, 4))
>>> data == {1, 2, (3, 4)}
True
>>> data = {1, 2}
>>> data.add({3, 4})
Traceback (most recent call last):
TypeError: unhashable type: 'set'
>>> data = {1, 2}
>>> data.add(frozenset({3,4}))
>>> data
{frozenset({3, 4}), 1, 2}

5.3.6. Update

>>> data = {1, 2}
>>> data.update({3, 4})
>>> data == {1, 2, 3, 4}
True
>>> data.update([5, 6])
>>> data == {1, 2, 3, 4, 5, 6}
True
>>> data.update((7, 8))
>>> data == {1, 2, 3, 4, 5, 6, 7, 8}
True

5.3.7. Pop

Gets and remove items

>>> data = {1, 2, 3}
>>> value = data.pop()
>>> value in [1, 2, 3]
True

5.3.8. Membership

Is Disjoint?:

  • True - if there are no common elements in data and x

  • False - if any x element are in data

>>> data = {1,2}
>>>
>>> data.isdisjoint({1,2})
False
>>> data.isdisjoint({1,3})
False
>>> data.isdisjoint({3,4})
True

Is Subset?:

  • True - if x has all elements from data

  • False - if x don't have element from data

>>> data = {1,2}
>>>
>>> data.issubset({1})
False
>>> data.issubset({1,2})
True
>>> data.issubset({1,2,3})
True
>>> data.issubset({1,3,4})
False
>>> {1,2} < {3,4}
False
>>> {1,2} < {1,2}
False
>>> {1,2} < {1,2,3}
True
>>> {1,2,3} < {1,2}
False
>>> {1,2} <= {3,4}
False
>>> {1,2} <= {1,2}
True
>>> {1,2} <= {1,2,3}
True
>>> {1,2,3} <= {1,2}
False

Is Superset?: * True - if data has all elements from x * False - if data don't have element from x

>>> data = {1,2}
>>>
>>> data.issuperset({1})
True
>>> data.issuperset({1,2})
True
>>> data.issuperset({1,2,3})
False
>>> data.issuperset({1,3})
False
>>> data.issuperset({2,1})
True
>>> {1,2} > {1,2}
False
>>> {1,2} > {1,2,3}
False
>>> {1,2,3} > {1,2}
True
>>> {1,2} >= {1,2}
True
>>> {1,2} >= {1,2,3}
False
>>> {1,2,3} >= {1,2}
True

5.3.9. Basic Operations

Union (returns sum of elements from data and x):

>>> data = {1,2}
>>>
>>> data.union({1,2})
{1, 2}
>>> data.union({1,2,3})
{1, 2, 3}
>>> data.union({1,2,4})
{1, 2, 4}
>>> data.union({1,3}, {2,4})
{1, 2, 3, 4}
>>> {1,2} | {1,2}
{1, 2}
>>> {1,2,3} | {1,2}
{1, 2, 3}
>>> {1,2,3} | {1,2,4}
{1, 2, 3, 4}
>>> {1,2} | {1,3} | {2,4}
{1, 2, 3, 4}

Difference (returns elements from data which are not in x):

>>> data = {1,2}
>>>
>>> data.difference({1,2})
set()
>>> data.difference({1,2,3})
set()
>>> data.difference({1,4})
{2}
>>> data.difference({1,3}, {2,4})
set()
>>> data.difference({3,4})
{1, 2}
>>> {1,2} - {2,3}
{1}
>>> {1,2} - {2,3} - {3}
{1}
>>> {1,2} - {1,2,3}
set()

Symmetric Difference (returns elements from data and x, but without common):

>>> data = {1,2}
>>>
>>> data.symmetric_difference({1,2})
set()
>>> data.symmetric_difference({1,2,3})
{3}
>>> data.symmetric_difference({1,4})
{2, 4}
>>> data.symmetric_difference({1,3}, {2,4})
Traceback (most recent call last):
TypeError: symmetric_difference() takes exactly one argument (2 given)
>>> data.symmetric_difference({3,4})
{1, 2, 3, 4}
>>> {1,2} ^ {1,2}
set()
>>> {1,2} ^ {2,3}
{1, 3}
>>> {1,2} ^ {1,3}
{2, 3}

Intersection (returns common element from in data and x):

>>> data = {1,2}
>>>
>>> data.intersection({1,2})
{1, 2}
>>> data.intersection({1,2,3})
{1, 2}
>>> data.intersection({1,4})
{1}
>>> data.intersection({1,3}, {2,4})
set()
>>> data.intersection({1,3}, {1,4})
{1}
>>> data.intersection({3,4})
set()
>>> {1,2} & {2,3}
{2}
>>> {1,2} & {2,3} & {2,4}
{2}
>>> {1,2} & {2,3} & {3}
set()

5.3.10. Cardinality

>>> data = {1, 2, 3}
>>> len(data)
3

5.3.11. Assignments

Code 5.7. Solution
"""
* Assignment: Sequence Set Create
* Required: yes
* Complexity: easy
* Lines of code: 1 lines
* Time: 2 min

English:
    1. Create set `result` with elements:
        a. `'a'`
        b. `1`
        c. `2.2`
    2. Run doctests - all must succeed

Polish:
    1. Stwórz zbiór `result` z elementami:
        a. `'a'`
        b. `1`
        c. `2.2`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'
    >>> assert len(result) == 3, \
    'Variable `result` length should be 3'

    >>> 'a' in result
    True
    >>> 1 in result
    True
    >>> 2.2 in result
    True
"""

# set[str|int|float]: with 'a' and 1 and 2.2
result = ...

Code 5.8. Solution
"""
* Assignment: Sequence Set Many
* Required: yes
* Complexity: easy
* Lines of code: 9 lines
* Time: 8 min

English:
    1. Non-functional requirements:
        a. Assignmnet verifies creation of `set()` and method `.add()` and `.update()` usage
        b. For simplicity numerical values type as `floats`, and not `str`
        c. Example: instead of '5.8' just type 5.8
        d. Do not use `str.split()`, `slice`, `getitem`, `for`, `while` or any other control-flow statement
    2. Create set `result` representing row with index 1
    3. Values from row at index 2 add to `result` using `.add()` (five calls)
    4. From row at index 3 create `set` and add it to `result` using `.update()` (one call)
    5. From row at index 4 `tuple` and add it to `result` using `.update()` (one call)
    6. From row at index 5 `list` and add it to `result` using `.update()` (one call)
    7. Run doctests - all must succeed

Polish:
    1. Wymagania niefunkcjonalne:
        a. Zadanie sprawdza tworzenie `set()` oraz użycie metod `.add()` i `.update()`
        b. Dla uproszczenia wartości numeryczne wypisuj jako `float`, a nie `str`
        c. Przykład: zamiast '5.8' zapisz 5.8
        d. Nie używaj `str.split()`, `slice`, `getitem`, `for`, `while` lub jakiejkolwiek innej instrukcji sterującej
    2. Stwórz zbiór `result` reprezentujący wiersz o indeksie 1
    3. Wartości z wiersza o indeksie 2 dodawaj do `result` używając `.add()` (pięć wywołań)
    4. Na podstawie wiersza o indeksie 3 stwórz `set` i dodaj go do `result` używając `.update()` (jedno wywołanie)
    5. Na podstawie wiersza o indeksie 4 stwórz `tuple` i dodaj go do `result` używając `.update()` (jedno wywołanie)
    6. Na podstawie wiersza o indeksie 5 stwórz `list` i dodaj go do `result` używając `.update()` (jedno wywołanie)
    7. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'
    >>> assert len(result) == 22, \
    'Variable `result` length should be 22'

    >>> ('sepal_length' not in result
    ...  and 'sepal_width' not in result
    ...  and 'petal_length' not in result
    ...  and 'petal_width' not in result
    ...  and 'species' not in result)
    True

    >>> result >= {5.8, 2.7, 5.1, 1.9, 'virginica'}
    True
    >>> result >= {5.1, 3.5, 1.4, 0.2, 'setosa'}
    True
    >>> result >= {5.7, 2.8, 4.1, 1.3, 'versicolor'}
    True
    >>> result >= {6.3, 2.9, 5.6, 1.8, 'virginica'}
    True
    >>> result >= {6.4, 3.2, 4.5, 1.5, 'versicolor'}
    True
"""

DATA = [
    'sepal_length,sepal_width,petal_length,petal_width,species',
    '5.8,2.7,5.1,1.9,virginica',
    '5.1,3.5,1.4,0.2,setosa',
    '5.7,2.8,4.1,1.3,versicolor',
    '6.3,2.9,5.6,1.8,virginica',
    '6.4,3.2,4.5,1.5,versicolor',
]

# set[float|str]: with row at DATA[1] (manually converted to float and str)
result = ...

# add to result float 5.1
result = ...

# add to result float 3.5
result = ...

# add to result float 1.4
result = ...

# add to result float 0.2
result = ...

# add to result str setosa
result = ...

# update result with set 5.7, 2.8, 4.1, 1.3, 'versicolor'
result = ...

# update result with tuple 6.3, 2.9, 5.6, 1.8, 'virginica'
result = ...

# update result with list 6.4, 3.2, 4.5, 1.5, 'versicolor'
result = ...