5.7. Sequence Slice

5.7.1. Rationale

  • Slice argument must be int (positive, negative or zero)

  • Positive Index starts with 0

  • Negative index starts with -1

5.7.2. Slice Forwards

  • sequence[start:stop]

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[0:2]
'We'
>>> text[:2]
'We'
>>> text[0:9]
'We choose'
>>> text[:9]
'We choose'
>>> text[23:28]
'Moon!'
>>> text[23:]
'Moon!'

5.7.3. Slice Backwards

  • Negative index starts from the end and go right to left

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:-13]
'We choose to go'
>>> text[:-19]
'We choose'
>>> text[-12:]
'to the Moon!'
>>> text[-5:]
'Moon!'
>>> text[-5:-1]
'Moon'
>>> text[23:-2]
'Moo'
>>>
>>> text[-1:0]
''
>>> text[-2:0]
''
>>> text[-2:2]
''
>>> text[-5:5]
''

5.7.4. Step

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[::1]
'We choose to go to the Moon!'
>>> text[::2]
'W hoet ot h on'
>>> text[::-1]
'!nooM eht ot og ot esoohc eW'
>>> text[::-2]
'!oMeto go soce'

5.7.5. Out of Range

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:100]
'We choose to go to the Moon!'
>>>
>>> text[100:]
''

5.7.6. Ordered Sequences

Slicing str:

>>> data = 'abcde'
>>>
>>> data[0:3]
'abc'
>>> data[3:5]
'de'
>>> data[:3]
'abc'
>>> data[3:]
'de'
>>> data[::1]
'abcde'
>>> data[::-1]
'edcba'
>>> data[::2]
'ace'
>>> data[::-2]
'eca'
>>> data[1::2]
'bd'
>>> data[1:4:2]
'bd'

Slicing tuple:

>>> data = ('a', 'b', 'c', 'd', 'e')
>>>
>>> data[0:3]
('a', 'b', 'c')
>>> data[3:5]
('d', 'e')
>>> data[:3]
('a', 'b', 'c')
>>> data[3:]
('d', 'e')
>>> data[::2]
('a', 'c', 'e')
>>> data[::-1]
('e', 'd', 'c', 'b', 'a')
>>> data[1::2]
('b', 'd')
>>> data[1:4:2]
('b', 'd')

Slicing list:

>>> data = ['a', 'b', 'c', 'd', 'e']
>>>
>>> data[0:3]
['a', 'b', 'c']
>>> data[3:5]
['d', 'e']
>>> data[:3]
['a', 'b', 'c']
>>> data[3:]
['d', 'e']
>>> data[::2]
['a', 'c', 'e']
>>> data[::-1]
['e', 'd', 'c', 'b', 'a']
>>> data[1::2]
['b', 'd']
>>> data[1:4:2]
['b', 'd']

5.7.7. Unordered Sequences

Slicing set is not possible:

>>> data = {'a', 'b', 'c', 'd', 'e'}
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'set' object is not subscriptable

Slicing frozenset is not possible:

>>> data = frozenset({'a', 'b', 'c', 'd', 'e'})
>>>
>>> data[:3]
Traceback (most recent call last):
TypeError: 'frozenset' object is not subscriptable

5.7.8. Nested Sequences

>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> DATA[1:]  # doctest: +NORMALIZE_WHITESPACE
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> DATA[-3:]  # doctest: +NORMALIZE_WHITESPACE
[(6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[::2]  # doctest: +NORMALIZE_WHITESPACE
[[1, 2, 3],
 [7, 8, 9]]
>>>
>>> data[::2][1]
[7, 8, 9]
>>>
>>> data[::2][:1]
[[1, 2, 3]]
>>>
>>> data[::2][1][1:]
[8, 9]

5.7.9. Slice All

>>> text = 'We choose to go to the Moon!'
>>>
>>> text[:]
'We choose to go to the Moon!'

Column selection unfortunately does not work on list:

>>> data = [[1, 2, 3],
...         [4, 5, 6],
...         [7, 8, 9]]
...
>>> data[:]
[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>>
>>> data[:, 1]
Traceback (most recent call last):
TypeError: list indices must be integers or slices, not tuple
>>>
>>> data[:][1]
[4, 5, 6]

However this syntax is valid in numpy and pandas.

5.7.10. Index Arithmetic

>>> text = 'We choose to go to the Moon!'
>>> first = 23
>>> last = 28
>>> step = 2
>>>
>>> text[first:last]
'Moon!'
>>> text[first:last-1]
'Moon'
>>> text[first:last:step]
'Mo!'
>>> text[first:last-1:step]
'Mo'

5.7.11. Slice Function

  • Every n-th element

  • sequence[start:stop:step]

  • start defaults to 0

  • stop defaults to len(sequence)

  • step defaults to 1

>>> text = 'We choose to go to the Moon!'
>>>
>>> q = slice(23, 27)
>>> text[q]
'Moon'
>>>
>>> q = slice(None, 9)
>>> text[q]
'We choose'
>>>
>>> q = slice(23, None)
>>> text[q]
'Moon!'
>>>
>>> q = slice(23, None, 2)
>>> text[q]
'Mo!'
>>>
>>> q = slice(None, None, 2)
>>> text[q]
'W hoet ot h on'

5.7.12. Example

>>> from pprint import pprint
>>>
>>> DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
...         (5.8, 2.7, 5.1, 1.9, 'virginica'),
...         (5.1, 3.5, 1.4, 0.2, 'setosa'),
...         (5.7, 2.8, 4.1, 1.3, 'versicolor'),
...         (6.3, 2.9, 5.6, 1.8, 'virginica'),
...         (6.4, 3.2, 4.5, 1.5, 'versicolor'),
...         (4.7, 3.2, 1.3, 0.2, 'setosa')]
...
>>> pprint(DATA[1:])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor'),
 (4.7, 3.2, 1.3, 0.2, 'setosa')]
>>>
>>> pprint(DATA[1::2])
[(5.8, 2.7, 5.1, 1.9, 'virginica'),
 (5.7, 2.8, 4.1, 1.3, 'versicolor'),
 (6.4, 3.2, 4.5, 1.5, 'versicolor')]
>>>
>>> pprint(DATA[1::-2])
[(5.8, 2.7, 5.1, 1.9, 'virginica')]
>>>
>>> pprint(DATA[:1:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'),
 (6.3, 2.9, 5.6, 1.8, 'virginica'),
 (5.1, 3.5, 1.4, 0.2, 'setosa')]
>>>
>>> pprint(DATA[:-5:-2])
[(4.7, 3.2, 1.3, 0.2, 'setosa'), (6.3, 2.9, 5.6, 1.8, 'virginica')]
>>>
>>> pprint(DATA[1:-5:-2])
[]

5.7.13. Assignments

Code 5.14. Solution
"""
* Assignment: Sequence Slice Text
* Required: yes
* Complexity: easy
* Lines of code: 8 lines
* Time: 8 min

English:
    1. Remove title and military rank in each variable
    2. Remove also whitespaces at the beginning and end of a text
    3. Use only `slice` to clean text
    4. Run doctests - all must succeed

Polish:
    1. Usuń tytuł naukowy i stopień wojskowy z każdej zmiennej
    2. Usuń również białe znaki na początku i końcu tekstu
    3. Użyj tylko `slice` do oczyszczenia tekstu
    4. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert a is not Ellipsis, \
    'Assign result to variable: `a`'
    >>> assert b is not Ellipsis, \
    'Assign result to variable: `b`'
    >>> assert c is not Ellipsis, \
    'Assign result to variable: `c`'
    >>> assert d is not Ellipsis, \
    'Assign result to variable: `d`'
    >>> assert e is not Ellipsis, \
    'Assign result to variable: `e`'
    >>> assert f is not Ellipsis, \
    'Assign result to variable: `f`'
    >>> assert g is not Ellipsis, \
    'Assign result to variable: `g`'
    >>> assert type(a) is str, \
    'Variable `a` has invalid type, should be str'
    >>> assert type(b) is str, \
    'Variable `b` has invalid type, should be str'
    >>> assert type(c) is str, \
    'Variable `c` has invalid type, should be str'
    >>> assert type(d) is str, \
    'Variable `d` has invalid type, should be str'
    >>> assert type(e) is str, \
    'Variable `e` has invalid type, should be str'
    >>> assert type(f) is str, \
    'Variable `f` has invalid type, should be str'
    >>> assert type(g) is str, \
    'Variable `g` has invalid type, should be str'

    >>> example
    'Mark Watney'
    >>> a
    'Jan Twardowski'
    >>> b
    'Jan Twardowski'
    >>> c
    'Mark Watney'
    >>> d
    'Melissa Lewis'
    >>> e
    'Ryan Stone'
    >>> f
    'Ryan Stone'
    >>> g
    'Jan Twardowski'
"""

example = 'lt. Mark Watney, PhD'
a = 'dr hab. inż. Jan Twardowski, prof. AATC'
b = 'gen. pil. Jan Twardowski'
c = 'Mark Watney, PhD'
d = 'lt. col. ret. Melissa Lewis'
e = 'dr n. med. Ryan Stone'
f = 'Ryan Stone, MD-PhD'
g = 'lt. col. Jan Twardowski\t'

example: str = example[4:-5]

# str: Jan Twardowski
a = ...

# str: Jan Twardowski
b = ...

# str: Mark Watney
c = ...

# str: Melissa Lewis
d = ...

# str: Ryan Stone
e = ...

# str: Ryan Stone
f = ...

# str: Jan Twardowski
g = ...

Code 5.15. Solution
"""
* Assignment: Sequence Slice Substr
* Required: yes
* Complexity: easy
* Lines of code: 3 lines
* Time: 5 min

English:
    1. Use `str.find()` and slicing
    2. Print `TEXT` without text in `REMOVE`
    3. Run doctests - all must succeed

Polish:
    1. Użyj `str.find()` oraz wycinania
    2. Wypisz `TEXT` bez tekstu z `REMOVE`
    3. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is str, \
    'Variable `result` has invalid type, should be str'

    >>> result
    'We choose the Moon!'
"""

TEXT = 'We choose to go to the Moon!'
REMOVE = 'to go to '

# str: TEXT without REMOVE part
result = ...

Code 5.16. Solution
"""
* Assignment: Sequence Slice Sequence
* Required: yes
* Complexity: easy
* Lines of code: 2 lines
* Time: 3 min

English:
    1. Create set `result` with every second element from `a` and `b`
    2. Run doctests - all must succeed

Polish:
    1. Stwórz zbiór `result` z co drugim elementem `a` i `b`
    2. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert result is not Ellipsis, \
    'Assign result to variable: `result`'
    >>> assert type(result) is set, \
    'Variable `result` has invalid type, should be set'

    >>> result
    {0, 2, 4}
"""

a = (0, 1, 2, 3)
b = [2, 3, 4, 5]

# set[int]: with every second element from `a` and `b`
result = ...

Code 5.17. Solution
"""
* Assignment: Sequence Slice Split
* Required: yes
* Complexity: easy
* Lines of code: 6 lines
* Time: 8 min

English:
    1. Separate header from data
    2. Write header (first line) to `header` variable
    3. Write data without header to `data` variable
    4. Calculate pivot point: number records in `data` multiplied by PERCENT
    (division ratio below)
    5. Divide `data` into two lists:
        a. `train`: 60% - training data
        b. `test`: 40% - testing data
    6. From `data` write training data from start to pivot
    7. From `data` write test data from pivot to end
    8. Run doctests - all must succeed

Polish:
    1. Odseparuj nagłówek od danych
    2. Zapisz nagłówek (pierwsza linia) do zmiennej `header`
    3. Zapisz dane bez nagłówka do zmiennej `data`
    4. Wylicz punkt podziału: ilość rekordów w `data` razy PROCENT (
    proporcja podziału poniżej)
    5. Podziel `data` na dwie listy:
        a. `train`: 60% - dane do uczenia
        b. `test`: 40% - dane do testów
    6. Z `data` zapisz do uczenia rekordy od początku do punktu podziału
    7. Z `data` zapisz do testów rekordy od punktu podziału do końca
    8. Uruchom doctesty - wszystkie muszą się powieść

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert header is not Ellipsis, 'Assign result to variable: `header`'
    >>> assert data is not Ellipsis, 'Assign result to variable: `data`'
    >>> assert train is not Ellipsis, 'Assign result to variable: `train`'
    >>> assert test is not Ellipsis, 'Assign result to variable: `test`'
    >>> assert type(header) is tuple, 'Variable `header` has invalid type, should be tuple'
    >>> assert type(train) is list, 'Variable `train` has invalid type, should be list'
    >>> assert type(train) is list, 'Variable `train` has invalid type, should be list'
    >>> assert type(test) is list, 'Variable `test` has invalid type, should be list'
    >>> assert all(type(x) is tuple for x in train), 'All elements in `train` should be tuple'
    >>> assert all(type(x) is tuple for x in test), 'All elements in `test` should be tuple'
    >>> assert header not in train, 'Header should not be in `train`'
    >>> assert header not in test, 'Header should not be in `test`'

    >>> header  # doctest: +NORMALIZE_WHITESPACE
    ('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species')

    >>> train  # doctest: +NORMALIZE_WHITESPACE
    [(5.8, 2.7, 5.1, 1.9, 'virginica'),
     (5.1, 3.5, 1.4, 0.2, 'setosa'),
     (5.7, 2.8, 4.1, 1.3, 'versicolor'),
     (6.3, 2.9, 5.6, 1.8, 'virginica'),
     (6.4, 3.2, 4.5, 1.5, 'versicolor'),
     (4.7, 3.2, 1.3, 0.2, 'setosa')]

    >>> test  # doctest: +NORMALIZE_WHITESPACE
    [(7.0, 3.2, 4.7, 1.4, 'versicolor'),
     (7.6, 3.0, 6.6, 2.1, 'virginica'),
     (4.9, 3.0, 1.4, 0.2, 'setosa'),
     (4.9, 2.5, 4.5, 1.7, 'virginica')]
"""

DATA = [('Sepal length', 'Sepal width', 'Petal length', 'Petal width', 'Species'),
        (5.8, 2.7, 5.1, 1.9, 'virginica'),
        (5.1, 3.5, 1.4, 0.2, 'setosa'),
        (5.7, 2.8, 4.1, 1.3, 'versicolor'),
        (6.3, 2.9, 5.6, 1.8, 'virginica'),
        (6.4, 3.2, 4.5, 1.5, 'versicolor'),
        (4.7, 3.2, 1.3, 0.2, 'setosa'),
        (7.0, 3.2, 4.7, 1.4, 'versicolor'),
        (7.6, 3.0, 6.6, 2.1, 'virginica'),
        (4.9, 3.0, 1.4, 0.2, 'setosa'),
        (4.9, 2.5, 4.5, 1.7, 'virginica')]

# tuple[str]: with row at index 0 from DATA
header = ...

# list[tuple]: with rows at all the other indexes from DATA
data = ...

# list[tuple]: first 60% from data
train = ...

# list[tuple]: last 40% from data
test = ...