Python: From None to Machine Learning
latest
  • License
  • Python Versions

Agenda

  • Python: Basics
  • Python: Intermediate (level 1)
  • Python: Intermediate (level 2)
  • Python: Advanced
  • Python: Design Patterns
  • Python: Test Driven Development
  • Python: DevOps, CI/CD
  • Python: Performance Optimization
  • Python: Data Science and Analysis
  • Python: Numpy
  • Python: Pandas
  • Python: Microservices
  • Python: Django
  • Python: FastAPI
  • Python: Flask
  • Python: Graphical User Interface

Install

  • 1. Install
  • 2. Install Python
  • 3. Install Git
  • 4. Install Github
  • 5. Install IDE
  • 6. Install Project
  • 7. Install Doctest

Python Basics

  • 1. About
  • 2. Syntax
  • 3. Types
  • 4. Iterables
  • 5. Unpack
  • 6. Mappings
  • 7. Conditional
  • 8. Loops
  • 9. Comprehensions
  • 10. Files
  • 11. Functions
  • 12. Exception
  • 13. Object Oriented Programming
  • 14. Recap

Python Intermediate

  • 1. About
  • 2. Star
  • 3. Enum
  • 4. Match
  • 5. Idiom
  • 6. Type Annotation
  • 7. Dataclass
  • 8. JSON
  • 9. CSV
  • 10. Regular Expressions
    • 10.1. Regex Syntax About
    • 10.2. Regex Syntax Qualifier
    • 10.3. Regex Syntax Anchor
    • 10.4. Regex Syntax Negation
    • 10.5. Regex Syntax Identifier
    • 10.6. Regex Syntax Quantifier
    • 10.7. Regex Syntax Group
    • 10.8. Regex Syntax Flag
    • 10.9. Regex Syntax Look Ahead/Behind
    • 10.10. Regex Syntax Flavors
    • 10.11. Regex Syntax Use Cases
    • 10.12. Regex RE Match
    • 10.13. Regex RE Search
      • 10.13.1. Example
      • 10.13.2. Assignments
    • 10.14. Regex RE Findall, Finditer
    • 10.15. Regex RE Compare
    • 10.16. Regex RE Compile
    • 10.17. Regex RE Group
    • 10.18. Regex RE Multiline
    • 10.19. Regex RE Substitute
    • 10.20. Regex RE Split
    • 10.21. Regex RE Lazy
    • 10.22. Regex RE Type Annotation
    • 10.23. Regex Cheatsheet
  • 11. Datetime and Timezones
  • 12. Operator
  • 13. Modules and Packages
  • 14. Logging
  • 15. Mathematics
  • 16. Tests

Python Advanced

  • 1. About
  • 2. Syntax
  • 3. Generators
  • 4. Functional Programming
  • 5. Decorators
  • 6. Object Oriented Programming
  • 7. Protocols
  • 8. Performance
  • 9. Concurrency
  • 10. Recap

Database

  • 1. About
  • 2. Theory
  • 3. Normalization
  • 4. NoSQL
  • 5. SQL
  • 6. SQLite3
  • 7. SQLAlchemy
  • 8. Case Study

Design Patterns

  • 1. About
  • 2. UML
  • 3. Type Annotation
  • 4. Dataclass
  • 5. Operator
  • 6. Object Oriented Programming
  • 7. Protocols
  • 8. Decorators
  • 9. Behavioral
  • 10. Structural
  • 11. Creational
  • 12. Practices
  • 13. Paradigms

Numpy

  • 1. About
  • 2. Create
  • 3. Attributes
  • 4. Indexing
  • 5. Methods
  • 6. Random
  • 7. Operations
  • 8. Statistics
  • 9. Math
  • 10. Polynomial
  • 11. References

Pandas

  • 1. About
  • 2. Import & Export
  • 3. Series
  • 4. DataFrame
  • 5. Date
  • 6. Recap
  • 7. Case Studies

Matplotlib

  • 1. About
  • 2. Figure
  • 3. Style
  • 4. Chart
  • 5. Advanced
  • 6. Recap

Stdlib

  • 1. Modules and Packages
  • 2. Mathematics
  • 3. Locale
  • 4. Pickle
  • 5. XML
  • 6. Operating System
  • 7. Builtin
  • 8. Loop
  • 9. Performance
  • 10. TKInter

DevOps

  • 1. About
  • 2. Good Engineering Practices
  • 3. Tests
  • 4. Debugging
  • 5. Type Annotation
  • 6. CI/CD

Network

  • 1. About
  • 2. Protocols
  • 3. Web
  • 4. Transport

HTTP and Microservices

  • 1. About
  • 2. HTTP Protocol
  • 3. Microservices
  • 4. Auth

Django

  • 1. About
  • 2. Conf
  • 3. Models
  • 4. Admin
  • 5. Orm
  • 6. Views
  • 7. Utils
  • 8. API
  • 9. DevOps
  • 10. Async

FastAPI

  • 1. About
  • 2. FastAPI
  • 3. Pydantic
  • 4. Database
  • 5. Auth
  • 6. DevOps
  • 7. Use Cases
  • 8. Appendix

Data Science

  • 1. About
  • 2. Jupyter
  • 3. Python
  • 4. Data Visualization
  • 5. Scipy

Machine Learning

  • 1. About
  • 2. Introduction
  • 3. Sklearn
  • 4. Model Quality
  • 5. Decision Trees
  • 6. Regressions
  • 7. K-Nearest Neighbors
  • 8. Bayes
  • 9. Support Vector Machines
  • 10. Clustering
  • 11. Neural Networks
  • 12. References

Object Oriented Programming

  • 1. Paradigm
  • 2. Python
  • 3. Dynamic Typing

Blogposts

  • 1. Machine Learning Introduction

Dragon

  • 1. Dragon About
  • 2. Dragon ADR About
  • 3. Dragon ADR Use Case
  • 4. Dragon ADR Init Name
  • 5. Dragon ADR Init Position
  • 6. Dragon ADR Position Set
  • 7. Dragon ADR Position Change
  • 8. Dragon ADR Damage Make
  • 9. Dragon ADR Damage Take
  • Writing Progress
  • References in the Book
  • Survey
  • Python History
  • 14.3. Further reading
Python: From None to Machine Learning
  • Docs »
  • 1. About »
  • 10.13. Regex RE Search
  • Edit on GitHub

10.13. Regex RE Search¶

  • re.search()

  • Searches if pattern contains a string

10.13.1. Example¶

  • Usage of re.search()

>>> import re
>>>
>>>
>>> def contains(pattern, text):
...     if re.search(pattern, text):
...         return True
...     else:
...         return False
>>>
>>>
>>> COMMIT_MESSAGE = 'MYPROJ-1337, MYPROJ-997 removed obsolete comments'
>>> jira_issuekey = r'[A-Z]{2,10}-[0-9]{1,6}'
>>> redmine_number = r'#[0-9]+'
>>>
>>> contains(jira_issuekey, COMMIT_MESSAGE)
True
>>> contains(redmine_number, COMMIT_MESSAGE)
False
>>> import re
>>>
>>>
>>> TEXT = 'We choose to go to the moon.'
>>>
>>> result = re.search(r'moon', TEXT)
>>>
>>> result
<re.Match object; span=(23, 27), match='moon'>
>>>
>>> result.span()
(23, 27)
>>>
>>> result.regs
((23, 27),)
>>>
>>> TEXT[23]
'm'
>>> TEXT[23:27]
'moon'
>>> import re
>>>
>>>
>>> TEXT = 'We choose to go to the moon.'
>>>
>>>
>>> result = re.search(r'Mars', TEXT)
>>>
>>> result.group()
Traceback (most recent call last):
AttributeError: 'NoneType' object has no attribute 'group'
>>>
>>> result = re.search(r'Mars', TEXT)
>>> if result:
...     result.group()
>>>
>>>
>>> if result := re.search(r'Mars', TEXT):
...     result.group()

10.13.2. Assignments¶

Code 10.44. Solution¶
"""
* Assignment: RE Search Astronauts
* Complexity: easy
* Lines of code: 6 lines
* Time: 5 min

English:
    1. Use `re.search()` to get start and end position in `TEXT`:
        a. Define `a: tuple[int,int]` for 'Neil Armstrong'
        b. Define `b: tuple[int,int]` for 'Buzz Aldrin'
        c. Define `c: tuple[int,int]` for 'Michael Collins'
        d. Define `d: tuple[int,int]` for 'July 21 at 02:56 UTC'
        e. Define `e: tuple[int,int]` for 'Tranquility Base'
        f. Define `f: tuple[int,int]` for 'Mark Watney'
    2. For each element return tuple i.e. `(10, 20)`
    3. If element is not present in `TEXT` assign `None`
    4. Run doctests - all must succeed

Polish:
    1. Użyj `re.search()` aby dostać pozycję startu i końca w `TEXT`:
        a. Zdefiniuj `a: tuple[int,int]` dla 'Neil Armstrong'
        b. Zdefiniuj `b: tuple[int,int]` dla 'Buzz Aldrin'
        c. Zdefiniuj `c: tuple[int,int]` dla 'Michael Collins'
        d. Zdefiniuj `d: tuple[int,int]` dla 'July 21 at 02:56 UTC'
        e. Zdefiniuj `e: tuple[int,int]` dla 'Tranquility Base'
        f. Zdefiniuj `f: tuple[int,int]` dla 'Mark Watney'
    2. Dla każdego ciągu znaków zwracaj tuple np. `(10, 20)`
    3. Jeżeli ciąg znaków nie jest obecny w `TEXT` przypisz `None`
    4. Uruchom doctesty - wszystkie muszą się powieść

Hints:
    * `re.Match.span()`

References:
    [1] Wikipedia: Apollo 11
        URL: https://en.wikipedia.org/wiki/Apollo_11
        Year: 2019
        Retrieved: 2019-12-14

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(a) is tuple, 'a must be a tuple'
    >>> assert type(b) is tuple, 'b must be a tuple'
    >>> assert type(c) is tuple, 'c must be a tuple'
    >>> assert type(d) is tuple, 'd must be a tuple'
    >>> assert type(e) is tuple, 'e must be a tuple'
    >>> assert f is None, 'f must be a None'

    >>> assert len(a) == 2, 'a must be a tuple with two elements'
    >>> assert len(b) == 2, 'b must be a tuple with two elements'
    >>> assert len(c) == 2, 'c must be a tuple with two elements'
    >>> assert len(d) == 2, 'd must be a tuple with two elements'
    >>> assert len(e) == 2, 'e must be a tuple with two elements'

    >>> assert all(type(x) is int for x in a), 'a must be a tuple[int,int]'
    >>> assert all(type(x) is int for x in b), 'b must be a tuple[int,int]'
    >>> assert all(type(x) is int for x in c), 'c must be a tuple[int,int]'
    >>> assert all(type(x) is int for x in d), 'd must be a tuple[int,int]'
    >>> assert all(type(x) is int for x in e), 'e must be a tuple[int,int]'

    >>> a
    (78, 92)
    >>> b
    (116, 127)
    >>> c
    (562, 577)
    >>> d
    (326, 346)
    >>> e
    (761, 777)
"""

import re


TEXT = ("Apollo 11 was the spaceflight that first landed humans on the Moon. "
        "Commander Neil Armstrong and lunar module pilot Buzz Aldrin formed "
        "the American crew that landed the Apollo Lunar Module Eagle on "
        "July 20, 1969, at 20:17 UTC. Armstrong became the first person to "
        "step onto the lunar surface six hours and 39 minutes later on "
        "July 21 at 02:56 UTC; Aldrin joined him 19 minutes later. They spent "
        "about two and a quarter hours together outside the spacecraft, "
        "and they collected 47.5 pounds (21.5 kg) of lunar material to bring "
        "back to Earth. Command module pilot Michael Collins flew the command "
        "module Columbia alone in lunar orbit while they were on the Moon's "
        "surface. Armstrong and Aldrin spent 21 hours, 36 minutes on the "
        "lunar surface at a site they named Tranquility Base before lifting "
        "off to rejoin Columbia in lunar orbit. ")


# use re.search() to get 'Neil Armstrong' a (start, end) position or None
# type: tuple[int,int] | None
a = ...

# use re.search() to get 'Buzz Aldrin' a (start, end) position or None
# type: tuple[int,int] | None
b = ...

# use re.search() to get 'Michael Collins' a (start, end) position or None
# type: tuple[int,int] | None
c = ...

# use re.search() to get 'July 21 at 02:56 UTC' a (start, end) position or None
# type: tuple[int,int] | None
d = ...

# use re.search() to get 'Tranquility Base' a (start, end) position or None
# type: tuple[int,int] | None
e = ...

# use re.search() to get 'Mark Watney' a (start, end) position or None
# type: tuple[int,int] | None
f = ...


Code 10.45. Solution¶
"""
* Assignment: RE Search Moon Speech
* Complexity: easy
* Lines of code: 5 lines
* Time: 8 min

English:
    1. Use `re.search()` to find in text [1]
    2. Define `result: str` containing paragraph starting with 'We choose to go to the moon'
    3. Run doctests - all must succeed

Polish:
    1. Użyj `re.search()` do znalezienia w tekscie [1]
    2. Zdefiniuj `result: str` zawierający tekst paragrafu zaczynający się od słów "We choose to go to the moon"
    3. Uruchom doctesty - wszystkie muszą się powieść

References:
    [1] Kennedy, J.F. Moon Speech - Rice Stadium,
        URL: http://er.jsc.nasa.gov/seh/ricetalk.htm
        Year: 2019
        Retrieved: 2019-12-14

Hints:
    * All HTML paragraphs starts with `<p>` and ends with `</p>`
    * In real life paragraphs parsing is more complex

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is str, 'result must be a str'
    >>> assert not result.startswith('<p>'), 'result cannot start with <p>'
    >>> assert not result.endswith('</p>'), 'result cannot end with </p>'

    >>> result  # doctest: +NORMALIZE_WHITESPACE
    'We choose to go to the moon. We choose to go to the moon in this decade
     and do the other things, not because they are easy, but because they are
     hard, because that goal will serve to organize and measure the best of our
     energies and skills,because that challenge is one that we are willing to
     accept, one we are unwilling to postpone, and one which we intend to win,
     and the others, too.'
"""

import re


TEXT = ("<h1>TEXT OF PRESIDENT JOHN KENNEDY'S RICE STADIUM MOON SPEECH</h1>\n"
        "<p>President Pitzer, Mr. Vice President, Governor, "
        "CongressmanThomas, Senator Wiley, and Congressman Miller, Mr. Webb, "
        "Mr.Bell, scientists, distinguished guests, and ladies and "
        "gentlemen:</p><p>We choose to go to the moon. We choose to go to "
        "the moon in this decade and do the other things, not because they "
        "are easy, but because they are hard, because that goal will serve "
        "to organize and measure the best of our energies and skills,because "
        "that challenge is one that we are willing to accept, one we are "
        "unwilling to postpone, and one which we intend to win, and the "
        "others, too.</p><p>It is for these reasons that I regard the "
        "decision last year to shift our efforts in space from low to high "
        "gear as among the most important decisions that will be made during "
        "my incumbency in the office of the Presidency.</p><p>In the last 24 "
        "hours we have seen facilities now being created for the greatest "
        "and most complex exploration in man's history.We have felt the "
        "ground shake and the air shattered by the testing of a Saturn C-1 "
        "booster rocket, many times as powerful as the Atlas which launched "
        "John Glenn, generating power equivalent to 10,000 automobiles with "
        "their accelerators on the floor.We have seen the site where the F-1 "
        "rocket engines, each one as powerful as all eight engines of the "
        "Saturn combined, will be clustered together to make the advanced "
        "Saturn missile, assembled in a new building to be built at Cape "
        "Canaveral as tall as a48 story structure, as wide as a city block, "
        "and as long as two lengths of this field.</p>")


# use re.search() to get paragraph starting with "We choose..."
# use .group(1) to get the value from re.Match object
# type: str
result = ...


Code 10.46. Solution¶
"""
* Assignment: RE Search Time
* Complexity: easy
* Lines of code: 4 lines
* Time: 3 min

English:
    1. Use regular expressions to check `TEXT` [1]
       contains time in UTC (24 hour clock compliant with ISO-8601)
    2. Define `result: str` with matched time
    3. Use simplified checking `xx:xx UTC`,
       where `x` is a digit
    4. Text does not contain any invalid date
    5. Run doctests - all must succeed

Polish:
    1. Użyj wyrażeń regularnych do sprawdzenia czy `TEXT` [1]
       zawiera godzinę w UTC (format 24 godzinny zgodny z ISO-8601)
    2. Zdefiniuj `result: str` ze znalezionym czasem
    3. Użyj uproszczonego sprawdzania: `xx:xx UTC`,
       gdzie `x` to dowolna cyfra
    4. Tekst nie zawiera żadnej niepoprawnej godziny
    5. Uruchom doctesty - wszystkie muszą się powieść

References:
    [1] Wikipedia Apollo 11,
        URL: https://en.wikipedia.org/wiki/Apollo_11
        Year: 2019
        Retrieved: 2019-12-14

Hints:
    * `re.Match.group()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is str, 'result must be a str'
    >>> assert result.endswith('UTC'), 'result must contain timezone'

    >>> result
    '20:17 UTC'
"""

import re


TEXT = ("Apollo 11 was the spaceflight that first landed humans on the Moon. "
        "Commander Neil Armstrong and lunar module pilot Buzz Aldrin formed "
        "the American crew that landed the Apollo Lunar Module Eagle on July "
        "20, 1969, at 20:17 UTC. Armstrong became the first person to step "
        "onto the lunar surface six hours and 39 minutes later on July 21 at "
        "02:56 UTC; Aldrin joined him 19 minutes later. They spent about two "
        "and a quarter hours together outside the spacecraft, and they "
        "collected 47.5 pounds (21.5 kg) of lunar material to bring back to "
        "Earth. Command module pilot Michael Collins flew the command module "
        "Columbia alone in lunar orbit while they were on the Moon's surface."
        "Armstrong and Aldrin spent 21 hours, 36 minutes on the lunar surface"
        "at a site they named Tranquility Base before lifting off to rejoin "
        "Columbia in lunar orbit.")


# Pattern for searching time with timezone in 24 format, i.e. '23:59 UTC'
# Text does not contain any invalid date
# type: str
pattern = ...

# use re.search() to find pattern in TEXT, get result text
# use .group() to get the value from re.Match object
# type: str
result = ...


Code 10.47. Solution¶
"""
* Assignment: RE Search Time
* Complexity: easy
* Lines of code: 4 lines
* Time: 5 min

English:
    1. Use regular expressions to check `TEXT` [1]
       contains time in UTC (24 hour clock compliant with ISO-8601)
    2. Define `result: str` with matched time
    3. Use real checking `xx:xx UTC`,
       where `x` is a valid digit at the position
    4. Text contains invalid date `24:56 UTC`
    5. Run doctests - all must succeed

Polish:
    1. Użyj wyrażeń regularnych do sprawdzenia czy `TEXT` [1]
       zawiera godzinę w UTC (format 24 godzinny zgodny z ISO-8601)
    2. Zdefiniuj `result: str` ze znalezionym czasem
    3. Użyj poprawnego sprawdzania: `xx:xx UTC`,
       gdzie `x` to odpowiedni znak na danym miejscu
    4. Tekst zawiera niepoprawną godzinę: `24:56 UTC`
    5. Uruchom doctesty - wszystkie muszą się powieść

References:
    [1] Wikipedia Apollo 11,
        URL: https://en.wikipedia.org/wiki/Apollo_11
        Year: 2019
        Retrieved: 2019-12-14

Hints:
    * `re.Match.group()`

Tests:
    >>> import sys; sys.tracebacklimit = 0

    >>> assert type(result) is str, 'result must be a str'
    >>> assert result.endswith('UTC'), 'result must contain timezone'

    >>> result
    '02:56 UTC'
"""

import re


TEXT = ("Apollo 11 was the spaceflight that first landed humans on the Moon. "
        "Commander Neil Armstrong and lunar module pilot Buzz Aldrin formed "
        "the American crew that landed the Apollo Lunar Module Eagle on July "
        "20, 1969, at 24:56 UTC. Armstrong became the first person to step "
        "onto the lunar surface six hours and 39 minutes later on July 21 at "
        "02:56 UTC; Aldrin joined him 19 minutes later. They spent about two "
        "and a quarter hours together outside the spacecraft, and they "
        "collected 47.5 pounds (21.5 kg) of lunar material to bring back to "
        "Earth. Command module pilot Michael Collins flew the command module "
        "Columbia alone in lunar orbit while they were on the Moon's surface."
        "Armstrong and Aldrin spent 21 hours, 36 minutes on the lunar surface"
        "at a site they named Tranquility Base before lifting off to rejoin "
        "Columbia in lunar orbit.")


# Pattern for searching time with timezone in 24 format, i.e. '23:59 UTC'
# Text contains invalid date `24:56 UTC`
# type: str
pattern = ...

# use re.search() to find pattern in TEXT, get result text
# use .group() to get the value from re.Match object
# type: str
result = ...


Next Previous

© Copyright 2023, CC-BY-SA-4.0, Matt Harasymczuk <matt@astrotech.io>, last update: 2023-04-01 Revision aa288c0f.

Read the Docs v: latest
Versions
latest
Downloads
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.