Extract a substring from a string in Python (position, regex) | note.nkmk.me (2025)

This article explains how to extract a substring from a string in Python. You can get a substring by specifying its position and length, or by using regular expression (regex) patterns.

Contents

  • Extract a substring by specifying the position and length
    • Extract a character by index
    • Extract a substring by slicing
    • Extract based on the number of characters
  • Extract a substring with regex: re.search(), re.findall()
  • Regex pattern examples
    • Wildcard-like patterns
    • Greedy and non-greedy matching
    • Extract part of the pattern with parentheses
    • Match any single character
    • Match the start/end of the string
    • Extract by multiple patterns
    • Case-insensitive matching

To find the position of a substring or replace it with another string, see the following articles.

  • Search for a string in Python (Check if a substring is included/Get a substring position)
  • Replace strings in Python (replace, translate, re.sub, re.subn)

If you want to extract a substring from a text file, read the file as a string.

  • Read, write, and create files in Python (with and open())

Extract a substring by specifying the position and length

Extract a character by index

You can get a character at the desired position by specifying an index in []. Indexes start at 0 (zero-based indexing).

s = 'abcde'print(s[0])# aprint(s[4])# e

You can specify a backward position with negative values. -1 represents the last character.

print(s[-1])# eprint(s[-5])# a

An error is raised if you specify an index that doesn't exist.

# print(s[5])# IndexError: string index out of range# print(s[-6])# IndexError: string index out of range

Extract a substring by slicing

You can extract a substring in the range start <= x < stop with [start:stop]. If start is omitted, the range begins at the start of the string, and if stop is omitted, the range extends to the end of the string.

s = 'abcde'print(s[1:3])# bcprint(s[:3])# abcprint(s[1:])# bcde

You can also use negative values.

print(s[-4:-2])# bcprint(s[:-2])# abcprint(s[-4:])# bcde

If start > stop, no error is raised, and an empty string ('') is extracted.

print(s[3:1])# print(s[3:1] == '')# True

Out-of-range values are ignored.

print(s[-100:100])# abcde

In addition to the start position start and end position stop, you can also specify an increment step using the syntax [start:stop:step]. If step is negative, the substring will be extracted in reverse order.

print(s[1:4:2])# bdprint(s[::2])# aceprint(s[::3])# adprint(s[::-1])# edcbaprint(s[::-2])# eca

For more information on slicing, see the following article.

  • How to slice a list, string, tuple in Python

Extract based on the number of characters

The built-in len() function returns the number of characters in a string. You can use it to get the central character or to extract the first or second half of the string by slicing.

Note that you can specify only integers (int) for index [] and slice [:]. Division by / in indexing or slicing raises an error because the result is a floating-point number (float).

The following example uses integer division // which truncates the decimal part of the result.

s = 'abcdefghi'print(len(s))# 9# print(s[len(s) / 2])# TypeError: string indices must be integersprint(s[len(s) // 2])# eprint(s[:len(s) // 2])# abcdprint(s[len(s) // 2:])# efghi

Extract a substring with regex: re.search(), re.findall()

You can use regular expressions (regex) with the re module of the standard library.

  • Regular expressions with the re module in Python

Use re.search() to extract a substring matching a regex pattern. Specify the regex pattern as the first argument and the target string as the second argument.

import res = '012-3456-7890'print(re.search(r'\d+', s))# <re.Match object; span=(0, 3), match='012'>

In regex, \d matches a digit character, while + matches one or more repetitions of the preceding pattern. Therefore, \d+ matches one or more consecutive digits.

Since backslash \ is used in regex special sequences such as \d, it is convenient to use a raw string by adding r before '' or "".

  • Raw strings in Python

When a string matches the pattern, re.search() returns a match object. You can get the matched part as a string (str) using the group() method of the match object.

m = re.search(r'\d+', s)print(m.group())# 012print(type(m.group()))# <class 'str'>

For more information on regex match objects, see the following article.

  • How to use regex match objects in Python

As shown in the example above, re.search() returns the match object for the first occurrence only, even if there are multiple matching parts in the string.

re.findall() returns a list of all matching substrings.

print(re.findall(r'\d+', s))# ['012', '3456', '7890']

Regex pattern examples

This section provides examples of regex patterns using metacharacters and special sequences.

Wildcard-like patterns

. matches any single character except a newline, and * matches zero or more repetitions of the preceding pattern.

For example, a.*b matches the string starting with a and ending with b. Since * matches zero repetitions, it also matches ab.

print(re.findall('a.*b', 'axyzb'))# ['axyzb']print(re.findall('a.*b', 'a---b'))# ['a---b']print(re.findall('a.*b', 'aあいうえおb'))# ['aあいうえおb']print(re.findall('a.*b', 'ab'))# ['ab']

+ matches one or more repetitions of the preceding pattern. a.+b does not match ab.

print(re.findall('a.+b', 'ab'))# []print(re.findall('a.+b', 'axb'))# ['axb']print(re.findall('a.+b', 'axxxxxxb'))# ['axxxxxxb']

? matches zero or one preceding pattern. In the case of a.?b, it matches ab and the string with only one character between a and b.

print(re.findall('a.?b', 'ab'))# ['ab']print(re.findall('a.?b', 'axb'))# ['axb']print(re.findall('a.?b', 'axxb'))# []

Greedy and non-greedy matching

*, +, and ? are greedy matches, matching as much text as possible. In contrast, *?, +?, and ?? are non-greedy, minimal matches, matching as few characters as possible.

s = 'axb-axxxxxxb'print(re.findall('a.*b', s))# ['axb-axxxxxxb']print(re.findall('a.*?b', s))# ['axb', 'axxxxxxb']

Extract part of the pattern with parentheses

If you enclose part of a regex pattern in parentheses (), you can extract a substring in that part.

print(re.findall('a(.*)b', 'axyzb'))# ['xyz']

If you want to match parentheses () as characters, escape them with backslash \.

print(re.findall(r'\(.+\)', 'abc(def)ghi'))# ['(def)']print(re.findall(r'\((.+)\)', 'abc(def)ghi'))# ['def']

Match any single character

Using square brackets [] in a pattern matches any single character from the enclosed string.

Using a hyphen - between consecutive Unicode code points, like [a-z], creates a character range. For example, [a-z] matches any single lowercase alphabetical character.

print(re.findall('[abc]x', 'ax-bx-cx'))# ['ax', 'bx', 'cx']print(re.findall('[abc]+', 'abc-aaa-cba'))# ['abc', 'aaa', 'cba']print(re.findall('[a-z]+', 'abc-xyz'))# ['abc', 'xyz']

Match the start/end of the string

^ matches the start of the string, and $ matches the end of the string.

s = 'abc-def-ghi'print(re.findall('[a-z]+', s))# ['abc', 'def', 'ghi']print(re.findall('^[a-z]+', s))# ['abc']print(re.findall('[a-z]+$', s))# ['ghi']

Extract by multiple patterns

Use | to match a substring that conforms to any of the specified patterns. For example, to match substrings that follow either pattern A or pattern B, use A|B.

s = 'axxxb-012'print(re.findall('a.*b', s))# ['axxxb']print(re.findall(r'\d+', s))# ['012']print(re.findall(r'a.*b|\d+', s))# ['axxxb', '012']

Case-insensitive matching

The re module is case-sensitive by default. Set the flags argument to re.IGNORECASE to perform case-insensitive matching.

s = 'abc-Abc-ABC'print(re.findall('[a-z]+', s))# ['abc', 'bc']print(re.findall('[A-Z]+', s))# ['A', 'ABC']print(re.findall('[a-z]+', s, flags=re.IGNORECASE))# ['abc', 'Abc', 'ABC']
Extract a substring from a string in Python (position, regex) | note.nkmk.me (2025)
Top Articles
Latest Posts
Recommended Articles
Article information

Author: Domingo Moore

Last Updated:

Views: 5291

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.