Python Strings — The Essentials

Immutable sequences of Unicode characters — understanding creation, manipulation, searching, and the common gotchas

The Core Concept: Strings Are Immutable

This single fact explains most string behavior: you cannot change a string in place. Every "modification" creates a new string object.

python
s = "hello"
s[0] = "H"      # TypeError — can't modify in place
s = s.upper()    # Works — creates a NEW string
Why Immutability Matters

Because strings are immutable, they can be used as dictionary keys, set members, and shared safely between variables. The trade-off is that every "change" allocates a new string in memory.

Unicode by default — Python 3 strings are Unicode. No separate "byte string" vs "text" confusion:

python
len("café")   # 4 characters, not 5 bytes
len("😀")    # 1 character

Creating Strings

Quote Styles

Pick whichever avoids escaping:

python
'single quotes'
"double quotes"
"it's easy"          # No escaping needed
'she said "hi"'      # No escaping needed

Triple Quotes — Multi-line Strings

python
msg = """This spans
multiple lines
automatically"""

# Also used for docstrings
def greet(name):
    """Return a greeting for the given name."""
    return f"Hello, {name}!"

Raw Strings — No Escape Interpretation

python
path = r"C:\new\folder"    # Backslashes are literal
pattern = r"\d+\.\d+"      # Cleaner regex patterns
When to Use Raw Strings

Use r"..." for Windows file paths and regular expressions — the two most common cases where backslashes cause trouble.

Escape Sequences

python
"\n"    # Newline
"\t"    # Tab
"\\"    # Literal backslash
"\'"    # Literal single quote
"\""    # Literal double quote
"\0"    # Null character
Surprise — Escape Interpreted
path = "C:\new\folder"
print(path)
# C:
# ew
# older
Fix — Raw String or Double Backslash
path = r"C:\new\folder"
# or
path = "C:\\new\\folder"
print(path)
# C:\new\folder

f-strings and Formatting

f-strings (Python 3.6+) are the preferred way to embed expressions in strings:

python
name = "Alice"
age = 30
price = 49.99

# Basic interpolation
f"{name} is {age} years old"

# Format specifiers
f"{price:.2f}"               # '49.99' — 2 decimal places
f"{name:>10}"              # '     Alice' — right-aligned
f"{age:05d}"               # '00030' — zero-padded

# Expressions inside
f"{name.upper():>10}"      # Methods + format spec combined

# Debug shorthand (Python 3.8+)
f"{name=}"                 # "name='Alice'"
f"{2 + 2=}"               # "2 + 2=4"

Other Formatting Styles

You'll see these in existing codebases:

python
# .format() method
"Hello, {}".format(name)
"Hello, {0}. You are {1}.".format(name, age)

# %-style (still common in logging)
"Hello, %s. You are %d." % (name, age)
Which to Use?

Prefer f-strings for new code. Use .format() when the template is defined separately from the values. The % style still appears in logging calls.

Slicing

Strings support the same slicing syntax as lists: s[start:stop:step]

python
s = "Hello, World!"

s[0:5]      # 'Hello'       — start to stop (exclusive)
s[7:]       # 'World!'      — index 7 to end
s[:5]       # 'Hello'       — start to index 5
s[-6:]      # 'orld!'       — last 6 characters
s[::2]      # 'Hlo ol!'     — every other character
s[::-1]     # '!dlroW ,olleH' — reversed
Slicing Never Raises IndexError

Out-of-range slices silently return what's available: "hi"[0:100] returns "hi". But direct indexing like "hi"[100] raises IndexError.

Searching and Testing

python
# Membership
"py" in "python"                  # True
"PY" in "python"                  # False — case-sensitive

# Starts/ends with
"python".startswith("py")        # True
"python".endswith("on")          # True
"image.png".endswith((".png", ".jpg"))  # True — accepts a tuple

# Finding position
"python".find("th")              # 2 — returns index, or -1 if not found
"python".index("th")             # 2 — same but raises ValueError if not found

# Counting
"banana".count("a")              # 3
Fragile — index() Crashes
pos = "hello".index("xyz")
# ValueError: substring not found
Safe — find() Returns -1
pos = "hello".find("xyz")
if pos != -1:
    print(f"Found at {pos}")

Replacing and Transforming

python
# Replacing
"hello world".replace("world", "there")   # 'hello there'
"aaa".replace("a", "b", 2)               # 'bba' — limit replacements

# Case transformations
"hello world".title()       # 'Hello World'
"hello world".capitalize()  # 'Hello world'
"HELLO".lower()             # 'hello'
"hello".upper()             # 'HELLO'
"Hello".swapcase()          # 'hELLO'
"hello".casefold()          # 'hello' — aggressive lowercase for comparison
Remember: Strings Are Immutable

All these methods return a new string. The original is unchanged. You must reassign: s = s.upper()

Stripping and Padding

python
# Stripping whitespace
"  messy  ".strip()          # 'messy'
"  messy  ".lstrip()         # 'messy  '
"  messy  ".rstrip()         # '  messy'

# Stripping specific characters
"***hi***".strip("*")       # 'hi'
"xxhelloxx".strip("x")     # 'hello'
python
# Padding and alignment
"hi".center(10)       # '    hi    '
"hi".ljust(10)        # 'hi        '
"hi".rjust(10)        # '        hi'
"42".zfill(5)         # '00042'

# center/ljust/rjust accept a fill character
"hi".center(10, "-")  # '----hi----'

Splitting and Joining

python
# Basic split
"a,b,c".split(",")                  # ['a', 'b', 'c']

# Join — called on the separator
",".join(['a', 'b', 'c'])           # 'a,b,c'

# Split lines
"line1\nline2".splitlines()         # ['line1', 'line2']

# Partition — splits on first/last occurrence
"a.b.c".partition(".")              # ('a', '.', 'b.c')
"a.b.c".rpartition(".")             # ('a.b', '.', 'c')

# Limit splits
"a,b,c,d".split(",", 2)             # ['a', 'b', 'c,d']
Gotcha — split(" ")
"a  b".split(" ")
# ['a', '', 'b']
# Literal single-space split
# Empty string from double space
Better — split()
"a  b".split()
# ['a', 'b']
# No argument = split on any
# whitespace, ignore empties

Type Checking Methods

python
"abc".isalpha()       # True  — letters only
"123".isdigit()       # True  — digits only
"abc123".isalnum()    # True  — letters or digits
"   ".isspace()       # True  — whitespace only
"Hello".istitle()     # True  — title case
"HELLO".isupper()     # True  — all uppercase
"hello".islower()     # True  — all lowercase
Empty String Edge Case

All is* methods return False for empty strings: "".isdigit() is False.

Type Conversion

python
# To string
str(42)          # '42'
str(3.14)        # '3.14'
str(True)        # 'True'
str([1,2,3])    # '[1, 2, 3]'

# From string
int("42")        # 42
float("3.14")    # 3.14

# String to list of characters
list("abc")      # ['a', 'b', 'c']

# Ordinal conversions
ord("A")         # 65 — character to Unicode code point
chr(65)          # 'A' — code point to character
python
# String → bytes
"café".encode("utf-8")     # b'caf\xc3\xa9'

# Bytes → string
b'caf\xc3\xa9'.decode("utf-8")  # 'café'

Common Pitfalls

Concatenation in Loops is O(n²)

Slow — O(n²)
result = ""
for x in items:
    result += x
# Creates a new string each iteration
Fast — O(n)
result = "".join(items)
# Single allocation

is vs == for Strings

python
a = "hello"
b = "hello"
a is b   # True (interned), but DON'T rely on this
a == b   # True — always use == for string comparison
String Interning is an Implementation Detail

CPython interns some strings for performance, but this behavior is not guaranteed. is checks identity (same object), == checks equality (same value). Always use == for comparing string contents.

Forgetting to Reassign

Bug — Result Discarded
s = "hello"
s.upper()
print(s)  # 'hello' — unchanged!
Correct — Reassign
s = "hello"
s = s.upper()
print(s)  # 'HELLO'

Case-Sensitive Operations

Bug — Case Mismatch
user_input = "Yes"
if user_input == "yes":
    print("confirmed")
# Never prints!
Fix — Normalize Case
user_input = "Yes"
if user_input.lower() == "yes":
    print("confirmed")
# Works!

Summary: Key Takeaways

ConceptKey Takeaway
ImmutabilityEvery "modification" creates a new string — you must reassign
Quote stylesUse ', ", or """ — pick whichever avoids escaping
Raw stringsr"..." for Windows paths and regex patterns
f-stringsPreferred formatting: f"{expr:spec}" with full expression support
Slicings[start:stop:step] — never raises IndexError
in operatorSubstring check: "py" in "python" — case-sensitive
find() vs index()find returns -1 on failure; index raises ValueError
split() vs split(" ")No-arg split() handles any whitespace and strips empties
join()Called on the separator: ",".join(list)
ConcatenationUse "".join() in loops, not +=
ComparisonAlways use ==, never is for string values
Case sensitivityNormalize with .lower() or .casefold() before comparing