The Core Concept: Strings Are Immutable
This single fact explains most string behavior: you cannot change a string in place. Every "modification" creates a new string object.
s = "hello" s[0] = "H" # TypeError — can't modify in place s = s.upper() # Works — creates a NEW string
Because strings are immutable, they can be used as dictionary keys, set members, and shared safely between variables. The trade-off is that every "change" allocates a new string in memory.
Unicode by default — Python 3 strings are Unicode. No separate "byte string" vs "text" confusion:
len("café") # 4 characters, not 5 bytes len("😀") # 1 character
Creating Strings
Quote Styles
Pick whichever avoids escaping:
'single quotes' "double quotes" "it's easy" # No escaping needed 'she said "hi"' # No escaping needed
Triple Quotes — Multi-line Strings
msg = """This spans multiple lines automatically""" # Also used for docstrings def greet(name): """Return a greeting for the given name.""" return f"Hello, {name}!"
Raw Strings — No Escape Interpretation
path = r"C:\new\folder" # Backslashes are literal pattern = r"\d+\.\d+" # Cleaner regex patterns
Use r"..." for Windows file paths and regular expressions — the two most common cases where backslashes cause trouble.
Escape Sequences
"\n" # Newline "\t" # Tab "\\" # Literal backslash "\'" # Literal single quote "\"" # Literal double quote "\0" # Null character
path = "C:\new\folder" print(path) # C: # ew # older
path = r"C:\new\folder" # or path = "C:\\new\\folder" print(path) # C:\new\folder
f-strings and Formatting
f-strings (Python 3.6+) are the preferred way to embed expressions in strings:
name = "Alice" age = 30 price = 49.99 # Basic interpolation f"{name} is {age} years old" # Format specifiers f"{price:.2f}" # '49.99' — 2 decimal places f"{name:>10}" # ' Alice' — right-aligned f"{age:05d}" # '00030' — zero-padded # Expressions inside f"{name.upper():>10}" # Methods + format spec combined # Debug shorthand (Python 3.8+) f"{name=}" # "name='Alice'" f"{2 + 2=}" # "2 + 2=4"
Other Formatting Styles
You'll see these in existing codebases:
# .format() method "Hello, {}".format(name) "Hello, {0}. You are {1}.".format(name, age) # %-style (still common in logging) "Hello, %s. You are %d." % (name, age)
Prefer f-strings for new code. Use .format() when the template is defined separately from the values. The % style still appears in logging calls.
Slicing
Strings support the same slicing syntax as lists: s[start:stop:step]
s = "Hello, World!" s[0:5] # 'Hello' — start to stop (exclusive) s[7:] # 'World!' — index 7 to end s[:5] # 'Hello' — start to index 5 s[-6:] # 'orld!' — last 6 characters s[::2] # 'Hlo ol!' — every other character s[::-1] # '!dlroW ,olleH' — reversed
Out-of-range slices silently return what's available: "hi"[0:100] returns "hi". But direct indexing like "hi"[100] raises IndexError.
Searching and Testing
# Membership "py" in "python" # True "PY" in "python" # False — case-sensitive # Starts/ends with "python".startswith("py") # True "python".endswith("on") # True "image.png".endswith((".png", ".jpg")) # True — accepts a tuple # Finding position "python".find("th") # 2 — returns index, or -1 if not found "python".index("th") # 2 — same but raises ValueError if not found # Counting "banana".count("a") # 3
pos = "hello".index("xyz") # ValueError: substring not found
pos = "hello".find("xyz") if pos != -1: print(f"Found at {pos}")
Replacing and Transforming
# Replacing "hello world".replace("world", "there") # 'hello there' "aaa".replace("a", "b", 2) # 'bba' — limit replacements # Case transformations "hello world".title() # 'Hello World' "hello world".capitalize() # 'Hello world' "HELLO".lower() # 'hello' "hello".upper() # 'HELLO' "Hello".swapcase() # 'hELLO' "hello".casefold() # 'hello' — aggressive lowercase for comparison
All these methods return a new string. The original is unchanged. You must reassign: s = s.upper()
Stripping and Padding
# Stripping whitespace " messy ".strip() # 'messy' " messy ".lstrip() # 'messy ' " messy ".rstrip() # ' messy' # Stripping specific characters "***hi***".strip("*") # 'hi' "xxhelloxx".strip("x") # 'hello'
# Padding and alignment "hi".center(10) # ' hi ' "hi".ljust(10) # 'hi ' "hi".rjust(10) # ' hi' "42".zfill(5) # '00042' # center/ljust/rjust accept a fill character "hi".center(10, "-") # '----hi----'
Splitting and Joining
# Basic split "a,b,c".split(",") # ['a', 'b', 'c'] # Join — called on the separator ",".join(['a', 'b', 'c']) # 'a,b,c' # Split lines "line1\nline2".splitlines() # ['line1', 'line2'] # Partition — splits on first/last occurrence "a.b.c".partition(".") # ('a', '.', 'b.c') "a.b.c".rpartition(".") # ('a.b', '.', 'c') # Limit splits "a,b,c,d".split(",", 2) # ['a', 'b', 'c,d']
"a b".split(" ") # ['a', '', 'b'] # Literal single-space split # Empty string from double space
"a b".split() # ['a', 'b'] # No argument = split on any # whitespace, ignore empties
Type Checking Methods
"abc".isalpha() # True — letters only "123".isdigit() # True — digits only "abc123".isalnum() # True — letters or digits " ".isspace() # True — whitespace only "Hello".istitle() # True — title case "HELLO".isupper() # True — all uppercase "hello".islower() # True — all lowercase
All is* methods return False for empty strings: "".isdigit() is False.
Type Conversion
# To string str(42) # '42' str(3.14) # '3.14' str(True) # 'True' str([1,2,3]) # '[1, 2, 3]' # From string int("42") # 42 float("3.14") # 3.14 # String to list of characters list("abc") # ['a', 'b', 'c'] # Ordinal conversions ord("A") # 65 — character to Unicode code point chr(65) # 'A' — code point to character
# String → bytes "café".encode("utf-8") # b'caf\xc3\xa9' # Bytes → string b'caf\xc3\xa9'.decode("utf-8") # 'café'
Common Pitfalls
Concatenation in Loops is O(n²)
result = "" for x in items: result += x # Creates a new string each iteration
result = "".join(items) # Single allocation
is vs == for Strings
a = "hello" b = "hello" a is b # True (interned), but DON'T rely on this a == b # True — always use == for string comparison
CPython interns some strings for performance, but this behavior is not guaranteed. is checks identity (same object), == checks equality (same value). Always use == for comparing string contents.
Forgetting to Reassign
s = "hello" s.upper() print(s) # 'hello' — unchanged!
s = "hello" s = s.upper() print(s) # 'HELLO'
Case-Sensitive Operations
user_input = "Yes" if user_input == "yes": print("confirmed") # Never prints!
user_input = "Yes" if user_input.lower() == "yes": print("confirmed") # Works!
Summary: Key Takeaways
| Concept | Key Takeaway |
|---|---|
| Immutability | Every "modification" creates a new string — you must reassign |
| Quote styles | Use ', ", or """ — pick whichever avoids escaping |
| Raw strings | r"..." for Windows paths and regex patterns |
| f-strings | Preferred formatting: f"{expr:spec}" with full expression support |
| Slicing | s[start:stop:step] — never raises IndexError |
in operator | Substring check: "py" in "python" — case-sensitive |
find() vs index() | find returns -1 on failure; index raises ValueError |
split() vs split(" ") | No-arg split() handles any whitespace and strips empties |
join() | Called on the separator: ",".join(list) |
| Concatenation | Use "".join() in loops, not += |
| Comparison | Always use ==, never is for string values |
| Case sensitivity | Normalize with .lower() or .casefold() before comparing |