String Comparison in Python
In Python, strings are more than just sequences of characters, they are powerful tools that can be compared in ways that influence the flow of your code. Whether you are checking for equality or sorting data, string comparison is a core function that every developer needs to master. But how exactly does Python decide which string is "greater" or "less"? Dive into the mechanics behind string comparison and discover how understanding this concept can elevate your python programming to the next level.
Table of Content
- String comparison
- String conversion
- Escape sequences
- Raw string literals
- String formatting
- String format using the format methods of string class
String Comparison
The operators is and is not are used to compare the identity of strings
and other objects. They check whether the two strings occupy the same
space in memory.
The comparison operators ==, !=,<, >, <= and >= are used to compare
strings. As usual, they return a Boolean value True or False. Two strings are considered equal if their content is exactly the same.
>>> s1 = 'Python'
>>> s2 = 'Python'
>>> s1 == s2
Output
True
>>> s1 != s2
Output
False
The comparisons performed by the comparison operators are case-sensitive.
For example, 'Python' and 'python' will not be considered equal. To
ignore case and perform case-insensitive comparisons, you can convert both
strings to either lowercase or uppercase by using the upper and lower
methods.
>>> s1 = 'Python'
>>> s2 = 'python'
>>> s1 == s2
Output
False
>>> s1.lower() == s2.lower()
Output
True
>>> s1.upper() == s2.upper()
Output
True
The casefold() method can also be used for caseless matching of the
strings, as it returns a casefolded copy of the string. This method will work
properly even if your string contains Unicode characters.
>>> s1.casefold() == s2.casefold()
Output
True
The comparison operators compare the individual characters according to
the ASCII or Unicode value code point. Lowercase letters are considered
larger than the corresponding uppercase letters as the lowercase letters have
a bigger code point than the uppercase ones.
>>> ord('P')
Output
80
>>> ord('p')
Output
112
>>> 'Python' < 'python'
Output
True
When the string contains all lowercase or all uppercase letters, the
comparison is done in regular alphabetic order as in a dictionary.
String Conversion
A type can be converted to another type using the type name as a function if
the conversion is supported. Suppose you have a string that represents a
number.
>>> s = '23'
The type of this variable s is str, so you cannot perform any arithmetic
operation supported by int type.
>>> s + 1
Output
TypeError: can only concatenate str (not "int") to str
However, you can perform operations by converting s to int or float.
>>> x = int(s)
>>> x + 1
Output
24
>>> float(s) / 2
Output
11.5
>>> s = 'UP05788'
>>> n = int(s[2:])
>>> n + 1
Output
5789
You can similarly convert strings to other types like list or set. We will
see these types in the coming sections. The conversion to int is a bit
different from others as it can take a second argument also we have
discussed this in the previous section.
The type name str can be used as a function to create string objects. If the
argument you send is a string, the function str returns a new string object
that is a copy of the string. If the argument is a non-string type, it returns a
string object that represents the string form of the argument, provided the
argument is convertible to a string.
If we try to concatenate a string with a number, a TypeError is raised. The
number must be converted to a string by using the str function.
>>> s1 = 'UP05'
>>> n = 2456
>>> s1 + n
Output
TypeError: can only concatenate str (not "int") to str
>>> s1 + str(n)
Output
'UP052456'
The functions bin, oct and hex can also convert a number to a string in
an appropriate base.
>>> bin(100)
Output
'0b1100100'
>>> oct(100)
Output
'0o144'
>>> hex(100)
Output
'0x64'
Escape Sequences
Inside a string, the backslash (\) is considered an escape character. It is used
to indicate that the following character has special meaning, so it should not
be treated in the regular way. We have already seen how to include a newline
and a tab using the character combinations '\n' and '\t’. These
character combinations are examples of escape sequences. The combination
'\n' or '\t' is considered a single character known as an escape
character.
Escape sequences are special character representations that are represented
by a combination of characters where the first character is a backslash,
followed by one or more characters. When they appear inside a string, they
are replaced by the single character that they represent. Escape sequences let
us embed special non-printing characters that cannot be typed on a
keyboard in a string. They also resolve ambiguity, such as printing a single
quote inside a single quoted string.
Let us use these escape sequences in our strings. We know that '\n'
represents a newline character, and when it is written inside a string, it will
start a new line on the screen.
>>> print('How\nare\nyou')
Output
How
are
you
Here we are printing a string that contains the escape sequence '\n’. We
can see that each '\n' is replaced with a newline character; it is printed in
the form of a newline. So, you can print the text inside a single string in
multiple lines. Let us see the length of this string.
>>>len('How\nare\nyou')
Output
11
The escape sequence '\n' is counted as just one character, so we have
3+1+3+1+3, which is 11. If we use the escape sequence '\t’, then it is
replaced by a tab character which provides space between 2 values.
>>> print('How\tare\nyou')
Output
How are
you
An escape sequence is called so as it escapes the usual meaning of a letter or
character (like n in '\n’) and gives it a whole new meaning.
When Python does not recognize the character after a backslash as an escape
code, it just keeps the backslash literally in the string.
>>> print('H\el\lo')
Output
H\el\lo
Here, e and l are not escape codes, so the backslash is literally included in
the string. This means that the backslash is included as itself in the string and is not treated specially. The replacement is done only when the backslash is
followed by a valid escape code.
Now, suppose we want to print or use a string that contains some Windows
Path.
>>> print('C:\textfiles\newFile')
Output
C: extfiles
ewFile
Both '\t' and backslash '\n' are recognized as escape sequences. So,
they are replaced by their respective characters. However, we do not want
this replacement to be done in this case. We want to print the backslash
literally, even when followed by an escape code. To print a literal backslash
character, you must use double backslashes.
>>> print('C:\\textfiles\\newFile')
Output
C:\textfiles\newFile
Now, the backslashes are printed literally. We could also use raw strings, as
we will discuss shortly.
If we try to print a string containing a single quote and enclosed inside single
quotes, we will get a syntax error.
>>> print('Don't run')
Output
SyntaxError: unterminated string literal
One solution to this problem is to enclose the whole string inside double
quotes instead of single quotes. Another solution is to use an escape
sequence.
>>> print('Don\'t run')
Output
Don't run
Here, the interpreter sees that the single quote is preceded by a backslash, so
it will print a single quote; it will not use this single quote to end the string.
This way, you can insert a single quote inside a string enclosed in single
quotes and similarly, you can insert a double quote inside a double quoted
string.
Raw String Literals
If you want to turn off the backslash escape mechanism in a string, you can
precede the string literal with the letter r. These are called raw strings. They
treat backslash as a literal character and not as an escape character. Every
character inside a raw string stays the way it is written inside the string.
>>> s = r'hello\n'
>>> print(s)
Output
hello\n
Raw strings can be helpful when you have strings that contain many
backslashes like Windows path and regular expressions.
>>> print(r'C:\Deepali\newFiles')
Output
C:\Deepali\newFiles
Here, '\n' is not considered an escape sequence. Since the string is
preceded by r, it is a raw string. The interpreter considers the backslash as a
normal character of the string and not as a start of an escape sequence. If we
remove r, then '\n' is considered an escape sequence.
>>> print('C:\Deepali\newFiles')
Output
C:\Deepali
ewFiles
String Formatting
We have the 3 variables of type str, int and float.
>>> name = 'Raj'
>>> age = 23
>>> wt = 43.567
We know that we can create a string by concatenating strings literal and
variables.
>>> s = 'My name is ' + name + ', I am ' + str(age)
+ ' years old and my weight is ' + str(wt) + ' kg'
>>> s
Output
'My name is Raj, I am 23 years old and my weight is 43.567 kg'
Here, we have all the string literals and variables separated by commas. Till
now, we have been using these simple approaches for displaying our data,
but these approaches were not very readable. Python has different formatting
styles that we can use to do more value formatting and display the output in
an organized way.
We need to format strings to present data in a better way. This is required
when data is to be displayed to the program’s user in a readable and
understandable manner. In the following image, you can clearly see the
difference between the data displayed without any formatting and after
formatting.
String formatting also allows us to interpolate values of variables into
strings, which means that we can insert values inside strings using different formats. You need to format strings for better display on the screen. String
formatting is also required when you need to substitute variables.
There are three ways of formatting strings in Python. There is no need to
learn all of them, but knowing them is good as you might encounter them in
someone else’s code. The first is the old-style formatting, which uses the %
operators like the C language. This style is still supported but is deprecated.
>>> name = 'Raj'
>>> age = 23
>>> wt = 47.5
>>> s = 'My name is %s, I am %d years old and my weight is %f kg' % (name, age, wt)
>>> s
Output
'My name is Raj, I am 23 years old and my weight is 47.500000 kg'
In Python 3, a newer style was introduced, which used the format method
of string class. This was introduced in Python 3 but was backported to
Python 2.6.
>>> name = 'Raj'
>>> age = 23
>>> wt = 47.5
>>> s = 'My name is {}, I am {} years old and my weight is {} kg'.format(name, age, wt)
>>> print(s)
Output
My name is Raj, I am 23 years old and my weight is 47.5 kg
The curly braces act as placeholders for the data and the values are sent as
arguments to the format method.
In Python 3.6, a new formatting approach was introduced that used
formatted string literals also called f-strings.
>>> name = 'Raj'
>>> age = 23
>>> wt = 47.5
>>> s = f'My name is {name}, I am {age} years old and my weight is {wt} kg'
>>> print(s)
Output
My name is Raj, I am 23 years old and my weight is 47.5 kg
Using these f-string literals, you can embed Python expressions inside a
string literal using curly braces. They are called f-strings because you get a
formatted string literal by prefixing a string with the letter f.
So, when we have a string literal prefixed with f, any variable inside curly
braces is substituted with its value. You can see that this style is much clearer
than the previous two. It is the simplest one because you can directly insert
the names inside the string literal. In this section, we will mostly use the f-string formatting. You might encounter the format method style in some
other code, so it is discussed in the next section. In the rest of this section,
we will discuss f-strings.
Using f-strings, you can simply write your string whenever you want to
substitute the value of a variable, just put it inside curly braces. You can even
write Python expressions inside curly braces or call functions and methods
directly.
>>> name = 'Raj'
>>> age = 23
>>> wt = 47.567
>>> f'After 10 years {name.upper()} will be {age + 10} years old'
Output
'After 10 years RAJ will be 33 years old'
We have called the str method upper and used the expression age +
10. Curly braces are used to hold the variables or expressions; they are not
displayed. If you want to print left and right curly braces, double them up.
>>> f'He is {{ {name}, {age} }}'
Output
'He is { Raj, 23 }'
The double curly braces are displayed as a single curly brace.
You can specify a field width where the given value will be displayed.
>>> f'His name is {name:8} and he is {age:6} years old'
The numbers 8 and 6 represent the field width, so the variable name is
displayed in a width of 8, and age is displayed in a field width of 6. By
default, the text is left-aligned and numbers are right-aligned in their field.
We can force left alignment by using less than sign <. Similarly, the right
alignment can be forced using the greater than sign > and center alignment
by caret ^ sign.
>>> f'His name is {name:>8} and he is {age:<6} years old'
Output
'His name is Raj and he is 23 years old'
Now name is left-aligned and age is right-aligned.
>>> f'His name is {name:^8} and he is {age:^6} years old'
Output
'His name is Raj and he is 23 years old'
Now, both name and age are center-aligned in their fields.
To print an integer in a fixed point format, write :f.
>>> f'Age is {age:f} and weight is {wt}'
Output
'Age is 23.000000 and weight is 47.567'
The variable age is an integer, but since we have included :f, it is printed
with a point. We can also control the number of digits that are displayed.
>>> f'Age is {age:.3f} and weight is {wt}'
Output
'Age is 23.000000 and weight is 47.567'
The variable age is an integer, but since we have included :f, it is printed
with a point. We can also control the number of digits that are displayed.
>>> f'Age is {age:.3f} and weight is {wt}'
Output
'Age is 23.000 and weight is 47.567'
Now, only three decimal digits are displayed. We can also specify the width.
>>> f'Age is {age:<10.3f} and weight is {wt}'
Output
'Age is 23.000 and weight is 47.567'
The number 10 is the field width, and the less than symbol is for left
justification. Now, let us format the float value wt.
>>> f'Age is {age:<10.3f} and weight is {wt:.3}'
Output
'Age is 23.000 and weight is 47.6'
We have specified a colon, a dot, and the number 3. This number represents
the total number of digits displayed. So, we can see that a total of three digits
are displayed. Let us specify a width for it.
>>> f'Age is {age:<10.3f} and weight is {wt:8.3}'
Output
'Age is 23.000 and weight is 47.6'
We have specified a colon, a dot, and the number 3. This number represents
the total number of digits displayed. So, we can see that a total of three digits
are displayed. Let us specify a width for it.
>>> f'Age is {age:<10.3f} and weight is {wt:8.3}'
Output
'Age is 23.000 and weight is 47.6'
Now, eight spaces are reserved to display this value. If you want to control
the number of digits displayed after the decimal, use the letter f.
>>> f'Age is {age:<10.3f} and weight is {wt:8.3f}'
Output
'Age is 23.000 and weight is 47.567'
The number 3 represents the number of digits displayed after the decimal.
By default, your output fields will be padded using spaces; if you want a
character to be used for padding, you can place it just after the colon before
the alignment specifier. The character is used to display data when the data is
too small to fit in the assigned field width. It is called the fill character which can be any character except '{' or '}’.
>>> f'My name is {name:*^10} and age is {age:->12}'
Output
'My name is ***Raj**** and age is ----------23'
The variable name is center aligned in a field width of 10, while the asterisk
is a fill character. The variable age is right-aligned in a field width of 12 and the dash is a fill character. The fill character must be specified before the
alignment specifier and if you want to specify a fill character, it is necessary
to specify an alignment specifier. We know that numbers are right-justified
by default, but we have still specified the right alignment specifier because
we wanted padding done by dashes instead of spaces.
Escape sequences are interpreted as usual inside f-strings also. If you want to
suppress the escape mechanism, you can write raw f strings.
>>> print(fr'\name: {name}')
Output
\name: Raj
This \n is not considered an escape sequence here. We can write triple-quoted f-strings that span multiple lines.
>>> s = f'''My name is {name}, I am {age} years old and my weight is {wt} kg'''
>>> s
Output
'My name is Raj, I am 23 years old \nand my weight is 47.567 kg'
>>> print(s)
Output
My name is Raj, I am 23 years old and my weight is 47.567 kg
An integer can be displayed in hexadecimal, octal or binary base.
>>> num = 1247
>>> f'{num:x} {num:o} {num:b}'
Output
'4df 2337 10011011111'
We can use lowercase e or uppercase E to display a number in exponential
notation.
>>> num1 = 0.00000082478
>>> num2 = 3345600000000
>>> f'{num1:e} {num2:e} {num1:E} {num2:E}'
Output
'8.247800e-07 3.345600e+12 8.247800E-07 3.345600E+12'
If we have a big number and want to print the thousands separator, we can
write a comma after the colon.
>>> f'{num2:,}'
Output
'3,345,600,000,000'
Many times, in our programs, we need to display the value of variables and
expressions with their names.
>>> name = 'Raj'
>>> age = 23
>>> print(f'name = {name}, age = {age}')
Output
name = Raj, age = 23
>>> a = 14
>>> b = 12
>>> print(f'a + b = {a + b} , a - b = {a - b}')
>>> print(f'min(a,b) = {min(a,b)}, max(a,b) = {max(a,b)}')
Output
min(a,b) = 12, max(a,b) = 14
Instead of duplicating the name of the thing to be printed, we can specify it
once with an equal to sign, inside the curly braces.
>>> print(f'{name = }, {age = }')
Output
name = 'Raj', age = 23
>>> print(f'{a + b = }, {a - b = }')
Output
a + b = 26, a - b = 2
>>> print(f'{min(a,b) = }, {max(a,b) = }')
Output
min(a,b) = 12, max(a,b) = 14
String Formatting Using the Format() Method of String Class
f-strings were introduced in Python 3.6. If you are using an older version,
you have to use the format method to format strings.
>>> name = 'Raj'
>>> age = 23
>>> wt = 47.567
>>> s = 'My name is {}, I am {} years old and my weight is {} kg'.format(name, age, wt)
>>> s
Output
'My name is Raj, I am 23 years old and my weight is 47.567 kg'
When the curly braces are empty, the interpreter will substitute based on the
order of arguments sent in the format method. In the above example, the
first pair of curly braces are replaced with name, the second pair with age,
and the third pair with wt.
We can use index numbers inside curly braces to decide what goes where
while substituting values inside the string.
>>> s = 'My name is {0}, I am {1} years old and my weight is {2} kg'.format(name, age, wt)
>>> s
Output
'My name is Raj, I am 23 years old and my weight is 47.567 kg'
The value 0 refers to the first argument, 1 refers to the second argument and
2 refers to the third argument. This way, you can change the order of the
variables and use a data value even more than once.
>>> s = 'Age {1} years, Name {0}, weight {2} kg, bye from {0}'.format(name, age, wt)
>>> s
Output
'My name is Raj, I am 23 years old and my weight is 47.567 kg'
The value 0 refers to the first argument, 1 refers to the second argument and
2 refers to the third argument. This way, you can change the order of the
variables and use a data value even more than once.
>>> s = 'Age {1} years, Name {0}, weight {2} kg, bye from {0}'.format(name, age, wt)
>>> s
Output
'Age 23 years, Name Raj, weight 47.567 kg, bye from Raj'
In addition to positional arguments, we can send keyword arguments also.
These keyword arguments are called by their name.
>>> s = '{msg}, my name is {n}, I am {a} years old'.format(n=name, a=age, msg='Hello')
>>> s
Output
'Hello, my name is Raj, I am 23 years old'
We can mix both positional and keyword arguments in the same string.
>>> s = '{msg}, I am {1} years old and my weight is {0} kg'.format(wt, age, msg='Hello')
>>> s
Output
'Hello, I am 23 years old and my weight is 47.567 kg'
We can use conversion codes s, d, or f the code s to display the value as a
string; d to display the values as a decimal integer (base 10) and f to display
the value as a float with decimal places. When using f conversion for values,
you can limit the number of digits displayed after the decimal point. This can
be done by adding a dot followed by the number of digits after the decimal
you want displayed.
>>> num1 = 123
>>> num2 = 345.43678
>>> print('number1 is {:.2f}'.format(num1))
Output
number1 is 123.00
>>> print('number2 is {:.2f}'.format(num2))
Output
number2 is 345.44
The float value will be rounded off if it has more decimal places than the
number of places we want to display.
You can use 0 if you do not want any decimal places to be displayed.
>>> print('number2 is {:.0f}'.format(num2))
Output
number2 is 345
You can specify a width in which a given value is displayed.
>>> name = 'Raj'
>>> age = 23
>>> print('My name is {:8} and I am {:6} years old'.format(name,age)
Output
number2 is 345.44
The float value will be rounded off if it has more decimal places than the
number of places we want to display.
You can use 0 if you do not want any decimal places to be displayed.
>>> print('number2 is {:.0f}'.format(num2))
Output
number2 is 345
You can specify a width in which a given value is displayed.
>>> name = 'Raj'
>>> age = 23
>>> print('My name is {:8} and I am {:6} years old'.format(name,age))
Output
My name is Raj and I am 23 years old
By default, strings are left-justified in their width and numbers are right-justified. To change the justification, you can use symbols <, > or ^.
- < for left justification
- > for right justification
- ^ for center justification
>>> print('My name is {:^8} and I am {:<6} years old'.format(name, age))
Output
My name is Raj and I am 23 years old
In this example, a total of four digits of number are displayed in a
width of 10.
>>> number = 78.386367
>>> print('number is {:10.4}'.format(number))
Output
number is 78.39
In the next example, number is displayed in a width of 10 with four
decimal places.
>>> print('number is {:10.4f}'.format(number))
Output
number is 78.3864
If you want, you can specify a fill character for padding within the given
field. By default, this padding is done with spaces. The alignment specifier
should be provided to specify a padding character.
>>> print('My name is {:*^8} and age is {:.>6}'.format(name, age))
Output
My name is **Raj*** and age is 000023
You can provide a sign for numeric values.
- + Positive numbers have a + sign, and negative numbers have a -
sign - - Negative numbers have a minus sign
<space> Positive numbers preceded with space and negative numbers with
a - sign.
To specify an output type, you can use any of this characters.
String - s.
Integers - b for binary, d for decimal base 10 notation, x or X for
hexadecimal, o for octal notation
Floating point - e or E for exponential notation, f for fixed point notation.
>>> num = 246
>>> print('{:x}'.format(num))
Output
f6
>>> print('{:X}'.format(num))
Output
F6
>>> print('{:o}'.format(num))
Output
366
>>> print('{:b}'.format(num))
Output
11110110
>>> num1 = 0.000000000412
>>> num2 = 124300000000000
>>> print('{:e}'.format(num1))
Output
4.120000e-10
>>> print('{:e}'.format(num2))
Output
1.243000e+14
You can display your numeric data with a comma as the thousands separator.
>>> print('{:,}'.format(num2))
Output
124,300,000,000,000
Conclusion
Mastering string comparison in Python is crucial for anyone diving into text-based operations. Whether you are sorting data, validating user inputs or searching for specific patterns, understanding how Python handles strings using equality operators, lexicographical order, and case sensitivity can make your code cleaner and more efficient. This seemingly simple topic opens the door to more complex algorithms and problem-solving techniques. As you continue to explore the world of Python, leveraging the power of string comparisons will become an invaluable tool in your programming arsenal. So, dive in, experiment, and let Python’s flexibility unlock new possibilities in your projects.