Processing Symbols

Programming = rule-based processing of symbols

The symbols we deal with in a higher programming language are numbers and characters. Programming means to perform actions on and with this symbols:

on = symbols are operands (data)
with = symbols are operators (instructions)

That’s what’s called executable text: the code contains instructions, which are performed when we execute it. A lot of the instructions (operations) are performed on the code itself and define/ alter the program flow.

Different types of data (operands)

Numbers and characters (text) are different types of data and we have to specify in our code what type of data we’re processing.

Strings (text)

The data type for text is called string. A string is declared through quotation marks. There are some options:

print('text in single quotation marks')
text in single quotation marks
print("text in double quotation marks")
text in double quotation marks
print('''text in tripple quotation marks''')
text in tripple quotation marks
print("""text in triple (double) quotation marks""")
text in triple (double) quotation marks
# Single and double quotation marks can be used to print one of them:
print('"A quote" (Author)')
"A quote" (Author)
# Tripple quotation marks can be used to keep/ insert line breaks:
print('''Line 1
Line 2
Line 3''')
Line 1
Line 2
Line 3

Numbers

Numbers are divided into integers and floating numbers. There’s no special notation necessary.

print(4) # integer
print(1.2) # floating point
4
1.2

Check the data type with type()

type('Line 1')
str
type(4)
int
type(-52.1)
float

Converting data types (casting)

It’s easy to transform (cast) data into a specific type with built-in methods.

str(4.2)
'4.2'
str(5)
'5'
int('7')
7
int(7.4)
7
float('7')
7.0
float(4)
4.0

Basic operators

Expressions with strings

print('A rose' + ' is a rose')
print('A rose' + ' is a rose' * 2)
A rose is a rose
A rose is a rose is a rose

Expressions with numbers

print(4 * 3)
print(4 - 3)
print(4 / 3)
print(4 + 3)
12
1
1.3333333333333333
7
# Numbers as strings (in apostrophes) are like characters
print(5 + 9)
print('5' + '9')
14
59

Variables

Instead of using the data just once and output it directly, we can store data in variables. This is very powerful, because data becomes reusable.

phrase_a = 'A rose'
phrase_b = ' is a rose'
print(phrase_a)
print(phrase_b)
print(phrase_a + phrase_b * 2)
A rose
 is a rose
A rose is a rose is a rose


Variables are initiated through a name of your choice followed by = followed by the data you want to store in it. Allowed characters for variable names: characters, numbers, ‘_’ (underscore). It’s not allowed to start a name with a number. In Python it’s convention to write regular variables in lowercase characters and separate them with _.

# Often the result of an expression is stored in a new variable
combined_phrase = phrase_a + phrase_b
print(combined_phrase)
A rose is a rose


The data inside variables is not fixed. We can override or modify it. (We’ll learn exceptions later.)

print(combined_phrase)
combined_phrase = combined_phrase + phrase_b
print(combined_phrase)
A rose is a rose
A rose is a rose is a rose
# Shortcut: +=
print(combined_phrase)
combined_phrase += phrase_b
print(combined_phrase)
combined_phrase += phrase_b * 4
print(combined_phrase)
A rose is a rose is a rose
A rose is a rose is a rose is a rose
A rose is a rose is a rose is a rose is a rose is a rose is a rose is a rose

We have to make sure that a variable is initiated first, before we access it in the following code.
# Not working:
print(variable_xy)
variable_xy = 'some data'
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_10471/2924663491.py in <module>
      1 # Not working
----> 2 print(variable_xy)
      3 variable_xy = 'some data'

NameError: name 'variable_xy' is not defined


It’s very helpful to be able to read error messages. Most of them are easy to understand. It’s best to start reading from the bottom, which says “NameError: name ‘variable_xy is not defined” and we can see that this error occurs in line 2 of the code.

# Working:
variable_xy = 'some data'
print(variable_xy)
some data

Reading files

A typical usage of a variable is to store some data external to the program in it. Next we’ll open a text file and store its content in a variable, so that we can work with the data.

# Open a text file in reading mode ('r'):
with open('example.md', 'r') as f: # f is a variable and holds the file object
    # Read the text file into a new variable called txt:
    txt = f.read()

# Now the data is available:
print(txt)
# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.

"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, »I«, free will, and so forth." (30)

"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)

"I begin with the simple fact that living beings, having been shaped by evolution, have survival as their most fundamental, automatic, and built-in goal. To enhance the chances of its survival, any living being must be able to react flexibly to events that take place in its environment. This means it must develop the ability to sense and to categorize, however rudimentarily, the goings-on in its immediate environment (most earthbound beings can pretty safely ignore comets crashing on Jupiter). Once the ability to sense external goings-on has developed, however, there ensues a curious side effect that will have vital and radical consequences. This is the fact that the living being's ability to sense certain aspects of its environment flips around and endows the being with the ability to sense certain aspects of <i>itself</i>. (73)

"Indeed, thinking about how one might tackle such an engineering challenge is a helpful way of simultaneously envisioning the process of perception in the brain of a living creature and its counterpart in the cognitive system of an artificial mind (or an alien creature, for that matter)." (77)

"A creature that thinks knows next to nothing of the subtrate allowing its thinking to happen, but nonetheless it knows all about its symbolic interpretation of the world, and knows very intimately something it calls »I«." (173)

"a human brain is a representational system that knows no bounds in terms of extensibility or flexibility of its categories." (182)

"The closing of the strange loop of human selfhood is deeply dependent upon the level-changing leap that is <i>perception</i>, which means <i>categorization</i>, and therefore, the richer and more powerful an organism's categorization equipment is, the more realized and rich will be its self." (209)

"Through language, other people's bodies can become flexible extensions of our own bodies." (213)

"A novel is not a specific sequence of words, because if it were, it could only be written in <i>one</i> language, in <i>one</i> culture. No, a novel is a <i>pattern</i> -- a particular collection of characters, events, moods, tones, jokes, allusions, and much more. And so a novel is an abstraction [...]." (224)

"The cells inside a brain are not the bearers of its consciousness; the bearers of consciousness are <i>patterns</i>. The pattern of organization is what matters, not the substance." (257)

"...symbol-level brain activity [...] that mirrors external events <i>is</i> consciousness" (276)

"My brain [...] is constantly seeking to label, to categorize, to find precedents and analogues – in other words, <i>to simplify while not letting essence slip away</i>." (279)

"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)

"we human beings [...] are unpredictable self-writing poems -- vague, metaphorical, ambiguous, and sometimes exceedingly beautiful." (363)

Writing files

Writing text to a file is similar to reading it. Instead of opening a file in reading mode (r) we will open it in writing mode (w). If the file does not exist yet, it will be created. Instead of reading the content of the file into a variable, we will insert the content to be written into the method write().

Reading:

with open('filename.txt', 'r') as f:
    txt = f.read()

Writing:

with open('filename.txt', 'w') as f:
    f.write(txt)
# Write the excerpt to disk
with open('excerpt.txt', 'w') as f:
    f.write(excerpt)

Slicing

A string consists of single characters and we can access these individual characters. For that the variable name is followed by [] with the numbered index in the brackets.

s = 'The quick brown fox jumps over the lazy dog.'
# In programming counting starts with 0!
s[0]
'T'
s[3]
' '
# Counting from the end is done with a - before the index (starting with -1)
s[-1]
'.'
s[-2]
'g'


It’s possible to extract a range of characters instead of only one character. For that the starting point and ending point of the range is given, separated by a :.

s[0:19]
'The quick brown fox'
# If the range starts from the beginning it's possible to leave the first value empty
s[:19]
'The quick brown fox'
# Slicing from the end with negative indices
s[-13:-1]
'the lazy dog'
# If the range should start at the end it's necessary to leave the last value empty
s[-13:]
'the lazy dog.'

Read a text as a list

With the code above the whole document is stored as one string. Sometimes it’s more practical to separate the paragraphs for further processing. This can be done with .readlines() instead of read(). With .readlines() the content is stored in a list, each item of the list holding one line of text as a string.

with open('example.md', 'r') as f:
    txt = f.readlines()
    
print(txt)
['# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n', '\n', '"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, »I«, free will, and so forth." (30)\n', '\n', '"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)\n', '\n', '"I begin with the simple fact that living beings, having been shaped by evolution, have survival as their most fundamental, automatic, and built-in goal. To enhance the chances of its survival, any living being must be able to react flexibly to events that take place in its environment. This means it must develop the ability to sense and to categorize, however rudimentarily, the goings-on in its immediate environment (most earthbound beings can pretty safely ignore comets crashing on Jupiter). Once the ability to sense external goings-on has developed, however, there ensues a curious side effect that will have vital and radical consequences. This is the fact that the living being\'s ability to sense certain aspects of its environment flips around and endows the being with the ability to sense certain aspects of <i>itself</i>. (73)\n', '\n', '"Indeed, thinking about how one might tackle such an engineering challenge is a helpful way of simultaneously envisioning the process of perception in the brain of a living creature and its counterpart in the cognitive system of an artificial mind (or an alien creature, for that matter)." (77)\n', '\n', '"A creature that thinks knows next to nothing of the subtrate allowing its thinking to happen, but nonetheless it knows all about its symbolic interpretation of the world, and knows very intimately something it calls »I«." (173)\n', '\n', '"a human brain is a representational system that knows no bounds in terms of extensibility or flexibility of its categories." (182)\n', '\n', '"The closing of the strange loop of human selfhood is deeply dependent upon the level-changing leap that is <i>perception</i>, which means <i>categorization</i>, and therefore, the richer and more powerful an organism\'s categorization equipment is, the more realized and rich will be its self." (209)\n', '\n', '"Through language, other people\'s bodies can become flexible extensions of our own bodies." (213)\n', '\n', '"A novel is not a specific sequence of words, because if it were, it could only be written in <i>one</i> language, in <i>one</i> culture. No, a novel is a <i>pattern</i> -- a particular collection of characters, events, moods, tones, jokes, allusions, and much more. And so a novel is an abstraction [...]." (224)\n', '\n', '"The cells inside a brain are not the bearers of its consciousness; the bearers of consciousness are <i>patterns</i>. The pattern of organization is what matters, not the substance." (257)\n', '\n', '"...symbol-level brain activity [...] that mirrors external events <i>is</i> consciousness" (276)\n', '\n', '"My brain [...] is constantly seeking to label, to categorize, to find precedents and analogues – in other words, <i>to simplify while not letting essence slip away</i>." (279)\n', '\n', '"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)\n', '\n', '"we human beings [...] are unpredictable self-writing poems -- vague, metaphorical, ambiguous, and sometimes exceedingly beautiful." (363)']
type(txt)
list


Now the list consists of several strings, which are separated with commas. The leading and trailing [] are indicators of a list. One advantage is that we can easily extract single items through their index.

txt[0]
'# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n'
type(txt[0])
str
txt[1]
'\n'
# Extract the first 5 items:
txt[:5]
['# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n',
 '\n',
 '"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, »I«, free will, and so forth." (30)\n',
 '\n',
 '"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)\n']

String methods

Python provides several built-in methods to perform operations on strings. We’ll store one item of the list in a new variable called excerpt and perform some operations on it.

# Store the third last item in a new variable
excerpt = txt[-3]
print(excerpt)
"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)
excerpt.upper()
'"[...] CATEGORY ASSIGNMENTS GO RIGHT TO THE CORE OF THINKING, THEY ARE DETERMINANT OF OUR ATTITUDE TOWARD EACH THING IN THE WORLD [...]." (319)\n'


When methods are being performed on strings, they will not modify the string itself, instead will return (more on that later) a new string. Thus if we want to modify the string itself, we have to override it.

# Proof that the string is unchanged:
print(excerpt)
# Override excerpt:
excerpt = excerpt.upper()
print(excerpt)
"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)

"[...] CATEGORY ASSIGNMENTS GO RIGHT TO THE CORE OF THINKING, THEY ARE DETERMINANT OF OUR ATTITUDE TOWARD EACH THING IN THE WORLD [...]." (319)
# Lower the string again
excerpt = excerpt.lower()


It’s common that some parts of a string needs to be removed or replaced.

rose = 'A rose is a rose is a rose'
print(rose.replace('rose', 'code'))
A code is a code is a code
# The first string inside the () of .replace() will be removed by the latter.
# If the latter is empty, the first one will be removed.
excerpt = excerpt.replace('[...]', '')
print(excerpt)
" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world ." (319)
# Remove the space in before the dot:
excerpt = excerpt.replace(' .', '.')
print(excerpt)
" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world." (319)

We can search for the index of characters with the methods .find() and rfind() (reverse search). For example we might want to remove the page number at the end.

# Find the index of the last (
index = excerpt.rfind('(')
print(index)
print(excerpt[index])
127
(


Then we can use this index for slicing the string.

# Find the index of the last (
index = excerpt.rfind('(')
# Remove the part from index to the end.
excerpt = excerpt[:index]
print(excerpt)
" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world." 
# Remove the apostrophes
excerpt = excerpt.replace('"', '')
print(excerpt)
 category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world. 
# Remove leading or trailing spaces with .strip()
excerpt = excerpt.strip()
print(excerpt)
category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.
# Capitalize the sentence.
excerpt = excerpt.capitalize()
print(excerpt)
Category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.
# Print the original text for comparision:
print(txt[-3])
"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)

More string methods

With split() we can split a string into items. The method returns a list of strings.

excerpt_list = excerpt.split()
print(excerpt_list)
['Category', 'assignments', 'go', 'right', 'to', 'the', 'core', 'of', 'thinking,', 'they', 'are', 'determinant', 'of', 'our', 'attitude', 'toward', 'each', 'thing', 'in', 'the', 'world.']


If no separator is specified inside split(), it will split at spaces and newline characters (‘\n’):

x = 'some words\nwith spaces and\nwithout.'
print(x)
print(type(x), '\n') # This adds an additional newline.
x = x.split()
print(x)
print(type(x))
some words
with spaces and
without.
<class 'str'> 

['some', 'words', 'with', 'spaces', 'and', 'without.']
<class 'list'>


The resulting list shows that split() removes the separator.
Creating a string out of a list is done with the method join(). The syntax looks slightly confusing.

' '.join(x)
'some words with spaces and without.'

It starts with a string that is used as the element in between the items of a list. The list is inserted into the () of .join().

''.join(x)
'somewordswithspacesandwithout.'
print('\n'.join(x))
some
words
with
spaces
and
without.
' * '.join(x)
'some * words * with * spaces * and * without.'
excerpt = ' '.join(excerpt_list)
print(excerpt)
Category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.


Above we used the method .readlines() to read the content of a file into a list of separate strings. It’s also easy to split it with .split(‘\n’) or directly with .splitlines().

sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)
Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose

['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']
# remove multiple spaces
s = 'to   much      spaces '
print(s)
s = s.split()
print(s)
s = ' '.join(s)
print(s)
to   much      spaces 
['to', 'much', 'spaces']
to much spaces

Work with your own text

For example

  • read it from disk

  • split it into words

  • shuffle the order

  • join it to a text

  • split text into sentences

  • capitalize sentences

  • output some sentences

String formatting

name = '🐍'
print('Hello, %s' % name)
Hello, 🐍

.format style
name = '🐍'
print('Hello, {}'.format(name))

print('A rose is a {} is a {}'.format('rose', 'hose'))

print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='🌷'))
Hello, 🐍
A rose is a rose is a hose
A Rose is a rose is a 🌷

Literal string interpolation (Python 3.6+). This is the recommended way as it's the most readable.
print(f'Hello, {name}!')

# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')
Hello, 🐍!
A Rose is a rose is a rose.

Inserting spaces.
sequence = 'Rose is a rose is a rose'
l = len(sequence)

print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))
Rose is a rose is a rose          
     Rose is a rose is a rose     
          Rose is a rose is a rose
s = 'text, with commas and.'
print(s)
s = s.split('t')
print(s)
text, with commas and.
['', 'ex', ', wi', 'h commas and.']