# Processing Symbols

## Programming = rule-based processing of symbols

The symbols we deal with in a higher programming language are **numbers** and **characters**. Programming means to perform actions *on* and *with* this symbols:

*on* = symbols are **operands** (data)<br>
*with* = symbols are **operators** (instructions)

That‚Äôs what‚Äôs called *executable text*: the code contains  instructions, which are performed when we execute it. A lot of the  instructions (operations) are performed on the code itself and define/  alter the program flow.

## Different types of data (operands)

Numbers and characters (text) are different types of data and we have to specify in our code what type of data we're processing.

### Strings (text)

The data type for text is called string. A string is declared through quotation marks. There are some options:

In [6]:
print('text in single quotation marks')

text in single quotation marks


In [7]:
print("text in double quotation marks")

text in double quotation marks


In [8]:
print('''text in tripple quotation marks''')

text in tripple quotation marks


In [9]:
print("""text in triple (double) quotation marks""")

text in triple (double) quotation marks


In [11]:
# Single and double quotation marks can be used to print one of them:
print('"A quote" (Author)')

"A quote" (Author)


In [1]:
# Tripple quotation marks can be used to keep/ insert line breaks:
print('''Line 1
Line 2
Line 3''')

Line 1
Line 2
Line 3


### Numbers

Numbers are divided into integers and floating numbers. There's no special notation necessary.

In [2]:
print(4) # integer
print(1.2) # floating point

4
1.2


### Check the data type with type()

In [1]:
type('Line 1')

str

In [2]:
type(4)

int

In [3]:
type(-52.1)

float

## Converting data types (casting)

It's easy to transform (**cast**) data into a specific type with built-in methods.

In [5]:
str(4.2)

'4.2'

In [6]:
str(5)

'5'

In [7]:
int('7')

7

In [9]:
int(7.4)

7

In [8]:
float('7')

7.0

In [10]:
float(4)

4.0

## Basic operators

### Expressions with strings

In [18]:
print('A rose' + ' is a rose')
print('A rose' + ' is a rose' * 2)

A rose is a rose
A rose is a rose is a rose


### Expressions with numbers

In [19]:
print(4 * 3)
print(4 - 3)
print(4 / 3)
print(4 + 3)

12
1
1.3333333333333333
7


In [1]:
# Numbers as strings (in apostrophes) are like characters
print(5 + 9)
print('5' + '9')

14
59


## Variables

Instead of using the data just once and output it directly, we can store data in variables. This is very powerful, because data becomes reusable.

In [13]:
phrase_a = 'A rose'
phrase_b = ' is a rose'
print(phrase_a)
print(phrase_b)
print(phrase_a + phrase_b * 2)

A rose
 is a rose
A rose is a rose is a rose


<br>Variables are **initiated** through a name of your choice followed by `=` followed by the data you want to store in it. Allowed characters for **variable names**: characters, numbers, '_' (underscore). It's not allowed to start a name with a number. In Python it's convention to write regular variables in lowercase characters and separate them with _.

In [14]:
# Often the result of an expression is stored in a new variable
combined_phrase = phrase_a + phrase_b
print(combined_phrase)

A rose is a rose


<br>The data inside variables is not fixed. We can **override** or modify it. (We'll learn exceptions later.)

In [15]:
print(combined_phrase)
combined_phrase = combined_phrase + phrase_b
print(combined_phrase)

A rose is a rose
A rose is a rose is a rose


In [16]:
# Shortcut: +=
print(combined_phrase)
combined_phrase += phrase_b
print(combined_phrase)
combined_phrase += phrase_b * 4
print(combined_phrase)

A rose is a rose is a rose
A rose is a rose is a rose is a rose
A rose is a rose is a rose is a rose is a rose is a rose is a rose is a rose


<br>
We have to make sure that a variable is initiated first, before we access it in the following code.

In [13]:
# Not working:
print(variable_xy)
variable_xy = 'some data'

NameError: name 'variable_xy' is not defined

<br>It's very helpful to be able to read error messages. Most of them are easy to understand. It's best to start reading from the bottom, which says "NameError: name 'variable_xy is not defined" and we can see that this error occurs in line 2 of the code.

In [14]:
# Working:
variable_xy = 'some data'
print(variable_xy)

some data


## Reading files

A typical usage of a variable is to store some data external to the program in it. Next we'll open a text file and store its content in a variable, so that we can work with the data.

In [2]:
# Open a text file in reading mode ('r'):
with open('example.md', 'r') as f: # f is a variable and holds the file object
    # Read the text file into a new variable called txt:
    txt = f.read()

# Now the data is available:
print(txt)

# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.

"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, ¬ªI¬´, free will, and so forth." (30)

"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)

"I begin with the simple fact that living beings, having been shaped by evolution, have survival as their most fundamental, automatic, and built-in goal. To enhance the chances of its survival, any living being m

## Writing files

Writing text to a file is similar to reading it. Instead of opening a file in reading mode (`r`) we will open it in writing mode (`w`). If the file does not exist yet, it will be created. Instead of reading the content of the file into a variable, we will insert the content to be written into the method `write()`.

Reading:

```python
with open('filename.txt', 'r') as f:
    txt = f.read()
```

Writing:

```python
with open('filename.txt', 'w') as f:
    f.write(txt)
````

In [56]:
# Write the excerpt to disk
with open('excerpt.txt', 'w') as f:
    f.write(excerpt)

## Slicing

A string consists of single characters and we can access these individual characters. For that the variable name is followed by `[]` with the numbered **index** in the brackets.

In [2]:
s = 'The quick brown fox jumps over the lazy dog.'
# In programming counting starts with 0!
s[0]

'T'

In [3]:
s[3]

' '

In [4]:
# Counting from the end is done with a - before the index (starting with -1)
s[-1]

'.'

In [5]:
s[-2]

'g'

<br>It's possible to extract a range of characters instead of only one character. For that the starting point and ending point of the range is given, separated by a `:`.

In [7]:
s[0:19]

'The quick brown fox'

In [9]:
# If the range starts from the beginning it's possible to leave the first value empty
s[:19]

'The quick brown fox'

In [13]:
# Slicing from the end with negative indices
s[-13:-1]

'the lazy dog'

In [14]:
# If the range should start at the end it's necessary to leave the last value empty
s[-13:]

'the lazy dog.'

## Read a text as a list

With the code above the whole document is stored as one string. Sometimes it's more practical to separate the paragraphs for further processing. This can be done with `.readlines()` instead of `read()`. With `.readlines()` the content is stored in a list, each item of the list holding one line of text as a string.

In [15]:
with open('example.md', 'r') as f:
    txt = f.readlines()
    
print(txt)

['# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n', '\n', '"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, ¬ªI¬´, free will, and so forth." (30)\n', '\n', '"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)\n', '\n', '"I begin with the simple fact that living beings, having been shaped by evolution, have survival as their most fundamental, automatic, and built-in goal. To enhance the chances of 

In [7]:
type(txt)

list

<br>Now the list consists of several strings, which are separated with commas. The leading and trailing `[]` are indicators of a list. One advantage is that we can easily extract single items through their index.

In [6]:
txt[0]

'# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n'

In [7]:
type(txt[0])

str

In [26]:
txt[1]

'\n'

In [8]:
# Extract the first 5 items:
txt[:5]

['# Hofstadter, Douglas R. I Am a Strange Loop. New York: Basic Books, 2007.\n',
 '\n',
 '"Dealing with brains as multi-level systems is essential if we are to make even the slightest progress in analyzing elusive mental phenomena such as perception, concepts, thinking, consciousness, ¬ªI¬´, free will, and so forth." (30)\n',
 '\n',
 '"our brains [...] contain tiny events (neuron firings) and larger events (patterns of neuron frings), and the latter presumably somehow have <i>representational</i> qualities, allowing us to register and also to remember things that happen outside of our crania. Such internalization of the outer world in symbolic patterns in a brain is a pretty far-fetched idea, when you think about it, and yet we know it somehow came to exist, thanks to the pressures of evolution." (46)\n']

## String methods

Python provides several built-in methods to perform operations on strings. We'll store one item of the list in a new variable called `excerpt` and perform some operations on it.

In [16]:
# Store the third last item in a new variable
excerpt = txt[-3]
print(excerpt)

"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)



In [17]:
excerpt.upper()

'"[...] CATEGORY ASSIGNMENTS GO RIGHT TO THE CORE OF THINKING, THEY ARE DETERMINANT OF OUR ATTITUDE TOWARD EACH THING IN THE WORLD [...]." (319)\n'

<br>When methods are being performed on strings, they will not modify the string itself, instead will **return** (more on that later) a new string. Thus if we want to modify the string itself, we have to override it.

In [18]:
# Proof that the string is unchanged:
print(excerpt)
# Override excerpt:
excerpt = excerpt.upper()
print(excerpt)

"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)

"[...] CATEGORY ASSIGNMENTS GO RIGHT TO THE CORE OF THINKING, THEY ARE DETERMINANT OF OUR ATTITUDE TOWARD EACH THING IN THE WORLD [...]." (319)



In [19]:
# Lower the string again
excerpt = excerpt.lower()

<br>It's common that some parts of a string needs to be removed or replaced.

In [20]:
rose = 'A rose is a rose is a rose'
print(rose.replace('rose', 'code'))

A code is a code is a code


In [21]:
# The first string inside the () of .replace() will be removed by the latter.
# If the latter is empty, the first one will be removed.
excerpt = excerpt.replace('[...]', '')
print(excerpt)

" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world ." (319)



In [22]:
# Remove the space in before the dot:
excerpt = excerpt.replace(' .', '.')
print(excerpt)

" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world." (319)



We can search for the index of characters with the methods `.find()` and `rfind()` (reverse search). For example we might want to remove the page number at the end. 

In [23]:
# Find the index of the last (
index = excerpt.rfind('(')
print(index)
print(excerpt[index])

127
(


<br>Then we can use this index for slicing the string.

In [24]:
# Find the index of the last (
index = excerpt.rfind('(')
# Remove the part from index to the end.
excerpt = excerpt[:index]
print(excerpt)

" category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world." 


In [25]:
# Remove the apostrophes
excerpt = excerpt.replace('"', '')
print(excerpt)

 category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world. 


In [26]:
# Remove leading or trailing spaces with .strip()
excerpt = excerpt.strip()
print(excerpt)

category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.


In [27]:
# Capitalize the sentence.
excerpt = excerpt.capitalize()
print(excerpt)

Category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.


In [28]:
# Print the original text for comparision:
print(txt[-3])

"[...] category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world [...]." (319)



## More string methods

With `split()` we can split a string into items. The method returns a list of strings.

In [28]:
excerpt_list = excerpt.split()
print(excerpt_list)

['Category', 'assignments', 'go', 'right', 'to', 'the', 'core', 'of', 'thinking,', 'they', 'are', 'determinant', 'of', 'our', 'attitude', 'toward', 'each', 'thing', 'in', 'the', 'world.']


<br>If no **separator** is specified inside `split()`, it will split at spaces and newline characters ('\n'):

In [29]:
x = 'some words\nwith spaces and\nwithout.'
print(x)
print(type(x), '\n') # This adds an additional newline.
x = x.split()
print(x)
print(type(x))

some words
with spaces and
without.
<class 'str'> 

['some', 'words', 'with', 'spaces', 'and', 'without.']
<class 'list'>


<br>The resulting list shows that `split()` **removes** the separator.<br>Creating a string out of a list is done with the method `join()`. The syntax looks slightly confusing.

In [36]:
' '.join(x)

'some words with spaces and without.'

It starts with a string that is used as the element in between the items of a list. The list is inserted into the `()` of `.join()`.

In [37]:
''.join(x)

'somewordswithspacesandwithout.'

In [39]:
print('\n'.join(x))

some
words
with
spaces
and
without.


In [40]:
' * '.join(x)

'some * words * with * spaces * and * without.'

In [41]:
excerpt = ' '.join(excerpt_list)
print(excerpt)

Category assignments go right to the core of thinking, they are determinant of our attitude toward each thing in the world.


<br>Above we used the method <code>.readlines()</code> to read the content of a file into a list of separate strings. It's also easy to split it with <code>.split('\n')</code> or directly with <code>.splitlines()</code>.

In [43]:
sequence = 'Rose is a rose is a rose'
lines = (sequence+'\n')*3
print(lines)
lines = lines.splitlines() # same as lines.split('\n')
print(lines)

Rose is a rose is a rose
Rose is a rose is a rose
Rose is a rose is a rose

['Rose is a rose is a rose', 'Rose is a rose is a rose', 'Rose is a rose is a rose']


In [20]:
# remove multiple spaces
s = 'to   much      spaces '
print(s)
s = s.split()
print(s)
s = ' '.join(s)
print(s)

to   much      spaces 
['to', 'much', 'spaces']
to much spaces


## Work with your own text

For example
- read it from disk
- split it into words
- shuffle the order
- join it to a text
- split text into sentences
- capitalize sentences
- output some sentences

## String formatting

%-operator style

In [3]:
name = 'üêç'
print('Hello, %s' % name)

Hello, üêç


<br>
.format style

In [11]:
name = 'üêç'
print('Hello, {}'.format(name))

print('A rose is a {} is a {}'.format('rose', 'hose'))

print('A Rose is a {val1} is a {val2}'.format(val1='rose', val2='üå∑'))

Hello, üêç
A rose is a rose is a hose
A Rose is a rose is a üå∑


<br>
Literal string interpolation (Python 3.6+). This is the recommended way as it's the most readable.

In [14]:
print(f'Hello, {name}!')

# it's possible to embedd Python expressions
a = ' is a rose'
print(f'A Rose{a * 2}.')

Hello, üêç!
A Rose is a rose is a rose.


<br>
Inserting spaces.

In [30]:
sequence = 'Rose is a rose is a rose'
l = len(sequence)

print(sequence.ljust(l+10))
print(sequence.center(l+10))
print(sequence.rjust(l+10))

Rose is a rose is a rose          
     Rose is a rose is a rose     
          Rose is a rose is a rose


In [4]:
s = 'text, with commas and.'
print(s)
s = s.split('t')
print(s)

text, with commas and.
['', 'ex', ', wi', 'h commas and.']
