# Sentiment Analysis

## Sentiment analysis with NLTK

In [17]:
# If the cell below returns an error,
# uncomment the following to lines and execute this cell

# import nltk
# nltk.download('vader_lexicon')

In [18]:
from nltk.sentiment import SentimentIntensityAnalyzer as SIA
sia = SIA()

In [19]:
s = 'The lazy programmer jumps over the quick brown fox jumps over the lazy programmer jumps over the fire fox.'
sia.polarity_scores(s)

{'neg': 0.316, 'neu': 0.684, 'pos': 0.0, 'compound': -0.7506}

In [20]:
s = 'The enthusiastic programmer jumps over the quick brown fox jumps over the happy programmer jumps over the overjoyed fire fox.'
sia.polarity_scores(s)

{'neg': 0.083, 'neu': 0.552, 'pos': 0.366, 'compound': 0.8481}

`neg` + `neu` + `pos` = 1.0

`compound` varies between `-1` and `+1`.

In [21]:
res = sia.polarity_scores(s)
type(res)

dict

The data returned from `poarity_scores` is a **dictionary** (dict). More on [dictionaries](./dictionaries.ipynb).

In [23]:
# Access an element of a dictionary:
print(res['pos'])

0.366


## Sentiment analysis with TextBlob

In [10]:
from textblob import TextBlob

In [11]:
s = 'The lazy programmer jumps over the quick brown fox jumps over the lazy programmer jumps over the fire fox.'
blob = TextBlob(s)
print(blob.sentiment)

Sentiment(polarity=-0.05555555555555556, subjectivity=0.8333333333333334)


In [16]:
s = 'The enthusiastic programmer jumps over the quick brown fox jumps over the happy programmer jumps over the overjoyed fire fox.'
blob = TextBlob(s)
print(blob.sentiment)
print(blob.sentiment.polarity)

Sentiment(polarity=0.5777777777777778, subjectivity=0.7999999999999999)
0.5777777777777778


## Hugging Face

### Installation

[Docs](https://huggingface.co/docs/transformers/installation)

Hugging Face requires one of these machine learning frameworks: [PyTorch](https://pytorch.org/), [TensorFlow](https://www.tensorflow.org/install/pip) or [Flax](https://flax.readthedocs.io/en/latest/overview.html)

```shell
# Installation with torch
pip install transformers[torch]
```

**For mac users:**

```shell
pip install 'transformers[torch]'
```

## Sentiment analysis with DistilBERT

[Link to the model on Hugging Face](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)

In [12]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased-finetuned-sst-2-english")

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

[Quick tour through using Hugging Face](https://huggingface.co/docs/transformers/main/en/quicktour)

In [13]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

In [16]:
s = 'The lazy programmer jumps over the quick brown fox jumps over the lazy programmer jumps over the fire fox.'
res = classifier(s)
print(res)

[{'label': 'NEGATIVE', 'score': 0.9923615455627441}]


In [18]:
s = 'The enthusiastic programmer jumps over the quick brown fox jumps over the happy programmer jumps over the overjoyed fire fox.'
res = classifier(s)
print(res)

[{'label': 'POSITIVE', 'score': 0.9940286874771118}]


## Sentiment analysis with FinBERT

(Trained to analyze sentiment on financial text.)

[Link to the model on Hugging Face](https://huggingface.co/ProsusAI/finbert)

In [19]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")

model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")

Downloading:   0%|          | 0.00/252 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/758 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/418M [00:00<?, ?B/s]

In [22]:
classifier = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

In [23]:
s = 'The lazy programmer jumps over the quick brown fox jumps over the lazy programmer jumps over the fire fox.'
res = classifier(s)
print(res)

[{'label': 'neutral', 'score': 0.8497675061225891}]


In [24]:
s = 'The enthusiastic programmer jumps over the quick brown fox jumps over the happy programmer jumps over the overjoyed fire fox.'
res = classifier(s)
print(res)

[{'label': 'neutral', 'score': 0.8964928388595581}]


## Write text to please a sentiment analysis

For e.g. »dax« instead of »fox« raises the score for positiveness from 0.72 to 0.81.

In [32]:
s = '''The market value of the enthusiastic programmer jumps over the increased
brown dax jumps high over the happy programmer jumps over the top of the overjoyed fire fox.'''.replace('\n', ' ')
res = classifier(s)
print(res)

[{'label': 'positive', 'score': 0.8132365345954895}]
