Amazon Reviews Sentiment Classifications and Textual Similarities with Universal Sentence Encoder.
GitHub Repository: https://github.com/Ashot72/spfx-universal-sentence-encoder
Video Link: https://youtu.be/6e3TPawyhRQ
Sentiment classification is the task of looking at a piece of text and telling if someone likes or dislikes the thing, they are talking about.
The input X is a piece of text and the output Y is the sentiment which we want to predict, such as the star rating of a movie review.
The movie is fantastic- 4 starts
The movie really sucks- 1 star
Textual Similarities- is that sentences are semantically similar if they have a similar distribution of responses. For example,
How old are you? and What is your age? are both questions about the age, which can be answered by similar responses
such as I am 48 years old. In contrast, while How are you? and How old are you? contain almost identical words, they have very different meanings
and lead to different responses.
Universal Sentence Encoder - https://github.com/tensorflow/tfjs-models/tree/master/universal-sentence-encoder encodes text into 512
embeddings (we will discuss embeddings) and uses a vocabulary of 8000 words. An additional benefit of the model is that it is trained on short
sentences/phrases (e.g. amazon short comments).
The model is trained and optimized for greater-than-word length text, such as sentences, phrases or short paragraphs. It is trained on a
variety of data sources and a variety of tasks with the aim of dynamically accommodating a wide variety of natural language understanding
tasks.
In our Amazon Reviews SharePoint SPFx-extension we make use of Universal Sentence Encoder lite model
https://github.com/tensorflow/tfjs-models/tree/master/universal-sentence-encoder to rate a comment/review based on the predefined existing
reviews and find a similar review.
One-Hot/Multi-Hot Encoding
In deep learning, values are mostly represented as float-type tensors (floating-point numbers) but it is different with text. Text data are based on
characters, not real numbers. That makes things difficult as there is no connection between say, k and p in the same sense as a number
between 1.23 and 1.45. Text data should be turned into vectors (e.g. array of numbers) before they can be fed into deep-learning models.
The conversion process is called text vectorization. One of the ways is one-hot encoding. In English there are around 10000 most frequently
used words. We can form a vocabulary based on those words. Any given word in in this vocabulary can be given an integer index.
So, every English word in the vocabulary can be represented as a length-10000 vector, in which only the element that corresponds to the index
is 1 and all remaining elements are 0.
Figure 1
Here is one-hot encoding of a word.
Figure 2
What if we have a sentence instead of a single word such as the cat sat on the mat. In this case we can get the one-hot vectors
for all the words that makes up the sentence and put them together to form a 2D representation of the words. This approach
perfectly preserves the information about what words appear in the sentence and in what order.
Figure 3
When text gets long, the size of the vector may get so big that is no longer manageable. A sentence in English contains
about 18 words on average. Having the vocabulary size of 10000 and a sentence of 18 words, it takes 10000 * 18 = 180000
numbers to represent just a single sentence which takes much larger space than the sentence itself.
Figure 4
One of the ways to deal with this problem is to include all the words in a single vector, so that each element in the vector represents
whether the corresponding word has appeared in the text. In this representation, multiple elements of the vector can have the value 1.
This is why it is called multi-hot encoding. Multi-hot encoding has a fixed length, which is the size of vocabulary (10000), regardless of
how long the text is. It solves the size-expansion problem. The main drawback is that we lose the order information. We cannot tell
from the multi-hot vector which words come first and which words next. For some problems it is OK, for others unacceptable.
Word Embeddings
What is word embedding? Similar to one-hot encoding (Figure 1), word embedding is a way to represent a word as a vector, which is
1D tensor in TensorFlow.js. However, word embeddings allow the values of the vectors elements to be trained instead of hard-coded.
When a text oriented neural network uses word embedding, the embedding vectors become trainable weight parameters of the model.
Figure 5
In the past, in NLP (natural language processing), words were replaced with unique IDs in order to do calculations. The disadvantage
of this approach is that you will need to create a huge list of words and give each element a unique ID. Instead of using unique numbers
for your calculations, you can also use vectors to that represent their meaning, so-called word.
Figure 6
Here, each word represented by a vector. The length of a vector can be different. The bigger the vector is, the more context
information it can store and the calculation costs go up as vector size increases. The element count of a vector is also called the
number of vector dimensions. In the picture the word example is represented with (4, 2, 6) where 4 is the value of the first
dimension, 2 of the second and 6 of the third dimension.
In more complex examples, there could be more than 100 dimensions which can encode a lot of information like
gender, race, age, type of word.
Figure 7
A word such as one is a word that is a quantity like many therefore, both words vectors are closer compared to words that are
more different in their usage.
Figure 8
Here is an Embedding Matrix where we specified Embedding dimensions of 200 and the length of the sentence.
We will go deeper into these numbers when we dive into the code.
Figure 9
You may remember that Universal Sentence Encoder's Embeddings dimensions are 512!
Application
Figure 10
Our SharePoint list defines 60 negative Amazon comments/reviews with rating of zero.
Figure 11
Actually, these are Amazon reviews I have taken them from this site corresponding one-star ratings.
Figure 12
The list includes 60 positive reviews. Altogether 120 reviews.
Figure 13
Positive reviews correspond to Amazon five-star ratings.
Figure 14
When we run the SPFx extension for the first time we should create Word Embeddings and train the model.
Figure 15
We obtain reviews from the list to create word/sentence embeddings.
Figure 16
Sentence embeddings are created for 120 comments and the embedding dimension is 512.
Figure 17
We train the model after sentence embeddings have been created.
Figure 18
If you run the app the second time you may notice that this time there is no Train button and you can directly enter a comment.
Figure 19
The reason is that we save the model and its weights in the local storage and load them from the storage. No need to train the model each time.
Figure 20
When you add a comment the app displays thumbs up or thumbs down icon and the probability. The comment It is worth much money is a positive result
with 92.43% probability.
Figure 21
It is not worth much money is a negative comment - 64.39% probability.
Figure 22
For each entered review we also display a similar sentence. We display the most similar sentence regardless of ratings. For example, It is not worth much money is a
negative comment while The prices are a lot cheaper is a positive review. We just find the most similar sentence.
Figure 23
For example, for the comment Amazon is my go-to we got 100%. It is because Amazon is my go-to is one of the reviews specified in the list. We have 100% match.
Figure 24
If you look at the embeddings you will see that the embedding vector produced by the Universal Sentence Encoder model is already normalized,
meaning values are in the range 0 and 1. Therefore, to find the similarity between two vectors, it is enough to compute their inner (dot) product.
Inner product between normalized vectors is the same as finding the cosine similarity.
Figure 25
The sentence embedding for the comment It is worth much money is xPredict 2D tensor having [1, 512] shape.
Figure 26
The shape is [1, 512] because we want to predict just for one comment - It is worth much money.
Figure 27
Our reviews embeddings shape is [120, 512] as we have 120 comments.
Figure 28
We can get the embedding of the first comment using slice. this.xTrain.slice([0, 0], [1]) will give us the embedding of the first review in the list.
Figure 29
Here is the first review in the list.
Now, we should calculate the inner/dot product of each review (this.xTrain.slice([i, 0], [1]) ) and our comment It is worth much money xPredict (Figure 25) and find the highest score.
The highest score will give us the most similar sentence.
Figure 30
Let's test dot product. Our sample xTrain shape is [1,3] for testing (embeddings of an existing review)
xPredict is the comment's embeddings (It is worth much money) which has the same shape. You see that we cannot do dot product operation.
Figure 31
What we have to do is transpose xPredict. Actually, we flipped xPredict. Previously, the tensor's shape was [1, 3] (1 row and 3 columns)
now it became [3, 1] (3 rows and 1 column). The result is a tensor with a value of 26 (1 * 3 + 2 * 4 + 3 * 5 = 26).
Figure 32
We go through each comment and find the highest score.
Figure 33
You see the result. I find the items with a good price is the most similar sentence.