This paper enlightens the way companies can design Intelligent System to understand their customers’ sentiments better to improve their experience, which will help the businesses change their market position.

Sentiment analysis is widely acknowledged in the web and social media monitoring. It allows businesses to gain a comprehensive public opinion on the organization and its services. The ability to deduce insights from the text and emoticons from social media is a practice that is now widely adopted by the organizations worldwide. Digital media represents an extensive opportunity for businesses of any industry to acquire the needs, opinions and intent that users share on social media and web. Listening to consumer’s voice requires in-depth understanding of what customer’s express in Natural Language. This research paper describes the designing of an Intelligent System to understand the human language and crack the sentiment behind it.

**1. Introduction:**

In recent years the amount of data generated on internet had increased rapidly and continue to grow exponentially in near future. Every day, large amount of data is generated by social media, financial transactions, behaviour of the internet user, consumer’s browsing and purchasing history. This data is being continuously explored by industry and academia for useful insights that can enhance revenue of the industry and user experience on internet.The data also includes huge chunk of raw text data in the form of product reviews, news or research articles, blogs, song lyrics, poems, etc. Labeling or categorization of this text data helps in efficiently searching relevant information about the product or query, from the huge data on internet.

The financial organizations are more concerned about their products and their reputation. Hence, they rely on customer reviews for improving their services and product. Recently, various text mining and machine learning techniques have been explored to draw insight about the sentiment polarity of the reviews.

The proposed work is the comparative study of performance of deep learning techniques and traditional classification techniques to find polarity of customer reviews of Banking and Insurance domain. The aim is to simplify the task of manually rating each and every feedback and automating them. This approach will give good estimate about the company’s reputation in the market in very less time, so, that optimal decisions can be made in real time. The methodology employed deep learning techniques like convolution neural network (CNN), bidirectional recurrent neural network (RNN), bidirectional long short-term memory (LSTM) and two traditional text classification algorithms i.e., support vector machine (SVM) and Naive Bayes (NB).

**2. Methodology:**

The methodology consists of two main steps, the first step consists of data crawling from web resources followed by its manual rating for classifier training. The Second step involves training of classification algorithms.

**2.1 Data construction:** Different online sites that consists of huge number of customer feedback on different banks and insurance agencies are fetched using different Python libraries such as Scrapy, Beautiful Soup etc. The data is then manually rated in three categories i.e., negative, neutral and positive.

The constructed dataset consisted of labeled 5000 reviews in the document. The punctuations are irrelevant, therefore, removed from the reviews. The unique words of the dataset were ordered according to their frequency. The stop words such as the, is, an, about, etc., were also removed from the dictionary because they do not affect the sentiment polarity and are present in high frequency in all documents. The final dictionary of size was prepared which have unique words of the dataset. Each review is represented as a binary vector of size having at index of dictionary location if that word is present in the reviews. Hence, whole document can be represented as matrix:

Where Tr and Ts are the number of training and test review samples respectively. Each row of the matrix represents a review binary vector in D

The training samples were used for learning following supervised learning methods for comparative study:

**3. Support vector machine (SVM):** Support vector machine is an efficient discriminative supervised classification model. It has been widely used in different classification problems of the industry due to its high prediction accuracy and ability to handle high-dimensional data. These models separate two classes on the basis of two key concepts: In the first step, the kernel function is transformed from non-linearly separable input data to linearly separable high dimensional feature space. In the second step, the margin that separates optimal hyper plane is maximized that act as decision boundary for the classification.

**4. Naive Bayes classifier:** Naive Bayes (NB) is a generative (probabilistic) model for classification based on the assumption of independent features. It is applied to solve business intelligence problems like text mining, computer vision when training examples are less but features are independent of each other. As this is a generative classifier, it learns a model of the joint distribution P(X,yj) of input and output, where input data is X and the output (class label) is yj. The posterior from joint distribution is obtained using Bayes rule, i.e., the probability of class yj for the input data X.

The parameters of distribution and were estimated by various parameter estimation methods like Maximum likelihood, Expected maximization.

Finally, NB assigns the label of most probable target class Y to any given data instance xi, i.e.,

Where, L(xi) is the label assigned to given data instance xi.

**5. Artificial Neural Networks (ANN):** Recently artificial neural network and its variants have been widely exploited for

classification tasks to make intelligent systems for business decision making like predicting financial frauds, hand-writing recognition, computer vision, text mining, self-driving cars, etc. These models mimic the behaviour of brain neurons to learn from the given situations. The simplest form of ANN consists of only two layers of neurons,i.e., input layer and output layer, and can be applied for linear regression and linear classification purpose. The non-linear classification problems such as XOR, and needs to addressed by the introduction of hidden layers to introduce complexity to the model. The size of the hidden layer (number of neurons/ layer)is also reduced significantly by adding more hidden layers.

Additionally, the increment in hidden layers may cause overfitting of the model. Therefore, the trade-off between complexity and overfitting should be considered while building a model. Various architectures of ANN have been proposed for different problems.

**5.1 Feed forward neural network:** In feed forward neural network, each neuron or node in one layer is connected to every neuron in the next layer. Hence information is constantly "fed forward" from one layer to the next. The pairs of input and output values are fed into the network for many cycles to minimize errors using back propagation algorithm to update weights, so that the network can learn the relationship between the input and output. The networks that have many hidden layers are deep neural networks (DNN), and each of the successive hidden layers learns more complex patterns than previous one.

However, the introduction of successive hidden layers may make the model more specific to training examples which cause bad performance on the test or unseen instances. Another problem is faced in deep neural networks is “vanishing gradient problem”. The different layers in DNN are learning vastly at different speeds eg. the later layers in the network are learning well, on the other hand, the early layers may get stuck during training, learning almost nothing.

**5.1.1 Convolution neural network:** Convolution is a particular case of DNN which overcomes the “vanishing gradient problem” by using weight initialization, feature preparation (through batch normalization —centering all input feature values to zero), and rectified linear units (ReLU). This approach has been successfully used to extract deep features for classification tasks and has been widely used in computer vision. Convolution network combines three architectural ideas to ensure some degree shift, scale, and distortion invariance: local respective field, shared weights (weight replication), and spatial or temporal subsampling.

Basically, a CNNconsists of two primary layers. In the case of computer vision, First, convolution layers that convolve local image regions independently with multiple filters, and the responses are combined according to the coordinates of the image regions. Second, the pooling layers summarise the feature responses, and pooling is processed with a fixed stride and a pooling kernel size. The convolution neural networks (CNNs) do not consider contextual dependencies between different image regions because both convolution and pooling operations are locally applied on image areas separately. The contextual

information is crucial to obtain real meaning from the raw sequential text data. Hence, other architectures of DNN have been developed to capture contextual information like recurrent neural networks (RNN) and its variant long short-term memory (LSTM).

**5.1.2 Recurrent neural network (RNN):** Various learning tasks require information from sequential data. The processes such as time series prediction, speech recognition, language modelling, translation, musical information retrieval, text mining, and video analysis, a model must learn from the sequential input. The current neural network (RNN) is a class of DNN designed for learning contextual dependencies among sequential data by using the recurrent (feedback) connections.

These are connectionist models that capture the dynamics of sequences via interconnected networks of simple units. In simple words, the architectures RNN can be considered as multiple copies of the same network, each passing a message to a successor. Unlike standard feed forward neural networks, this architecture enables RNNs information from an arbitrarily long context window. Although in past recurrent neural networks were difficult to train due to millions of parameters. However, recent advances in optimization techniques, network architectures, and parallel computation have enabled successful large scale learning with them.

The learning with RNNs is challenging due to difficulty in learning long-range dependencies. The problems of vanishing and exploding gradients occur when back propagating errors across many successive time steps.The long short term memory (LSTM) architecture of RNN described in next subsection uses precisely designed nodes with recurrent edges with fixed unit weight as a solution to the vanishing gradient problem.

**5.1.2.1 Long short-term memory(LSTM):** LSTM is an RNN architecture designed to handle with long time-dependencies in sequential data such as sentences, speech etc. It was motivated by an analysis of error flow in existing RNNs ,where long time lags were inaccessible to existing architectures because the backpropagated error either blows up or decays exponentially. Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete time steps even in the case of noisy, incompressible input sequences, without the loss of short time lag capability. This is done by enforcing constant error flow through “constant error carrousels (CEC)” within special self-connected units i.e., multiplicative gate units. These units act as memory cells and learn to open and close access to the constant error flow. Hence, LSTM is designed to get rid of the vanishing error problem.

**5.1.2.2 Bidirectional Recurrent Neural Network with multiple LSTM layers:** The main idea of bidirectional LSTM (BLSTM) recurrent Neural Network is to capture context of both sides of the current word at s(t) i.e., s(t-n) to s(t) & s(t) to s(t+n), to encode the text and make decision. A BLSTM processes input sequences in both directions with two sub-layers. Due to context capturing behavior these models have many applications in the field of image captioning, speech recognition and language modeling, and text mining.

**6. Experimental setup and Results:** The performance of the above classification models on the review data compared. Although the Bernoulli Naïve Bayes had been widely used for text classification when data was less. However, in present scenario, the data is available in sufficient amount which is ideal for deep learning tasks. Our study also proves that deep learning techniques (BLSTM and CNN) do better sentiment classification compared to another conventional method due to the ability to capture more complex features and context on a large data set (Table 1).

**7. Conclusion:** This study showed bidirectional long short term memory RNN is the ideal choice of the classifier to find polarity of review sentiments. This study can prove useful for the organizations to quantify their reputation or their product quality in real time so that necessary steps can be taken. Other potential applications of this work can be social

media monitoring such as public opinion on certain topics, tracking sentiment towards products, movies, politicians, etc., improving customer relation models, detecting happiness and well-being, improving automatic dialogue systems, etc.

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service

**Most Popular Content on DSC**

To not miss this type of content in the future, subscribe to our newsletter.

**Technical**

- Free Books and Resources for DSC Members
- Learn Machine Learning Coding Basics in a weekend
- New Machine Learning Cheat Sheet | Old one
- Advanced Machine Learning with Basic Excel
- 12 Algorithms Every Data Scientist Should Know
- Hitchhiker's Guide to Data Science, Machine Learning, R, Python
- Visualizations: Comparing Tableau, SPSS, R, Excel, Matlab, JS, Pyth...
- How to Automatically Determine the Number of Clusters in your Data
- New Perspectives on Statistical Distributions and Deep Learning
- Fascinating New Results in the Theory of Randomness
- Long-range Correlations in Time Series: Modeling, Testing, Case Study
- Fast Combinatorial Feature Selection with New Definition of Predict...
- 10 types of regressions. Which one to use?
- 40 Techniques Used by Data Scientists
- 15 Deep Learning Tutorials
- R: a survival guide to data science with R

**Non Technical**

- Advanced Analytic Platforms - Incumbents Fall - Challengers Rise
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- How to Become a Data Scientist - On your own
- 16 analytic disciplines compared to data science
- Six categories of Data Scientists
- 21 data science systems used by Amazon to operate its business
- 24 Uses of Statistical Modeling
- 33 unusual problems that can be solved with data science
- 22 Differences Between Junior and Senior Data Scientists
- Why You Should be a Data Science Generalist - and How to Become One
- Becoming a Billionaire Data Scientist vs Struggling to Get a $100k Job
- Why do people with no experience want to become data scientists?

**Articles from top bloggers**

- Kirk Borne | Stephanie Glen | Vincent Granville
- Ajit Jaokar | Ronald van Loon | Bernard Marr
- Steve Miller | Bill Schmarzo | Bill Vorhies

**Other popular resources**

- Comprehensive Repository of Data Science and ML Resources
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- 100 Data Science Interview Questions and Answers
- Cheat Sheets | Curated Articles | Search | Jobs | Courses
- Post a Blog | Forum Questions | Books | Salaries | News

**Archives**: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More

**Most popular articles**

- Free Book and Resources for DSC Members
- New Perspectives on Statistical Distributions and Deep Learning
- Time series, Growth Modeling and Data Science Wizardy
- Statistical Concepts Explained in Simple English
- Machine Learning Concepts Explained in One Picture
- Comprehensive Repository of Data Science and ML Resources
- Advanced Machine Learning with Basic Excel
- Difference between ML, Data Science, AI, Deep Learning, and Statistics
- Selected Business Analytics, Data Science and ML articles
- How to Automatically Determine the Number of Clusters in your Data
- Fascinating New Results in the Theory of Randomness
- Hire a Data Scientist | Search DSC | Find a Job
- Post a Blog | Forum Questions

## You need to be a member of Data Science Central to add comments!

Join Data Science Central