Sentiment Analysis of Hotel and Restaurant Yelp Reviews: A Comparison of Methods and Theories

This project, led by Fuad Mehraliyev, an associate professor at ISE, compares various methodological and theoretical approaches in conducting sentiment analysis within the context of service encounters. Eight studies were conducted using the Yelp dataset, with multiple models compared in each study.

The analytical process involves several steps, including data cleaning and preparation, extracting review attributes, converting texts into sentiment scores, and finally, performing regression analysis. The results are then stored for later visualization and comparison.

Several packages were used in this process. Specifically, pandas and numpy packages were employed for reading and cleaning data, as well as for conducting basic computations. The VADER sentiment analysis tool from the nltk package was used for sentiment analysis. Statistical analysis was primarily conducted using the statsmodel package. It was important to store results in a structured way so that researchers could refer to them in the future when needed. For this purpose, the CSV package was utilized to write a csv file with results. Finally, the matplotlib package was used to visualize results in the form of figures.

Since different methodological approaches were compared, it meant that most of the aforementioned steps had to be repeated multiple times in each of the eight studies. Due to the large dataset (more than 5 million observations), conducting these procedures repeatedly required colossal amounts of time, if a standard computer would be used.

With the advanced computational power of HPC, the processing time was reduced to several hours. On a standard laptop, it could take days, or worse, it would not be possible to process. By utilizing HPC, the project compared the results of a multitude of methodical and theoretical approaches, providing answers to questions many researchers face when performing sentiment analysis, and thus serving as a guide for future research. The findings illustrate the advantages and disadvantages of different approaches, helping future researchers make appropriate methodological decisions and determine research directions based on their priorities.

The findings also hold theoretical implications. First, an ongoing discussion surrounds the relevance of theory and theoretical frameworks in conducting big data analysis. Many big data analysts argue for a “data-driven” approach. The findings support the importance of a data-driven approach but asserts that theory-driven approaches cannot be abandoned. Furthermore, the findings reveal that certain service theories perform better in the restaurant context than in the hotel context.

Published date: