•  
  •  
 

Abstract

Movie reviews are crucial in determining a film's success by influencing audience decisions. Automating sentiment classification is essential for efficient public opinion analysis. However, it faces challenges such as high-dimensional data and imbalanced class distributions. This study addresses these issues by applying manifold learning techniques, Principal Component Analysis (PCA) and Laplacian Eigenmaps (LE) to reduce data complexity and undersampling strategies (Random Undersampling (RUS) and EasyEnsemble) to balance data and improve predictions for both sentiment classes. On reviews of The Raid 2: Berandal, EasyEnsemble achieved the highest average G-Mean of 0.694 using Term Frequency-Inverse Document Frequency (TF IDF) features with a linear kernel without dimensionality reduction. RUS provided balanced but inconsistent results, while Review of Systems (ROS) combined with PCA (85% variance cumulative) improved predictions for negative reviews. Laplacian Eigenmaps were effective for negative reviews with 500 dimensions but less accurate for positive ones. This study highlights EasyEnsemble's superior performance in addressing the class imbalance, though optimization with manifold learning remains challenging.

DOI

10.17977/um018v7i22024p139-151

Share

COinS