A Comparative Study of Machine Learning Models for Javanese Wuku Classification: Exploring SVM, Naïve Bayes, and CNN for Cultural Texts

Abstract

This study rigorously evaluates machine learning models for classifying culturally significant Javanese Wuku texts from the “Keagamaan atau Spiritual” category, a domain challenged by unique linguistic nuances and limited digitized resources. We compared Support Vector Machine (SVM), Naïve Bayes, and Convolutional Neural Network (CNN) on texts from five pivotal Wuku types (Sinta, Galungan, Kuningan, Sungsang, Warigalit) sourced from sastra.org, aiming to identify the most effective computational approach. The dataset comprises N = 1419 documents (T = 751.290 tokens), with per-class document counts reported for all five Wuku types. Our evaluation uses accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) under repeated stratified 5-fold cross-validation (10 repeats; 50 runs) to ensure robust estimates. CNN achieved the best performance with Accuracy = 0.92 ± [SD], Macro-F1 = 0.90 ± [SD], and AUC = 0.93 ± [SD], outperforming SVM (Accuracy: 0.87; F1-score: 0.84) and Naïve Bayes (Accuracy: 0.82; F1-score: 0.78). The results underscore CNN’s strong effectiveness for nuanced, context-rich text classification, offering a vital contribution to cultural heritage preservation and advancing Natural Language Processing (NLP) for under-resourced languages. From a knowledge-engineering perspective, predicted Wuku labels can serve as structured metadata to support computational indexing and retrieval of Wuku narratives in cultural information systems. Methodologically, our CNN is a lightweight, small-corpus design that uses tuned regularization (dropout/early stopping) and multi-scale convolution to capture culturally salient n-gram cues, rather than relying on a fixed default TextCNN configuration. Future work involves expanding the dataset and exploring advanced deep learning architectures.

First Page

104

Last Page

117

Recommended Citation

Sulistyo, D., Prasetya Wibawa, A., Prasetya, D., Ahda, F., & Utama, A. (2025). A Comparative Study of Machine Learning Models for Javanese Wuku Classification: Exploring SVM, Naïve Bayes, and CNN for Cultural Texts. Knowledge Engineering and Data Science, 8(1), 104-117.
DOI: https://doi.org/10.17977/um018v8i12025p104-117
Available at: https://citeus.um.ac.id/keds/vol8/iss1/7

Download

Included in

Computational Engineering Commons, Data Science Commons, Language and Literacy Education Commons

COinS

A Comparative Study of Machine Learning Models for Javanese Wuku Classification: Exploring SVM, Naïve Bayes, and CNN for Cultural Texts

Authors

Abstract

First Page

Last Page

Recommended Citation

Included in

Share

Search