Abstract
This study rigorously evaluates machine learning models for classifying culturally significant Javanese Wuku texts from the “Keagamaan atau Spiritual” category, a domain challenged by unique linguistic nuances and limited digitized resources. We compared Support Vector Machine (SVM), Naïve Bayes, and Convolutional Neural Network (CNN) on texts from five pivotal Wuku types (Sinta, Galungan, Kuningan, Sungsang, Warigalit) sourced from sastra.org, aiming to identify the most effective computational approach. The dataset comprises N = 1419 documents (T = 751.290 tokens), with per-class document counts reported for all five Wuku types. Our evaluation uses accuracy, precision, recall, F1-score, and Area Under the Curve (AUC) under repeated stratified 5-fold cross-validation (10 repeats; 50 runs) to ensure robust estimates. CNN achieved the best performance with Accuracy = 0.92 ± [SD], Macro-F1 = 0.90 ± [SD], and AUC = 0.93 ± [SD], outperforming SVM (Accuracy: 0.87; F1-score: 0.84) and Naïve Bayes (Accuracy: 0.82; F1-score: 0.78). The results underscore CNN’s strong effectiveness for nuanced, context-rich text classification, offering a vital contribution to cultural heritage preservation and advancing Natural Language Processing (NLP) for under-resourced languages. From a knowledge-engineering perspective, predicted Wuku labels can serve as structured metadata to support computational indexing and retrieval of Wuku narratives in cultural information systems. Methodologically, our CNN is a lightweight, small-corpus design that uses tuned regularization (dropout/early stopping) and multi-scale convolution to capture culturally salient n-gram cues, rather than relying on a fixed default TextCNN configuration. Future work involves expanding the dataset and exploring advanced deep learning architectures.
DOI
https://doi.org/10.17977/um018v8i12025p104-117
First Page
104
Last Page
117
Recommended Citation
Sulistyo, Danang Arbian; Prasetya Wibawa, Aji; Prasetya, Didik Dwi; Ahda, Fadhli Almu'iini; and Utama, Agung Bella Putra
(2025)
"A Comparative Study of Machine Learning Models for Javanese Wuku Classification: Exploring SVM, Naïve Bayes, and CNN for Cultural Texts,"
Knowledge Engineering and Data Science: Vol. 8:
No.
1, Article 7.
DOI: https://doi.org/10.17977/um018v8i12025p104-117
Available at:
https://citeus.um.ac.id/keds/vol8/iss1/7
Included in
Computational Engineering Commons, Data Science Commons, Language and Literacy Education Commons
