Reliability of ChatGPT-Based Essay Scoring: A Teacher–AI Comparison in Economics Education

Abstract

Essay assessment in economics education plays an important role in measuring students' conceptual understanding and higher-order thinking skills, but is often hampered by subjectivity and teachers' workload. This study aims to analyze the reliability and level of agreement between teacher assessments and the ChatGPT-based automatic essay grading system (EsyGrade). The study uses a quantitative approach with a comparative reliability study design involving 60 high school students in grades 10 and 11. Data were analyzed using the Intraclass Correlation Coefficient (ICC), Pearson's correlation, and paired sample t-test. The results show that EsyGrade has a very high level of reliability and is consistent with teacher assessments, both in terms of total scores and each dimension of essay assessment. These findings indicate that EsyGrade has the potential to be a reliable and objective essay assessment support tool in economics learning.

First Page

Last Page

Recommended Citation

Pratama, Ramadzan Defitri; Sangka, Khresna Bayu; and Indrawati, Cicilia Dyah Sulistyaningrum (2026) "Reliability of ChatGPT-Based Essay Scoring: A Teacher–AI Comparison in Economics Education," Jurnal Pendidikan: Teori, Penelitian, dan Pengembangan: Vol. 11: No. 3, Article 1.
DOI: https://doi.org/10.17977/2502-471X.1183
Available at: https://citeus.um.ac.id/jptpp/vol11/iss3/1

The EsyGrade System Description & Data Set.pdf (624 kB)
The EsyGrade System Description & Data Set.pdf

Download

Included in

Curriculum and Instruction Commons, Educational Assessment, Evaluation, and Research Commons, Educational Technology Commons, Education Economics Commons, Teacher Education and Professional Development Commons

COinS

Reliability of ChatGPT-Based Essay Scoring: A Teacher–AI Comparison in Economics Education

Authors

Abstract

First Page

Last Page

Recommended Citation

Included in

Share

Search