华东师范大学学报(教育科学版) ›› 2025, Vol. 43 ›› Issue (9): 69-82.doi: 10.16382/j.cnki.1000-5560.2025.09.006

• 教育评价学 • 上一篇    下一篇

教育考试增值评价模型构建:基于深度神经网络的方法

李金波1, 苏胜2, 曾平飞2, 王永固3   

  1. 1. 浙江省教育考试院,杭州 310012
    2. 浙江师范大学心理学院,金华 321004
    3. 浙江工业大学教育学院,杭州 310023
  • 接受日期:2025-05-08 出版日期:2025-09-01 发布日期:2025-08-25
  • 基金资助:
    教育部教育考试院“十四五”支撑课题“改进考试结果的测评技术研究”(20220008);国家自然科学基金面上项目“多模态特征融合的自闭症教育机器人情感社交智能感知模型及应用研究”(62177043)。

Construction of Value-added Assessment Model for Educational Tests: A Method Based on Deep Neural Network

Jinbo Li1, Sheng Su2, Pingfei Zeng2, Yonggu Wang3   

  1. 1. Zhejiang Education Examinations Authority, Hangzhou 310012, China
    2. College of Psychology, Zhejiang Normal University, Jinhua 321004, China
    3. College of Education, Zhejiang University of Technology, Hangzhou 310023, China
  • Accepted:2025-05-08 Online:2025-09-01 Published:2025-08-25

摘要:

教育评价改革是新时期深化教育改革的关键环节,但传统增值评价方法在处理学习过程的动态特征和复杂依赖关系方面存在技术局限。本研究以浙江省2023届4869名高中学生为研究对象,构建时序模式注意力长短时记忆深度神经网络(TPA-LSTM)增值评价模型,通过结合分位数回归方法,实现对学生成绩时序特征和非线性变化的精准评估。研究基于高中五个学期的语文考试成绩,对个体层面的学习轨迹特征和群体层面的增值表现进行系统分析。研究发现:TPA-LSTM模型在测试集上的均方根误差(RMSE)为0.082,平均绝对误差(MAE)为0.067,显著优于传统SGP模型;对高二下学期成绩相同(0.716)的学生群体,能够根据其历史学习轨迹识别出34至80的增值水平差异;模型的时序权重分布特征揭示了第三学期和第四学期为学习关键期,为评价结果提供了更强的解释性。研究表明,该模型在个体评价层面实现对学习轨迹的精确刻画,在群体层面揭示不同类型学生的发展特征,为提高教育考试增值评价的预测精度和教育诊断价值提供新的技术路径。

关键词: 教育考试, 增值评价, 神经网络模型, 时序模式, 长短时记忆网络

Abstract:

Educational evaluation reform is a key component of deepening educational reform in the new era, yet traditional value-added assessment methods have technical limitations in handling dynamic characteristics and complex dependencies in the learning process. This study constructs a Temporal Pattern Attention Long Short-Term Memory neural network (TPA-LSTM) value-added assessment model based on data from 4,869 high school students in Zhejiang Province's class of 2023. By integrating quantile regression methods, the model achieves precise evaluation of temporal patterns and nonlinear changes in student performance. Through systematic analysis of Chinese language test scores across five semesters, the study examines learning trajectory characteristics at the individual level and value-added performance at the group level. The findings show that: the TPA-LSTM model achieves a root mean square error (RMSE) of 0.082 and mean absolute error (MAE) of 0.067 on the test set, significantly outperforming traditional SGP models; for students with identical scores (0.716) in the second semester of grade 11, the model identifies value-added level differences ranging from 34 to 80 based on their historical learning trajectories; and the temporal weight distribution reveals that the third and fourth semesters are critical learning periods, providing stronger interpretability for evaluation results. The study demonstrates that this model enables precise characterization of learning trajectories at the individual level while revealing developmental patterns of different student types at the group level, providing a new technical approach to improving both the predictive accuracy and educational diagnostic value of value-added assessment in educational testing.

Key words: educational testing, value-added assessment, neural network model, temporal pattern, long short-term memory network