Classroom teaching serves as the primary channel for talent cultivation. Against the backdrop of rapid intelligent technology development, large-scale pre-trained language models have gradually permeated educational scenarios, becoming a key technological variable driving the transformation of teaching paradigms. However, the current application effectiveness of large models in education exhibits significant heterogeneity, with evaluation dimensions mostly limited to technical performance aspects, while their pedagogical appropriateness and goal attainment urgently require empirical verification. This study constructs a three-dimensional evaluation model of “value guidance-knowledge construction-cognitive development,” adopting a comparative experimental design to systematically evaluate teaching texts generated by six mainstream large models (including three domestic and three international) across multiple K-12 subjects (Chinese, mathematics, foreign languages, integrated sciences, and integrated humanities), and conducts comparative analysis with expert teachers’ parallel lesson teaching records. The findings reveal that, first, in the value guidance dimension, teacher-led teaching demonstrates significant moral education-dominant characteristics, domestic models exhibit balanced value guidance with consistent performance across core value dimensions, while international models show differentiated value orientation—excelling in dimensions like social responsibility but displaying structural deficiencies in national identity and cultural inheritance. Second, in knowledge construction, teachers demonstrate high curriculum content focus, whereas large models exhibit stronger knowledge extensibility, particularly showing significant advantages in constructing interdisciplinary knowledge networks. Finally, in cognitive development, large models prove significantly more effective in promoting higher-order thinking (including complex problem-solving, knowledge transfer, and innovative thinking), while teacher-led teaching excels in declarative knowledge mastery and situated learning experiences but risks cognitive fixation. The study aims to provide references for educational practitioners to scientifically and rationally apply large models to empower classroom teaching improvement, offering important empirical evidence for reconstructing a new “human-machine collaborative” educational paradigm and promoting high-quality educational development in the AI era.