Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers' Lesson Observation Reports

This article was originally published as: Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers’ Lesson Observation Reports

Original Article Link: Read Original Article

Download PDF: Click Here to Download PDF

Abstract

Education researchers have long had to choose between studies that provide rich insight into teaching and learning in a particular context and insight into broad patterns revealed from large-scale studies. The advances in natural language processing models potentially generate research that offers detailed analysis of specific cases and reveals broader patterns in a much larger dataset. This paper reports on the findings of a study that tested the accuracy of advanced natural language processing models to assign labels to a qualitative dataset. The dataset for this analysis comes from lesson observation reports written by a cohort of preservice teachers pursuing a Postgraduate Certificate in Education (PGCE). Their responses were manually analysed using Legitimation Code Theory (LCT) and graded from simple descriptive observations to complex ones that suggested an interpretation of teachers’ pedagogic actions. The Bidirectional Encoder Representations from Transformers (BERT) and its derivatives, namely DistilBERT and RoBERTa, were trained to recognise coding decisions made by researchers on a subset of empirical data. This study compares the efficacy of BERT models in assigning appropriate labels to sections of the dataset by comparing its assigned labels to those allocated manually by the research team. Built upon a dataset consisting of 2167 manually annotated sections, the natural language processing models were trained, refined, and tested in labelling the dataset. A comparative analysis of BERT, DistilBERT, and RoBERTa offers insights into their strengths, efficiencies, and adaptability, achieving an accuracy rate between 72% and 78%. The metrics reveal the current efficacy of these models in coding semantic density in lesson observation reports and create possibilities for analysing massive datasets of similar text. The challenges experienced also reveal the potential limitations of this approach.

Authors

Thato Senoamadi (University of the Witwatersrand, South Africa)
Lee Rusznyak (University of Witwatersrand, South Africa)
Ritesh Ajoodha (University of the Witwatersrand, South Africa)

Keywords

natural language processing, bidirectional encoder representations from transformers, legitimation code theory, semantic density, teacher education

References

References not available for this article.

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Using Natural Language Processing Models to Automate Text Labelling: Categorising Semantic Density in Preservice Teachers’ Lesson Observation Reports

Abstract

Authors

Keywords

References

Leave a Reply Cancel reply

Talk To Someone

Get a Book to help your research

Anglais pour les universitaires

Comment postuler

Advanced Search

Writing Science: How to Write Papers That Get Cited and Proposals That Get Funded

Upcoming Conferences you can attend

Archives

Check this out on Amazon!