Page 1 of 19
European Journal of Applied Sciences – Vol. 11, No. 2
Publication Date: April 25, 2023
DOI:10.14738/aivp.112.14406.
Passonneau, R. J., Koenig, K., Li, Z., & Soddano, J. (2023). The Ideal versus the Real Deal in Assessment of Physics Lab Report
Writing. European Journal of Applied Sciences, Vol - 11(2). 626-644.
Services for Science and Education – United Kingdom
The Ideal versus the Real Deal in Assessment of Physics Lab
Report Writing
Rebecca J. Passonneau
Department of Computer Science and Engineering,
Pennsylvania State University, United States
Kathleen Koenig
Department of Physics,
University of Cincinnati, United States
Zhaohui Li
Department of Computer Science and Engineering,
Pennsylvania State University, United States
Josephine Soddano
Department of Computer Science and Engineering,
Pennsylvania State University, United States
ABSTRACT
Effective writing is important for communicating science ideas, and for writing-to- learn in science. This paper investigates lab reports from a large-enrollment college
physics course that integrates scientific reasoning and science writing. While
analytic rubrics have been shown to define expectations more clearly for students,
and to improve reliability of assessment, there has been little investigation of how
well analytic rubrics serve students and instructors in large-enrollment science
classes. Unsurprisingly, we found that grades administered by teaching assistants
(TAs) do not correlate with reliable post-hoc assessments from trained raters. More
important, we identified lost learning opportunities for students, and
misinformation for instructors about students’ progress. We believe our
methodology to achieve post-hoc reliability is straightforward enough to be used in
classrooms. A key element is the development of finer-grained rubrics for grading
that are aligned with the rubrics provided to students to define expectations, but
which reduce subjectivity of judgements and grading time. We conclude that the use
of dual rubrics, one to elicit independent reasoning from students and one to clarify
grading criteria, could improve reliability and accountability of lab report
assessment, which could in turn elevate the role of lab reports in the instruction of
scientific inquiry.
Keywords: Science writing assessment, Physics lab reports, Analytic rubrics, Writing
assessment reliability.
Page 2 of 19
627
Passonneau, R. J., Koenig, K., Li, Z., & Soddano, J. (2023). The Ideal versus the Real Deal in Assessment of Physics Lab Report Writing. European
Journal of Applied Sciences, Vol - 11(2). 626-644.
URL: http://dx.doi.org/10.14738/aivp.112.14406.
INTRODUCTION
Writing plays a central role in communicating about scientific ideas, experiments and results,
yet instructors find it challenging to provide undergraduate science students with rigorous
instruction in science writing. This is especially true in the large-enrollment classes that are the
norm in bigger public schools. This paper presents a study of a post-hoc reliability assessment
of physics lab reports from a large-enrollment college curriculum that integrates several
increasingly difficult writing assignments. The curriculum was designed to support the
development of scientific reasoning through theory-evidence coordination [1], and was
informed by the Science Writing Heuristic (SWH) [2]. A growing body of evidence finds that
asking students to put science ideas into writing enhances inquiry-based science instruction
(Graham, Kiuhara, and MacKay 2020; Gere et al. 2019; Huerta and Garza 2019; Clabough and
Clabough 2016; Timmerman et al. 2011). An important component of learning to write,
however, is to provide students with timely, reliable and informative assessments with
appropriate feedback [9]–[11]. We investigated the reliability of the original grades assigned to
physics lab reports, and time on task to complete the grading. We present an approach that
involves the use of an analytic assessment rubric that can improve reliability, timeliness and
informativeness of lab report assessment.
An analytic rubric defines the expectations of a writing assignment along multiple dimensions,
such as the ability to state a clear hypothesis, to present claims that test the hypothesis, and to
give supporting evidence for each claim using experimental results. Each rubric dimension is
rated on the same scale. Studies have shown that analytic rubrics can have multiple benefits,
including transparency and accountability for students, and reliability of assessment [8], [12],
[13]. To achieve reliable grades post-hoc, we developed distinct assessment rubrics with
specific criteria for assignment of distinct degrees of partial credit on each rubric dimension.
Concurrently, we trained raters until they could apply the assessment rubrics reliably. A
comparison of grades assigned by teaching assistants (TAs) and our post-hoc assessments
shows the TA grades to be unreliable, with similar time-on-task for both.
We analyzed over 2,000 physics lab reports to address three research questions:
• RQ 1: To what extent do analytic grading rubrics, which are more specific than rubrics
provided to students to define lab report expectations, produce reliable assessments?
• RQ 2: How far from reliable were the original grades assigned by TAs?
• RQ 3: What does the reliable assessment reveal about students’ science writing?
A critical factor for achieving reliability is that we created distinct assessment rubrics that
parallel the original rubrics where expectations for students are defined, but which provided
much more detailed and objective criteria for grading. A comparison of the TA and rater effort
appears in the first subsection of our Results section, suggesting that a more specific
assessment rubric potentially reduces the time spent on assessment. To address RQ 2, we show
concretely how far the TAs’ grading behavior is from the reliable post-hoc assessment,
presented as the second subsection of our Results section. In our Discussion section, we discuss
which rubric dimensions students find most challenging (RQ 3), based on our reliable post-hoc
assessment. Reliable assessment supports more meaningful conclusions about trends in
student writing, and identification of science ideas students struggle with.
Page 3 of 19
Services for Science and Education – United Kingdom 628
European Journal of Applied Sciences (EJAS) Vol. 11, Issue 2, April-2023
Inconsistency in rubric application is a well-known issue [14] that counterbalances the
evidence for the efficacy of rubrics to improve student writing [15]. However, we find little
published work on exactly how unreliable classroom grading is, and what the losses might be
regarding instructors’ ability to adapt classroom practice to the needs of students. Our main
objectives are to highlight the potential gains from improved reliability of classroom
assessments, along with recommendations for ways to improve reliability of classroom
grading.
Science Writing and Assessment
Writing is an important part of science that serves to document and communicate ideas, and in
addition, supports science learning [5], [16], [17], and the development of scientific reasoning
(SR) skills [18], [19]. Three best practices for incorporating writing into science instruction are
(1) the use of analytic rubrics to define student expectations, such as how to construct an
argument from evidence [8], [12], [13], (2) frequent opportunities for students to practice
writing over extended periods [16], [20], [21], and (3) timely feedback for how well a given
piece of writing meets expectations [9]–[11], [22]. We present evidence here for the
importance of a fourth criterion, that assessment feedback should also be reliable. In his text
on teaching science and engineering [23], Kalman notes that students find it difficult to shift
from oral to written discourse. He points out that in conversation, listeners provide feedback
that shows a speaker which parts of their discourse are engaging or confusing through explicit
comments, or implicit signals such as eye gaze and facial expression. In [9], the authors
delineate numerous opportunities for students to receive feedback. They also argue for
students and teachers to build assessment literacy, such as how to set expectations about the
type of feedback students should receive and how they should use it. An important role of a
writing rubric is to account to students for each grade point in their assessment, so that
students can tackle the next report with a better understanding of how to meet expectations.
For a rubric to serve as feedback, however, it must be applied reliably.
Theory-Evidence-Coordination Lab Curriculum
Current education goals include fostering high end skills, such as non-routine problem solving,
systems thinking, and critical thinking [24], [25], all of which are foundational for scientific
reasoning [26]. Unfortunately, research has shown that students have difficulty applying
scientific reasoning (SR) skills to science-related or everyday life contexts [26]–[32]. Informed
by research on the development of SR [25], [33], [34], the physics curriculum we investigate
here has multiple components. For a series of four increasingly complex investigations to
address specific research questions, the components are pre-lab instruction and exercises that
target specific SR skills, authentic scaffolded practice of the targeted skills in classroom
experiments conducted by groups of three to four students, and lab report writing to
communicate outcomes.
Although multiple research-validated curricula promote learning through conceptual change
[35], [36], our labs expand on these and emphasize mathematical modeling while promoting
higher order reasoning through the process of theory-evidence-coordination (TEC) (see Figure
1) [1], [37].