EJAS-14406 Camera Ready.pdf

Page 1 of 19

European Journal of Applied Sciences – Vol. 11, No. 2

Publication Date: April 25, 2023

DOI:10.14738/aivp.112.14406.

Passonneau, R. J., Koenig, K., Li, Z., & Soddano, J. (2023). The Ideal versus the Real Deal in Assessment of Physics Lab Report

Writing. European Journal of Applied Sciences, Vol - 11(2). 626-644.

Services for Science and Education – United Kingdom

The Ideal versus the Real Deal in Assessment of Physics Lab

Report Writing

Rebecca J. Passonneau

Department of Computer Science and Engineering,

Pennsylvania State University, United States

Kathleen Koenig

Department of Physics,

University of Cincinnati, United States

Zhaohui Li

Department of Computer Science and Engineering,

Pennsylvania State University, United States

Josephine Soddano

Department of Computer Science and Engineering,

Pennsylvania State University, United States

ABSTRACT

Effective writing is important for communicating science ideas, and for writing-to- learn in science. This paper investigates lab reports from a large-enrollment college

physics course that integrates scientific reasoning and science writing. While

analytic rubrics have been shown to define expectations more clearly for students,

and to improve reliability of assessment, there has been little investigation of how

well analytic rubrics serve students and instructors in large-enrollment science

classes. Unsurprisingly, we found that grades administered by teaching assistants

(TAs) do not correlate with reliable post-hoc assessments from trained raters. More

important, we identified lost learning opportunities for students, and

misinformation for instructors about students’ progress. We believe our

methodology to achieve post-hoc reliability is straightforward enough to be used in

classrooms. A key element is the development of finer-grained rubrics for grading

that are aligned with the rubrics provided to students to define expectations, but

which reduce subjectivity of judgements and grading time. We conclude that the use

of dual rubrics, one to elicit independent reasoning from students and one to clarify

grading criteria, could improve reliability and accountability of lab report

assessment, which could in turn elevate the role of lab reports in the instruction of

scientific inquiry.

Keywords: Science writing assessment, Physics lab reports, Analytic rubrics, Writing

assessment reliability.

Page 2 of 19

627

Passonneau, R. J., Koenig, K., Li, Z., & Soddano, J. (2023). The Ideal versus the Real Deal in Assessment of Physics Lab Report Writing. European

Journal of Applied Sciences, Vol - 11(2). 626-644.

URL: http://dx.doi.org/10.14738/aivp.112.14406.

INTRODUCTION

Writing plays a central role in communicating about scientific ideas, experiments and results,

yet instructors find it challenging to provide undergraduate science students with rigorous

instruction in science writing. This is especially true in the large-enrollment classes that are the

norm in bigger public schools. This paper presents a study of a post-hoc reliability assessment

of physics lab reports from a large-enrollment college curriculum that integrates several

increasingly difficult writing assignments. The curriculum was designed to support the

development of scientific reasoning through theory-evidence coordination [1], and was

informed by the Science Writing Heuristic (SWH) [2]. A growing body of evidence finds that

asking students to put science ideas into writing enhances inquiry-based science instruction

(Graham, Kiuhara, and MacKay 2020; Gere et al. 2019; Huerta and Garza 2019; Clabough and

Clabough 2016; Timmerman et al. 2011). An important component of learning to write,

however, is to provide students with timely, reliable and informative assessments with

appropriate feedback [9]–[11]. We investigated the reliability of the original grades assigned to

physics lab reports, and time on task to complete the grading. We present an approach that

involves the use of an analytic assessment rubric that can improve reliability, timeliness and

informativeness of lab report assessment.

An analytic rubric defines the expectations of a writing assignment along multiple dimensions,

such as the ability to state a clear hypothesis, to present claims that test the hypothesis, and to

give supporting evidence for each claim using experimental results. Each rubric dimension is

rated on the same scale. Studies have shown that analytic rubrics can have multiple benefits,

including transparency and accountability for students, and reliability of assessment [8], [12],

[13]. To achieve reliable grades post-hoc, we developed distinct assessment rubrics with

specific criteria for assignment of distinct degrees of partial credit on each rubric dimension.

Concurrently, we trained raters until they could apply the assessment rubrics reliably. A

comparison of grades assigned by teaching assistants (TAs) and our post-hoc assessments

shows the TA grades to be unreliable, with similar time-on-task for both.

We analyzed over 2,000 physics lab reports to address three research questions:

• RQ 1: To what extent do analytic grading rubrics, which are more specific than rubrics

provided to students to define lab report expectations, produce reliable assessments?

• RQ 2: How far from reliable were the original grades assigned by TAs?

• RQ 3: What does the reliable assessment reveal about students’ science writing?

A critical factor for achieving reliability is that we created distinct assessment rubrics that

parallel the original rubrics where expectations for students are defined, but which provided

much more detailed and objective criteria for grading. A comparison of the TA and rater effort

appears in the first subsection of our Results section, suggesting that a more specific

assessment rubric potentially reduces the time spent on assessment. To address RQ 2, we show

concretely how far the TAs’ grading behavior is from the reliable post-hoc assessment,

presented as the second subsection of our Results section. In our Discussion section, we discuss

which rubric dimensions students find most challenging (RQ 3), based on our reliable post-hoc

assessment. Reliable assessment supports more meaningful conclusions about trends in

student writing, and identification of science ideas students struggle with.

Page 3 of 19

Services for Science and Education – United Kingdom 628

European Journal of Applied Sciences (EJAS) Vol. 11, Issue 2, April-2023

Inconsistency in rubric application is a well-known issue [14] that counterbalances the

evidence for the efficacy of rubrics to improve student writing [15]. However, we find little

published work on exactly how unreliable classroom grading is, and what the losses might be

regarding instructors’ ability to adapt classroom practice to the needs of students. Our main

objectives are to highlight the potential gains from improved reliability of classroom

assessments, along with recommendations for ways to improve reliability of classroom

grading.

Science Writing and Assessment

Writing is an important part of science that serves to document and communicate ideas, and in

addition, supports science learning [5], [16], [17], and the development of scientific reasoning

(SR) skills [18], [19]. Three best practices for incorporating writing into science instruction are

(1) the use of analytic rubrics to define student expectations, such as how to construct an

argument from evidence [8], [12], [13], (2) frequent opportunities for students to practice

writing over extended periods [16], [20], [21], and (3) timely feedback for how well a given

piece of writing meets expectations [9]–[11], [22]. We present evidence here for the

importance of a fourth criterion, that assessment feedback should also be reliable. In his text

on teaching science and engineering [23], Kalman notes that students find it difficult to shift

from oral to written discourse. He points out that in conversation, listeners provide feedback

that shows a speaker which parts of their discourse are engaging or confusing through explicit

comments, or implicit signals such as eye gaze and facial expression. In [9], the authors

delineate numerous opportunities for students to receive feedback. They also argue for

students and teachers to build assessment literacy, such as how to set expectations about the

type of feedback students should receive and how they should use it. An important role of a

writing rubric is to account to students for each grade point in their assessment, so that

students can tackle the next report with a better understanding of how to meet expectations.

For a rubric to serve as feedback, however, it must be applied reliably.

Theory-Evidence-Coordination Lab Curriculum

Current education goals include fostering high end skills, such as non-routine problem solving,

systems thinking, and critical thinking [24], [25], all of which are foundational for scientific

reasoning [26]. Unfortunately, research has shown that students have difficulty applying

scientific reasoning (SR) skills to science-related or everyday life contexts [26]–[32]. Informed

by research on the development of SR [25], [33], [34], the physics curriculum we investigate

here has multiple components. For a series of four increasingly complex investigations to

address specific research questions, the components are pre-lab instruction and exercises that

target specific SR skills, authentic scaffolded practice of the targeted skills in classroom

experiments conducted by groups of three to four students, and lab report writing to

communicate outcomes.

Although multiple research-validated curricula promote learning through conceptual change

[35], [36], our labs expand on these and emphasize mathematical modeling while promoting

higher order reasoning through the process of theory-evidence-coordination (TEC) (see Figure

1) [1], [37].