Investigating Student Mistakes in Introductory Data Science Programming
Author(s)
Singh, Anjali; Fariha, Anna; Brooks, Christopher; Soares, Gustavo; Henley, Austin Z.; Tiwari, Ashish; M, Chethan; Choi, Heeryung; Gulwani, Sumit; ... Show more Show less
Download3626252.3630884.pdf (2.939Mb)
Publisher with Creative Commons License
Publisher with Creative Commons License
Creative Commons Attribution
Terms of use
Metadata
Show full item recordAbstract
Data Science (DS) has emerged as a new academic discipline where students are introduced to data-centric thinking and generating data-driven insights through programming. Unlike traditional introductory Computer Science (CS) education, which focuses on program syntax and core CS topics (e.g., algorithms and data structures), introductory DS education emphasizes skills such as analyzing data to gain insights by making effective use of programming libraries (e.g., re, NumPy, pandas, scikit-learn). To better understand learners' needs and pain points when they are introduced to DS programming, we investigated a large online course on data manipulation designed for graduate students who do not have a CS or Statistics undergraduate degree. We qualitatively analyzed students' incorrect code submissions for computational notebook-based assignments in Python. We identified common mistakes and grouped them into the following themes: (1) programming language and environment misconceptions, (2) logical mistakes due to data or problem-statement misunderstanding or incorrectly dealing with missing values, (3) semantic mistakes due to incorrect use of DS libraries, and (4) suboptimal coding. Our work provides instructors insights to understand student needs in introductory DS courses and improve course pedagogy, and recommendations for developing assessment and feedback tools to support students in large courses.
Description
SIGCSE 2024, March 20–23, 2024, Portland, OR, USA
Date issued
2024-03-07Department
MIT Open LearningPublisher
ACM
Citation
Singh, Anjali, Fariha, Anna, Brooks, Christopher, Soares, Gustavo, Henley, Austin Z. et al. 2024. "Investigating Student Mistakes in Introductory Data Science Programming."
Version: Final published version
ISBN
979-8-4007-0423-9