Allan Moser and Lynne Molter
Department of Engineering
October 3, 2016
The data used in this analysis come from the Consortium for Undergraduate STEM Success (CUSTEMS). As described on the CUSTEMS website (custems.org),
The Consortium for Undergraduate STEM Success (CUSTEMS) is a collaboration of post secondary institutions interested in addressing issues relating to undergraduate degree completion in STEM (Science, Technology, Engineering, and Mathematics) fields, with particular focus on under-represented students. CUSTEMS combines student academic data from participating institutions with survey responses from those same students to inform institutions about patterns of student migration into and out of STEM fields.
Twenty-four institutions participated in this consortium: 16 HBCUs, 5 private liberal arts colleges, and 3 public research universities. Participants in the consortium provided Admissions, Academic, and Grade files containing information for their students covering a time span from 2008 to 2014. The coverage of data from institutions varied, with some providing information across all years and all types of records, and other providing only partial data for one or two years.
Admissions records include demographic information such as sex, race, home zip codes, guardian education level, SAT or ACT scores, and anticipated major for the entering student. Academic records were submitted for each student at the end of the academic year. Ideally, there should be one academic record for each year of an individual student’s attendance. These records include the academic year, class (e.g., freshman, sophomore), PELL eligibility, GPA, and anticipated (or declared) major indicated by Classification of Instructional
Program (CIP) codes. Records for grades were also submitted at the end of each academic year. For each student, there is a record for each course the student took that year including the course abbreviation (e.g., CS, Phys), course number, course level, course credits, and course grade. Importantly, each type of record includes the institution Integrated Postsecondary Education Data System (IPEDS) code and student ID, enabling linkage of the records for individual students.
Data for this study were obtained from the last CUSTEMS release in May 2016. They contain 95,518 admissions, 115,303 academic, and 293,800 grade records. Record types were linked by forming a unique identifier for each student consisting of the concatenated institution IPEDS code and student ID. Not all records could be linked, since some institutions provided only partial data. In addition some records could not be used because necessary data elements, such as anticipated major, were missing. Linkage between valid admissions and academic records provided 54,981 unique student cases. Requiring linked grade records reduced this number to 19,788 cases.
DISCUSSION OF STUDENTS ENROLLED IN COMPUTER SCIENCE CLASSES
The Grade file contains information for each course taken by a student. Out of the 24 institutions participating in CUSTEMS, 16 contributed to the Grade File. Course abbreviations and numbers may be the same for different institutions, so individual courses were identified by combining the institution IPEDS code with the course abbreviation and course number. Doing so, we found 4,079 unique courses.
Identifying which of these courses are computer science (CS) was not straightforward. Courses with abbreviations such as CompSci or CS could be safely assumed to be computer science related. Courses with names such as “EEGR 409” (C Programming Apps) and “GEEN 101” (Software Design and Modeling) were not obvious from the abbreviation, however. Considerable effort, including searching school websites and course catalogues, was required to identify which of the 4,079 courses were, indeed, associated with computer science. Even with carefully reading course descriptions, it was not always clear, since there is considerable overlap between some science and engineering courses and computer science courses. For example, courses from the same school with the same abbreviation, “ECE,” had titles that were clearly computer science related, such as “Data Mining and Machine Learning” (ECE 321), and courses that were not, such as “Electrical Circuits” (ECE 225). Additionally, it is possible that the same course may be cross-listed at a given school under a different course abbreviation and number. In many cases a judgment call was required. There will be valid differences in opinion about whether a given course should or not be listed in the computer science category.
A total of 387 unique courses associated with a student record in both the Admissions and Academic files were identified as computer science related. We categorized the computer science courses as being (1) introductory, (2) intermediate, or (3) advanced. Although the course level should have been provided in the Grade file record, this entry was often omitted. Where possible, school course numbering systems were used to estimate course level (e.g., 100 for introductory, 200 for intermediate, 300/400 for advanced), though numbering systems were not always this straightforward. In some cases best guesses were made based on course titles, or by looking up the courses online. One task that we were not able to accomplish was determining specific individual successor courses, since this again would require an understanding of individual school’s degree requirements.
SUMMARY OF ANALYSIS
The analysis using Grade files for individual students provided the data for all students, of any major, taking computer science classes. This provides fine-grained detail for each computer science course. While there is much detail in the individual course statistics, the clearest trends emerge from aggregating course information by level (introductory, intermediate, or advanced). The aggregated data show a disparity between the percentage of men and women taking computer science courses, even at the introductory level, with 63 percent of the enrollment male and 37 percent female. This disparity increases significantly moving from introductory to intermediate levels: 76 percent male, 24 percent female. The asymmetry continues to slightly increase for advanced courses: 77 percent male, 23 percent female. A similar disparity exists for racial categories. Enrollment in introductory courses is comprised of 71 percent non-underrepresented-minority (non-URM) and 29 percent URM students. As in the case of gender this disparity grows for intermediate-level courses: 88 percent non-URM, 12 percent URM. There is a slight decrease in the disparity moving from intermediate to advanced courses, but it is probably not statistically significant.
The migration analysis presents the data in terms of major field of study. This analysis shows a net increase in computer science majors from entering major to last reported major. Again, there is a disparity between the percentage of male and female computer science majors: 5.5 percent of male students enter a major in computer, while the percentage for women is 1.4 percent. The percentage increase due to migration into the field is substantially higher for women (~30 percent) than men (~10 percent); however, this change does not bring the post-migration ratio to parity, with 6.0 percent of males and 1.8 percent of females reporting CS as a major. The percentages of men and women in computer science both increase, but with the larger number of incoming men, a small percentage increase still results in a larger overall gain in numbers: an increase of 161 men and 124 women. Comparing URM to non-URM, 4.5 percent of URM students enter as computer science majors, while a smaller percentage, 2.9 percent, of non-URM
students start with this major. There is a net in-migration into the field for each category; however, the percent increase in non-URM is substantially higher, with a growth of 6 percent, compared to only 2 percent for URM.
There are several avenues for additional research using these data. Results for this analysis were primarily descriptive. More sophisticated statistical methods could be employed to better determine the significance of differences. Additionally, different and perhaps finer (when the numbers permit) categorizations could be made, such as female URM, male URM, female non-URM, and male non-URM. We could also use additional attributes in the data set. For example, we considered only SAT scores, which left a large percentage of cases classified as unknown. ACT scores are available for many students. We could bring the SAT and ACT scores into concordance, significantly reducing the unknowns for this category. In addition to the admissions, grade, and academic files, there is also an entering student survey (ESS) file that we did not use. The ESS provides additional information on student preparation and attitudes.