Data science education lacks a much-needed focus on ethics
Source: Beth Daley
The big idea
Undergraduate training for data scientists - dubbed the sexiest job of the 21st century by Harvard Business Review - falls short in preparing students for the ethical use of data science, our new study found.
Data science lies at the nexus of statistics and computer science applied to a particular field such as astronomy, linguistics, medicine, psychology or sociology. The idea behind this data crunching is to use big data to address otherwise unsolvable problems, such as how health care providers can create personalized medicine based on a patient’s genes and how businesses can make purchase predictions based on customers’ behavior.
The U.S. Bureau of Labor Statistics projects a 15% growth in data science careers over the period of 2019-2029, corresponding with an increased demand for data science training. Universities and colleges have responded to the demand by creating new programs or revamping existing ones. The number of undergraduate data science programs in the U.S. jumped from 13 in 2014 to at least 50 as of September 2020.
As educators and practitioners in data science, we were prompted by the growth in programs to investigate what is covered, and what is not covered, in data science undergraduate education.
In our study, we compared undergraduate data science curricula with the expectations for undergraduate data science training put forth by the National Academies of Sciences, Engineering and Medicine. Those expectations include training in ethics. We found most programs dedicated considerable coursework to mathematics, statistics and computer science, but little training in ethical considerations such as privacy and systemic bias. Only 50% of the degree programs we investigated required any coursework in ethics.
Why it matters
As with any powerful tool, the responsible application of data science requires training in how to use data science and to understand its impacts. Our results align with prior work that found little attention is paid to ethics in data science degree programs. This suggests that undergraduate data science degree programs may produce a workforce without the training and judgment to apply data science methods responsibly.
This primer on data science ethics covers real-world harms.
It isn’t hard to find examples of irresponsible use of data science. For instance, policing models that have a built-in data bias can lead to an elevated police presence in historically over-policed neighborhoods. In another example, algorithms used by the U.S. health care system are biased in a way that causes Black patients to receive less care than white patients with similar needs.
We believe explicit training in ethical practices would better prepare a socially responsible data science workforce.
What still isn’t known
While data science is a relatively new field – still being defined as a discipline – guidelines exist for training undergraduate students in data science. These guidelines prompt the question: How much training can we expect in an undergraduate degree?
The National Academies recommend training in 10 areas, including ethical problem solving, communication and data management.
Our work focused on undergraduate data science degrees at schools classified as R1, meaning they engage in high levels of research activity. Further research could examine the amount of training and preparation in various aspects of data science at the Masters and Ph.D. levels and the nature of undergraduate data science training at schools of different research levels.
Given that many data science programs are new, there is considerable opportunity to compare the training that students receive with the expectations of employers.
We plan to expand on our findings by investigating the pressures that might be driving curriculum development for degrees in other disciplines that are seeing similar job market growth.