With data science employment projected to grow 36% by 2033, professionals across industries are seeking ways to leverage the power of data analysis to drive decision-making. The Data Science Modeling certificate program — developed by Sumanta Basu, an associate professor at Cornell Bowers Computing and Information Science — bridges the gap between basic statistics and advanced data science applications.
The certificate program consists of four comprehensive courses: Nonlinear Regression Models, Modeling Interactions Between Predictors, Foundations of Predictive Modeling and Ensemble Methods. Participants learn to capture complex relationships in data through advanced regression techniques, transform categorical variables into meaningful predictors and build models that adapt to real-world complexities. Through hands-on practice in R programming, professionals develop practical skills in decision trees and random forests to solve challenging prediction problems.
In a recent conversation with eCornell, Basu explained how the program blends statistics and data science.
How do you help students bridge the gap between theoretical concepts and real-world applications?
“We are teaching students these concepts, but in parallel, we are also giving them very real and relevant topical examples that they can apply right away. For example, we use the data set from Tompkins County’s COVID-19 counts. We plot the number of days since the pandemic, the number of new infections and the number of hospitalizations. All of these variables change over time, so the pattern cannot be captured by a single line.
“This example points out how you can use these [non-linear regression] tools to model. And you use the data to predict and understand the evolution of the pandemic. We use a tool called spline or a piecewise polynomial, which is more advanced for capturing nonlinear relationships. This shows the differentiation and how you can use new models or new methods to improve your fit and improve your predictions.”
What’s your approach to teaching students about different types of data and their corresponding models?
“Machine learning models vary depending on the type of data you’re analyzing. The classical machine learning model is for structured data that can be organized in the form of a table. If your data is unstructured, if it’s just text, or if it’s a bunch of images or audio files or video files, then you’ll need more modern tools like deep neural networks. But as long as the data is structured and can be stored in a table format with a bunch of numbers or categories, what we do here, compared to some other courses in the machine learning world, is still state of the art in industry and built based on statistical foundations.
“We have the flexibility to pause and really get students to appreciate what each piece of this complex machinery is doing, in what way they can go wrong, how to understand the limitations and how to explain it to others in simpler terms.”
Turn statistical expertise into data science proficiency — enroll in the Data Science Modeling certificate program today!