About the course
A sound data engineering approach is the baseline by which all other data initiatives depend. Moving data through ETLs/ELTs into data warehouses that represent data in a way that is serviceable to the data goals of the organization is a critical, ongoing process to any data-driven organization. However, ETLs are filled with risks to data integrity, quality, and provenance and must be approached with best practices in mind.
This course will provide an overview of data engineering approaches and their trade offs in building different types of data warehouses. Tools for batch or stream data processing will be introduced. Common issues related to reliability, robustness, data loss, and data provenance will be explored.
This course is part of the Data Engineering track of the Advanced Data Science Certificate.
Upon successful completion of the course, students will be able to:
Understand the different approaches to data warehousing across varying sized organizations.
Structure and organize data engineering initiatives.
Compose data pipelines that integrate a wide range of data tools including Python scheduling libraries or services, distributed processing systems, and multiple data stores.
Articulate the issues related to data lineage and provenance and evaluate workflow options that preserve them.
Design unit and end-to-end automated tests for data pipelines
Understand anomaly detection and alerting techniques to reinforce robust data pipelines.
Laura is a data and software engineer at Industry Dive, a B2B media company, where she implements and operates full-stack solutions with both the web and data teams using primarily Python tools and frameworks such as Django, Flask, pandas, and scikit-learn. She also contributes to open source pro...
Read more about Business Analytics
Because of COVID-19, many providers are cancelling or postponing in-person programs or providing online participation options.
We are happy to help you find a suitable online alternative.