NUS Institute of Systems Science

NICF- Big Data Engineering for Analytics

Available dates

This course has no confirmed dates in the future. Subscribe to be notified when it is offered.

About the course

This 5-day course helps data engineers focus on essential design and architecture while building a data lake and relevant processing platform.

Participants will learn various aspects of data engineering while building resilient distributed datasets. Participants will learn to apply key practices, identify multiple data sources appraised against their business value, design the right storage, and implement proper access model(s). Finally, participants will build a scalable data pipeline solution composed of pluggable component architecture, based on the combination of requirements in a vendor/technology agnostic manner. Participants will familiarize themselves on working with Spark platform along with additional focus on query and streaming libraries.

This course is part of the Analytics and Intelligent Systems series offered by NUS-ISS.

Key Takeaways

Upon effective completion of the course, participants will be able to:

  • Understand the growth of big data and need for a scalable processing framework. Understand the fundamental characteristics, storage, analysis techniques and the relevant distributions
  • Understand the distributed storage essentials, storage needs, and relevant architectural mechanism in processing large amounts of structured, semi-structured and unstructured data.
  • Gain expertise with the fault-tolerant computing framework (E.g. YARN) by setting up pseudo cluster nodes or cloud based nodes for processing big data. .
  • Construct configurable and executable tasks using the In Memory Processing frameworks (E.g. Spark Core). Understand the nuances of writing functional programs and use the core libraries to manipulate the large corpse of unstructured data residing as Resilient Distributed Datasets.
  • Organize, store and manipulate the collected data using processing libraries. For example, using special statistical operation and stream processing data tools (E.g. Spark Special Libraries).
  • Understand various data processing, querying and persistence (E.g. Spark QL APIs) available for usage in RDD’s context. Perform tasks such as filtering, selection and categorization.

What Will Be Covered

The course objective is to explore the engineering aspects of big data storage, querying and processing techniques. The course aims to teach the students to apply the newly acquired proficiencies by developing data intensive applications using distributed compute platform (e.g. using the Hadoop platform, Spark Framework and relevant tools).

A brief module description is provided below:

Agenda

Module 1: Introduction to Data Science, Data Engineering and Big Data

Module 2: Understand Big Data from an Analytics Perspective

Module 3: Architectural Viewpoints in Big Data

Module 4: The Hadoop Ecosystem for Big Data

Module 5: Distributed File Storage

Module 6: NoSQL Databases for Big Data

Module 7: Spark and Functional Programming for Big Data

Module 8: Spark and Resilient Distributed Data Sets

Module 9: Spark QL for Big Data

Module 10: Spark and Real Time Stream Processing

Module 11: Management of Big Data initiatives

Discussion and Project Requirement Elaboration

Project and Assessment

  • Project Demonstration, Report Submission and Presentations. Each team will work on a practical case study and submit/present their work done regarding the assigned Big Data project.

Closing Remarks

Who should attend

This is an intermediate course, suitable for professionals with some experience in any programming language and data design. If the participants have some business exposure, they can appreciate the case studies discussed better.

This course targets analytics professional including:

  • Business and IT professionals seeking analytical skills to handle large amounts of unstructured data (Data lake e.g. customer feedbacks, product reviews on social media, phone call recordings, etc.) for insights to improve business process and decision-making.
  • Individuals who have no knowledge or experience in data engineering for analytics and would like to gain some practical skills in this area so that they may explore work opportunities in data engineering.
  • Data analysts and Data Engineers, who want to move from the structured to large amounts of unstructured data engineering.

Pre-requisites

This is an intensive, intermediate course. Our proposed course targets the higher value chain professionals such as data engineers, data application architects, integration architects, software engineers working on data pipeline processing and key technology decision makers.

Participants with experience in programming languages such as Python or Java or Scala will benefit more from the course. Participants also need to have a strong interest in building functional pipelines and be comfortable working with Hadoop platform and Spark framework.

NUS-ISS also offers a range of other basic courses in analytics for participants new to analytics

Trust the experts

Suriya Priya Asaithambi

Suria has twenty years of teaching and consulting experience in areas such as software engineering, application architecture, crafting cloud services, agile development and big data engineering. Her research interest spans around cloud computing, software engineering, test automation and big dat...

More...

Course reviews