About the course
Applying machine learning techniques to language has resulted in some impressive gains over the last few decades; tasks like spam detection, entity recognition, and information extraction have become increasingly automated thanks to advances in natural language processing and text analysis. Other language modeling tasks remain challenging: question-and-answer systems, automated summarization, and until fairly recently, machine translation were among the tasks that were still considered “unsolved” by the machine learning community.
Recent advances in research, including word2vec, long short-term memory networks, and transfer learning have dramatically improved the potential for automated translation between languages. And yet, human contextual understandings are still difficult to fully encode in language data, and machine translations often suffer from both contextual and semantic errors (such as missing sarcasm or incorrectly applying honorifics).
In this course, we will explore the state of the art in open source language modeling tools, and investigate their efficacies and weaknesses for a range of machine translation tasks.
This course is part of the Data Science and Machine Learning tracks of the Advanced Data Science Certificate.
Upon successful completion of the course, students will be able to:
Engage modern machine translation word embeddings such as BERT and ELMo.
Compare and contrast neural architectures for machine translation such as auto encoder-decoder networks of RNNs and LSTMs.
Use transfer learning solutions to quickly train initial language translation models.
Build an intuition around the types of algorithms and machine learning techniques that are most appropriate for natural language translation.
Understand the nuances and sensitivities around human language translation.
Identify and assess data sources for use in training machine learning models for translation.
Evaluate the effectiveness and efficacies of language models using statistical and user-evaluation methods.
Garin is an Adjunct Lecturer for the Georgetown Data Science Certificate and Advanced Data Science Certificate programs, where he teaches Machine Learning and Natural Language Understanding. Garin is also currently a Senior Data Science Manager at Amazon Web Services, where he leads teams of data...
Read more about Business Analytics
Because of COVID-19, many providers are cancelling or postponing in-person programs or providing online participation options.
We are happy to help you find a suitable online alternative.