How DID YOU GET INTO machine learning?
My first ever computer program was a chatbot called "Psycho". It was rule-based with lots of if-statements to generate responses, and was entertaining but very limited. This piqued my interest in computer engineering, and so after high school I enrolled for a double major in Electronic Engineering and Computer Science at Stellenbosch University (South Africa). Although I did not take any machine learning modules during undergrad, my favourite class was control system theory which underlies a lot of the fundamentals of machine learning. My Masters was in NLP, but still very algorithmic with little "learning" involved, and I was constantly hoping to discover a way to let the system "learn on its own". When I started my PhD circa 2009 I became aware of deep learning and the idea of letting a system learn from the ground up from lots of data. The ability to "compartmentalize" models into different layers/modules also really appealed to my engineering background. During this time I spent one year at ISI@USC in Los Angeles, and 1 year at LISA/MILA in Montreal where I worked closely with Yoshua Bengio. I was particularly influenced by his deep enthusiasm for research and excitement about the future and potential impact that machine learning can have for the good of humanity. After this, I guess you could say I was hooked! :)
WhAT WILL YOU Be teaching?
We will talk about how to model sequential data (e.g. language) using recurrent neural networks (RNNs). We will see how RNNs are similar but also how they are different from feed-forward networks. We'll discuss the stability issues that arise when training these models (gradients vanishing or blowing up). We will then look at recent advances in the learning algorithms as well as architectural innovations (gated architectures such as the GRU/LSTM) that have allowed these models to become a standard tool in the DL toolbox for modeling sequential data.
What advice would you give to those getting started in machine/deep learning?
ML is a very empirical field, but also very fast moving at the moment. Instead of getting sucked into the arXiv vortex too early on, the best way to get started is to read a few papers in the areas that interest you and then get your hands dirty with code as soon as possible. There are so many great deep learning packages available these days (PyTorch, MXNet, TensorFlow, etc), and the general trend is for researchers to share their code, that it is now really easy to reproduce results from key papers without too much effort. Once your comfortable with that, start asking questions, formulate your own thoughts and hypotheses, and then tweak the code to test these out. Over time you will develop the intuition to start asking the right questions, and the code base for quickly trying out these ideas.