How DID YOU get into MachIne LEARNING?
My entry into ML was not straightforward. I started out working on statistical signal processing and information theory. My first topic in graduate school was on distributed statistical estimation under communication and energy constraints. This was before the term "Internet of Things (IoT)" was invented. Many people have asked if the transition into machine learning was very hard. My response is that it was seamless. I got a strong mathematical foundation in probability and statistics during my graduate school. I found myself working on probabilistic graphical models towards end of graduate school and never looked back since then.
Anima Anandkumar is currently a principal scientist at Amazon Web Services and a Bren professor at Caltech CMS department. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. She is the recipient of several awards such as the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF Career Award, Early Career Excellence in Research Award at UCI, Best Thesis Award from the ACM Sigmetrics society, IBM Fran Allen PhD fellowship, and several best paper awards. She has been featured in a number of forums such as the yourstory, Quora ML session, Huffington post, Forbes, O’Reilly media, and so on. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, an assistant professor at U.C. Irvine between 2010 and 2016, and a visiting researcher at Microsoft Research New England in 2012 and 2014.
Anima Anandkumar is currently a principal scientist at Amazon Web Services and a Bren professor at Caltech CMS department. Her research interests are in the areas of large-scale machine learning, non-convex optimization and high-dimensional statistics. In particular, she has been spearheading the development and analysis of tensor algorithms. She is the recipient of several awards such as the Alfred. P. Sloan Fellowship, Microsoft Faculty Fellowship, Google research award, ARO and AFOSR Young Investigator Awards, NSF Career Award, Early Career Excellence in Research Award at UCI, Best Thesis Award from the ACM Sigmetrics society, IBM Fran Allen PhD fellowship, and several best paper awards. She has been featured in a number of forums such as the yourstory, Quora ML session, Huffington post, Forbes, O’Reilly media, and so on. She received her B.Tech in Electrical Engineering from IIT Madras in 2004 and her PhD from Cornell University in 2009. She was a postdoctoral researcher at MIT from 2009 to 2010, an assistant professor at U.C. Irvine between 2010 and 2016, and a visiting researcher at Microsoft Research New England in 2012 and 2014.
WHAT WILL YOU BE SPEAKING ABOUT AT THE INDABA?
Modern machine learning involves deep neural network architectures which yields state-of-art performance on multiple domains such as computer vision, natural language processing and speech recognition. As the data and models scale, it becomes necessary to have multiple processing units (either CPU or GPU cores) for both training and inference. Apache MXNet is an open-source framework developed for distributed deep learning. I will describe the underlying lightweight hierarchical parameter server architecture that results in high efficiency. We obtain state-of-art performance: ~90% efficiency on P2.16x AWS instances with 16 GPUs, and ~88% efficiency on multi-node AWS instances with 256 GPUs. I will also demonstrate how you can quickly start using MXNet by leveraging preconfigured Deep Learning AMIs (Amazon Machine Images) and CloudFormation Templates on AWS.
Pushing the current boundaries of deep learning requires using multiple dimensions and modalities. These can be encoded into tensors, which are natural extensions of matrices. We present new deep learning architectures that preserve the multi-dimensional information in data end-to-end. We show that tensor contractions are an effective replacement for fully connected layers in deep learning architectures. They result in significant space savings (more than 65%) with negligible performance degradation. We also introduce tensor regression in the output layer of the networks and establish further space savings. This is because tensor operations retain the multi-dimensional dependencies in activation tensors while fully connected layers flatten them into vectors and lose this information. Tensor contractions present rich opportunities for hardware optimizations through extended BLAS kernels. We propose a new primitive known as StridedBatchedGEMM in Cublas 8.0 that significantly speeds up tensor contractions, and avoids explicit copy and transpositions. These functionalities are available in the tensorly package with mxnet backend interface for large-scale efficient learning.
Pushing the current boundaries of deep learning requires using multiple dimensions and modalities. These can be encoded into tensors, which are natural extensions of matrices. We present new deep learning architectures that preserve the multi-dimensional information in data end-to-end. We show that tensor contractions are an effective replacement for fully connected layers in deep learning architectures. They result in significant space savings (more than 65%) with negligible performance degradation. We also introduce tensor regression in the output layer of the networks and establish further space savings. This is because tensor operations retain the multi-dimensional dependencies in activation tensors while fully connected layers flatten them into vectors and lose this information. Tensor contractions present rich opportunities for hardware optimizations through extended BLAS kernels. We propose a new primitive known as StridedBatchedGEMM in Cublas 8.0 that significantly speeds up tensor contractions, and avoids explicit copy and transpositions. These functionalities are available in the tensorly package with mxnet backend interface for large-scale efficient learning.
WHAT ADVICE WOULD YOU GIVE TO THOSE GETTING STARTED IN MACHINE LEARNING?
To get hands-on experience, one can explore using jupyter notebooks, e.g. on mxnet website. This allows one to tweak parameters and visualize the results on various datasets. It is important to have enough mathematical grounding to understand the role of various parameters in a machine learning algorithm. I have seen people struggle when algorithms don't work straight out of the box. Spending enough time and effort on a foundational course in machine learning (such as Andrew Ng's online course) is essential. If that is too difficult, brushing up on high school math is highly advised.