GS 2 Day 1 (7-April-2021)

2.1 Opening Keynote by Jay Yagnik YouTube 9:00 AM

Jay divided his presentation into three topics.

2.1.1 Patterns of progress in AI

  • End-to-end ML techniques result in practically the best models in almost all the domains (e.g., health care, speech recognition, vision-based robotics, self-driving vehicles, games, agriculture, weather forecasting).

    • Maybe it allows using some mathematical shortcuts underlying the phenomena.
  • Reusable blocks of ML (e.g., Tensorflow, PyTorch) allow building upon it and quickly connecting with the other pipelines.

    • Recurrence is closely related to the deep part of DL, yet it does not work well. This is a contradiction or a paradox

    • Very little has changed in terms of fundamental blocks of DL. Researchers must look at this as there is much room for improvement at the fundamental levels.

    • Another example of reusability is transfer learning using BERT models

  • Model capacity and compute have grown way faster than Moore’s law.

    • TPU is able to work fast because of lower precision computation
  • Potential technology inflation point: Quantum Computing

    • Quantum processors can execute exponentially complex tasks in linear time

    • A challenge of noise exists but can be tackled in the future by a self-error-correcting mechanism using a large number of qubits.

2.1.2 Future of artificial general intelligence (AGI)

  • Including world knowledge in models that are grounded in reality would work well.

2.1.3 Societal impact with AI

  • Looking at the patient history and helping doctors to focus on important events of the past.

  • Flood warning system by Google

  • Sharing fields on which Google India is working on Google level as well as societal level.

2.1.4 Q&A

  • Q: How would one balance between breadth and depth as a PhD student?

    A: Jay said he started as a machine learning and vision guy 15 years back. After that, he forced himself to learn a new field at a complexity level of a practitioner of that field every two to three years. He felt that summarizing all the domains, the number of core problems being solved is really small. Just because of different names and slightly different ways, they look very different. Thus, he advised first to get grounded in one field at the nuts and bolts level, and then it would force one to make connections among various areas, which is also a good recipe for building a good career.

  • Q: What are the expectations from a PhD student who wants to join the research industry? Where should one focus in the last years of PhD? How different is research in the industry from academia?

    A: Answering the last question first, industry focuses on four pillars (patterns observed in successful researches): i) fundamental research; ii) Building infrastructure; iii) Taking the product to larger user base; iv) new product renovation. Answering the other questions, one should be able to summarize what they know in different fields. Jay recalls his adviser’s advice that one should be able to summarize a paper in 3-5 sentences. Jay also builds a corollary on top of it that one should write papers one would not be able to summarize in 3-5 sentences.

  • Q: How to explain end-to-end models? Does Google use something specifically for the explainability of models?

    A: Most models are black-box nowadays, and we also have black-box methods to probe them (pointing to gradient propagation and uncertainty propagation methods). What happens in between (exact mathematics) is of the least concern as long as one can reason the output based on particular inputs (e.g., which part of a medical image drove the decision made by an AI model?)

  • Q: As the model parameters are growing into millions, we need to scale hardware (GPUs and TPUs) at the same pace. This might be a bottleneck after some time. What are your views on this?

    A: Jay has two answers to this question. First is, having a small GPU cluster at the university level is not that expensive, and it is enough to do meaningful experiments that can scale in practice. Secondly, Jay believes he would not be surprised if a few years from now we discover that we were unnecessarily wasting a lot of compute and there are clever mathematical tricks (nearest neighbor searching, branch and bound) to do the task efficiently.

  • Q: Most ML models learn with back-propagation, but the human brain does not seem to learn that way. How to closely mimic human brain learning for AGI (artificial general intelligence)?

    A: Jay does not think it is a requirement for AGI (quoting airplanes don’t flap their wings philosophy). He feels that taking inspiration from nature is good, but one has to stay within the limits of what machines can do.

2.2 General ML session by Katherine Heller 10 AM

Katherine talked elaborately about her research projects.

2.2.1 Learning to Detect Sepsis with a Multitask Gaussian Process RNN Classifier

Multitask Gaussian process models were used to convert irregularly spaced data into equally spaced time-series data in this work. After that, the RNN model was used to predict the probability of having sepsis on real-world data. Authors use the Lanczos method to cope with the Gaussian process’s time complexity and draw approximate samples. Authors were able to improve the performance of previous work significantly.

2.2.2 Graph-Coupled HMMs for Modeling the Spread of Infection

Authors try to efficiently model the spread of infection with GCHMMs leveraging sparsity in social networks. Authors successfully leverage mobile phone data collected from 84 people over an extended period to model the spread of infection on an individual level

2.2.3 Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data

In this work, HGCHMMs were used to detect infections in a small mobile community. The model predicted the probability of infections for each person each day.