The International Conference on Machine Learning (ICML) is one of the top conferences for machine learning and artificial intelligence research. Hundreds of researchers gather to present and discuss the latest papers and trends on machine learning. Here we briefly summarize some of the key takeaways from this year’s conference.
Causal Inference
Causal inference deals with discovering actual causal relationships between variables (explicit or latent), instead of simply capturing correlations between variables.
Current machine learning approaches rely mostly on the correlations, which comes with a bunch of problems like difficulties with out-of-distribution samples, tendency to overfit to spurious correlations and lack of proper interpretability. The following image shows it quite clearly:
The model didn’t learn to identify the cow; it captured the correlation between the background (grasslands) and the label (cow). When you keep the cow but change the background, the predictions change drastically.
There have been lots of advances in using causal inference in machine learning models, and conversely, using machine learning to learn causal mechanisms. This seems to be a promising direction for machine learning since it promises robust, interpretable models.
Explainability
Following the idea of causality, we see many recent methods for neural network explainability and interpretability that rely on causal mechanisms. It now seems that the idea that “deep learning is a black-box and is not interpretable” is becoming more a myth than a reality. Modern methods are capable of some truly impressive feats, such as explaining the model’s prediction in natural language.
An example from (Majumder et. al, 2021), shows what this looks like:
This method allows the model to respond:
“They are in a hospital room” because “There are hospital beds and nurses in the room”.
Efficient Learning
Datasets for deep learning are often huge, and training models requires a lot of time and a lot of computational resources. Some recent papers have been taking a look at this problem and come up with quite creative solutions.
(Kim et. al, 2022) show methods for synthetic dataset generation, which enables learning models with results similar to the ones trained in entire datasets while using significantly fewer data points, even under 1% of the original dataset.
(Mindermann et. al, 2022) propose that not all data points are equally valuable, and avoid using redundant data points, with redundant meaning points that i) are not learnable, ii) are not worth learning or iii) are already learnt.
Applications
In terms of new applications, the use of deep learning for bioinformatics, chemoinformatics, and drug discovery is growing a lot. There’s so much content to cover that it probably deserves its own post, but as a preview we can see applications of deep learning for predicting molecular structures, binding sites, effectiveness of potential drugs, drug-discovery focused on ease of synthesis, drug synthesis generation, and much more. Graph neural networks, huge datasets of chemical, pharmacological, and 2D and 3D molecular geometry are the main drivers of this revolution, and the potential is huge.
Theory
The invention of the steam engine predates the theory of thermodynamics, and something analogous happened in deep learning. We started building massive computational models that added a lot of value, and while we understood how they work (matrix multiplications, activation functions and gradient-based learning), we did not understand why they work.
When the theory of thermodynamics was developed, this enabled us to understand why the engines worked the way they did, and gave us insights on how they could be improved. Something similar is happening to deep learning now, and a lot of theoretical foundations are starting to emerge.
We’re learning the structure and behavior of the loss landscape, the evolution of the gradients and the neural tangent kernel through training, the ideas behind pruning, the convergence of the learning methods, and so much more. In the future we’ll have answers to those questions: why does deep learning work, and make informed decisions about architectures and training methods that are supported by these theoretical foundations.
—
There were many more interesting areas of research that weren’t covered in this post, including: federated learning, reinforcement learning, generative models, graph neural networks, bayesian methods, and ethics.
It is an exciting time for machine learning, research is moving fast in many different directions, and the applications we’re finding are becoming more impactful. Let’s see what the future holds for us; at Factored, our fingers are always on the pulse to ensure we remain leaders in our industry.
Need rigorously vetted, expert machine learning engineers to complement your tech team? Book a meeting with us today.