Improving continual learning and Multi-Exit Models – Tooploox at ICML 2025

  • Scope:
  • Artificial Intelligence
Date: July 9, 2025 Author: Konrad Budek 4 min read

The International Conference on Machine Learning (ICML) is one of the top machine learning conferences in the world. The first issue of the event was held in 1980 in Pittsburgh and, since then, the conference has been held in many cities around the world, including Austin, TX; Washington, DC; Sydney, Australia,; Helsinki, Finland; Beijing, China; and Vienna, Austria. With this year’s edition being held in Vancouver, Canada. 

Tooploox is a regular contributor to the conference, submitting research papers that tackle the problems and challenges of modern Artificial Intelligence and Machine Learning. 

This year, Tooploox has delivered two research papers, centering on continual learning and training multi-exit models.

Improving Continual Learning Performance and Efficiency with Auxiliary Classifiers

This paper shows a method of tackling catastrophic forgetting – one of the major challenges in continual learning. Interestingly, the research shows that there is common ground between machines trying to recall forgotten skills and people attempting to use skills which have gone long unused. 

What is continual learning

Continual learning is a set of techniques and technologies that aim to modify the skills of a particular neural network without complete retraining. Usually, when the creator of a neural network (be that a company, a team, or an individual) needs to add some skills or change the nature of existing ones, it is necessary to retrain the network entirely. Considering that, the procedure is extremely costly, especially when it comes to applying minor changes. Continual learning limits costs and boosts time effectiveness in neural network training. 

Yet continual learning comes with some pitfalls, with catastrophic forgetting being at the top. 

What is catastrophic forgetting

Catastrophic forgetting occurs when a trained network loses some (and sometimes all) previous skills when gaining a new one using continual learning techniques. For instance, a network controlling an autonomous car may forget how to turn left when the creator inserts a new set of road signs to memorize. 

The Tooploox team provides a solution to this problem that is (somewhat and not directly) inspired by the process by which the human brain recalls long-unused skills. 

Using auxiliary classifiers to boost continual learning performance

Neural networks are divided into layers that process information one after another. For example, if a network has to decide if a particular traffic light on a road is a red traffic light, there are several (sometimes many) steps to follow. It may look as follows:  

The system has to recognize the shape, match it with a traffic light pattern, recognize a color, match it, recognize context, check its relevance, and make a decision. Without this, an autonomous vehicle could stop in the middle of the road when spotting the neon sign of a nearby cafe simply because it is red. 

The story above is obviously oversimplified for the sake of clarity and illustration and does not intend to be a comprehensive guide to machine learning mechanisms.

Tooploox research shows that the information processed “between the layers,” called “intermediate representation,” can be recalled and used even if the network shows signs of forgetting particular skills. To reach these representations, the research team proposes Auxiliary Classifiers that access and leverage them to boost other continual learning techniques. The research paper shows that, when used, this technique can reduce the costs of interference by up to 60% and brings 10% relative gain on average, making the whole process cheaper and more efficient. 

The situation can be compared to riding a bike after a long time. Initially, one may be a little bit clumsy, yet there remains muscle memory as well as skills stored deep within the natural neural network. Re-learning to ride a bike is just a matter of a few minutes. 

The research was provided by a team consisting of Filip Szatkowski, Yaoyue Zheng, Fei Yang, Tomasz Trzcinski, Bartłomiej Twardowski, and Joost van de Weijer, representing Warsaw University of Technology; IDEAS NCBR; the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China; Computer Vision Center, Barcelona; VCIP, College of Computer Science; Nankai University; NKIARI, Shenzhen Futian; the IDEAS Research Institute; Tooploox, and Universitat Autonoma de Barcelona. The full text can be found on Arxiv

How to Train Your Multi-Exit Model? Analyzing the Impact of Training Strategies

Early exit models may not only accelerate the speed by which a neural network delivers results, but also cut the costs of its operation. Yet this technology still comes with challenges to overcome. 

What are multi-exit models?

A multi-exit model is a neural network that may provide an answer earlier than expected, in instances where the result is extremely likely to change. Assuming that there are X layers in the network, the response may be delivered at layer B, when B<X. This enables the network to save on energy consumption and processing time. For an individual user or small team, savings are minute, yet if a company operates on a larger scale answering hundreds, if not thousands, queries a day (or hour), the accumulated time savings can be game-changing.

Our work

This paper delivers a checklist to follow when training a multi-exit model. The checklist is backed by a structured comparison of the available tech, as well as the strengths and weaknesses of available approaches. 

The research was prepared by the team consisting of Piotr Kubaty, Bartosz Wojcik, Bartłomiej Krzepkowski, Monika Michaluk, Tomasz Trzcinski, Jary Pomponi, and Kamil Adamczewski. Researchers come from Jagiellonian University; Warsaw University of Technology; University of Warsaw; Tooploox; the IDEAS Research Institute; the Department of Information Engineering, Electronics, and Telecommunications (DIET), Sapienza University of Rome, Italy; and Wroclaw University of Science and Technology. The full text containing the research can be found on Arxiv

Summary

The 2025 edition of the ICML conference will take place in Vancouver, Canada. The event starts on July 13 and will continue until July 19. It is one of the most prestigious conferences on Machine Learning with arguably the longest history and researchers contributing from all around the world. 

It is not the first time Tooploox has contributed to the ICML conference. Previous texts on contributions include Bitter Lessons in Reinforcement Learning and Hypernetwork approach to generating point clouds.

Similar Posts

See all posts