Machine learning, also known in physics circles as multivariate analysis, is used more and more in high-energy physics, most visibly in data analysis but also in other applications such as trigger and reconstruction. The community of machine-learning data scientists organises “Kaggle” competitions to solve difficult and interesting challenges in different fields.
With the aim being to develop interactions with the machine-learning community, LHCb organised such a competition, featuring the search for the lepton-flavour violating decay, τ→ μμμ. This decay is (almost) forbidden in the Standard Model, and therefore its observation would indicate a discovery of “new physics”, which is now the key goal of the LHC. This Kaggle challenge (https://www.kaggle.com/c/flavours-of-physics) was conceived by a group of scientists from CERN, the University of Warwick, the University of Zürich and the Yandex School of Data Analysis. It was financially supported by the Yandex Data Factory, Intel and the University of Zürich. The competition took place over three months between July and October 2015. More than 700 people competed to achieve the best signal-versus-background discrimination and to win the prize awarded to the first three ranked solutions, totalling $15,000.
This particular challenge, using both “real” and simulated LHCb data, has been recognised by the community as more complicated than usual challenges, and therefore a refreshing problem to try and solve. The winners of the competition were awarded their prizes in December at one of the main conferences of the machine-learning community – the Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS).
In addition to the prizes for the best-ranked solutions, another prize was foreseen for the solution that is the most interesting from a physics point of view. In the event, LHCb decided to award two of these physics prizes of $2000 each to Vincens Gaitan (a former member of the ALEPH collaboration at CERN’s Large Electron–Positron collider) and Alexander Rakhlin. Their solutions are innovative and particularly suitable for cases where the size of the samples used to train the multivariate operator is limited and when the training samples do not perfectly match the real data.
The two awardees collected their prize at a three-day workshop organised at the University of Zürich on 18–21 February, as a follow-up to the Kaggle challenge. This workshop brought together 55 people from the LHC and the machine-learning communities, and interesting ideas have been exchanged. The general conclusion from discussions at this event was that the exercise had been a very positive one, both for LHCb and those that entered the competition.