For centuries, the process of formulating new knowledge from observations has driven scientific discoveries. With rapid advancements in machine learning, it is natural to question the possibility of automating knowledge discovery in the scientific field. A benchmark task for automated knowledge discovery is called symbolic regression.
The aim of the task is to predict a mathematical equation that best describes the observational data. The advancements in symbolic regression have a major potential to aid research in understanding the dynamics and governing properties of unexplored systems. However, the combinatorial nature of the task makes it an expensive and hard problem to solve efficiently. There are several types of algorithms for symbolic regression, from genetic programming and sparse regression to deep generative models. However, there is no survey that collates these prominent algorithms.
Therefore the goal of this paper is to summarize key research works in symbolic regression and perform a comparative study to understand the strength and limitations of each method. Finally, we highlight the challenges in the current methods and future research directions in the application of machine learning in knowledge discovery.