Illustration 1

Powered by Evolved Analytics' DataModeler

Illustration: Creative hypothesis generation

Conventional methods impose artificial constraints on the models — e.g., polynomials — despite the fact that these constraints do not have a physical basis and are only imposed to make the mathematics tractable for that method or due to a lack of imagination. SymbolicRegression, in contrast, lets the data define the model form free of artificial limits. Part of this is the ability to hypothesize and explore diverse potential model structures.

Generate trivial data

Let us generate the trivial data set below. Our first inclination would be to assume a straight-line linear model. HOWEVER, the reality is that we only know truth (assuming there is not contribution from noise or other perturbations) at two points. Between the observed data points we cannot be confident in any conclusions — even though we do have a bias for simplicity and an aversion to unwarranted complexity.

Search for possible models

Now let us devote three seconds to identifying models which fit this data. Although we might intuitively prefer the simple linear model, all of the other discovered models also fit the data exactly! (Mousing over the ResponseSurfacePlot curves will display the underlying model whereas mousing over the functions in the ModelSelectionReport will display the variables embedded within the expression.)

We can exploit the creativity of SymbolicRegression in a number of ways in addition to searching for possible model structures. Ensembles of diverse but quality models imply that we can develop a trust metric on our predictions. We can also use the model disagreement to guide data collection for Adaptive DOE. We will discuss these topics more later.

The key take-away here is that while a linear model might be appropriate — and, if it is appropriate, it will be recognized and reported as such — the data determines the model form.