Illustration 6

Powered by Evolved Analytics' DataModeler

Illustration: Handling correlated variables

Although common, correlated variables are a major problem for most modeling techniques. In contrast, SymbolicRegression can identify the best of correlated inputs and may synthesize metavariables to produce insightful, robust and quality models.

Generate the correlated data

Here we have two driving variables which are unknown. These are transformed to produce the observed variables as well as the response that we want to model. In looking at the BivariatePlot below, there isn't an obvious relationship between the observed inputs and the response. However, we do see instances of highly coupled inputs which would generally cause problems for traditional modeling techniques. The underlying relationships between the two (unknown) real drivers and the observed inputs and response, of course, are unknown to the modeling process.

6_correlatedVariables_1.gif

6_correlatedVariables_2.gif

Devoting fifteen seconds to modeling discovers the underlying relationship — and manages to not get confused by the fact that some inputs are highly correlated and, furthermore, that the observed response is not linearly correlated to the input variables. This is pretty cool.

6_correlatedVariables_3.gif

6_correlatedVariables_4.gif

6_correlatedVariables_5.gif

Spikey Created with Wolfram Mathematica 8.0