News archive for May 2012

DataModeler Release 8.08 (16 May)

Wednesday, May 16, 2012

We are proud to release DataModeler 8.08 (16 May 2012)! Other than working around a "designed-as bug/feature" in Compile, the theme of this release is a major new capabilities for metavariable identification and exploitation. A metavariable is simply a combination of variables or a transform of a variable which is useful in the developed models. There are eight new functions supporting this capability:

New Functions

We can exploit the diversity of model forms developed during SymbolicRegression by running many IndependentEvolutions and looking for those MetaVariables which are prevalent in quality models. This can provide insight into underlying mechanisms and alternative paths to quality models — which is especially useful when we have highly correlated/coupled inputs and multiple paths to producing models of comparable accuracy and conciseness.


Of course, as illustrated below, we can also specify MetaVariables to be exploited and explored during SymbolicRegression. In this fashion, we can explore the potential of these metavariables as well as bias the model search towards their exploitation (since the supplied variable combinations or transforms do not have to be rediscovered). If we were so inclined, we could even exclude the direct use of any of the DataVariables and use only the MetaVariables in the model search.

Symbolic Regression Example

In summary, the support for the discovery and exploitation of MetaVariables is a major enhancement in DataModeler. In addition to the documentation and help examples, you might also like to check out the new case study, Symbolic Regression is Not Enough, which looks at these new capabilities within the context of a modeling workflow and also highlights some of the recently-added capabilities around variable combination analysis. (To get to the case studies, open up the DataModeler guide page in Mathematica's help and click on the tutorials link in the introductory paragraph.)

The official release notes for 8.08:

Support for the identification and exploitation of MetaVariables was the main theme of this release. However, we also discovered a "designed as" bug in the Mathematica Compile behavior that warrants a workaround.

  • The default behavior of Compile is to value speed rather than quality. Hence, for example, evaluating Compile[{x},UnitStep[1/x]][0] will return a value of 1 rather than detecting the pathology. When reported to WRI, the official response was that this was proper behavior. Since this is a dangerous behavior given the disparate model forms synthesized during SymbolicRegression, NumericCompile has been modified to use RuntimeOptions -> "Quality". More discussion is in the NumericCompile help.
  • Implemented support for MetaVariables. Now users can specify MetaVariables to SymbolicRegression and those will be used in the model development (the returned models, however, will be expressed in terms of the native DataVariables). Specifying these variable transforms and combinations can accelerate the model discovery as well as guide the structure of the models returned.
  • Added a new case study, Symbolic Regression is Not Enough, based upon our chapter for the 2012 Genetic Programming Theory & Practice Workshop. This paper looks at the issues around the modeling process and highlights the need for context and tools to identify and select key variables and metavariables in the pursuit of deployable models.
  • Implemented a suite of functions to identify, prioritize and extract MetaVariables from developed models. MetaVariables, MetaVariablePresence and MetaVariableTable look at the aggregated model set for metavariables.
  • Also implemented were functions looking at the variability of metavariable discovery. These functions, MetaVariableDistribution, MetaVariableDistributionTable and MetaVariableDistributionChart partition the supplied models into their IndependentEvolutions and can give insight into key transforms if there are many possible variable combinations which lead to quality models.
  • Implemented a MetaVariableModels function which will synthesize GPModels in terms of the DataVariables.
  • Implemented an AugmentData function which will append colums to the supplied dataMatrix based upon the MetaVariables option setting.
  • Implemented support for a SortBy option in UnivariatePlot. This can be either an index into the columns of the data matrix or one or more of the components of the supplied DataVariableLabels. This looks like it will be a very insightful augmentation for some data sets.
  • Implemented a new function, RangeLength, which returns Range[Length[x]]. Although simple, this utility function was requested by users due to the frequency of needing this behavior.
  • Implemented a Tooltip option for ResponsePlot, ResponseSurfacePlot and ResponsePlotExplorer to suppress the display of the reference values on the response curves. Unfortunately, although they are very useful, Mathematica's implementation of tooltips is very fragile and can cause the continuous reformatting of notebooks.
  • Fixed a bug in LabelString (and, by extension, LabelForm) where the NumberFormatting option was not being applied to real values within expressions.
  • Generalized CreateDataVariableNames to handle formatted inputs. Previously, it could also handle lists of formatted inputs so the documentation was also updated to reflect that capability.
  • Extended SymbolicRegression and SelectModels to handle combinations of the output of DriverVariables and DriverVariableCombinations as inputs to the ModelingVariables, AllowedVariables and RequiredVariables options. This form will also be valid for any of the many functions that implicitly use SelectModels.
  • Fixed a bug in SymbolicRegression wherein modeling would fail if non-numeric data was supplied and Rescale was enabled.