Release news and events

DataModeler Release 8.08 (16 May)

Wednesday, May 16, 2012

We are proud to release DataModeler 8.08 (16 May 2012)! Other than working around a "designed-as bug/feature" in Compile, the theme of this release is a major new capabilities for metavariable identification and exploitation. A metavariable is simply a combination of variables or a transform of a variable which is useful in the developed models. There are eight new functions supporting this capability:

New Functions

We can exploit the diversity of model forms developed during SymbolicRegression by running many IndependentEvolutions and looking for those MetaVariables which are prevalent in quality models. This can provide insight into underlying mechanisms and alternative paths to quality models — which is especially useful when we have highly correlated/coupled inputs and multiple paths to producing models of comparable accuracy and conciseness.


Of course, as illustrated below, we can also specify MetaVariables to be exploited and explored during SymbolicRegression. In this fashion, we can explore the potential of these metavariables as well as bias the model search towards their exploitation (since the supplied variable combinations or transforms do not have to be rediscovered). If we were so inclined, we could even exclude the direct use of any of the DataVariables and use only the MetaVariables in the model search.

Symbolic Regression Example

In summary, the support for the discovery and exploitation of MetaVariables is a major enhancement in DataModeler. In addition to the documentation and help examples, you might also like to check out the new case study, Symbolic Regression is Not Enough, which looks at these new capabilities within the context of a modeling workflow and also highlights some of the recently-added capabilities around variable combination analysis. (To get to the case studies, open up the DataModeler guide page in Mathematica's help and click on the tutorials link in the introductory paragraph.)

The official release notes for 8.08:

Support for the identification and exploitation of MetaVariables was the main theme of this release. However, we also discovered a "designed as" bug in the Mathematica Compile behavior that warrants a workaround.

  • The default behavior of Compile is to value speed rather than quality. Hence, for example, evaluating Compile[{x},UnitStep[1/x]][0] will return a value of 1 rather than detecting the pathology. When reported to WRI, the official response was that this was proper behavior. Since this is a dangerous behavior given the disparate model forms synthesized during SymbolicRegression, NumericCompile has been modified to use RuntimeOptions -> "Quality". More discussion is in the NumericCompile help.
  • Implemented support for MetaVariables. Now users can specify MetaVariables to SymbolicRegression and those will be used in the model development (the returned models, however, will be expressed in terms of the native DataVariables). Specifying these variable transforms and combinations can accelerate the model discovery as well as guide the structure of the models returned.
  • Added a new case study, Symbolic Regression is Not Enough, based upon our chapter for the 2012 Genetic Programming Theory & Practice Workshop. This paper looks at the issues around the modeling process and highlights the need for context and tools to identify and select key variables and metavariables in the pursuit of deployable models.
  • Implemented a suite of functions to identify, prioritize and extract MetaVariables from developed models. MetaVariables, MetaVariablePresence and MetaVariableTable look at the aggregated model set for metavariables.
  • Also implemented were functions looking at the variability of metavariable discovery. These functions, MetaVariableDistribution, MetaVariableDistributionTable and MetaVariableDistributionChart partition the supplied models into their IndependentEvolutions and can give insight into key transforms if there are many possible variable combinations which lead to quality models.
  • Implemented a MetaVariableModels function which will synthesize GPModels in terms of the DataVariables.
  • Implemented an AugmentData function which will append colums to the supplied dataMatrix based upon the MetaVariables option setting.
  • Implemented support for a SortBy option in UnivariatePlot. This can be either an index into the columns of the data matrix or one or more of the components of the supplied DataVariableLabels. This looks like it will be a very insightful augmentation for some data sets.
  • Implemented a new function, RangeLength, which returns Range[Length[x]]. Although simple, this utility function was requested by users due to the frequency of needing this behavior.
  • Implemented a Tooltip option for ResponsePlot, ResponseSurfacePlot and ResponsePlotExplorer to suppress the display of the reference values on the response curves. Unfortunately, although they are very useful, Mathematica's implementation of tooltips is very fragile and can cause the continuous reformatting of notebooks.
  • Fixed a bug in LabelString (and, by extension, LabelForm) where the NumberFormatting option was not being applied to real values within expressions.
  • Generalized CreateDataVariableNames to handle formatted inputs. Previously, it could also handle lists of formatted inputs so the documentation was also updated to reflect that capability.
  • Extended SymbolicRegression and SelectModels to handle combinations of the output of DriverVariables and DriverVariableCombinations as inputs to the ModelingVariables, AllowedVariables and RequiredVariables options. This form will also be valid for any of the many functions that implicitly use SelectModels.
  • Fixed a bug in SymbolicRegression wherein modeling would fail if non-numeric data was supplied and Rescale was enabled.


DataModeler Release 8.06 (26 January)

Thursday, January 26, 2012

The theme of this release is a significantly enhanced DataOutlierTable. DataOutlierTable function now allows DataRecordLabels to be displayed as well as other changes to improve the information display. Also changed the Input option to display the input variables to VariablesToPlot and added some flexibility and clarity to the input data display.

Besides we attached a Tooltip to the titles of the VariableCombinationMap, VariableCombinationChart, and VariableCombinationTable showing the cumulative percentage of the total number of distinct combinations in the model set. Since there is a combinatorial explosion of possibilities when many input variables are being considered, this provides some context given that many variable combinations may not satisfy the SignificanceLevel threshold for display in the graphic.

Several new bugs were fixed:

  • Fixed a bug in BivariatePlot wherein the warning messages if non-numeric data were supplied were generating incorrect numbers of columns affected.
  • Fixed a bug in UnivariatePlot wherein if (the default) GraphicsArrayColumns -> Automatic was being used, the number of data columns supplied rather than the number of numeric data columns would be used in calculating the layout.
  • Fixed a bug in ModelSelectionReport and ModelSelectionTable wherein supplying certain colors as the ColorFunction would cause the details of the Style construct to be displayed.
  • Fixed a bug in LabelString (and, by extension, LabelForm) wherein symbols and tooltips were not being handled properly if a list was supplied and the Joined -> True option was enabled. The new behavior is for the tooltip content to be stripped since a Tooltip cannot be converted into a form acceptable to StringJoin.
  • Increased the default NumericColumnThreshold options setting for SymbolicRegression to 0.75 (from 0.7). This means that any supplied data column must be at least 75% numeric to be included in the model development.
  • Extended VariablePresenceChart and VariablePresenceDistributionChart to accomodate the output of DriverVariableCombinations being supplied as the input to the VariablesToPlot option.
  • Modified MakeDataNumeric to allow a ReplacementFunction -> None setting. This just returns the originally supplied data structure. Additionally, we can specify None as part of a list; in this case the corresponding column would be returned unmodified.
  • Fixed a parsing bug in MergeInputResponseData wherein elements (non-lists) which were constructs (e.g., Π ) were not being recognized as being “atomic”.


DataModeler Release 8.05 (7 December)

Thursday, December 8, 2011

The theme for this release is improved (and beautiful) model analysis. We have another suite of functions in development targeted at VariableContributionAnalysis; however, rather than hold things up while that gets lined out, we decided to get this out the door since the changes since the multi-core release are fairly extensive as well as practically useful. Since the variable contribution analysis tools are in the pipeline, the QuickStart, case studies and function examples have not yet been refreshed to illustrate these new tools. That will be part of the next release.

The theme for this release is model analysis. We have another suite of functions in development targeted at VariableContributionAnalysis; however, rather than hold things up while that gets lined out, we decided to get this out the door since the changes since the multi-core release are fairly extensive as well as practically useful. Since the variable contribution analysis tools are in the pipeline, the QuickStart, case studies and function examples have not yet been refreshed to illustrate these new tools. That will be part of the next release.

The release notes are below. Other than working around a Mathematica bug that makes the normal way of suppressing warning messages from SymbolicRegression via Quiet impossible, we have some important new functionality which is pretty slick:

  • ModelDimensionalityTable: This lets us look at the number of variables required for models. This is especially nice when we have coupled inputs so that different variable combinations can achieve quality models. This function also makes extensive use of tooltips to maximize the information content and accessibility.
  • VariablePresenceChart: This provides a quick visual of the relative presence of model variables. It is related to the VariablePresenceMap as well as the VariableCombinationChart. In addition to also using tooltips to provide additional content, it is a smart function in that, for example, changing the BarOrigin will make intelligent adaptations in option settings.
  • CorrelationChart: This provides a 1-D slice of the CorrelationMatrixPlot and is useful for getting a quick overview of the linear relationships between inputs and targeted response. Of course, this function also features tooltips as well as intelligent adaptation of option settings.
  • VariablePresenceDistributionChart: During modeling we want to run multiple IndependentEvolutions (which are facilitated by the multi-core support of current generation CPUs). Since each model search follows a different path, it is useful to look at the variability between these different searches since there can be multiple paths to achieving high-quality models. Towards that end, this release features model tagging and functions to allow separating the IndependentEvolutions. In this function we can see some of the variable substitutions which are possible and identify inputs which may merit further investigation despite not rising to the forefront within an aggregated analysis. Consistent with the theme of this release, these functions have intelligent option setting adaptation as well as tooltips to maximize the information transfer.

Of course, all of the new functions can handle data with missing or non-numeric elements and have lots of flexibility in terms of their usage and inputs.


DataModeler Release 8.02 (25 October)

Wednesday, October 26, 2011

Parallel computing support is the big feature from this release. If you have a multi-core processor, DataModeler automatically runs parallel IndependentEvolutions up to the limit imposed by either the number of cores available, or the license restriction on the number of subkernels (typically, four) which can be associated with a given master kernel. Of course, if you have a quad-core i7 processor, you can launch two master kernels and really make the fan on your machine spin.

As detailed in the release notes below, the big change is the support for MultiCore (which is the option name to turn the feature on or off); however, we did implement a few minor tweaks in the process.

  • Implemented support for parallel execution of IndependentEvolutions during SymbolicRegression. This coarse-grain MultiCore parallelization exploits the capabilities of multi-core processors up to the $KernelCount limit imposed by the hardware or the Mathematica license.
  • Implemented a KernelID function which can be used inside the SymbolicRegression monitoring functions (e.g., GenerationMonitor) to identify which of the subkernels generated a result. Symbols updated within the monitoring functions will be shared by the various subkernels so there was a need for a tagging mechanism so that evolution trajectories could be deconvolved.
  • Extended UpdateModelPersonality to support clearing model attributes so that default behaviors will come through. Thus, if we had a model set which had previously had a PlotLabel defined, supplying PlotLabel -> Clear would take that aspect out of the ModelPersonality.
  • Modified EvaluationNotebookDirectory to restore the desired behavior of always returning a valid filepath even if the evaluating notebook had not previously been saved. Otherwise, it works the same as the built-in NotebookDirectory function.

DataModeler goes Multi-Core!

Wednesday, October 26, 2011

We are happy to announce that Evolved Analytics' DataModeler now fully supports multi-core computing. This increases the robustness of computations and saves time. So good to see all the cores running!