News archive for April 2011

DataModeler Release 24.0 (28 April 2011)

Thursday, April 28, 2011

At this point we should be feature-complete for the official release. The function documentation has been checked and brought up-to-date; the last step will be a sweep through the tutorials & case studies.

There are not too much in the way of changes with this release; however, we do have a major change in that the BoxRegion option used to select models has been renamed QualityBox and the BoxRegion toggle used to constrain the extremal search in, for example, ModelExtrema, has been renamed BoxBoundary.

The other highlight is we also get a couple of new functions, DriverVariables and RearrangemodelQuality which should prove to be convenient for developing ModelEnsemble as well as monitoring the evolutionary search progress.

These and the other changes are in the release notes extract below:

  • (major change) Split the current BoxRegion option into two new option names: QualityBox and BoxBoundary. Although this is a fairly major change, this avoids future problems since, even though it is undocumented, BoxRegion is a Mathematica system variable and, therefore, out DataModeler's control. QualityBox is now used by SelectModels (and all functions which build upon it) to define the region of ModelQuality from which models should be selected. BoxBoundary is an option for ModelMaximum, ModelMinimum and ModelExtrema which defines whether the extremal search should be restricted to the DataVariableRange. Although this will require changing old analyses, the time to change is now so apologies for the inconvenience and the need to retrain the muscle memory.
  • Implemented a new function, DriverVariables, to make it easier to select interesting models for inclusion in ensembles or to specify ModelingVariables for secondary rounds of SymbolicRegression. Although the equivalent could be accomplished using appropriate option settings for VariablePresence, the new form is much cleaner and more straightforward because of its restricted scope.
  • Added a new function, RearrangeModelQuality, which will restructure the ModelQuality as well as make the appropriate adjustments in the ModelPersonality for the ModelingObjective and ModelingObjectiveNames. This function is useful when looking at monitoring results from a SymbolicRegression since it can suppress the SecondaryModelingObjective (typically, ModelAge) which are used during the modeling evolution.
  • Made RobustCorrelationMatrix a little more general.
  • Deleted the $ClassicGPExplore, $ClassicGPIntensive, $ClassicGPQuick, $ParetoGPExplore, $ParetoGPIntensive, $ParetoGPQuick, $OrdinalGPExplore, $OrdinalGPIntensive and $OrdinalGPQuick pre-defined option sets for SymbolicRegression since the need for explicitly specifying the number of targeted generations has been obviated by the current default option settings and the continual innovation offered. For most regression problems, users now really only need to specify the TimeConstraint and whether StoreModelSet should be activated (and, possibly, tweak the FunctionPatterns to adjust to model search to the problem domain). However, the $ConventionalGP option set has been maintained in case the user would like to slow things down by a couple of orders-of-magnitude.
  • Extended UpdateModelQuality, EvaluateModelQuality, UpdateModelQualityVsMultipleDataSets and EvaluateModelQualityVsMultipleDataSets to accomodate data matrices where the targeted response is embedded within the supplied data set.
  • Modified KeijzerExpansion to accept data matrices with embedded target response. Also, now supplied options will be embedded in the developed models.
  • Modified CreateModelEnsemble to only require UniqueModels rather than UniqueAndFitModels if building a ModelEnsemble from only a supplied model set (i.e., without a data input-output set to guide the ensemble selection).