News archive for April 2010

DataModeler Release 19.0

Tuesday, April 27, 2010

Documentation completion continues to be the priority theme of this release of 27 April 2010; however, a number of changes and enhancements are created in that process:

  • More tweaking of SymbolicRegression options. The new option defaults will run for 50,000 generations unless interrupted by a TimeConstraint and feature continuous innovation over that span.
  • Modified NicheModels by renaming the Split option to be NicheBy and also introduced a new option, NicheSortBy which defines a criteria by which to sort the models in the returned niches.
  • Tweaked the performance of SelectModels. Unless variable constraints are being imposed, this should result in a substantial speedup.
  • Added a new ProgagationOperator, NichedCrossover, which may be used during SymbolicRegression. This operator partitions and organizes the supplied model set according to the the NicheBy and NicheSortBy options for NicheModels and then applies Crossover to the niched model sets.
  • Modified the $ConventionalGP option set for SymbolicRegression to include ModelComplexity as a ModelingObjective. Because a single-objective SelectionStrategy is used, this does not affect the model search; however, it is convenient for comparison with multi-objective strategies.
  • Renamed the SignificanceLevel option for UncorrelatedModels, UncorrelatedVariables and CorrelationMatrixPlot to be the clearer CorrelationThreshold.
  • Renamed the DataSegments option for UncorrelatedVariables back to DataSubsetSize since it was erroneously changed in a previous option renaming exercise. (UncorrelatedModels was never modified.)
  • Added a CreateDataVariableNames function which will convert supplied strings into forms that may be safely used for the SymbolicRegression DataVariables option. If a list is provided, all of the returned symbol strings are guaranteed to be unique.
  • Added a new option, EnsemblePlotStyle, for EnsemblePredictionPlot and EnsembleResidualPlot to specify how the ensemble predictions should be displayed. Previously, the predictions could get visually lost if there were lots of evaluation points to be displayed.
  • Fixed a bug in UncorrelatedVariables wherein if a matrix of constant columns was supplied, errors would be spawned. The new behavior is to return an empty list.
  • Modified UncorrelatedModels so that if a perfect model (zero error residual) is supplied, it will automatically be included in the returned model set. This is a bit of an ad hoc behavior snce the correlation of a constant is undefined. However, since UncorrelatedModels is used by the default EnsembleStrategy -> Automatic behavior of CreateModelEnsemble, the previous behavior of deleting perfection seemed inappropriate. Since perfection, typically, only appears for toy problems, this should not change the behavior for most real-world modeling.
  • Modified MedianAverage to accommodate Indeterminate points in the supplied vector. If the number of Indeterminate values are not too large (i.e. with a risk of intruding into the calculated result), a numeric value will be returned. Otherwise, Indeterminate will still be returned. This should be useful in ModelEnsemble visualization.
  • Modified BoundedModelResponseQ to accommodate the RangeExpansion option along with the DataVariableRange. This makes it consistent with RobustModels in its behavior.
  • Fixed a bug in VariablePresence wherein the "PresencePercent" option setting for PresenceMetric was not handled properly if a single stand-alone model or ensemble was provided.
  • Fixed a bug in VariablePresenceTable wherein only generic string DataVariableLabels would be recognized. Now formatted (e.g. the result from LabelForm) lists may be supplied.