News archive for December 2009

DataModeler Release 16.0

Tuesday, December 22, 2009

Documentation completion continues to be the priority theme in this release of 22 December 2009; however, a number of changes and enhancements are created in that process:

  • Modified the default interpretation of SymbolicRegression building blocks (DataVariables, FunctionPattern or TerminalSet) if a list without an associated class weight is supplied. Previously, it was assumed that each of the list elements would have a element weight of one for the roulette wheel assembly of RandomModels. However, it is quite convenient and attractive to simply supply a list labels for DataVariables which results in directly interpretable models without the need for DataVariableLabels. Unfortunately, for reasonably multivariate data sets this would result in simplistic models being synthesized since the class of DataVariables would be heavily overweighted. Now if a list of variables is supplied, it is assumed that the supplied components should be normalized so that the set has a class weight of one which is more likely the desired behavior.
  • Modified the default Options settings for SymbolicRegression so that a single thread is used for RunsPerCascade. The previous {3, 1, 1} default would have three runs (nominally, of ten generations) execute in parallel for the first cascade and, then merge these results and continue with a single model search thread. This strategy helps to kick start the model search. The problem is that for very short SymbolicRegression the search is spending time laying a foundation which is not exploited. To compensate, the PopulationSize was changed from a flat 300 to {1000, 500, 300} so the first two generations feature larger population sizes to maximize the influx of high-quality genetics.
  • Extended GridTable to include a TableDirections option. This allows the supplied data to be transposed \[LongDash] even if the data is ragged. (Hence, in this case, GridTable has more functionality than TableForm.)
  • Modified SummaryStatistics to return an appropriately sized (according to the supplied SelectionFunction) vector or matrix if an empty list or matrix is supplied. This is a bug fix since, previously, an arcane error message would be returned.
  • Removed an implicit requirement of SelectModels (which would ripple into NicheModels) that the supplied models have the ModelFitness evaluated.
  • Fixed a bug in RescaleData wherein symbolic numerics (e.g., \[Pi]) would not be recognized as valid rescale ranges.
  • Fixed a bug in ModelExtrema, ModelMinimum and ModelMaximum where duplicate extrema would sometimes not be deleted if the option Unique -> True was set.
  • Since we generally want to use the results from RetrieveModelSets as an group rather than individual file results, a MergeModelSets option was enabled (default associated with StoreModelSet for consistency) with the new default behavior being True. This avoids the need to post-process the retrieved results with explicit application of the MergeModelSets function. Setting this option to False will restore the previous behavior.
  • Added an Input option to DataOutlierTable to allow suppression of the input data record display. This is useful in situations where many input variables are in the source data and simply knowing the index of the offending data record and its degree of strangeness is sufficient.
  • Fixed a bug in ModelInputOutputMatrix wherein if a model only had a single variable and the input was supplied as a vector the supplied form was not recognized as being valid. Now the function checks the alignment of the evalPts dimensionality with the DataVariables embedded in the model(s) to perform an appropriate interpretation.
  • Added a ModelVariables function which is equivalent to ModelSubspace. The new name is more appropriate for the typical user to describe the functionality.
  • Corrected a number of bugs in ModelNonlinearity and modified the implementation so that the returned value is normalized by the range of the sum of all variables to provide a constant reference for model comparison.
  • Modified ModelPredictionPlot and EnsemblePrediction plot so that if a list of models is displayed that any supplied PlotLabel will only apply to the graphics grid rather than the individual models. Also, any AspectRatio settings will now only apply to the individual plots.
  • Modified RemoveModelScaling so that it can handle a ModelEnsemble being supplied. If an ensemble is supplied, it is simply returned unchanged.
  • Implemented a ConvertToFittedModel function which returns a FittedModel data structure. This can be used directly by the built-in Mathematica statistics function introduced in Mathematica 7.
  • Deleted the ModelRegressionReport function since the foundation Statistics`LinearRegression package had been superseded as part of the changes in version 7 and the basic functionality can be achieved by using the result from ConvertToFittedModel.
  • Fixed a bug in ModelResidualPlot wherein options embedded in the ModelPersonality were not being used if only a single model was being plotted.
  • Fixed a bug in ModelSelectionReport and ModelSelectionTable wherein DataVariableLabels with an embedded FontColor would cause errors. Now both Hue and RGBColor formatting can be handled.
  • Extended ModelTreePlot to allow Automatic as well as a pure function to be supplied for PlotLabel. Also implemented support for a ToolTipFunction for the individual tree plots.
  • Fixed a bug in MutateSubtree and DepthPreservingSubtreeMutation wherein ModelInputVariables embedded in the supplied models would not be used in the new genetics creation. This did not affect SymbolicRegression; however, it would affect standalone use of the functions.
  • Modified OptimizeModel and OptimizeModelExpression add another valid form for the OptimizeIntegers options. Now All will bring powers and square-roots into play, True (the default behavior) will handle integers which are converted to reals via N and False will leave integers alone and focus on the embedded reals. Mathematica has problems with FindFit so the optimizatons are checked and, if the model is pathological, a warning message is generated and the original model returned rather than the pathological one produced by Mathematica's optimization.
  • Discovered that OrderedQ has a strange behavior if supplied numerics which are not real, integer or rationals. This would cause problems when, for example, something like Sqrt[2] was supplied to ParetoFront since rather than being sorted between 1.4 and 3/2 it would be after all of the reals, integers and rationals — which causes a bit of a problem if there is an implicit assumption that the numbers are sorted by Sort. Although this problem has been fixed for ParetoFront and related functions and was not a problem for the data typically provided to a SymbolicRegression, it is a potential issue across all Mathematica algorithms — including, of course, those in DataModeler.