The release notes (found in the preface) are below. From a visual perspective, the big change is changing the default setting of DataVariableLabels to ColorizeList rather than Automatic. The effect is to color-code each of the model variables with a unique and consistent color. This seems to help in visualizing expressions; however, if you do not like the colors, the Automatic setting will use the embedded DataVariables set in black. Please give feedback as to whether you like or dislike this new default.
The other big change is a suite of functions for variable combination analysis (VariableCombinations, VariableCombinationTable, VariableCombinationMap, VariableCombinationChart and DriverVariableCombinations). The help files have some good examples — this one is huge in terms of ease-of-use. Speaking of help files, the quick start is worth a gander.
We also bulked out the support for non-numeric data. DataModeler should be able to "just work" even if you throw ugly data with missing elements and non-numeric columns and that ilk at it.
The more intelligent SelectionStrategy behavior default of Automatic is also pretty nice. The default assumption here is that if you specify a QualityBox, you want the 50% of models closest to the ParetoFront and, otherwise, you want to maintain the entire set of supplied models.
As always, thank you for your questions, bug reports and suggestions.
We have now completed the documentation through the quick start tutorial along with all of the function ref pages. Along the way, new capabilities were implemented, implementations refined and some bugs squashed. The new default behavior for SelectModels is a big improvement in terms of ease-of-use and the suite of functions around VariableCombinations has quickly been incorporated into our best practices. The help pages have not yet been refreshed to show the effect of the new variable color-coding (with the exception of the quick start tutorial).
- Implemented a ColorizeList function which allows applying position-specific colors to the elements of a list. Additional formatting appropriate for Style can also be applied using this function.
- ColorizeList is now the default setting for the SymbolicRegression DataVariableLabels option. The new behavior is to color- code DataVariables — which helps in the visual exploration of model expressions. If you don't like the colors, the previous DataVariableLabels -> Automatic setting will tone things down.
- Modified SelectModels to support a SelectionStrategy -> Automatic option setting (which is now the default behavior). With this setting, ParetoFrontSelect will be used unless a QualityBox -> All (which is the default) is specified wherein AllModels will be used. The previous default of ParetoFrontSelect meant that successive applications would result in successiving trimming of the supplied model set.
- Implemented five new functions targeted at variable combination analysis: VariableCombinations, DriverVariableCombinations,VariableCombinationMap, VariableCombinationChart, VariableCombinationTable. These will be especially useful when coupled inputs enable multiple input combinations to produce quality models.
- Implemented four new functions targeted at data subset extraction: NumericDataRecords, NumericDataRecordIndices, NonNumericDataRecords and NonNumericDataRecordIndices. These are especially nice if models are supplied along with the data since only variables used in the models will be considered as to whether the data records are viewed as numeric or non- numeric.
Modified the data outlier analysis suite (DataStrangeness, DataOutlierAnalysis, DataOutliers, DataOutlierIndices and DataOutlierTable) to accomodate data with non-numeric elements as well as models derived during the exploratory analysis form of SymbolicRegression where the targeted response is embedded as a column within the supplied data set.
- Implemented support for DriverVariableCombinations to be specified as AllowedVariables or RequiredVariables for SelectModels (and functions like NicheModels, VariablePresenceMap, etc. which build upon SelectModels).
- Implemented support for DriverVariableCombinations to be specified as ModelingVariables for SymbolicRegression, ClassicGP, ParetoGP or KeijzerExpansion. This will be useful when using modeling results to focus exploration for subsequent modeling runs.
Fixed a bug in DriverVariables wherein options were not properly handled if a SignificanceLevel threshold was not explicitly supplied. (If the threshold was explicitly supplied - which would be the normal usage - the function worked properly.)
- Modified ParetoFrontPlot and ParetoFrontLogPlot to allow a setting of ToolTipFunction -> None which will supress all tooltips. Previously, tooltips would be maintained for ParetoFront models even if the ToolTipLimit was set to None.
Fixed a bug in LabelForm wherein a negative real-valued coefficient of expressions or subexpressions would appear to be subtracted from the expression.
- Fixed a bug in SymbolicRegression wherein if the modeling ran up against the TimeConstraint (which would be the default behavior), more models would be returned than proper due to merging the intermediate results in place at that point. The new behavior is to isolate the final PopulationSize of models closest to the ParetoFront of the developmental criteria (ModelingObjective + SecondaryModelingObjective) and then select from this set using the RunResultsSelectionStrategy considering only the ModelingObjective criteria. This will trim the number of models returned from SymbolicRegression by a factor of two or three relative to the previous behavior.
- The nuisance bug where Mathematica would report that it couldn't parse an input variable (despite having successfully done so thousands of times before) was not tracked down so no bug report has been submitted to WRI. However, we have modified the processing flow so that, hopefully, the opportunity for this misbehavior has been minimized or eliminated.
- Modified RetrieveModelSets so that UniqueFitnessModels is automatically applied if MergeModelSets is enabled (which is the default). This helps to minimize duplicates which will especially occur for the low-complexity models.
Extended UncorrelatedModels to handle missing or nonnumeric data elements as well as models where the TargetColumn is embeded within the supplied data set.
- Modified DriverVariables so that mixed lists of GPModel and ModelEnsemble may be supplied.
- Fixed a bug in MakeDataNumeric where numeric rational entries were not handled properly.
- Fixed an intermittant bug in AlignModelExpression in which the alignment (for a small fraction of models) was inverted if there was missing elements in the supplied input-output data. This would ripple into AlignModel and, thence, into the output from SymbolicRegression.
- Fixed a bug in AlignModel in which it was being overly restrictive in the supplied data structure.
- Generalized MergeInputResponseData to handle more arbitrary data structures. Previously, it was assumed the data being merged were either atoms, vectors or matrices.
- Modified SelectModels to use symbols supplied for the AllowedVariables, RequiredVariables or ExcludedVariables directly. Previously, CreateDataVariableNames would have been applied.