DataModeler Release 8.20 (March 2014)
We have a very nice update for our users!
This release features performance and option tuning to better exploit the OptimizeLinearModel capability introduced in the previous release as well as a number of enhancements to improve the results display and ease-of-use. The complete release notes are available at the end of this post. We also think we might have worked around the infamous Mathematica tooltip bug wherein tooltip content stop being properly displayed. Since DataModeler makes extensive use of tooltips to layer information content and the only known recovery for this WRI bug is to restart Mathematica, this hopefully resolves a major, albeit randomly occurring, annoyance.
The DataSummaryTable is a nice addition for quickly assessing a data set. Of course, tooltips are used to layer information and it provides a alternate view to that offered by the DataDistributionPlot. to provide a visual on data type, distribution and consistency. The foundational thinking underlying this perspective will enable some impressive capabilities in our upcoming releases.
Another useful new function is the ParetoFrontContextPlot and its sister, the ParetoFrontContextLogPlot. This is useful for exploring a developed model set.To illustrate, suppose we were modeling literacy fraction of the countries around the world and wanted to look at the models which were comprised of exactly three variables and one of those factors had to be femaleLifeExpectancy. (ParetoFrontPlot and ParetoFrontLogPlot now also support SelectModels options.)
The default location for archived SymbolicRegression models is a subfolder, DataModelerModelSets, co-located with the evaluating notebook. As a general rule, we want to run lots of IndependentEvolutions and exploit the evolutionary talent in generating and exploring hypothesized model structures which implies that we want to generate and archive (StoreModelSet set to True) many model sets. Storing them in a subfolder helps to avoid cluttering up the main directory which, typically, contains analysis notebooks as well as the foundation data sets.
We have also shifted the foundations of DataModeler’s GridTable which adds some new capabilities, if appropriate. Many DataModeler functions exploit GridTable so the changes ripple throughout the analysis system. One beneficiary is the ModelSelectionReport which now automatically wraps the ModelExpressions to fit the notebook document width.
As itemized below, there have been quite a few other enhancements since the last release a couple of months ago. We have some really slick stuff in the pipeline which should be released later in the spring.
The official release notes and changes for 8.20:
The big new function in this release is the DataSummaryTable which is quite nice for getting the zen of a data set. The main themes of this release are performance tuning (especially if OptimizeLinearModel is enabled during SymbolicRegression), ease-of-use (e.g., ParetoFrontPlot now accepts SelectModels options) and refining the display of results (for example, the modifications to GridTable ripple into the display of the ModelSelectionReport and the inclusion of a NumberFormatting option for ModelExpression as well as GridTable).
- Added a DataSummaryTable function explore the columns in a supplied data set. This is a nice complement to DataDistributionPlot to provide a very visual assessment of the columns in the supplied data set.
- Added ParetoFrontContextPlot and ParetoFrontContextLogPlot which facilitates the comparison of selected models within the context of the ModelQuality of other models.
- Modified the default ModelingObjective for SymbolicRegression to reward minimizing the number of variables used as well as model simplicity. Enabling OptimizeLinearModel for SymbolicRegression did not impose enough of a penalty on the inclusion of additional variables in models so this has been addressed.
- Although DataModeler makes extensive use of tooltips to layer information, tooltips in Mathematica are pretty fragile. One manifestation is that Mathematica would spontaneously decide that only a limited region of a tooltip should be displayed — which greatly decreases the functionality of the tooltips. Previously, the only known recovery plan was to restart Mathematica — which lacks a certain elegance. However, we now suspect that introducing a TooltipDelay improves the robustness. Hence, delays of between a twentieth and quarter of a second have been introduced in data and model review functions of DataModeler.
- Tuned the SymbolicRegression algorithm to improve the performance when OptimizeLinearModels is enabled. It now executes about half the number of generations per unit time as when it is disabled.
- Modified ParetoTourneySelect so that if a fractional ParetoTournamentSize is provided, it will map into a minimum of two contestants. Previously, a single-competitor tournament would have been possible which would have been equivalent to a RandomSelect strategy.
- Modified AgeModel, RearrangeModelQuality and UpdateModelPersonality to make them (much) more efficient. The functionality remains the same.
- Modified ParetoFrontPlot and ParetoFrontLogPlot to avoid plotting overlays of the ParetoFront points. Also reduced the ToolTipLimit to 750 so that if more than this number of models are supplied tooltips will only be shown for those models on the ParetoFront.
- Modified SymbolicRegression (as well as StoreModelSet, RetrieveModelSets and RetrieveModelSetFilenames) to archive models into the DataModelerModelSets directory within the EvaluationNotebookDirectory folder. If necessary, this directory will be created. For backwards compatibility, the EvaluationNotebookDirectory will also be searched for archived models for retrieval.
- Modified ModelExpression to support a NumberFormatting option. This allows more compact model representations with more clarity of the developed model forms. This change ripples down to the myriad functions which use ModelExpression.
- Migrated GridTable to be based upon Grid rather than the lower-level GridBox foundation. This changes some of the applicable options. The most important of these is ItemSize which allows a width for a column to be specified with the cell content automatically adusted. A NumberFormatting option was also added to the mix for convenience of formatting top-level real values and ItemStyle is now the mechanism to control the element formatting.
- Modified the default option settings for a variety of functions to support the new GridTable foundations.
- Microsoft Windows allows users to run files directly from within a zipped archive. Unfortunately, Mathematica is not aware of the file structure within these archives and, as a result, the DataModeler installer is unable to install the package. The InstallDataModeler.nb installer has been modified to detect this situation and warn the user that it cannot install the package.
- Fixed a bug in UnivariatePlot wherein the color was not being plotted properly if a vector rather than a matrix was being plotted.
- Modified ModelPredictionComparisonPlot to format and automatically append a color key to the PlotLabel if a string is supplied. The color will be automatically matched to that specified for the predicted, observed and outlier data points.
- Modified ConsolidateRules to only return Rule or DelayedRule elements at the top- level.
- Modified ParetoFrontPlot and ParetoFrontLogPlot to accept SelectModels options. Although the default for SelectModels is to take the 50% of models closest to the ParetoFront whenever a QualityBox is specified, the default behavior for these functions is to show AllModels within the QualityBox.
- Modified SymbolicRegression to automatically exclude constant data columns from the modeling since they do not provide any information content.
- Fixed a bug in CorrelationChart wherein if two integer columns were being compared, the correlation would be expressed as a full-precision fraction rather than as a real- valued numeric.
- Fixed a bug in LabelForm wherein tooltips were not being handled properly.
- Modified UpdateModelQuality and EvaluateModelQuality to also accept input-output data as a list rather than separate entries. This matches the input form used by UpdateModelQualityVsMultipleDataSets and EvaluateModelQualityVsMultipleDataSets.
- For at least 7 years (first reported by Evolved Analytics in 2007), Mathematica's ListPlot and ListLogPlot been unable to plot two points which have tooltips. We have trapped this situation for ParetoFrontPlot and ParetoFrontLogPlot and implemented a workaround.