News archive for December 2008

DataModeler Release 11.0

Tuesday, December 23, 2008

Mathematica 7 has been released so the decision was made to make DataModeler require that version with an eye towards long-term maintenance. Hence, we will require Mathematica 7 starting with this release. Of course, lots of other changes and documentation additions are associated with this release of 23 December 2008.

  • Known Problems worked out. There are a number of enhancements and in the pipeline before commercial release; however, there are a few annoying "features" which warrant mention:
    1. Accessing DataModeler symbol help: Unfortunately, Mathematica 7 introduced a new behavior that selecting a symbol like ParetoFrontLogPlot and executing the "Find Selected Function" command now pops up search results for that symbol rather than taking you directly to the symbol reference page. This means that you now have to take your mouse and click on the appropriate entry in the search results to access the desired information. Making it inconvenient to get to the desired information seems like a strange interface design; however, to get this changed will require users and developers of 3rd party packages complaining to WRI.

    2. Error messages during SymbolicRegression: During SymbolicRegression, intermittant warnings that the First element of an empty list is not retrievable get generated. Although unsightly, this does not affect the modeling process or model quality. Wrapping SymbolicRegression in Quiet will suppress this message if it is too irritating. This will be fixed at the root-cause level before release.
    3. Links to functions listed on the DataModeler guide page fail. Hopefully, a fix will be coming in the next release of WolframWorkbench; otherwise links to all 280 DataModeler symbols will have to be entered by hand on the guide page.
  • DataModeler now requires Mathematica 7.
  • The GUI package installer released with the last release turned out to be Apple OS X specific. Hopefully, this installer will be truly cross-platform compatible.
  • Worked on the BalanceData and BalanceDataIndices algorithms. The functionality has not changed; however, the code should now execute between two and four times faster. Unfortunately, the scaling on this implementation is still not quite as good as desired so some more work will need to be done.
  • Added reference pages for for all of the DataModeler symbols. Hopefully, we now have all symbols reachable by the help system even if the documentation is not complete.
  • Renamed the RefreshRate option for SymbolicRegression to NewModelRate. This is a more descriptive name as well as allows the option to be found in the Mathematica help system. 

  • Modified SetInputModelVariables to handle an empty list; this should mitigate spurious warning messages which can crop up during SymbolicRegression.
Fixed a bug in RobustModels which was introduced via too vigorous code prettification. (Sometimes, those commas are important.)

  • Worked around a Mathematica bug where an error would be generated if LabelForm was nested and a FontColor of Orange was supplied to the inner LabelForm. It turned out that Red, Green, Blue, Magenta, etc would behave properly but any color where RGB[r, g, b] was had an r, g, or b that wasn't 0 or 1 would spawn the error and fail. 

  • Modified the RunsPerCascade default for SymbolicRegression and the $ClassicGPQuick, $ClassicGPExplore and $ClassicGPIntensive pre-defined option sets to be of the form {num, 1, 1} where num is larger than one. This has the effect of the running num independent cascades (each nominally 10 generations) in parallel with the best results from these merged and used as the foundation for the subsequent model search. This kick-starts the search process to minimize the risk of a premature lock-in on a solution structure. 

  • Modified the SymbolicRegression InversePatternMapping rules to handle any Rational constructs that might be present in a Mathematica expression and convert them into the equivalent genetic code expression. Also modified the IntronRemovalRules to prefer Π2 (square) rather than Π (power) representations.

  • Fixed a bug in AgeModel wherein the ModelFitness would not be appropriately modified if ModelAge was included in the ModelingObjective.

  • Fixed a bug in ModelPredictionPlot and ModelResidualPlot where it would fail if presented with an input vector rather than an input matrix.

  • Modified the default PlotStyle for UnivariatePlot and SmallPlot to make the lines Thick and the PointSize Large.

  • Set the NewModelRate default to be zero for SymbolicRegression and all pre-defined option sets — with the exception of $ClassicGPIntensive which uses ModelAge as a SecondaryModelingObjective.

  • Modified ModelTreePlot so that GraphicsArrayColumns may be set to None. This will return a list of the corresponding ModelTreePlot of a supplied model list rather than forcing a Grid structure which is the default if a number or Automatic is supplied for GraphicsArrayColumns and a list of models is supplied.

  • Modified the default SelectionFunction option for RescaleData since the previous default had problems with severely unbalanced quantized data. We probably still need to modify this function to do some intelligent switching of behavior.