Release news and events

DataModeler Release 13.0

Friday, May 15, 2009

The main thrust has been documentation. However, some bug fixes and other changes have also been made in this release of 15 may 2009:

  • Modified ParetoFrontPlot and ParetoFrontLogPlot so that any EnsemblePersonality settings will be used as options.
  • Changed the default SimplificationFunction for ModelSelectionReport and ModelSelectionTable to be None. It turns out that Simplify (which was the previous default) will convert integers into reals AND insert a multiplier of 1.0 for variables if it detects a real-valued number in the expression. Since models will generally feature the scaling & translation factors as a result of being automatically aligned as a last processing step in SymbolicRegression, this produces quite a bit of visual clutter which would not be present if we simply looked at the ModelExpression.
  • Modified ModelSelectionReport and ModeleSelectionTable so that the EnsemblePersonality will be supplied as options.
  • Modified EvaluateModel EvaluateEnsemble, EvaluateEnsemblePrediction, EvaluateEnsembleDivergence, EnsembleResidual and ModelResidual so that if a vector of evaluation points (rather than a matrix) is supplied and there is only a single DataVariables symbol defined, it will recognize that the supplied vector is a list of evaluation points rather than a single evaluation point (which is the behavior, otherwise).
  • Fixed a bug in EnsembleResidualPlot so that it now can handle a vector of evaluation points. Also, made the default PlotRange -> All rather than Automatic.
  • Modified EnsembleResidualPlot so that VariablesToPlot may be set to None. If this is done (and IncludeResponse -> True) then only the residual of the observed behavior will be plotted.
  • Fixed a sin-of-omission in CreateFittedEnsemble wherein supplied options were not being automatically inserted into the returned EnsemblePersonality. The behavior is now consistent with that of CreateModelEnsemble.
  • Removed support for the KickoutFunction option for EvaluateModel, EvaluateGenome, etc. since the default behavior of $NumericCompileKickoutFunction returning Indeterminate is probably fundamental to a successful SymbolicRegression because of the built-in support for Indeterminate. $NumericCompileKickoutFunction is unprotected so it can be changed and the output of NumericCompile whenever machine precision is left will be adjusted accordingly.
  • Modified DivergenceSurfacePlot so that it recognizes GPModel input as well as ModelEnsemble and produces a nice looking graphic saying that the divergence surface of an individual model is meaningless.
  • Fixed a bug in ModelSelectionReport wherein supplied formatting options (e.g., FontSize -> Large) were not getting transferred to the header row of the report table. Unfortunately, ModelSelectionTable will need more extensive modification to allow for entry formatting.
  • Fixed a bug in BalanceData and BalanceDataIndices where the some of the most important data records would (occassionally) not be returned.

DataModeler Release 12.0

Friday, February 27, 2009

Again, we continue with fleshing out and migrating the documentation in this release of 27 February 2009. Additionaly, some other changes have been made:

  • Included negation, \[DoubleStruckCapitalM], as well as inversion, \[DoubleStruckCapitalI]\[DoubleStruckCapitalV], as default FunctionPatterns and make the appropriate modifications to the PatternMapping and InversePatternMapping options for SymbolicRegression. The InversePatternMapping was also modified to promote the incluson of \[DoubleStruckCapitalD], \[DoubleStruckCapitalS]\[DoubleStruckCapitalQ] and \[DoubleStruckCapitalP]2 (divide, square root and square) in the inverse mapping to minimize creation of \[DoubleStruckCapitalP] (power) structures when aligning models or if ActiveGenomeSimplification is enabled during SymbolicRegression.
  • Implemented a RemoveModelScaling function which strips away scaling and translation factors from models. The returned models will maintain the original ModelPersonality; however, the ModelFitness will be reset to Indeterminate.
  • Modified AlignModelExpression (and, by extension, AlignModel) so that any integers in the ModelGenome will be maintained across the alignment process. Previously, they would be converted to reals. Although this would not affect model evaluation, it did affect the formatting in ModelSelectionReport and the ilk. Note that the ModelGenome will be modified in the alignment processing due to the need to convert into an expression (PatternMapping) and back (InversePatternMapping).
  • Modified DataOutlierAnalysis, DataOutliers, DataOutlierIndices, DataOutlierTable and DataStrangeness to treat an ensemble as an individual model. Previously, it would break apart the ensemble into its constituent models and look for outliers using them as the reference; however, this behavior was not consistent with the philosophical equivalence between a simple model and an ensemble. HOWEVER, it may still be desirable since such gives a different view of the modeling difficulty. Also, the StrangenessMetric default has been changed to (Mean@#1&). This is better than the previous Total since the result will be independent of the number of supplied models.
  • Fixed a bug in ModelTreePlot wherein graphics options set in the ModelPersonality were not being transferred to the plotting function. Now, for example, embedded PlotLabels will be used. This helps when a ModelTreePlot is a ToolTipFunction in other graphics.
  • Fixed a bug in DivergenceSurfacePlot where a duplicate PlotLabel would be generated if a univariate (single variable) divergence surface was being plotted.
  • Fixed a bug in CreateFittedEnsemble wherein supplying options would cause the function to not be evaluated.
  • Modified EnsemblePersonality so that it returns an empty list when supplied with a model. This facilitates models and ensembles being treated as more interchangeable.
  • Modified ResponseSurfacePlot and DivergenceSurfacePlot to use any EnsemblePersonality information in the graphics generation.
  • Supressed warning messages generated by RandomModels when pathological models are created. It is fairly easy for such random expressions to have divide-by-zero errors and the ilk so suppressing warning display seems reasonable.
  • Modified the ParetoGP and ClassicGP (aka, SymbolicRegression) algorithms to avoid various warning messages which were randomly cropping up. This will avoid confusion on the part of the user.
  • Fixed a bug in SymbolicRegression (and ParetoGP and ClassicGP) wherein it was possible to create and return models which exceeded the specified NumberOfVariables (ModelDimensionality).
  • Fixed a bug in RandomModels and RandomGenomes where supplying only a string or a symbol would result in that symbol being used as a genomic building block rather than serving as the base symbol for automatically synthesized variables. If only a single variable is truly desired, it should be supplied within a list.
  • Fixed a bug in DivergenceSurfacePlot wherein the plot would not be shown if only a single VariablesToPlot was specified. This can now be specified as either a specific symbol or as a list with a single element.

DataModeler Release 11.0

Tuesday, December 23, 2008

Mathematica 7 has been released so the decision was made to make DataModeler require that version with an eye towards long-term maintenance. Hence, we will require Mathematica 7 starting with this release. Of course, lots of other changes and documentation additions are associated with this release of 23 December 2008.

  • Known Problems worked out. There are a number of enhancements and in the pipeline before commercial release; however, there are a few annoying "features" which warrant mention:
    1. Accessing DataModeler symbol help: Unfortunately, Mathematica 7 introduced a new behavior that selecting a symbol like ParetoFrontLogPlot and executing the "Find Selected Function" command now pops up search results for that symbol rather than taking you directly to the symbol reference page. This means that you now have to take your mouse and click on the appropriate entry in the search results to access the desired information. Making it inconvenient to get to the desired information seems like a strange interface design; however, to get this changed will require users and developers of 3rd party packages complaining to WRI.

    2. Error messages during SymbolicRegression: During SymbolicRegression, intermittant warnings that the First element of an empty list is not retrievable get generated. Although unsightly, this does not affect the modeling process or model quality. Wrapping SymbolicRegression in Quiet will suppress this message if it is too irritating. This will be fixed at the root-cause level before release.
    3. Links to functions listed on the DataModeler guide page fail. Hopefully, a fix will be coming in the next release of WolframWorkbench; otherwise links to all 280 DataModeler symbols will have to be entered by hand on the guide page.
  • DataModeler now requires Mathematica 7.
  • The GUI package installer released with the last release turned out to be Apple OS X specific. Hopefully, this installer will be truly cross-platform compatible.
  • Worked on the BalanceData and BalanceDataIndices algorithms. The functionality has not changed; however, the code should now execute between two and four times faster. Unfortunately, the scaling on this implementation is still not quite as good as desired so some more work will need to be done.
  • Added reference pages for for all of the DataModeler symbols. Hopefully, we now have all symbols reachable by the help system even if the documentation is not complete.
  • Renamed the RefreshRate option for SymbolicRegression to NewModelRate. This is a more descriptive name as well as allows the option to be found in the Mathematica help system. 

  • Modified SetInputModelVariables to handle an empty list; this should mitigate spurious warning messages which can crop up during SymbolicRegression.
Fixed a bug in RobustModels which was introduced via too vigorous code prettification. (Sometimes, those commas are important.)

  • Worked around a Mathematica bug where an error would be generated if LabelForm was nested and a FontColor of Orange was supplied to the inner LabelForm. It turned out that Red, Green, Blue, Magenta, etc would behave properly but any color where RGB[r, g, b] was had an r, g, or b that wasn't 0 or 1 would spawn the error and fail. 

  • Modified the RunsPerCascade default for SymbolicRegression and the $ClassicGPQuick, $ClassicGPExplore and $ClassicGPIntensive pre-defined option sets to be of the form {num, 1, 1} where num is larger than one. This has the effect of the running num independent cascades (each nominally 10 generations) in parallel with the best results from these merged and used as the foundation for the subsequent model search. This kick-starts the search process to minimize the risk of a premature lock-in on a solution structure. 

  • Modified the SymbolicRegression InversePatternMapping rules to handle any Rational constructs that might be present in a Mathematica expression and convert them into the equivalent genetic code expression. Also modified the IntronRemovalRules to prefer Π2 (square) rather than Π (power) representations.

  • Fixed a bug in AgeModel wherein the ModelFitness would not be appropriately modified if ModelAge was included in the ModelingObjective.

  • Fixed a bug in ModelPredictionPlot and ModelResidualPlot where it would fail if presented with an input vector rather than an input matrix.

  • Modified the default PlotStyle for UnivariatePlot and SmallPlot to make the lines Thick and the PointSize Large.

  • Set the NewModelRate default to be zero for SymbolicRegression and all pre-defined option sets — with the exception of $ClassicGPIntensive which uses ModelAge as a SecondaryModelingObjective.

  • Modified ModelTreePlot so that GraphicsArrayColumns may be set to None. This will return a list of the corresponding ModelTreePlot of a supplied model list rather than forcing a Grid structure which is the default if a number or Automatic is supplied for GraphicsArrayColumns and a list of models is supplied.

  • Modified the default SelectionFunction option for RescaleData since the previous default had problems with severely unbalanced quantized data. We probably still need to modify this function to do some intelligent switching of behavior.

DataModeler Release 10.0

Tuesday, November 25, 2008

Lots of work on cleaning up the documentation in release 10.0 of 25 November 2008. Also, a number of refinements before we go live.

  • Changed where the DataModeler license key is stored. Now if the package is installed in either the $AddOnsDirectory or $UserAddOnsDirectory, the key will be stored in the associated Licensing directory separate from the actual package. This means that installing a new version of DataModeler will no longer overwrite any installed passcode á¾° which removes that annoyance. If DataModeler is NOT in one of these approved locations (which is important for the package documentation to be found in Mathematica's help system), then the passcode will be embedded within the package structure. Separate passcode files will be stored for different $LicenseID; this will simplify the maintenance for those — such as Mathematica beta testers — who use multiple licenses on a given machine.
  • Implemented a GUI to install DataModeler into the correct locatons. This is important since the approved $AddOnsDirectory and $UserAddOnsDirectory are not easily accessible \[LongDash] especially on Windows computers. The installer is actually a Mathematica notebook which allows the user to choose where to install the package and, if the destinaton already exists, requires approval of the user to overwrite the previous install. This, in conjunction with separating the license key from the package structure should greatly improve the ease-of-installation.
  • Fixed a bug where SymbolicRegression would fail if a user supplied DataVariables and also enabled RobustModels during the evolutionary search. Also fixed similar problems with the RobustModels and EvaluateModelFitness functions. Also updated the documentation for EvaluateModelFitness and UpdateModelFitness to reflect that RobustModels and DataVariableRange were valid options.
  • Fixed a bug in LabelForm where mathematical expressions were not properly displayed. Now such structures are automatically displayed using TraditionalForm. Also implemented a related LabelString function which returns a string structure (as opposed to being wrapped in Style as is the case with LabelForm) which facilitates combining formatted strings. A side-effect of these changes is that LabelForm may now be nested to programmatically produce formatted labels.
  • Extended GridTable to handle atomic input as well as lists whose elements are a mixture of lists and non-lists. Previously, if GridTable received a structure it did not recognize, it would generate an error message and return an empty list. It should now return a GridTable with any input.
  • Modified ModelExpression so that DataVariables ending in numbers will have those numbers displayed as subscripts in the resulting expressions.
  • Modified RandomGenomes (and, by extension, RandomModels) so that a supplied option of the form DataVariables -> {1, "x"} will automatically synthesize MaxNumberOfAutoSymbols variables using "x" as a foundation. RandomModels will now also have supplied DataVariables inserted into the ModelPersonality.

DataModeler Release 9.0 (5 November 2008)

Saturday, November 8, 2008

The big change is a baseline integrated documentation. Thanks to Tom Wickham-Jones as well as Jay & Andre, we finally have a baseline level of integrated Mathematica documentation (it has only been 1.5 years since the last integrated version — for v5.2!)