Release news and events

DataModeler Release 21.0

Tuesday, September 28, 2010

Release of 28 September 2010 features the following changes to DataModeler:

  • Modified SymbolicRegression to handle missing or non-numeric elements in the supplied input-response data.
  • Introduced a new DataDistributionPlot function which facilitates examination of data sets. This builds upon BoxWhiskerPlot but is a mich more intelligent implementation for real-world assessment of multivariate data.
  • Modified EvaluateModel and EvaluateEnsemble so that if non-numerics were in the evaluation data record, the model would still evaluate if those variables were not used in the model. Previously, any non-numeric entry would result in and Indeterminate result. A side-effect of this is that evaluation can be significantly faster for low-dimensional models which are derived from modeling with large numbers of possible input variables.
  • Modified NoisePower and ScaleInvariantNoisePower to allow the use of fractional norms. Fractional norms can reduce the influence of data outliers.
  • Added support for Max, Min and, Clip as modeling building blocks. These can be easily included by including the string "Bounds" in the BuildFunctionPatterns parameters and supplying that result to the FunctionPatterns option for SymbolicRegression.
  • Removed Sigmoid and RBF from the "PowerMath" definition for BuildFunctionPatterns and moved them into the new "Bounds" predefined set.
  • Modified SymbolicRegression so that supplied options are embedded in the ModelPersonality of returned models. This will be useful when, for example, custom FunctionPatterns are used during the modeling and, as a result, these definitions would be automatically transferred to future model evaluations. This capability could also be used to embed project info into the developed models (e.g., supplying "Project" -> "FormulationDesign" to the SymbolicRegression which would then be available for reference).
  • Modified the ObjectiveOrder option behavior (used by ParetoFrontPlot, ParetoFrontLogPlot, ModelSelectionTable and ModelSelectionReport to allow integer values to specify the objectives to be displayed. This will be especially useful when looking at results from, for example, CascadeMonitor during a SymbolicRegression since the default behavior is to use a SecondaryModelingObjective during model development which is suppressed as an explicit objective prior to returning the final results.
  • Modified RandomModel and RandomGenome to use the ModelingVariables option with that taking precedence over the DataVariables if there is a conflict.

DataModeler Release 20.0

Tuesday, July 20, 2010

The priority of this release of 20 July 2010 is still the documentation completion. More changes and enhancements are also made:

  • Heavily modified ResponseSurfacePlot and ResponsePlot. The new ShowDataVariableReference option will allow the reference point to be graphically denoted which is useful if more than one or two variables are in the model to show the value being used as a reference in the other plots. A ShowEnsembleDivergence option was also introduced which comes into play if a ModelEnsemble is being plotted to show the envelope of the EnsembleDivergenceFunction around the predicted response. Displaying the prediction confidence helps to highlight one unique advantage of ModelEnsemble. Of course a number of additional options were also associated with the functions to facilitate adjusting the graphics appearance. (SecondaryPlotStyle, Filling, and FillingStyle).
  • Fixed a bug in BoundedModelResponseQ (which would ripple into RobustModels) wherein specifying a RangeExpansion of the form {minScaleFactor, maxScaleFactor} did not properly thread over all of the DataVariableRange.
  • Fixed a bug in VariablePresenceMap wherein the EnsemblePersonality was not being used in the graphics generation.
  • Fixed a bug in ModelPredictionPlot wherein the EnsemblePersonality wasn't being used in the generated graphics.
  • Modified ConsolidateRules so that rules within rules (e.g., Filling -> {2 -> {3}}) are not promoted in the returned rule set.
  • Hopefully, worked around a random bug in Mathematica where it would not be able to parse the exact same expression that it had done thousands of times before during SymbolicRegression.
  • Made ImportDataMatrix a little smarter in that if a complete filepath was supplied that was valid the target file would be returned — even if the Directory option setting was such that a relative path should be pursued. In any event, a meaningful message will be displayed if the retrieval fails.
  • Modified CreateLinearModel so that any options supplied will be included in the ModelPersonality of the developed GPModel.
  • Fixed a bug in ReplaceModelPersonality wherein if an empty list was provided, it did not recognize that the ModelPersonality should be reset to an empty list.
  • Modified SummaryStatistics so that it can handle non-numeric data. If the supplied data is not strictly numeric, then it will be automatically removed from the data columns and a warning message displayed.
  • Fixed a sin-of-omission bug in MedianAverage where it was not holding its contents unevaluated — which affected plotting in ResponseSurfacePlot. Now the function explicitly looks for numerics, Indeterminate or Missing values and handles the case of a vector of those being supplied.
  • Added an RBF (aka, Gaussian or Radial Basis Function) function and implemented support for it in SymbolicRegression. It is now natively supported by BuildFunctionPatterns, etc.
  • Changed the Background option setting for ParetoFrontPlot, ParetoFrontLogPlot, ModelPredictionPlot, EnsemblePredictionPlot, ModelResidualPlot, ResponsePlot, ResponseSurfacePlot and DivergenceSurfacePlot from White to None since it appears that the previous Mathematica bug which motivated this setting has been corrected.

DataModeler Release 19.0

Tuesday, April 27, 2010

Documentation completion continues to be the priority theme of this release of 27 April 2010; however, a number of changes and enhancements are created in that process:

  • More tweaking of SymbolicRegression options. The new option defaults will run for 50,000 generations unless interrupted by a TimeConstraint and feature continuous innovation over that span.
  • Modified NicheModels by renaming the Split option to be NicheBy and also introduced a new option, NicheSortBy which defines a criteria by which to sort the models in the returned niches.
  • Tweaked the performance of SelectModels. Unless variable constraints are being imposed, this should result in a substantial speedup.
  • Added a new ProgagationOperator, NichedCrossover, which may be used during SymbolicRegression. This operator partitions and organizes the supplied model set according to the the NicheBy and NicheSortBy options for NicheModels and then applies Crossover to the niched model sets.
  • Modified the $ConventionalGP option set for SymbolicRegression to include ModelComplexity as a ModelingObjective. Because a single-objective SelectionStrategy is used, this does not affect the model search; however, it is convenient for comparison with multi-objective strategies.
  • Renamed the SignificanceLevel option for UncorrelatedModels, UncorrelatedVariables and CorrelationMatrixPlot to be the clearer CorrelationThreshold.
  • Renamed the DataSegments option for UncorrelatedVariables back to DataSubsetSize since it was erroneously changed in a previous option renaming exercise. (UncorrelatedModels was never modified.)
  • Added a CreateDataVariableNames function which will convert supplied strings into forms that may be safely used for the SymbolicRegression DataVariables option. If a list is provided, all of the returned symbol strings are guaranteed to be unique.
  • Added a new option, EnsemblePlotStyle, for EnsemblePredictionPlot and EnsembleResidualPlot to specify how the ensemble predictions should be displayed. Previously, the predictions could get visually lost if there were lots of evaluation points to be displayed.
  • Fixed a bug in UncorrelatedVariables wherein if a matrix of constant columns was supplied, errors would be spawned. The new behavior is to return an empty list.
  • Modified UncorrelatedModels so that if a perfect model (zero error residual) is supplied, it will automatically be included in the returned model set. This is a bit of an ad hoc behavior snce the correlation of a constant is undefined. However, since UncorrelatedModels is used by the default EnsembleStrategy -> Automatic behavior of CreateModelEnsemble, the previous behavior of deleting perfection seemed inappropriate. Since perfection, typically, only appears for toy problems, this should not change the behavior for most real-world modeling.
  • Modified MedianAverage to accommodate Indeterminate points in the supplied vector. If the number of Indeterminate values are not too large (i.e. with a risk of intruding into the calculated result), a numeric value will be returned. Otherwise, Indeterminate will still be returned. This should be useful in ModelEnsemble visualization.
  • Modified BoundedModelResponseQ to accommodate the RangeExpansion option along with the DataVariableRange. This makes it consistent with RobustModels in its behavior.
  • Fixed a bug in VariablePresence wherein the "PresencePercent" option setting for PresenceMetric was not handled properly if a single stand-alone model or ensemble was provided.
  • Fixed a bug in VariablePresenceTable wherein only generic string DataVariableLabels would be recognized. Now formatted (e.g. the result from LabelForm) lists may be supplied.

DataModeler Release 18.0

Tuesday, March 16, 2010

Documentation completion continues to be the priority theme in this release of 16 March 2010. Additional changes and bug fixes are the following:

  • Fixed a bug in Crossover wherein if it was supplied with a list of a single GPModel it would return two models. Now it will return a list of a single model.
  • Modified Crossover so that the ModelAge is based upon parent that donated the root node rather than the maximum of the two parents. This seemed more reasonable give that the root determines the fundamental structure of the resulting model.
  • Added three new functions: FibonacciSpread, FibonacciSequence and InverseFibonacci. These facilitate generating non-uniform indices which are more heaviliy represented for smaller numbers. Such can be useful for generating indices for lag matrices for time series data analysis as well as generating the ModelAgeBracket boundaries.
  • Added a ModelAgeBracket function which classifies models according to the specified ModelAgeBracketBoundaries. The default boundaries uses a FibonacciSpread result (e.g., {0,2,7,30,121,493,2000}) which is useful as a SecondaryModelingObjective for SymbolicRegression to promote continual innovation.
  • Modifed the behavior of the SecondaryModelingObjective so that a symbol can be supplied as an option setting. Now, for example, ModelDimensionality will automatically be converted into a functional form, ModelDimensionality[##]&. Of course, None will continue to suppress the use of a secondary objective. ModelAge, ModelAgeBracket, ModelDimensionality and ModelNonlinearity and were also modified to accept (and ignore) the spurious model and observed response vectors which would be supplied during SymbolicRegression.
  • Modified CreateModelFromExpression so that any supplied options will be automatically embedded in the ModelPersonality of the returned GPModel(s).
  • Modified UpdateModelPersonality and ReplaceModelPersonality so that multiple options may be supplied to a model or a model set rather than forcing the personality aspects to be enclosed in a list.
  • Changed the default SymbolicRegression options so that now a ClassicGP EvolutionStrategy is used with a SecondaryModelingObjective of ModelAgeBracket. The default ModelAgeBracketBoundaries are FibonacciSpread[2000,7].
  • Fixed a subtle bug in ModelExpression wherein it would take three orders-of-magnitude longer than it should have. Of course, the resulting timing impact rippled into all sorts of other functions.
  • Fixed a bug in OptimizeModel & OptimizeModelExpression wherein a small fraction of model forms would fail if OptimizeIntegers -> True.
  • Modified SubSample and SmallPlot to use use DataSegments and DataSegmentFunction options rather than the previous DataSubsetSize and DataSubsetSelectionFunction, respectively since the convention for their use was ad odds with that used by SymbolicRegression (which still uses the old option names). The revised option names more clearly represent the option functionality.

DataModeler Release 17.0

Wednesday, March 3, 2010

Documentation completion continues to be the priority theme in this release of 3 March 2010. Additionally, a number of changes and enhancements are created in the process:

  • Implemented support for templating. Towards this end, a TemplateTopLevel option for SymbolicRegression was implemented which facilitates forcing a desired output form — e.g., a conditional, exponential, etc. — in the generated models. The Crossover, MutateSubtree and DepthPreservingSubtreeMutation PropagationOperators were modified to support the preservation of any embedded templates. However, only the top-level pattern is viewed as sacred.
  • Implemented a ResponsePlot function which is similar to ResponseSurfacePlot except that variables are plotted individually as 2D rather than as all 3D pairwise combinations. This is useful to get a quick overview of the response behavior when models or ensembles feature many input variables. As with ResponseSurfacePlot, the settings for the model variables which are not being plotted can greatly affect both the scale and response behavior. To address this, a CommonPlotRange option was introduced which will place all of the synthesized graphics on the same vertical scale.
  • Deleted the ResponseSurfaceParameters option for ResponseSurfacePlot, DivergenceSurfacePlot (and ResponsePlot) with each now using the DataVariableRange and (newly introduced) DataVariableReference option. Valid settings for the DataVariableReference (which specifies the setting for all DataVariables not being modified in a given graphic) are: a specified point, Automatic (which uses the midpoint of the DataVariableRange), Random (which generates a random point in parameter space), ModelMaximum or ModelMinimum. The latter two settings will search for the appropriate extramal response points and use those.
  • Added a CreateLinearModel function which creates a GPModel using the supplied or synthesized BasisSet. This is useful for creating reference conventional models for comparison to SymbolicRegression results.
  • Modified RandomGenomes and RandomModels to speed up model synthesis as well as increase the diversity of models synthesized. Five new options were implemented (TemplateTopLevel, BalancedTemplates, TemplateFunctionCount, TemplateDepth and SynthesisDepth) with AllowAtomicGenomes deleted. MinimumTreeDepth and MaximumTreeDepth now only apply to ExtractGenomeSubtrees.
  • Introduced BuildFunctionPatterns which uses FunctionPatternSynthesisRules (default associated with SymbolicRegression) to generate appropriate input for the FunctionPatterns option for SymbolicRegression. Several pattern sets ("BasicMath", "ExtendedMath", PowerMath" etc.) have been pre-defined which can easily be mixed and extended to tailor the building blocks to the appliction characteristics. This is actually a really slick implementation since it allows the user to easily tweak the functional building blocks used in the model development.
  • Fixed a sin-of-omission so now RandomModels and RandomGenomes can handle all valid forms for the PopulationSize option. If a list of integers is supplied, the first number will be used as the targeted size.
  • Removed the Unique option for RandomModels since it was obsolete.
  • Modified the default FunctionPatterns so that summation and multiplication in RandomModels will have at least two arguments (and up to a MaximumArity of 5). Previously, it was easier to create models which had introns (non functional genetics) due to only having a single argument with summation and multiplication.
  • Modifed RemoveModelScaling so that any ModelingObjectiveNames in the ModelPersonality are removed along with the ModelFitness being reset.
  • Fixed a bug in introduced in Release 16.0 in RandomModels wherein the supplied variables were not properly weighted for selection during model synthesis. This would have been an issue for modeling systems with large numbers of input variables.
  • Uncovered a bug in SymbolicRegression wherein the ModelingVariables were all treated as having equal weights for RandomModel synthesis independent of any individual or class weighting.
  • Fixed a bug in MutateSubtree and DepthPreservingSubtreeMutation wherein the ModelFitness in the modified models was not being reset to Indeterminate.
  • Fixed a bug in CreateFittedEnsemble wherein SelectModels option defaults associated with CreateFittedEnsemble were not being passed through properly.
  • Fixed a bug in AlignModelExpression wherein option settings embedded in the ModelPersonality were not be used. This sin-of-omission rippled into other function; however, it did not affect the SymbolicRegression (where the model alignment typically occurs).
  • Modified the ParetoGP EvolutionStrategy so that both the archive and the final population are presented to the ResultsSelectionStrategy. This is important if a SecondaryModelingObjective has been used since moving to only considering the ModelingObjective can mean that some of the long tail models (e.g., overly complex low-dimensional models if a ModelDimensionality was used as the secondary objective) would not be of user interest.
  • Changed the default ResultsSelectionStrategy to return the 50% developed models closest to the ParetoFront from the final population (and archive). This shouldreturn the entire archive used by ParetoGP along with some other models.
  • Changed the default DataSubsetSelectionFunction to be RandomSample rather than RandomKSubset since the two are equivalent and RandomSample is about three times faster.
  • Renamed the NumberOfCascades option for SymbolicRegression to be CascadesPerEvolution. This makes its name explicit as well as as consistent with the related GenerationsPerRun, RunsPerCascade and IndependentEvolutions options.
  • Fixed a bug in MergeInputResponseData wherein if an atomic structure was supplied which did not pass an AtomQ test (e.g., \[Pi]/2), the supplied components would not be properly merged.
  • Fixed a bug in AbsoluteCorrelation wherein symbolic input would return Indeterminate even though those symbols (e.g., \[Pi]) would evaluate to being a real value. The revision also results in the implementation being even faster than using the standard Correlation function than it was before.
  • Implemented support for TerminalSet -> None in SymbolicRegression, RandomModels and RandomGenomes. This facilitates modeling when only the variables are to be used modeling.
  • Modified PolynomialBasisSet to allow PolynomialOrder, IncludeCrossTerms and IncludeConstantBasis to be supplied as options. Added the new symbols into the package documentation.
  • Modified ModelVariables (and VariablePresence when PresenceMetric -> Variables) to return the variables in the same order as produced by ModelInputVariables. This ripples into a number of other functions; however, the benefit is that model variables will be presented in the "natural order" defined by the input.
  • Implemented a Sigmoid function of the form x/(1+Abs@x). The definition of the Sigmoid is subject to change (e.g., to x/(1+x^2) or the classic (1-E^-x)/(1+E^-x)); however, this seems like a reasonable choice for a less discontinuous version of the UnitStep function