News archive for March 2010

DataModeler Release 18.0

Tuesday, March 16, 2010

Documentation completion continues to be the priority theme in this release of 16 March 2010. Additional changes and bug fixes are the following:

  • Fixed a bug in Crossover wherein if it was supplied with a list of a single GPModel it would return two models. Now it will return a list of a single model.
  • Modified Crossover so that the ModelAge is based upon parent that donated the root node rather than the maximum of the two parents. This seemed more reasonable give that the root determines the fundamental structure of the resulting model.
  • Added three new functions: FibonacciSpread, FibonacciSequence and InverseFibonacci. These facilitate generating non-uniform indices which are more heaviliy represented for smaller numbers. Such can be useful for generating indices for lag matrices for time series data analysis as well as generating the ModelAgeBracket boundaries.
  • Added a ModelAgeBracket function which classifies models according to the specified ModelAgeBracketBoundaries. The default boundaries uses a FibonacciSpread result (e.g., {0,2,7,30,121,493,2000}) which is useful as a SecondaryModelingObjective for SymbolicRegression to promote continual innovation.
  • Modifed the behavior of the SecondaryModelingObjective so that a symbol can be supplied as an option setting. Now, for example, ModelDimensionality will automatically be converted into a functional form, ModelDimensionality[##]&. Of course, None will continue to suppress the use of a secondary objective. ModelAge, ModelAgeBracket, ModelDimensionality and ModelNonlinearity and were also modified to accept (and ignore) the spurious model and observed response vectors which would be supplied during SymbolicRegression.
  • Modified CreateModelFromExpression so that any supplied options will be automatically embedded in the ModelPersonality of the returned GPModel(s).
  • Modified UpdateModelPersonality and ReplaceModelPersonality so that multiple options may be supplied to a model or a model set rather than forcing the personality aspects to be enclosed in a list.
  • Changed the default SymbolicRegression options so that now a ClassicGP EvolutionStrategy is used with a SecondaryModelingObjective of ModelAgeBracket. The default ModelAgeBracketBoundaries are FibonacciSpread[2000,7].
  • Fixed a subtle bug in ModelExpression wherein it would take three orders-of-magnitude longer than it should have. Of course, the resulting timing impact rippled into all sorts of other functions.
  • Fixed a bug in OptimizeModel & OptimizeModelExpression wherein a small fraction of model forms would fail if OptimizeIntegers -> True.
  • Modified SubSample and SmallPlot to use use DataSegments and DataSegmentFunction options rather than the previous DataSubsetSize and DataSubsetSelectionFunction, respectively since the convention for their use was ad odds with that used by SymbolicRegression (which still uses the old option names). The revised option names more clearly represent the option functionality.

DataModeler Release 17.0

Wednesday, March 3, 2010

Documentation completion continues to be the priority theme in this release of 3 March 2010. Additionally, a number of changes and enhancements are created in the process:

  • Implemented support for templating. Towards this end, a TemplateTopLevel option for SymbolicRegression was implemented which facilitates forcing a desired output form — e.g., a conditional, exponential, etc. — in the generated models. The Crossover, MutateSubtree and DepthPreservingSubtreeMutation PropagationOperators were modified to support the preservation of any embedded templates. However, only the top-level pattern is viewed as sacred.
  • Implemented a ResponsePlot function which is similar to ResponseSurfacePlot except that variables are plotted individually as 2D rather than as all 3D pairwise combinations. This is useful to get a quick overview of the response behavior when models or ensembles feature many input variables. As with ResponseSurfacePlot, the settings for the model variables which are not being plotted can greatly affect both the scale and response behavior. To address this, a CommonPlotRange option was introduced which will place all of the synthesized graphics on the same vertical scale.
  • Deleted the ResponseSurfaceParameters option for ResponseSurfacePlot, DivergenceSurfacePlot (and ResponsePlot) with each now using the DataVariableRange and (newly introduced) DataVariableReference option. Valid settings for the DataVariableReference (which specifies the setting for all DataVariables not being modified in a given graphic) are: a specified point, Automatic (which uses the midpoint of the DataVariableRange), Random (which generates a random point in parameter space), ModelMaximum or ModelMinimum. The latter two settings will search for the appropriate extramal response points and use those.
  • Added a CreateLinearModel function which creates a GPModel using the supplied or synthesized BasisSet. This is useful for creating reference conventional models for comparison to SymbolicRegression results.
  • Modified RandomGenomes and RandomModels to speed up model synthesis as well as increase the diversity of models synthesized. Five new options were implemented (TemplateTopLevel, BalancedTemplates, TemplateFunctionCount, TemplateDepth and SynthesisDepth) with AllowAtomicGenomes deleted. MinimumTreeDepth and MaximumTreeDepth now only apply to ExtractGenomeSubtrees.
  • Introduced BuildFunctionPatterns which uses FunctionPatternSynthesisRules (default associated with SymbolicRegression) to generate appropriate input for the FunctionPatterns option for SymbolicRegression. Several pattern sets ("BasicMath", "ExtendedMath", PowerMath" etc.) have been pre-defined which can easily be mixed and extended to tailor the building blocks to the appliction characteristics. This is actually a really slick implementation since it allows the user to easily tweak the functional building blocks used in the model development.
  • Fixed a sin-of-omission so now RandomModels and RandomGenomes can handle all valid forms for the PopulationSize option. If a list of integers is supplied, the first number will be used as the targeted size.
  • Removed the Unique option for RandomModels since it was obsolete.
  • Modified the default FunctionPatterns so that summation and multiplication in RandomModels will have at least two arguments (and up to a MaximumArity of 5). Previously, it was easier to create models which had introns (non functional genetics) due to only having a single argument with summation and multiplication.
  • Modifed RemoveModelScaling so that any ModelingObjectiveNames in the ModelPersonality are removed along with the ModelFitness being reset.
  • Fixed a bug in introduced in Release 16.0 in RandomModels wherein the supplied variables were not properly weighted for selection during model synthesis. This would have been an issue for modeling systems with large numbers of input variables.
  • Uncovered a bug in SymbolicRegression wherein the ModelingVariables were all treated as having equal weights for RandomModel synthesis independent of any individual or class weighting.
  • Fixed a bug in MutateSubtree and DepthPreservingSubtreeMutation wherein the ModelFitness in the modified models was not being reset to Indeterminate.
  • Fixed a bug in CreateFittedEnsemble wherein SelectModels option defaults associated with CreateFittedEnsemble were not being passed through properly.
  • Fixed a bug in AlignModelExpression wherein option settings embedded in the ModelPersonality were not be used. This sin-of-omission rippled into other function; however, it did not affect the SymbolicRegression (where the model alignment typically occurs).
  • Modified the ParetoGP EvolutionStrategy so that both the archive and the final population are presented to the ResultsSelectionStrategy. This is important if a SecondaryModelingObjective has been used since moving to only considering the ModelingObjective can mean that some of the long tail models (e.g., overly complex low-dimensional models if a ModelDimensionality was used as the secondary objective) would not be of user interest.
  • Changed the default ResultsSelectionStrategy to return the 50% developed models closest to the ParetoFront from the final population (and archive). This shouldreturn the entire archive used by ParetoGP along with some other models.
  • Changed the default DataSubsetSelectionFunction to be RandomSample rather than RandomKSubset since the two are equivalent and RandomSample is about three times faster.
  • Renamed the NumberOfCascades option for SymbolicRegression to be CascadesPerEvolution. This makes its name explicit as well as as consistent with the related GenerationsPerRun, RunsPerCascade and IndependentEvolutions options.
  • Fixed a bug in MergeInputResponseData wherein if an atomic structure was supplied which did not pass an AtomQ test (e.g., \[Pi]/2), the supplied components would not be properly merged.
  • Fixed a bug in AbsoluteCorrelation wherein symbolic input would return Indeterminate even though those symbols (e.g., \[Pi]) would evaluate to being a real value. The revision also results in the implementation being even faster than using the standard Correlation function than it was before.
  • Implemented support for TerminalSet -> None in SymbolicRegression, RandomModels and RandomGenomes. This facilitates modeling when only the variables are to be used modeling.
  • Modified PolynomialBasisSet to allow PolynomialOrder, IncludeCrossTerms and IncludeConstantBasis to be supplied as options. Added the new symbols into the package documentation.
  • Modified ModelVariables (and VariablePresence when PresenceMetric -> Variables) to return the variables in the same order as produced by ModelInputVariables. This ripples into a number of other functions; however, the benefit is that model variables will be presented in the "natural order" defined by the input.
  • Implemented a Sigmoid function of the form x/(1+Abs@x). The definition of the Sigmoid is subject to change (e.g., to x/(1+x^2) or the classic (1-E^-x)/(1+E^-x)); however, this seems like a reasonable choice for a less discontinuous version of the UnitStep function