Release news and events

Booth at the Mathematica Virtual Conference 2011

Saturday, September 10, 2011

Come to visit our virtual booth at the FREE Wolfram Mathematica Virtual Conference 2011 on September 26 and September 27 2011! 3500 people registered already.

Technology Platform Provider for Life Sciences

Friday, September 2, 2011

Evolved Analytics Europe BVBA has been accredited as a member of the Flanders Bio organization and classified as a company providing technological platform for Life Sciences and Biotechnology sectors in Flanders!

Our mission is to provide technology and to enable painless conversion of data into actionable results. Our systems, implementing newest robust non-linear modeling technology in a user-friendly and comprehensible way, are targeted at accelerating basic research, product development, and policy development in life sciences.

One-Touch Data Modeling at WTC 2011

Thursday, August 25, 2011

We will give a talk at the 2011 Wolfram Technology Conference (September 19-21, Urbana Champaign, U.S.A.). This year we will present our 'next big thing', so controversial that it is rightfully entitled: "On Magic and Cognitive Dissonance: One-Touch Data Modeling."

After a decade of development and industrial application, we are closing in our goal of one-touch conversion of real-world data into meaningful and insightful driver variable identification and models. Whether one has a Ph.D. in Computer Science or Statistics or not, it is now possible to easily develop robust nonlinear models, identify data outliers and interactively explore the model dynamics and response sensitivities.

Let us know what you think!

DataModeler release 26.0 (15 August)

Tuesday, August 16, 2011

The release notes (found in the preface) are below. From a visual perspective, the big change is changing the default setting of DataVariableLabels to ColorizeList rather than Automatic. The effect is to color-code each of the model variables with a unique and consistent color. This seems to help in visualizing expressions; however, if you do not like the colors, the Automatic setting will use the embedded DataVariables set in black. Please give feedback as to whether you like or dislike this new default.

The other big change is a suite of functions for variable combination analysis (VariableCombinations, VariableCombinationTable, VariableCombinationMap, VariableCombinationChart and DriverVariableCombinations). The help files have some good examples — this one is huge in terms of ease-of-use. Speaking of help files, the quick start is worth a gander.

We also bulked out the support for non-numeric data. DataModeler should be able to "just work" even if you throw ugly data with missing elements and non-numeric columns and that ilk at it.

The more intelligent SelectionStrategy behavior default of Automatic is also pretty nice. The default assumption here is that if you specify a QualityBox, you want the 50% of models closest to the ParetoFront and, otherwise, you want to maintain the entire set of supplied models.

As always, thank you for your questions, bug reports and suggestions.

Release Notes:

We have now completed the documentation through the quick start tutorial along with all of the function ref pages. Along the way, new capabilities were implemented, implementations refined and some bugs squashed. The new default behavior for SelectModels is a big improvement in terms of ease-of-use and the suite of functions around VariableCombinations has quickly been incorporated into our best practices. The help pages have not yet been refreshed to show the effect of the new variable color-coding (with the exception of the quick start tutorial).

  • Implemented a ColorizeList function which allows applying position-specific colors to the elements of a list. Additional formatting appropriate for Style can also be applied using this function.
  • ColorizeList is now the default setting for the SymbolicRegression DataVariableLabels option. The new behavior is to color- code DataVariables — which helps in the visual exploration of model expressions. If you don't like the colors, the previous DataVariableLabels -> Automatic setting will tone things down.
  • Modified SelectModels to support a SelectionStrategy -> Automatic option setting (which is now the default behavior). With this setting, ParetoFrontSelect will be used unless a QualityBox -> All (which is the default) is specified wherein AllModels will be used. The previous default of ParetoFrontSelect meant that successive applications would result in successiving trimming of the supplied model set.
  • Implemented five new functions targeted at variable combination analysis: VariableCombinations, DriverVariableCombinations,VariableCombinationMap, VariableCombinationChart, VariableCombinationTable. These will be especially useful when coupled inputs enable multiple input combinations to produce quality models.
  • Implemented four new functions targeted at data subset extraction: NumericDataRecords, NumericDataRecordIndices, NonNumericDataRecords and NonNumericDataRecordIndices. These are especially nice if models are supplied along with the data since only variables used in the models will be considered as to whether the data records are viewed as numeric or non- numeric.
    Modified the data outlier analysis suite (DataStrangeness, DataOutlierAnalysis, DataOutliers, DataOutlierIndices and DataOutlierTable) to accomodate data with non-numeric elements as well as models derived during the exploratory analysis form of SymbolicRegression where the targeted response is embedded as a column within the supplied data set.
  • Implemented support for DriverVariableCombinations to be specified as AllowedVariables or RequiredVariables for SelectModels (and functions like NicheModels, VariablePresenceMap, etc. which build upon SelectModels).
  • Implemented support for DriverVariableCombinations to be specified as ModelingVariables for SymbolicRegression, ClassicGP, ParetoGP or KeijzerExpansion. This will be useful when using modeling results to focus exploration for subsequent modeling runs.
    Fixed a bug in DriverVariables wherein options were not properly handled if a SignificanceLevel threshold was not explicitly supplied. (If the threshold was explicitly supplied - which would be the normal usage - the function worked properly.)
  • Modified ParetoFrontPlot and ParetoFrontLogPlot to allow a setting of ToolTipFunction -> None which will supress all tooltips. Previously, tooltips would be maintained for ParetoFront models even if the ToolTipLimit was set to None.
    Fixed a bug in LabelForm wherein a negative real-valued coefficient of expressions or subexpressions would appear to be subtracted from the expression.
  • Fixed a bug in SymbolicRegression wherein if the modeling ran up against the TimeConstraint (which would be the default behavior), more models would be returned than proper due to merging the intermediate results in place at that point. The new behavior is to isolate the final PopulationSize of models closest to the ParetoFront of the developmental criteria (ModelingObjective + SecondaryModelingObjective) and then select from this set using the RunResultsSelectionStrategy considering only the ModelingObjective criteria. This will trim the number of models returned from SymbolicRegression by a factor of two or three relative to the previous behavior.
  • The nuisance bug where Mathematica would report that it couldn't parse an input variable (despite having successfully done so thousands of times before) was not tracked down so no bug report has been submitted to WRI. However, we have modified the processing flow so that, hopefully, the opportunity for this misbehavior has been minimized or eliminated.
  • Modified RetrieveModelSets so that UniqueFitnessModels is automatically applied if MergeModelSets is enabled (which is the default). This helps to minimize duplicates which will especially occur for the low-complexity models.
    Extended UncorrelatedModels to handle missing or nonnumeric data elements as well as models where the TargetColumn is embeded within the supplied data set.
  • Modified DriverVariables so that mixed lists of GPModel and ModelEnsemble may be supplied.
  • Fixed a bug in MakeDataNumeric where numeric rational entries were not handled properly.
  • Fixed an intermittant bug in AlignModelExpression in which the alignment (for a small fraction of models) was inverted if there was missing elements in the supplied input-output data. This would ripple into AlignModel and, thence, into the output from SymbolicRegression.
  • Fixed a bug in AlignModel in which it was being overly restrictive in the supplied data structure.
  • Generalized MergeInputResponseData to handle more arbitrary data structures. Previously, it was assumed the data being merged were either atoms, vectors or matrices.
  • Modified SelectModels to use symbols supplied for the AllowedVariables, RequiredVariables or ExcludedVariables directly. Previously, CreateDataVariableNames would have been applied.

DataModeler Release 24.0 (28 April 2011)

Thursday, April 28, 2011

At this point we should be feature-complete for the official release. The function documentation has been checked and brought up-to-date; the last step will be a sweep through the tutorials & case studies.

There are not too much in the way of changes with this release; however, we do have a major change in that the BoxRegion option used to select models has been renamed QualityBox and the BoxRegion toggle used to constrain the extremal search in, for example, ModelExtrema, has been renamed BoxBoundary.

The other highlight is we also get a couple of new functions, DriverVariables and RearrangemodelQuality which should prove to be convenient for developing ModelEnsemble as well as monitoring the evolutionary search progress.

These and the other changes are in the release notes extract below:

  • (major change) Split the current BoxRegion option into two new option names: QualityBox and BoxBoundary. Although this is a fairly major change, this avoids future problems since, even though it is undocumented, BoxRegion is a Mathematica system variable and, therefore, out DataModeler's control. QualityBox is now used by SelectModels (and all functions which build upon it) to define the region of ModelQuality from which models should be selected. BoxBoundary is an option for ModelMaximum, ModelMinimum and ModelExtrema which defines whether the extremal search should be restricted to the DataVariableRange. Although this will require changing old analyses, the time to change is now so apologies for the inconvenience and the need to retrain the muscle memory.
  • Implemented a new function, DriverVariables, to make it easier to select interesting models for inclusion in ensembles or to specify ModelingVariables for secondary rounds of SymbolicRegression. Although the equivalent could be accomplished using appropriate option settings for VariablePresence, the new form is much cleaner and more straightforward because of its restricted scope.
  • Added a new function, RearrangeModelQuality, which will restructure the ModelQuality as well as make the appropriate adjustments in the ModelPersonality for the ModelingObjective and ModelingObjectiveNames. This function is useful when looking at monitoring results from a SymbolicRegression since it can suppress the SecondaryModelingObjective (typically, ModelAge) which are used during the modeling evolution.
  • Made RobustCorrelationMatrix a little more general.
  • Deleted the $ClassicGPExplore, $ClassicGPIntensive, $ClassicGPQuick, $ParetoGPExplore, $ParetoGPIntensive, $ParetoGPQuick, $OrdinalGPExplore, $OrdinalGPIntensive and $OrdinalGPQuick pre-defined option sets for SymbolicRegression since the need for explicitly specifying the number of targeted generations has been obviated by the current default option settings and the continual innovation offered. For most regression problems, users now really only need to specify the TimeConstraint and whether StoreModelSet should be activated (and, possibly, tweak the FunctionPatterns to adjust to model search to the problem domain). However, the $ConventionalGP option set has been maintained in case the user would like to slow things down by a couple of orders-of-magnitude.
  • Extended UpdateModelQuality, EvaluateModelQuality, UpdateModelQualityVsMultipleDataSets and EvaluateModelQualityVsMultipleDataSets to accomodate data matrices where the targeted response is embedded within the supplied data set.
  • Modified KeijzerExpansion to accept data matrices with embedded target response. Also, now supplied options will be embedded in the developed models.
  • Modified CreateModelEnsemble to only require UniqueModels rather than UniqueAndFitModels if building a ModelEnsemble from only a supplied model set (i.e., without a data input-output set to guide the ensemble selection).