Industrial Strength Data Modeling

The bottom line for nonlinear modeling in industrial world is fairly simple:

  • the world is nonlinear (exploring the problem structure is important);
  • the world is full of noise (uncertainty is everywhere);
  • the world is complex (we almost never have all the information);
  • decisions are multi-objective (interaction between objectives makes the search difficult);
  • problems are multi-dimensional (getting to solutions is difficult and exploration is essential);
  • people time is expensive;
  • computing time is cheap;
  • life does not have to be hard;
  • nonlinear modeling has unique capabilities and success has been demonstrated in the real world.

As shown below, there are many aspects to a good model. However, the contention is that algorithms and computers have advanced such that it is appropriate to shift the modeling burden more towards the machines and, furthermore, that nonlinear modeling techniques have some unique advantages relative to their linear peers with respect to both the quality of the developed models, auxiliary insights and total-cost-of-ownership (model development + deployment + maintenance).

By accurate we, of course, mean that the model replicates the observed data behavior. Tightly coupled with this are the notions of robustness  — the ability to withstand perturbations and changes in the underlying system, and credibility  — the ability to produce rational predictions.

For example, if a model predicts that the best place to plant corn is four inches above the ground then the model will not be credible and the modeler will, at best, not be asked back for future work.

Credibility is assisted if the model is interpretable  — that is, the proper (from a human insight perspective) variable combinations are being used and the response and limit behaviors are reasonable. The ideal in this respect is a simple mathematical expression which can be reviewed and explored for insight.

Empirical models are generally quite good for interpolation between data points; however, extrapolation can be a major issue since when extrapolating there are no constraints on the model behavior and nonsensical results can quickly be generated. Thus, we would like our models to be extrapolative and gracefully degrade when moving outside of the development regime. Related to this is the need for self-assessment to that the user can be warned that the model output should be viewed with suspicion.

Finally, and perhaps most important, the model should be cost-effective. Total-cost-of-ownership needs to consider the economic benefit, the deployment and maintenance costs as well as costs associated with the actual model development.

Further important topics of real-world nonlinear modeling are: