Pushing the Envelope: Advanced Strategies in Data Modeling (WTC 2010)

The 2010 Wolfram Technology Talk of Mark Kotanchek is available as a Mathematica presentation and as a video presentation.

Abstract: Real-world data tends to violate many of the fundamental assumptions of most data modeling techniques; e.g., that the supplied inputs are abundant, independent and uncorrelated with known error distributions and all of the supplied inputs are pertinent to the modeling task at hand. Although the multiobjective symbolic regression of DataModeler can somewhat seamlesslly develop meaningful and useful models from such ill-behaved data, there still remain systems and data sets which require human insight and guidance in model development.

In this talk we look at how to address problems such as multimodal systems (e.g., human taste evaluators prefer different types of flavors) or fundamental response limits (due to physical or economic constraints - such as buying negative insurance). We will also talk about dealing with data-related aspects such as incomplete data records, unbalanced data (i.e., very non-designed data) or when many variables are intrinsically drivers of the target system behavior.