Dealing with Data Deluge
.. Lots of variables. Little time. Lots of pressure... -- What variables really matter? What does it mean? Are there outliers? What to do with correlated inputs? How much do I know about my problem? What exactly don't I know about the problem? How to change it? Can I trust my conclusions? —These questions are raised in almost any data-driven industrial project.
Solving industrial projects by making sense of the data and turning data into value is our speciality.
Our technology will be interesting for
- everyone who ever stared at a data spreadsheet;
- everyone seeking an efficient, robust, and effective empirical modeling workflow;
- everyone searching for a reliable variable selection methodology to reduce the dimensionality of the design space when correlated variables are present;
- everyone hunting for outliers in the data, because they may be precious nuggets of information;
- everyone working on a novel product, for which fundamental models are not available;
- everyone wishing for more thinking time and wishing to let the data science come up with model structures for regression problems;
- everyone who accepts that the world is non-linear and looks for non-linear relationships in the data;
- everyone willing to accelerate fundamental research by insights drawn from empirical observations;
- everyone who needs to know where to sample their next experiments.
The fundamental issue is that most modeling approaches make a sequence of perfection assumptions to make the variable selection and modeling process tractable. Unfortunately, the real-world generally doesn't know, for example, that the variables have to be uncorrelated and independent or that the appropriate response behavior is a second-order polynomial with no cross-terms. Our technology relaxes those assumptions letting the data speak for itself to automatically select the appropriate inputs and develop concise and insightful models. The developed models have trust metrics to identify when the predictions shouldn't be trusted due entering new operating regions or fundamental changes in the targeted system. The model trustability can be also used in an active design-of-experiments mode to collect data to drive uncertainty out of the developed models.