Decision Science is an iterative process that constantly evaluates the problem being solved, the data being used, and the models or outputs being produced. This constant refinement helps ensure the outcome of the decision science process is accurate and relevant to the organization. Each step uses various tools and techniques that draw from a range of practice areas, including Human-Centered Design, data manipulation, data visualization, data modeling/machine learning, and software engineering.
Normally, each step takes a significant amount of time and resources, and most steps involve many non-decision scientists who give input from business, leadership, IT, and data perspectives. Projects may take weeks or months before yielding end results – but intermediate results still provide value along the way.
This image shows the iterative progression of the five steps in the Decision Science process, along with an underlying effort of Tool Building/Dev Ops. Each iteration through the steps should build on the knowledge gained and tools built in previous iterations, ensuring that progress is always being made. Below the image is a brief description of each step and questions that should be answered, or techniques that should be used, during that step.
Problem Discovery
Understand the question, its context, and its value.
A decision science project must begin by investigating the question being asked to determine the validity of moving forward on the project.
- Is it the correct question?
- Who is asking the question?
- What is the asker really trying to learn or understand?
- What is the asker’s role in the project/business?
- What value will the answer to this question provide to the project/business?
- Does the necessary data exists and is available? (quick evaluation)
Data
Identification, Collection, Evaluation. Includes metadata and audit trail.
A decision science project must have sufficient data, in quantity and quality, related to the question in order to produce meaningful results.
- Does data related to the question exist and how much is there?
- Who produces/owns that data and will they share it?
- Where/how is it store and can it be accessed?
- What is the integrity, fidelity, and recency of the data?
- Will more data be collected as the project continues and does it need to be incorporated in real-time?
- Identify missing values, and drop or impute data.
- Standardize values to consistent formats.
- Restructure and merge data into common data structures.
Exploratory Data Analysis (EDA)
Visualization and exploration to identify patterns and relationships.
A decision science project must identify and understand patterns and relationships in the data in order to exploit them for modeling and deriving insights.
- Visualize the data (scatterplots, histograms).
- Apply dimensionality reduction techniques.
- Apply transformations.
- Look for correlations (various kinds).
- Investigate subpopulations.
Model Development
Iteratively train and evaluate models.
A decision science project may require training and evaluating dozens of models with hundreds of combinations of features and hyperparameters.
- Split data into training and test sets.
- Train models on training set.
- Try different types of models.
- Try different combinations of features or generate new features through feature engineering.
- Try different combinations of hyperparameters.
Communication
Communicate results to facilitate data-driven decisions.
Communicating to stakeholders:
- Reports/presentations for individual decisions.
- Dashboards for frequent decisions.
Communicating to services:
- Decision Science as a Service models for programmatic decisions.
- Expose APIs to apps to deliver results to end users.
Summary
It is important to keep in mind that not all decision science projects produce the kind of final result envisioned at the beginning of the process. Sometimes, as the iterations progress, the question evolves and leads to a different kind of result. Other times, after multiple iterations, it becomes evident that the question directing the process cannot be answered due to lack of data, lack of relationships in the data, lack of time, or other reasons; and the project must end. Still, all is not lost – during the process knowledge is gained and tools are developed that you can leverage to answer other questions.