A scatter diagram is one of seven core tools in project management. It is used to plan and monitor operations to improve quality-related issues in an organization. Scatter diagrams are graphical statistical tools. They are simple to use and help in improving business processes.
While typical charts and graphs use lines or bars to represent data, scatter diagrams use dots. At first glance, this may be confusing, but scatter diagrams are easy to understand if you take the time.
What is a scatter diagram?
A scatter diagram is a graph that shows the association between two variables for a collection of numerical data. It depicts the link between a process component on one axis and the quality fault on the other to help with process optimization.
A scatter diagram demonstrates the relationship between a change in a dependent variable Y in response to a change in a corresponding independent variable X. How do we know which is the response and independent variable?
In general, the independent variable tries to explain or anticipate an observed outcome. The response variable measures the outcome. When creating the graph, the points will fall along a line or curve if the variables are correlated.
A scatter chart might be helpful when one variable is measurable, but the other is not. You can forecast the behavior of the dependent variable based on the independent variable after establishing how the variables are connected.
When would you use a scatter plot?
A scatter plot is an excellent tool for planning and measuring quality when:
- You have numerical data that is paired
- Each value of your independent variable might have several values for your dependent variable
- Objectively deciding whether a cause and effect are linked
- Assessing if two seemingly similar outcomes are caused by the same thing
- One variable can be measured, but the other can't
- Investigating hypotheses concerning cause-and-effect interactions
- Looking for the root of an issue that has been recognized
For instance, we may look at the pattern of plant height over time. You build the graph after selecting the two variables: the plant's height and age. Once you've finished drawing the scatter diagram, you may observe that as a plant grows older, it grows taller. This demonstrates a link between plant height and age.
The three types of scatter diagrams
The link between variables in scatter diagrams is indicated by the direction of the correlation on the graph. A correlation in a scatter diagram occurs when two variables are determined to have a connection.
You can use a regression line to predict how a change in one variable will likely affect the value of a dependent variable. You can find a correlation when a cause-and-effect link exists between both variables. Three types of correlations in scatter diagrams are:
Positive correlation
If variables have a positive correlation, this signifies that when the independent variable's value rises, the dependent variable's value rises as well. Consider this scatter diagram example:
As the weight of human adults increases, the risk of diabetes also increases. The pattern of observation in this example would slant from the chart's bottom left to the upper right.
Negative correlation
In the negative correlation, when the value of one variable grows, the value of the other variable falls. The dependent variable's value drops as the independent variable's value rises.
Here’s an example: When summer temperatures rise, sales of winter clothing declines. The pattern of observation in this example would slant from the top left to the bottom right of the graph.
No correlation
The "no correlation" type is used when there's no potential link between the variables. It's also known as zero correlation. The two variables plotted aren't connected in any way.
The area of land and air quality index, for example, have no relationship. As an area grows, there is no effect on the air quality. These two variables have no association, and the observations will be dispersed all over the graph.
A fundamental observation to remember when studying correlation is that a link between two variables does not guarantee causation. Stay mindful that correlation does not always indicate causation. A correlated relationship may happen for any of the following reasons:
- The causal relationship being reversed
- A third variable causing it
- Accidents and coincidences
The advantages and disadvantages of scatter diagrams
Advantages of scatter diagrams include:
- Patterns are easy to spot in scatter diagrams
- A scatter diagram is easy to plot with two variables
- Scatter diagrams are an effective way to demonstrate non-linear patterns
- Scatter diagrams make it possible to determine data flow range, such as the maximum and minimum values
- Plotting scatter diagrams helps with better project decisions
- Scatter diagrams help uncover the underlying root causes of issues
- They can objectively assess if a given cause and effect are connected
Disadvantages of scatter diagrams include:
- Reading scatter diagrams incorrectly may lead to false conclusions that one variable caused the other, when both may have been influenced by a third
- A relationship in a scatter diagram may not be apparent because the data does not cover a wide enough range
- Associations between more than two variables are not shown in scatter plots
- Scatter diagrams cannot provide the precise extent of association
- A scatter plot does not indicate the quantitative measure of the relationship between the two variables
Scatter diagram example
A scatter diagram can be applied to any data model with two variables and their respective numerical data. Let's look at a scatter diagram example.
We'll examine the number of workplace accidents happening at a factory. The two variables are the number of shift hours and the number of accidents. We will produce our scatter diagram based on the following data provided:
The independent or control variable on the horizontal axis are shift hours, while the dependent variable on the vertical axis is the number of accidents.
After drawing the scatter diagram, we see that the number of accidents increases as the number of shift hours increases. This illustrates a positive correlation between the two.
Scatter plots do not always have a controlling parameter. It is possible to have two independent variables. In such a situation, any axis can represent either independent variable.
Scatter diagrams in PMP
As part of the Project Management Professional (PMP) certification test preparation, it is crucial to understand scatter diagrams in PMP terms. To be certified, aspiring project managers need to learn to create and gain insights from scatter plot analysis.
The potential PMP candidate may be asked to evaluate data using scatter diagrams or choose the best quality control tool to employ in a given circumstance. Although there are no specific scatter diagram examples for the PMP exam, a general understanding of the tool is expected.
How to create a scatter diagram
Creating a scatter diagram can be broken into the following five steps:
- Identify variables: Identifying the independent and dependent variables is the first step in creating a scatter diagram. Find out which is the control variable affecting the dependent variable. Use variables that are quantitative and objective. For example, say you are observing the time taken by a vehicle to reach its destination at different speeds. The speed of the vehicle is the independent or control variable, while the distance is the dependent variable. Remember, it is also possible for both variables to be independent.
- Pull data: After you have determined both variables, gather data from the variables either by witnessing the process or using digital sources and tools, such as analytics, maintenance systems, automation, or mobile audit software.
- Build the scatter plot: Once you've gathered your data, use a spreadsheet or scatter plot tool to develop your scatter plot by connecting the dots representing each collection of numerical data. In our earlier example, we would see how many hours it takes to get to the destination at different speeds. The speeds are plotted on the x-axis, while the hours are plotted on the y-axis.
- Determine the type of correlation: After plotting the dots on the scatter diagram, analyze and determine the correlation between the two variables. The data trend could be upward, downward, or undefined. The correlation between the two variables may be positive or negative, or there may be no correlation. In our example, the correlation is negative as the number of hours declines with increased speed.
- Conduct a scatter plot analysis: We may come to several conclusions depending on how the scatter plot turns out. Confirm your conclusions by using scatter plots in conjunction with other root cause analysis methods. The key to fixing problems and building lasting remedies is to look at them from various perspectives.
Getting started with scatter diagrams
Scatter diagrams help discover correlations between variables and guide quality control in project management. They're a crucial part of the PMP certification exams and help project managers make better decisions.
Are you looking to improve your business processes and supercharge your project management? Use scatter diagrams to compare elements and confirm your conclusions. Get a two-week free trial of Wrike’s project management software to keep your ongoing projects organized.