How many times you have found a graph you want to use in your work, but it is too big, too small, info is unreadable, other language is needed or colors are just impossible to fit into your current master piece. This happened to me last week when trying to adapt a graph for a project where the index for measuring the SDG (Sustainable Developement Goals) is shown accross the european countries. See how I pimped it up.
This graph (Fig. 1) is very good looking when observed on its own web, but it is just too big for my purpose and it includes too much information which I do not need. Apart from the unnecessary ranking column. The information about regions is ok, but it requires a lot to grasp a conclusion.
If you do not quietly read the whole list of countries you barely notice the European Union has an overall score for all countries.
The whole mix of colours might be difficult to interpret, indeed if you are color blind. Note a remarkable percentage of people are color blind (e.g. 4.5% of Great britain population is color blind.
The bar for each country includes the ratio for each SDG (Sustainable Development Goal). This is great for an intereactive chart as this is (when visualized on its web). But in a static context this multi-color bar adds a lot of "noise".
Our first approach (Fig. 2) includes the following tweaks.
No ranking column is displayed. The ordered list just talks by itself.
Aditional info about single goals is removed, so only one single color for each country.
European Union and Spain are highligthed because our study is focused on Spain performance.
Our second approach (Fig. 3) tries to show a more compact chart by changing the orientation and ad additional info about the region of each country.
The region of each country is color encoded now. How fast can you note now which region is ahead from others in terms of sustainability?
Blind-color-friendly colors are used for encoding the region, we help those percentage to perceive the color-encoding info.
The names of the countries have been replaced by abbreviation and flag.
If the dataset used to create the chart is available and you want to show further/lesser info in order to show some idea you can add/remove data to/from the chart.
In the following plot (Fig. 4) we have added the GDP value per capita for each country.
The SDG index remains on the y axis (left axis). Note the scale does not start at 0 (bad for me), but this is done on purpose so as to allow for good vizualizaion of flags. Scale is adjusted between max and min value. It seems there is more difference bweteen countries, when actually it is not. This way, dots spread away from each other so it improves visibility.
Exploratory data analysis is an important part of Data Science. Throught this analysis the data are watched from different perspectives and conclusions can be taken if there exists correlation between variables or not. Threfore, next step into the data for using other techniques such as Machine Learning or Artificial Intelligence (AI) can be taken in one or another direction.
The following chart (Fig. 5) on the rigth shows how different parameters (3 in this case) are related to each other for all 28 countries.
The parameters are population, GDP and SDG index.
We can see how the correlation is pretty low for any combination of these parameters.
In case of the GDP vs. SDG index we see the correlation is only 0.383.
As seen in previous charts, Ireland and Luxemburg have a GDP much higher than the rest.
Let´s see what happens if these "rich" countries are excluded from the correlation chart. In this case (Fig. 6) we show only 26 countries (the "poorest").
In this case the correlation between GDP and SDG index score is higher (0.782).
We can also see population does not influence in SDG index at all.
Therefore, the conclusion we can obtain after the analysis is the following: the higher GDP, the higher SDG score a country obtains, which makes sense because the more money you have, the more policies towards sustainability you can implement. Then, what happens to countries such as Ireland and Luxemburg? When did they start pushing torwards sustainable development. Did they even really start pushing it? That is a good question to ask their politicians.
It is out of this study the analysis of the reasons for that. This is just an exmaple of how ideas, conclucions or arguments can be created based on actual data.
Actual data is always the strongest argument. It is just a matter of how arguments are displayed.
Graphics are developed by jrlab 2020. They are based on third-party packages, which are referenced below:
Data extracted from:
SDSN & IEEP. 2019. The 2019 Europe Sustainable Development Report. Sustainable Development Solutions Network and Institute for European Environmental Policy: Paris and Brussels.
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.