Scatter plots#
Documentation:
https://plotly.com/python/line-and-scatter/
Let’s start with the basic scatter plot from the earlier section of the tutorial. Here is what our data looks like:
import plotly.express as px
gap_data = px.data.gapminder()
just_2002 = gap_data.query('year == 2002')
just_2002.tail()
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
1654 | Vietnam | Asia | 2002 | 73.017 | 80908147 | 1764.456677 | VNM | 704 |
1666 | West Bank and Gaza | Asia | 2002 | 72.370 | 3389578 | 4515.487575 | PSE | 275 |
1678 | Yemen, Rep. | Asia | 2002 | 60.308 | 18701257 | 2234.820827 | YEM | 887 |
1690 | Zambia | Africa | 2002 | 39.193 | 10595811 | 1071.613938 | ZMB | 894 |
1702 | Zimbabwe | Africa | 2002 | 39.989 | 11926563 | 672.038623 | ZWE | 716 |
And here is the scatter plot:
scatter_2002 = px.scatter(
data_frame=just_2002,
x='gdpPercap',
y='lifeExp',
)
scatter_2002
If we want to know about a particular data point, we can hover over it with the mouse. But we can also include additional formatting to make the information more visible. For example, instead of showing all data points in the same color, we can color the points by continent and give them different symbols as well.
Here are the continents in our data:
just_2002['continent'].unique()
array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)
In order to color the points by continent, we need to specify the color
option in the scatter
function.
This option takes the name of a column in the data frame, and uses the values in that column to determine the color of each point.
scatter_2002 = px.scatter(
data_frame=just_2002,
x='gdpPercap',
y='lifeExp',
color='continent',
symbol='continent',
)
scatter_2002
Let’s also simplify and improve the styling by doing the following:
use the
simple_white
templateadd a title
change the axis and hover labels
add thin grey horizontal grid lines
remove the legend title
use the name of the country as the title in the hover box
add a dollar sign prefix to the GDP per capita axis
scatter_2002 = px.scatter(
data_frame=just_2002,
x='gdpPercap',
y='lifeExp',
color='continent',
symbol='continent',
template='simple_white',
title='Relation between life expectancy & GDP per capita',
hover_name='country',
labels={
'gdpPercap': 'GDP per capita (USD)',
'lifeExp': 'Life expectancy (years)',
'continent': 'Continent',
'country': 'Country',
},
)
scatter_2002.update_yaxes(
showgrid=True,
# set the width in pixels for the gridlines
gridwidth=1,
# set the grid color
gridcolor='lightgrey',
)
scatter_2002.update_xaxes(
tickprefix='$',
)
scatter_2002.update_layout(
legend_title=None,
)
scatter_2002
Let’s do one last thing. Let’s scale the size of the points by the population of each country.
How does this affect the readability of the chart?
scatter_2002 = px.scatter(
data_frame=just_2002,
x='gdpPercap',
y='lifeExp',
color='continent',
symbol='continent',
template='simple_white',
title='Relation between life expectancy & GDP per capita',
hover_name='country',
labels={
'gdpPercap': 'GDP per capita (USD)',
'lifeExp': 'Life expectancy (years)',
'continent': 'Continent',
'country': 'Country',
},
size='pop',
)
scatter_2002.update_yaxes(
showgrid=True,
# set the width in pixels for the gridlines
gridwidth=1,
# set the grid color
gridcolor='lightgrey',
)
scatter_2002.update_xaxes(
tickprefix='$',
)
scatter_2002.update_layout(
legend_title=None,
)
scatter_2002