Scatter plots

Scatter plots#

Documentation:
https://plotly.com/python/line-and-scatter/

Let’s start with the basic scatter plot from the earlier section of the tutorial. Here is what our data looks like:

import plotly.express as px
gap_data = px.data.gapminder()
just_2002 = gap_data.query('year == 2002')
just_2002.tail()
country continent year lifeExp pop gdpPercap iso_alpha iso_num
1654 Vietnam Asia 2002 73.017 80908147 1764.456677 VNM 704
1666 West Bank and Gaza Asia 2002 72.370 3389578 4515.487575 PSE 275
1678 Yemen, Rep. Asia 2002 60.308 18701257 2234.820827 YEM 887
1690 Zambia Africa 2002 39.193 10595811 1071.613938 ZMB 894
1702 Zimbabwe Africa 2002 39.989 11926563 672.038623 ZWE 716

And here is the scatter plot:

scatter_2002 = px.scatter(
	data_frame=just_2002,
	x='gdpPercap',
	y='lifeExp',
)
scatter_2002

If we want to know about a particular data point, we can hover over it with the mouse. But we can also include additional formatting to make the information more visible. For example, instead of showing all data points in the same color, we can color the points by continent and give them different symbols as well.

Here are the continents in our data:

just_2002['continent'].unique()
array(['Asia', 'Europe', 'Africa', 'Americas', 'Oceania'], dtype=object)

In order to color the points by continent, we need to specify the color option in the scatter function. This option takes the name of a column in the data frame, and uses the values in that column to determine the color of each point.

scatter_2002 = px.scatter(
  data_frame=just_2002,
  x='gdpPercap',
  y='lifeExp',
  color='continent',
  symbol='continent',
)
scatter_2002

Let’s also simplify and improve the styling by doing the following:

  • use the simple_white template

  • add a title

  • change the axis and hover labels

  • add thin grey horizontal grid lines

  • remove the legend title

  • use the name of the country as the title in the hover box

  • add a dollar sign prefix to the GDP per capita axis

scatter_2002 = px.scatter(
  data_frame=just_2002,
  x='gdpPercap',
  y='lifeExp',
  color='continent',
  symbol='continent',
  template='simple_white',
  title='Relation between life expectancy & GDP per capita',
  hover_name='country',
  labels={
    'gdpPercap': 'GDP per capita (USD)',
    'lifeExp': 'Life expectancy (years)',
    'continent': 'Continent',
    'country': 'Country',
  },
)
scatter_2002.update_yaxes(
  showgrid=True,
	# set the width in pixels for the gridlines
	gridwidth=1,
	# set the grid color
	gridcolor='lightgrey',
)
scatter_2002.update_xaxes(
  tickprefix='$',
)
scatter_2002.update_layout(
  legend_title=None,
)
scatter_2002

Let’s do one last thing. Let’s scale the size of the points by the population of each country.

How does this affect the readability of the chart?

scatter_2002 = px.scatter(
  data_frame=just_2002,
  x='gdpPercap',
  y='lifeExp',
  color='continent',
  symbol='continent',
  template='simple_white',
  title='Relation between life expectancy & GDP per capita',
  hover_name='country',
  labels={
    'gdpPercap': 'GDP per capita (USD)',
    'lifeExp': 'Life expectancy (years)',
    'continent': 'Continent',
    'country': 'Country',
  },
  size='pop',
)
scatter_2002.update_yaxes(
  showgrid=True,
	# set the width in pixels for the gridlines
	gridwidth=1,
	# set the grid color
	gridcolor='lightgrey',
)
scatter_2002.update_xaxes(
  tickprefix='$',
)
scatter_2002.update_layout(
  legend_title=None,
)
scatter_2002