Basic charts

Basic charts#

Documentation: https://plotly.com/python/basic-charts/

This section highlights some of the basic charts that can be created with Plotly: scatter plots, bar charts, line charts, and histograms. Each type of chart is demonstrated with a simple example here, and discussed in more detail in later sections.

import plotly.express as px

Scatter plot#

Use gapminder data, focusing on 2002.

gap_data = px.data.gapminder()

just_2002 = gap_data.query('year == 2002')

just_2002

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
10	Afghanistan	Asia	2002	42.129	25268405	726.734055	AFG	4
22	Albania	Europe	2002	75.651	3508512	4604.211737	ALB	8
34	Algeria	Africa	2002	70.994	31287142	5288.040382	DZA	12
46	Angola	Africa	2002	41.003	10866106	2773.287312	AGO	24
58	Argentina	Americas	2002	74.340	38331121	8797.640716	ARG	32
...	...	...	...	...	...	...	...	...
1654	Vietnam	Asia	2002	73.017	80908147	1764.456677	VNM	704
1666	West Bank and Gaza	Asia	2002	72.370	3389578	4515.487575	PSE	275
1678	Yemen, Rep.	Asia	2002	60.308	18701257	2234.820827	YEM	887
1690	Zambia	Africa	2002	39.193	10595811	1071.613938	ZMB	894
1702	Zimbabwe	Africa	2002	39.989	11926563	672.038623	ZWE	716

142 rows × 8 columns

Suppose we want to look at the relationship between GDP per capita and life expectancy in 2002, using a scatter plot. We just need to use the scatter function, and specify the data frame, and the columns to use for the x and y axes.

scatter_2002 = px.scatter(
	data_frame=just_2002,
	x='gdpPercap',
	y='lifeExp',
)
scatter_2002

Bar chart#

Using the same 2002 dataset, make a population bar chart for five countries:

United States
Canada
Italy
Sweden
Taiwan

Again, we just need to use the bar function, and specify the data frame, and the columns to use for the x and y axes.

country_list = [ 'United States', 'Canada', 'Italy', 'Sweden', 'Taiwan' ]

bar_data = just_2002.query('country in @country_list')

bar_data

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
250	Canada	Americas	2002	79.77	31902268	33328.96507	CAN	124
778	Italy	Europe	2002	80.24	57926999	27968.09817	ITA	380
1474	Sweden	Europe	2002	80.04	8954175	29341.63093	SWE	752
1510	Taiwan	Asia	2002	76.99	22454239	23235.42329	TWN	158
1618	United States	Americas	2002	77.31	287675526	39097.09955	USA	840

bar_fig = px.bar(
	data_frame=bar_data,
	x='country',
	y='pop',
)
bar_fig

Line chart#

Plot the stock prices of companies in a line chart.

df_stocks = px.data.stocks()
df_stocks

	date	GOOG	AAPL	AMZN	FB	NFLX	MSFT
0	2018-01-01	1.000000	1.000000	1.000000	1.000000	1.000000	1.000000
1	2018-01-08	1.018172	1.011943	1.061881	0.959968	1.053526	1.015988
2	2018-01-15	1.032008	1.019771	1.053240	0.970243	1.049860	1.020524
3	2018-01-22	1.066783	0.980057	1.140676	1.016858	1.307681	1.066561
4	2018-01-29	1.008773	0.917143	1.163374	1.018357	1.273537	1.040708
...	...	...	...	...	...	...	...
100	2019-12-02	1.216280	1.546914	1.425061	1.075997	1.463641	1.720717
101	2019-12-09	1.222821	1.572286	1.432660	1.038855	1.421496	1.752239
102	2019-12-16	1.224418	1.596800	1.453455	1.104094	1.604362	1.784896
103	2019-12-23	1.226504	1.656000	1.521226	1.113728	1.567170	1.802472
104	2019-12-30	1.213014	1.678000	1.503360	1.098475	1.540883	1.788185

105 rows × 7 columns

This example is slightly different in that we want to plot two lines on the same chart. We can do this by specifying a list of columns for the y axis.

stock_fig = px.line(
	data_frame=df_stocks,
	x='date',
	y=['GOOG', 'AAPL']
)
stock_fig

Histogram#

Plot the distribution of tips using a histogram. In this example, we do not need to specify the y axis, since the y axis will be the count of the number of tips in each bin.

tips_df = px.data.tips()
tips_df

	total_bill	tip	sex	smoker	day	time	size
0	16.99	1.01	Female	No	Sun	Dinner	2
1	10.34	1.66	Male	No	Sun	Dinner	3
2	21.01	3.50	Male	No	Sun	Dinner	3
3	23.68	3.31	Male	No	Sun	Dinner	2
4	24.59	3.61	Female	No	Sun	Dinner	4
...	...	...	...	...	...	...	...
239	29.03	5.92	Male	No	Sat	Dinner	3
240	27.18	2.00	Female	Yes	Sat	Dinner	2
241	22.67	2.00	Male	Yes	Sat	Dinner	2
242	17.82	1.75	Male	No	Sat	Dinner	2
243	18.78	3.00	Female	No	Thur	Dinner	2

244 rows × 7 columns

tips_fig = px.histogram(
	data_frame=tips_df,
	x='tip',
)
tips_fig

Histograms vs bar charts#

On the surface, bar charts and histograms appear to be similar representations of data, but they serve different purposes and are used in distinct contexts within the field of information visualization. Understanding their differences is crucial for effectively communicating data insights.

Bar charts are used to compare discrete categories or groups. In a bar chart, each category is represented by a bar, and the length or height of the bar corresponds to the value it represents. The bars are separated by spaces to emphasize that the categories are distinct and not related to each other in a quantitative manner. Bar charts are versatile and can be used to represent a wide range of data types, including counts, percentages, or other metrics associated with different categories. They are particularly useful for visualizing data where the categories do not have a natural order or are ranked.

In contrast, histograms are used to display the distribution of a continuous variable over a set of intervals, known as bins. Unlike bar charts, the bars in a histogram touch each other to convey the continuous nature of the data. Histograms are valuable for showing the shape of the data distribution, such as whether it is skewed to the left or right, has a single peak (unimodal) or multiple peaks (bimodal or multimodal), and to identify outliers or unusual gaps in the data. Each bar in a histogram represents the frequency or count of data points within a particular range, and the width of the bars can vary if the intervals are not uniform, although they are often kept the same for simplicity.

The key differences between bar charts and histograms thus lie in the type of data they represent and the way they are constructed. Bar charts are suitable for categorical data and emphasize comparison between different categories, while histograms are designed for continuous data and focus on showing the distribution of a variable across different intervals. Understanding these distinctions is essential for selecting the appropriate visualization technique to convey the right message about the data.