Basic charts#
Documentation: https://plotly.com/python/basic-charts/
This section highlights some of the basic charts that can be created with Plotly: scatter plots, bar charts, line charts, and histograms. Each type of chart is demonstrated with a simple example here, and discussed in more detail in later sections.
import plotly.express as px
Scatter plot#
Use gapminder data, focusing on 2002.
gap_data = px.data.gapminder()
just_2002 = gap_data.query('year == 2002')
just_2002
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
10 | Afghanistan | Asia | 2002 | 42.129 | 25268405 | 726.734055 | AFG | 4 |
22 | Albania | Europe | 2002 | 75.651 | 3508512 | 4604.211737 | ALB | 8 |
34 | Algeria | Africa | 2002 | 70.994 | 31287142 | 5288.040382 | DZA | 12 |
46 | Angola | Africa | 2002 | 41.003 | 10866106 | 2773.287312 | AGO | 24 |
58 | Argentina | Americas | 2002 | 74.340 | 38331121 | 8797.640716 | ARG | 32 |
... | ... | ... | ... | ... | ... | ... | ... | ... |
1654 | Vietnam | Asia | 2002 | 73.017 | 80908147 | 1764.456677 | VNM | 704 |
1666 | West Bank and Gaza | Asia | 2002 | 72.370 | 3389578 | 4515.487575 | PSE | 275 |
1678 | Yemen, Rep. | Asia | 2002 | 60.308 | 18701257 | 2234.820827 | YEM | 887 |
1690 | Zambia | Africa | 2002 | 39.193 | 10595811 | 1071.613938 | ZMB | 894 |
1702 | Zimbabwe | Africa | 2002 | 39.989 | 11926563 | 672.038623 | ZWE | 716 |
142 rows × 8 columns
Suppose we want to look at the relationship between GDP per capita and life expectancy in 2002, using a scatter plot.
We just need to use the scatter
function, and specify the data frame, and the columns to use for the x and y axes.
scatter_2002 = px.scatter(
data_frame=just_2002,
x='gdpPercap',
y='lifeExp',
)
scatter_2002
Bar chart#
Using the same 2002 dataset, make a population bar chart for five countries:
United States
Canada
Italy
Sweden
Taiwan
Again, we just need to use the bar
function, and specify the data frame, and the columns to use for the x and y axes.
country_list = [ 'United States', 'Canada', 'Italy', 'Sweden', 'Taiwan' ]
bar_data = just_2002.query('country in @country_list')
bar_data
country | continent | year | lifeExp | pop | gdpPercap | iso_alpha | iso_num | |
---|---|---|---|---|---|---|---|---|
250 | Canada | Americas | 2002 | 79.77 | 31902268 | 33328.96507 | CAN | 124 |
778 | Italy | Europe | 2002 | 80.24 | 57926999 | 27968.09817 | ITA | 380 |
1474 | Sweden | Europe | 2002 | 80.04 | 8954175 | 29341.63093 | SWE | 752 |
1510 | Taiwan | Asia | 2002 | 76.99 | 22454239 | 23235.42329 | TWN | 158 |
1618 | United States | Americas | 2002 | 77.31 | 287675526 | 39097.09955 | USA | 840 |
bar_fig = px.bar(
data_frame=bar_data,
x='country',
y='pop',
)
bar_fig
Line chart#
Plot the stock prices of companies in a line chart.
df_stocks = px.data.stocks()
df_stocks
date | GOOG | AAPL | AMZN | FB | NFLX | MSFT | |
---|---|---|---|---|---|---|---|
0 | 2018-01-01 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
1 | 2018-01-08 | 1.018172 | 1.011943 | 1.061881 | 0.959968 | 1.053526 | 1.015988 |
2 | 2018-01-15 | 1.032008 | 1.019771 | 1.053240 | 0.970243 | 1.049860 | 1.020524 |
3 | 2018-01-22 | 1.066783 | 0.980057 | 1.140676 | 1.016858 | 1.307681 | 1.066561 |
4 | 2018-01-29 | 1.008773 | 0.917143 | 1.163374 | 1.018357 | 1.273537 | 1.040708 |
... | ... | ... | ... | ... | ... | ... | ... |
100 | 2019-12-02 | 1.216280 | 1.546914 | 1.425061 | 1.075997 | 1.463641 | 1.720717 |
101 | 2019-12-09 | 1.222821 | 1.572286 | 1.432660 | 1.038855 | 1.421496 | 1.752239 |
102 | 2019-12-16 | 1.224418 | 1.596800 | 1.453455 | 1.104094 | 1.604362 | 1.784896 |
103 | 2019-12-23 | 1.226504 | 1.656000 | 1.521226 | 1.113728 | 1.567170 | 1.802472 |
104 | 2019-12-30 | 1.213014 | 1.678000 | 1.503360 | 1.098475 | 1.540883 | 1.788185 |
105 rows × 7 columns
This example is slightly different in that we want to plot two lines on the same chart. We can do this by specifying a list of columns for the y axis.
stock_fig = px.line(
data_frame=df_stocks,
x='date',
y=['GOOG', 'AAPL']
)
stock_fig
Histogram#
Plot the distribution of tips using a histogram. In this example, we do not need to specify the y axis, since the y axis will be the count of the number of tips in each bin.
tips_df = px.data.tips()
tips_df
total_bill | tip | sex | smoker | day | time | size | |
---|---|---|---|---|---|---|---|
0 | 16.99 | 1.01 | Female | No | Sun | Dinner | 2 |
1 | 10.34 | 1.66 | Male | No | Sun | Dinner | 3 |
2 | 21.01 | 3.50 | Male | No | Sun | Dinner | 3 |
3 | 23.68 | 3.31 | Male | No | Sun | Dinner | 2 |
4 | 24.59 | 3.61 | Female | No | Sun | Dinner | 4 |
... | ... | ... | ... | ... | ... | ... | ... |
239 | 29.03 | 5.92 | Male | No | Sat | Dinner | 3 |
240 | 27.18 | 2.00 | Female | Yes | Sat | Dinner | 2 |
241 | 22.67 | 2.00 | Male | Yes | Sat | Dinner | 2 |
242 | 17.82 | 1.75 | Male | No | Sat | Dinner | 2 |
243 | 18.78 | 3.00 | Female | No | Thur | Dinner | 2 |
244 rows × 7 columns
tips_fig = px.histogram(
data_frame=tips_df,
x='tip',
)
tips_fig
Histograms vs bar charts#
On the surface, bar charts and histograms appear to be similar representations of data, but they serve different purposes and are used in distinct contexts within the field of information visualization. Understanding their differences is crucial for effectively communicating data insights.
Bar charts are used to compare discrete categories or groups. In a bar chart, each category is represented by a bar, and the length or height of the bar corresponds to the value it represents. The bars are separated by spaces to emphasize that the categories are distinct and not related to each other in a quantitative manner. Bar charts are versatile and can be used to represent a wide range of data types, including counts, percentages, or other metrics associated with different categories. They are particularly useful for visualizing data where the categories do not have a natural order or are ranked.
In contrast, histograms are used to display the distribution of a continuous variable over a set of intervals, known as bins. Unlike bar charts, the bars in a histogram touch each other to convey the continuous nature of the data. Histograms are valuable for showing the shape of the data distribution, such as whether it is skewed to the left or right, has a single peak (unimodal) or multiple peaks (bimodal or multimodal), and to identify outliers or unusual gaps in the data. Each bar in a histogram represents the frequency or count of data points within a particular range, and the width of the bars can vary if the intervals are not uniform, although they are often kept the same for simplicity.
The key differences between bar charts and histograms thus lie in the type of data they represent and the way they are constructed. Bar charts are suitable for categorical data and emphasize comparison between different categories, while histograms are designed for continuous data and focus on showing the distribution of a variable across different intervals. Understanding these distinctions is essential for selecting the appropriate visualization technique to convey the right message about the data.