箱形图Boxplot
Boxplot可能是最常见的图形类型之一。它能够很好表示数据中的分布规律。箱型图方框的末尾显示了上下四分位数。极线显示最高和最低值,不包括异常值。seaborn中用boxplot函数制作箱形图。该章节主要内容有:
- 基础箱形图绘制 Basic boxplot and input format
- 自定义外观 Custom boxplot appearance
- 箱型图的颜色设置 Control colors of boxplot
- 分组箱图 Grouped Boxplot
- 箱图的顺序设置 Control order of boxplot
- 添加散点分布 Add jitter over boxplot
- 显示各类的样本数 Show number of observation on boxplot
import seaborn as sns
df = sns.load_dataset('iris')
df.head()
|
sepal_length |
sepal_width |
petal_length |
petal_width |
species |
0 |
5.1 |
3.5 |
1.4 |
0.2 |
setosa |
1 |
4.9 |
3.0 |
1.4 |
0.2 |
setosa |
2 |
4.7 |
3.2 |
1.3 |
0.2 |
setosa |
3 |
4.6 |
3.1 |
1.5 |
0.2 |
setosa |
4 |
5.0 |
3.6 |
1.4 |
0.2 |
setosa |
1. 基础箱形图绘制 Basic boxplot and input format
- 一个数值变量 One numerical variable only
- 一个数值变量和多个分组 One numerical variable, and several groups
- 多个数值变量 Several numerical variable
- 水平箱型图 Horizontal boxplot with seaborn
sns.boxplot( y=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzI0Mi83MDg2NWZlZDM1ZjhmYTRjN2E3NDc5NjYyMjhhNDhhMi5wbmc=)
sns.boxplot( x=df["species"], y=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzM4Mi81MzIyMjVmNGJmZTE1ZDFmMzZiNDJiNmYxNGFjZDAwNi5wbmc=)
sns.boxplot(data=df.iloc[:,0:2]);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzM4OS80MjIzNWUxYjUwNDVjNGJmMTc2ZjFiNmMzZjYyYjUwNS5wbmc=)
sns.boxplot( y=df["species"], x=df["sepal_length"] );
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzc0MS9kODI2NDE1ZDFjOTNjNDAxYzE2NDA2MzFlNDU2M2IzZC5wbmc=)
2. 自定义外观 Custom boxplot appearance
- 自定义线宽 Custom line width
- 添加缺口 Add notch
- 控制箱的尺寸 Control box sizes
sns.boxplot( x=df["species"], y=df["sepal_length"], linewidth=5);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzE4Mi85MjVhMWQ2ZGM2NjcyMDVlZWRhMzdjNGY5MmFkMjg1ZS5wbmc=)
sns.boxplot( x=df["species"], y=df["sepal_length"], notch=True);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzQwLzMyYzZiNjMxM2M0YmVmZTUwOTEwOWQzOWU5NmNjMzgwLnBuZw==)
sns.boxplot( x=df["species"], y=df["sepal_length"], width=0.3);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzY3Ni84YWFjYWQ1OThkNjEzMDc2MDBhZjk5MzIzOWQ1MmI3Yy5wbmc=)
3. 箱型图的颜色设置 Control colors of boxplot
- 调色板的使用 Use a color palette
- 单种颜色的使用 Uniform color
- 每组的特定颜色 Specific color for each group
- 单组高亮 Highlight a group
- 添加透明色 Add transparency to color
sns.boxplot( x=df["species"], y=df["sepal_length"], palette="Blues");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzMxMi9lZDU5ZDAxOTVhNWQ2NWZlOTkyY2M1MjEzNDJmMzc5MC5wbmc=)
sns.boxplot( x=df["species"], y=df["sepal_length"], color="skyblue");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzg5MS8yYmM2NTUzNGY4NDYyMDc0YjQ5YmJhYzUyYWI0NWRiMy5wbmc=)
my_pal = {"versicolor": "g", "setosa": "b", "virginica":"m"}
sns.boxplot( x=df["species"], y=df["sepal_length"], palette=my_pal);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzEwLzU3OGUxZjdkODc2ZGU2ZmZjM2Q1M2FiNGIxYTM2MDlhLnBuZw==)
my_pal = {species: "r" if species == "versicolor" else "b" for species in df.species.unique()}
sns.boxplot( x=df["species"], y=df["sepal_length"], palette=my_pal);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzEwNy83NzBiNzI2ZDhiOWQxMGM3YWE3MWFiZDZjYjYwNWIyMy5wbmc=)
ax = sns.boxplot(x='species', y='sepal_length', data=df);
for patch in ax.artists:
r, g, b, a = patch.get_facecolor()
patch.set_facecolor((r, g, b, .3))
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzY2Mi9iNzliNmE3OWZlMTU0NzhlZThhZjllOTk2YmRkYWVmZS5wbmc=)
4. 分组箱图 Grouped Boxplot
df_tips = sns.load_dataset('tips')
df_tips.head()
|
total_bill |
tip |
sex |
smoker |
day |
time |
size |
0 |
16.99 |
1.01 |
Female |
No |
Sun |
Dinner |
2 |
1 |
10.34 |
1.66 |
Male |
No |
Sun |
Dinner |
3 |
2 |
21.01 |
3.50 |
Male |
No |
Sun |
Dinner |
3 |
3 |
23.68 |
3.31 |
Male |
No |
Sun |
Dinner |
2 |
4 |
24.59 |
3.61 |
Female |
No |
Sun |
Dinner |
4 |
sns.boxplot(x="day", y="total_bill", hue="smoker", data=df_tips, palette="Set1");
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzY4NS9mOGM2ZTNmYjhkMTBmMTlhNzllMjY3MzE1ZjZiZGQ5ZC5wbmc=)
5. 箱图的顺序设置 Control order of boxplot
p1=sns.boxplot(x='species', y='sepal_length', data=df, order=["virginica", "versicolor", "setosa"]);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzk0Ny9hMDYwYTFkYWZlZmJhYmNlNmEwOTRhNmI0ZmUyMDdlMy5wbmc=)
my_order = df.groupby(by=["species"])["sepal_length"].median().iloc[::-1].index
sns.boxplot(x='species', y='sepal_length', data=df, order=my_order);
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzg0OS8xZTgyYmQ1NGU5NmVmODEyNzQyMWUzYzEzOTYwM2M2OS5wbmc=)
6. 添加散点分布 Add jitter over boxplot
ax = sns.boxplot(x='species', y='sepal_length', data=df)
ax = sns.swarmplot(x='species', y='sepal_length', data=df, color="grey")
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzIyMi8zYTM3ZDdlYTk4ZTM4OTQ1ZTk1ZWU3NzdmZTMxOTc4Ni5wbmc=)
7. 显示各类的样本数 Show number of observation on boxplot
ax = sns.boxplot(x="species", y="sepal_length", data=df)
medians = df.groupby(['species'])['sepal_length'].median().values
nobs = df['species'].value_counts().values
nobs = [str(x) for x in nobs.tolist()]
nobs = ["n: " + i for i in nobs]
pos = range(len(nobs))
for tick,label in zip(pos,ax.get_xticklabels()):
ax.text(pos[tick], medians[tick] + 0.03, nobs[tick], horizontalalignment='center', size='x-small', color='w', weight='semibold')
![[seaborn] seaborn学习笔记1——箱形图Boxplot [seaborn] seaborn学习笔记1——箱形图Boxplot](/default/index/img?u=aHR0cHM6Ly9waWFuc2hlbi5jb20vaW1hZ2VzLzIyOC9iNzcyMjRhYWE3OWUxOTkzMmM4ZjU2MjFjMDljNGMxYy5wbmc=)