R for Data Science总结之——ggplot2
R for Data Science总结之——ggplot2
ggplot2作为R语言中经典的画图包,Grammar of Graphics理论的最佳实现,用图层的方式让数据研究人员可以最大程度地自定义化编程作图,其基本结构为:
ggplot(data = ) +
<GEOM_FUNCTION>(
mapping = aes(),
stat = ,
position =
) +
<COORDINATE_FUNCTION> +
<FACET_FUNCTION>
安装和加载:
install.packages("ggplot2")
library(ggplot2)
本文主要介绍Hadley Wickham著作R for Data Science 中的ggplot2包用法。
快速作图qplot()
qplot()函数是graphics包中的plot函数的简单包装,以图层理念的快速作图函数,简单实现为:
# Use data from data.frame
qplot(mpg, wt, data = mtcars)
qplot(mpg, wt, data = mtcars, colour = cyl)
qplot(mpg, wt, data = mtcars, size = cyl)
qplot(mpg, wt, data = mtcars, facets = vs ~ am)
# qplot will attempt to guess what geom you want depending on the input
# both x and y supplied = scatterplot
qplot(mpg, wt, data = mtcars)
# just x supplied = histogram
qplot(mpg, data = mtcars)
# just y supplied = scatterplot, with x = seq_along(y)
qplot(y = mpg, data = mtcars)
注意qplot()会根据给定自变量的数量猜测使用的geom自动作图
手动选择geom进行操作如下:
qplot(mpg, wt, data = mtcars, geom = "path")
qplot(factor(cyl), wt, data = mtcars, geom = c("boxplot", "jitter"))
qplot(mpg, data = mtcars, geom = "dotplot")
ggplot()函数
aes()
与qplot()在一个函数中控制所有的x,y轴变量,data, color, size, facet, geom不同的是,ggplot()中一般在ggplot()函数中控制data并通过aes()函数控制x,y轴变量,group, color, size的分组再根据图层思想,用 “+” 符号添加geom函数,不同的geom_point(), geom_line(), geom_boxplot()控制不同的制图,可以再Rstudio中打出geom再按"Tab"键进行选择,其后仍然用 “+” 符号添加Coordinate和Facet选项,例如:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
aes()函数中可用于分组的变量除group以外,还有color, size, alpha, shape, 而在aes()函数之外设置这些变量则无分组效果,例如:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), color = "blue")
效果分别为:
Facets
Facets系列函数主要用于花多个图时对其进行分行列处理,例如:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_wrap(~ class, nrow = 2)
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy)) +
facet_grid(drv ~ cyl)
Geoms
选择不同的geom函数可以画不同类型的图,如下分别为点图和圆滑曲线图代码:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy))
ggplot(data = mpg) +
geom_smooth(mapping = aes(x = displ, y = hwy))
Stats
除了geom函数外,还可用stat函数对其进行转换,例如geom_bar()函数画图的柱高代表的是某一变量的数量,对应的stat函数为stat_count();而geom_col()函数画图的柱高代表的是某一变量的数值,其对应的stat函数为stat_identity(),如下两段代码作图结果是相同的:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut))
ggplot(data = diamonds) +
stat_count(mapping = aes(x = cut))
也可以在geom函数中对stat默认值进行修改,如下两段代码作图结果是相同的:
ggplot(data = demo) +
geom_bar(mapping = aes(x = cut, y = freq), stat = "identity")
ggplot(data = diamonds) +
stat_identity(mapping = aes(x = cut, y = freq))
位置控制
除此之外柱状图的color属性定义的是边框的颜色,填充颜色应使用fill属性定义,同时还可以将fill赋值为不同的分类变量使其展示不同的颜色:
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, colour = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = cut))
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity))
同时还可用position属性控制柱状图的堆积模式,默认为stack, 还可设置为identity, dodge, fill:
ggplot(data = diamonds, mapping = aes(x = cut, fill = clarity)) +
geom_bar(alpha = 1/5, position = "identity")
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "fill")
ggplot(data = diamonds) +
geom_bar(mapping = aes(x = cut, fill = clarity), position = "dodge")
为了防止点的重合可使用如下position = "jitter"的设置方法:
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy), position = "jitter")
Coordinate系统
Coordinate系统可以用于调整作图的整体呈现结果,如进行翻转,或将条形图转成饼图等:
ggplot(data = mpg, mapping = aes(x = class, y = hwy)) +
geom_boxplot() +
coord_flip()
nz <- map_data("nz")
ggplot(nz, aes(long, lat, group = group)) +
geom_polygon(fill = "white", colour = "black") +
coord_quickmap()
bar <- ggplot(data = diamonds) +
geom_bar(
mapping = aes(x = cut, fill = cut),
show.legend = FALSE,
width = 1
) +
theme(aspect.ratio = 1) +
labs(x = NULL, y = NULL)
bar + coord_flip()
bar + coord_polar()
标签
标签的定义主要为图像的标题例如:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth(se = FALSE) +
labs(
title = "Fuel efficiency generally decreases with engine size",
subtitle = "Two seaters (sports cars) are an exception because of their light weight",
caption = "Data from fueleconomy.gov",
colour = "Car type"
)
对每个数据点进行标签为:
best_in_class <- mpg %>%
group_by(class) %>%
filter(row_number(desc(hwy)) == 1)
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_text(aes(label = model), data = best_in_class)
还可以对标签位置进行调整,使其在数据点偏上位置:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_label(aes(label = model), data = best_in_class, nudge_y = 2, alpha = 0.5)
同时可以使用ggrepel包的方法使标签更加美观不重叠:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_point(size = 3, shape = 1, data = best_in_class) +
ggrepel::geom_label_repel(aes(label = model), data = best_in_class)
Scales
ggplot对scale的默认设置为:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
scale_x_continuous() +
scale_y_continuous() +
scale_colour_discrete()
可以如下调整x坐标轴的坐标值:
ggplot(mpg, aes(displ, hwy)) +
geom_point() +
scale_y_continuous(breaks = seq(15, 40, by = 5))
同时可以用scale_x_log10(),scale_y_log10()函数使坐标轴对数化,用 scale_colour_brewer(palette = “Set1”)调整调色板,也可以scale_colour_manual(values = c(Republican = “red”, Democratic = “blue”))手动设置。
legend图例
可以用guide()和theme()函数联合控制图例位置:
ggplot(mpg, aes(displ, hwy)) +
geom_point(aes(colour = class)) +
geom_smooth(se = FALSE) +
theme(legend.position = "bottom") +
guides(colour = guide_legend(nrow = 1, override.aes = list(size = 4)))
zooming缩放
coord_cartesian()函数可用于图形的缩放如:
ggplot(mpg, mapping = aes(displ, hwy)) +
geom_point(aes(color = class)) +
geom_smooth() +
coord_cartesian(xlim = c(5, 7), ylim = c(10, 30))
全文代码已上传GITHUB点此查看