python数据可视化步骤_数据可视化101有效可视化的7个步骤
python数据可视化步骤
One of the essential aspects of being a data scientist is the ability to communicate data analysis results using various kinds of visualizations effectively.
成为数据科学家的基本方面之一是能够有效使用各种可视化工具传达数据分析结果的能力。
Data is a story told in numbers, visualizing it is how you’re telling the story.
数据是一个用数字讲述的故事,将其形象化地展现出来。
Unfortunately, we pay more attention to learning new analysis methods, libraries, and approaches, getting familiar with new datasets or trending machine learning and artificial intelligence algorithms, and ignore improving our visualization skills.
不幸的是,我们更加注重学习新的分析方法,库和方法,熟悉新的数据集或趋势机器学习和人工智能算法,而忽略了提高我们的可视化技能。
Don’t misunderstand me, being up to date with new technology is very important to have a successful career in DS. But we need to devote some time to get better at visualization and storytelling as well.
别误会我,掌握新技术对于在DS中取得成功至关重要。 但是,我们需要花一些时间来更好地实现可视化和讲故事。
Imagine this; you spend hours upon hours cleaning data, exploring it, and modeling it. It’s interesting; your results are valid and of significant meaning. But, your data visualization is dull and ineffective. That leads to your audience overlooking your hard work.
想象一下; 您需要花费数小时来清理,浏览和建模数据。 这真有趣; 您的结果是有效且有意义的。 但是,您的数据可视化既乏味又无效。 这导致您的听众忽略了您的辛勤工作。
Learning how to effectively visualize your data is like learning how to tell a compelling story.
学习如何有效地可视化数据就像学习如何讲一个引人入胜的故事。
Your choice of chart type, of colors, of style, will make a tremendous difference in how others will perceive your data.
您选择的图表类型,颜色,样式将对其他人如何看待您的数据产生巨大的影响。
Fortunately, there are simple guidelines that, if you follow, can make your data visualization both visually appealing, compelling, and captivating.
幸运的是,有一些简单的准则,如果您遵循的话,可以使您的数据可视化既具有视觉吸引力 ,又具有吸引力和吸引力 。
This article will present 7 simple tips to level-up your visualization based on scientific experiments and research.
本文将介绍7个 简单的技巧,以根据科学实验和研究来升级您的可视化效果。
Without further ado, let’s get into effectively telling a story with our data.
事不宜迟,让我们开始使用数据有效地讲述一个故事。
秘诀一:简单总是更好 (Tip №1: Simple is always better)
The goal of using visualization is to make information easier to read and understand by others. So, having complex, crowded visualization is something to be avoided.
使用可视化的目的是使信息更易于他人阅读和理解。 因此,需要避免复杂而拥挤的可视化。
Whenever you’re creating a visualization, you need to pay attention to the data-ink ratio. Data-ink ratio is a term used to refer to the amount of data vs. redundant ink in the graph, such as background effects/ colors and 3D representation of the data.
无论何时创建可视化,都需要注意数据墨水比率。 数据墨水比率是一个术语,用于表示图中的数据量与冗余墨水的比例,例如背景效果/颜色和数据的3D表示。
Instead of using multi-dimensional graphs, you can use visualization properties, such as shape, color, and thickness, to differentiate and distinguish your various datasets.
您可以使用可视化属性(例如形状,颜色和厚度)来区别和区分各种数据集,而不必使用多维图形。
For your visualization to be simple and effective, your data-ink ration needs to be high.
为了使可视化变得简单有效,您的数据墨水利用率必须很高。
提示№2:选择正确的图表类型 (Tip №2: Choose the right chart type)
Whenever you try to create a graph, you need to pay attention to your data type to select the correct chart to represent it accurately.
每当您尝试创建图形时,都需要注意数据类型以选择正确的图表以准确地表示它。
Based on the data you’re using, the type of chart you will use will differ. A good rule of thumb is:
根据您使用的数据,将使用的图表类型会有所不同。 一个好的经验法则是:
- If you have categorical data, use a bar chart if you have more than 5 categories or a pie chart otherwise. 如果您具有分类数据,则如果类别超过5个,则使用条形图,否则使用饼图。
- If you have nominal data, use bar charts or histograms if your data is discrete, or line/ area charts if it is continuous. 如果您有名义数据,则如果数据是离散的,则使用条形图或直方图;如果数据是连续的,则使用折线/面积图。
- If you want to show the relationship between values in your dataset, use a scatter plot, bubble chart, or line charts. 如果要显示数据集中值之间的关系,请使用散点图,气泡图或折线图。
- If you want to compare values, use a pie chart — for relative comparison — or bar charts — for precise comparison. 如果要比较值,请使用饼图(用于相对比较)或条形图(用于精确比较)。
提示№3:可视化每个图表的一个方面 (Tip №3: Visualize one aspect per chart)
Before creating a chart, you need to decide what exactly you want to show. Do you want to show patterns or details? To make your visuals more effective, try to display only one aspect at a time.
在创建图表之前,您需要确定要显示的内容。 您要显示图案或细节吗? 为了使视觉效果更有效,请尝试一次仅显示一个方面。
If you need to show two sides of your data, a pattern and some details, use two different plots. For example, you can use a line chart to show details and a heatmap or horizon graph to show the pattern within the data.
如果需要显示数据的两个侧面,一个模式和一些细节,请使用两个不同的图。 例如,您可以使用折线图显示详细信息,并使用热图或地平线图显示数据中的模式。
Horizon graphs display multiple time-series in parallel. Horizon graphs are similar to a time-series plot. However, in horizon graphs use color to highlight differences and extreme across time-series.
地平线图并行显示多个时间序列。 地平线图类似于时间序列图。 但是,在水平图中,颜色会突出显示时间序列中的差异和极端。
秘诀四:让您的轴范围变得有趣 (Tip №4: Make your axis ranges interesting)
The range of your vertical and horizontal axes depends on the type of chart and the story you’re trying to tell with it.
垂直轴和水平轴的范围取决于图表的类型以及您要尝试讲述的故事。
For example, if you’re using a bar chart and only to show the maximum values of different datasets, your axes need to start from 0.
例如,如果您使用的是条形图并且仅显示不同数据集的最大值,则您的坐标轴必须从0开始。
However, if you want to show fluctuation in your data in precise numbers, you need to zoom in your axes to make this fluctuation clear. It is easier to see variations in a dataset when the plot limits are closer to the fluctuation range.
但是,如果要以精确的数字显示数据波动,则需要放大轴以使波动清晰可见。 当图限制接近波动范围时,更容易查看数据集中的变化。
提示№5:通过数据转换强调变化率 (Tip №5: Emphasize change rate with data transformation)
The decision to use a transformation in your visualization depends on both your dataset and the intent of the plot. Applying transformations on your graph can change the impression and the information conveyed by your chart.
在可视化中使用变换的决定取决于数据集和绘图的意图。 在图形上应用变换可以更改印象和图表传达的信息。
Generally speaking, you can transform two aspects of your graphs. Your axes or your data itself.
一般来说,您可以变换图形的两个方面。 您的轴或数据本身。
改变你的轴 (Transforming your axes)
When plotting a set of data, you can either use a linear or a logarithmic scale. A logarithmic scale is often used to display the percentage of change during a period of time, so the points on the scale are not positioned equidistantly.
绘制一组数据时,可以使用线性或对数刻度。 对数刻度通常用于显示一段时间内的变化百分比,因此刻度上的点不会等距放置。
A linear scale, on the other hand, is used to display the absolute difference between various unique points of your dataset.
另一方面,线性刻度用于显示数据集各个唯一点之间的绝对差。
转换数据 (Transforming your data)
Logarithmic scales are sometimes challenging to understand by people, so a way to avoid it is to transform your data. For example, instead of displaying absolute values, you can normalize your values to the mean or a specific value.
对数刻度有时很难被人们理解,因此避免对数刻度的一种方法是转换您的数据。 例如,您可以将值标准化为平均值或特定值,而不是显示绝对值。
秘诀№6:注意散点图中的重叠点 (Tip №6: Be careful with overlapping points in Scatter plots)
When using a scatter plot, sometimes two or more circles may overlap each other, which could make reading the data more complex. It can also hide the actual size of a specific cluster within the graph.
使用散点图时,有时两个或多个圆圈可能会相互重叠,这会使读取数据变得更加复杂。 它还可以在图中隐藏特定群集的实际大小。
One thing you can do to avoid this problem and make your scatter plot more meaningful is to use different opacities for your circles to visualize all of your data points clearly.
为避免此问题并使散点图更有意义,您可以做的一件事是为圆使用不同的不透明度,以清晰地可视化所有数据点。
Another strategy to achieve a similar effect is to plot unfilled circles. This approach may not be beneficial in the case of large datasets, then, using the opacity option may be a better choice. You can also change the sizes of the circles to have an overall clearer visualization.
实现类似效果的另一种策略是绘制未填充的圆。 对于大型数据集,此方法可能没有好处,因此使用不透明选项可能是更好的选择。 您还可以更改圆圈的大小,以使整体可视化效果更清晰。
秘诀№7:小心您的配色方案 (Tip №7: Be careful with your color scheme)
Colors can make or break your graphs. When you’re creating new visuals, you need to be careful when selecting a color scheme. To choose the best color scheme, you need to ask yourself two questions.
颜色会影响您的图表。 在创建新的视觉效果时,选择配色方案时需要小心。 要选择最佳的配色方案,您需要问自己两个问题。
颜色在不同平台上可见吗? (Is the color visible on different platforms?)
Sometimes when we build charts on our devices to use in a presentation or a meeting, we forget to test how this chart will appear on different platforms.
有时,当我们在设备上构建图表以用于演示或会议时,我们忘记测试该图表在不同平台上的显示方式。
Will they be clear when displayed on a computer or a phone? What about the lighting? Do I have to use high screen brightness to see the chart clearly, or does it work regardless?
在计算机或电话上显示时,它们是否清晰? 照明呢? 我是否必须使用较高的屏幕亮度才能清楚地查看图表,还是可以正常工作?
我将使用什么媒体来显示我的图表? (What media will I use to display my chart?)
If you’re creating charts to be printed, the type of paper may affect your choice of colors. Sometimes a color that is clear on your screen may not be apparent when printing on a specific kind of paper.
如果要创建要打印的图表,则纸张类型可能会影响您选择的颜色。 在特定类型的纸张上打印时,有时屏幕上看不见的颜色可能不明显。
Moreover, try to use fewer colors or related colors to deliver your message. If you’re creating a heatmap, you need to use the gradient of one color and not different colors. Using different colors may confuse and make your map difficult to understand.
此外,请尝试使用较少的颜色或相关颜色来传递消息。 如果要创建热图,则需要使用一种颜色而不是其他颜色的渐变。 使用不同的颜色可能会造成混淆,并使您的地图难以理解。
结论 (Conclusion)
Visualizing data is often the best and most straightforward approach to communicate this data across to a broad audience. Whenever we try to create charts and figures, we need to make them simple, direct, and easy to read.
可视化数据通常是最好的和最直接的方法,可以将这些数据传达给广泛的受众。 每当我们尝试创建图表时,都需要使它们简单,直接,易于阅读。
Remember, your data tells a story, and your choice of visualization can either make this story exciting or downright dull.
请记住,您的数据讲述了一个故事,而您选择的可视化效果可能会使这个故事令人振奋或沉闷。
So, following 7 simple steps, you can quickly improve the quality and readability of your visualization:
因此,按照以下7个简单步骤,您可以快速提高可视化效果的质量和可读性:
-
Simple is always better.
简单总是更好。
-
Your axes ranges make a huge difference.
您的轴范围会产生巨大的变化。
-
Focus on one aspect per chart.
专注于每个图表的一个方面。
-
Choose the right chart type for your data.
选择适合您的数据的图表类型 。
-
Use transformations to emphasize change.
使用转换来强调变化。
-
Be careful with overlapping circles in a scatter plot.
注意散点图中重叠的圆圈。
-
Don’t overdo it with color schemes.
不要过度使用配色方案。
python数据可视化步骤