devops_迁移您的DevOps警报并避免停机

devops

This article was sponsored by VictorOps. Thank you for supporting the sponsors who make SitePoint possible.

本文由VictorOps赞助。 感谢您支持使SitePoint成为可能的赞助商。

It sounds like something you might remember from the Calvin & Hobbes comic strip. But instead of helping with mind reading and shape-shifting, the VictorOps Transmogrifier tool is designed to help engineers resolve application alerts and minimize downtime.

听起来您可能会记得Calvin&Hobbes连环画中的某些东西。 但是, VictorOps Transmogrifier工具旨在帮助工程师解决应用程序警报并最大程度地减少停机时间,而不是帮助阅读和改变形状。

By bringing automation to key stages of an alert lifecycle, the Transmogrifier is unrivaled in both range of use and user satisfaction. Engineers no longer have to sift through irrelevant alerts, while hunting for documentation related to the ones that are of actual importance. That’s because the Transmogrifier can handle those tasks on its own, leaving the important part, resolving the issue, to the engineer on-call.

通过使自动化进入警报生命周期的关键阶段,Transmogrifier在使用范围和用户满意度方面均无与伦比。 工程师不再需要筛选不相关的警报,而寻找与实际重要的文档有关的文档。 这是因为Transmogrifier可以自行处理这些任务,而将解决问题的重要部分留给了工程师。

为什么需要变形机 (Why you need a Transmogrifier)

If you’ve managed to successfully integrate an alert monitoring system into your workflow, you may have started to notice two things. The first being that, at the slightest hint of an error (valid or not), an alert is sent. This can add up to a lot of alerts in a short time period, especially when you begin to consider all of the different pieces of an application that might require monitoring.

如果您已成功地将警报监视系统集成到工作流程中,则可能已经开始注意到两件事。 第一个是在出现错误的丝毫提示(有效或无效)时,发送警报。 在短时间内,这可能会增加大量警报,尤其是当您开始考虑可能需要监视的应用程序的所有不同部分时。

The second issue you may notice is that, as a result of these alerts, your devops team may begin to start feeling a little overwhelmed. It’s called alert fatigue, and it’s an issue that plagues on-call teams whose responsibility it is to respond to all of these alerts getting sent to them.

您可能会注意到的第二个问题是,由于这些警报,您的团队可能开始感到有些不知所措。 这就是警报疲劳,这是困扰待命团队的问题,后者的职责是响应所有发送给他们的警报。

These two issues can lead to an environment of constant business and disorganization, as teams struggle to filter through alerts and gather the relevant information needed to actually solve them.

这两个问题可能会导致业务不断混乱的局面,因为团队努力过滤警报并收集实际解决它们所需的相关信息。

Using the VictorOps Transmogrifier is like adding another member to your devops team. It removes the problem of alert fatigue by adding a number of key features to your alert notification process. With the Transmogrifier you can filter through irrelevant alerts, delivering only the important ones to your team.

使用VictorOps Transmogrifier就像将另一个成员添加到您的devops团队。 通过在警报通知过程中添加许多关键功能,消除了警报疲劳的问题。 使用Transmogrifier,您可以过滤不相关的警报,仅将重要的警报传递给您的团队。

To cut down on development time, you can make sure relevant documentation is attached to to every alert, so that your developers don’t have to hunt down a solution to a commonly occurring problem. You can even change the status of alerts on the fly, so that specific alert patterns go on to notify the right members of your team.

为了减少开发时间,您可以确保将相关文档附加到每个警报中,以使您的开发人员不必寻找常见问题的解决方案。 您甚至可以即时更改警报的状态,以便继续使用特定的警报模式来通知您团队中的正确成员。

The Transmogrifier helps to transform your current alert monitoring strategy so you’re not left in the dark to fight every random alert that comes your way.

Transmogrifier有助于转变您当前的警报监视策略,因此您不会无所适从来应对您遇到的每种随机警报。

那么它是怎样工作的? (So how does it work?)

VictorOps Transmogrifier from VictorOps on Vimeo.

VictorOps TransmogrifierVictorOps的Vimeo

By way of a simple drag-and-drop interface, the Transmogrifier gives you a number of features that you can add onto your alerts, transforming them into miniature programs that can do a lot of important work for you and your team. Let’s go over those features and explain how you can customize them to help your team.

通过简单的拖放界面,Transmogrifier为您提供了许多可以添加到警报中的功能,将它们转换为微型程序,可以为您和您的团队做很多重要的工作。 让我们仔细研究一下这些功能,并说明如何自定义它们以帮助您的团队。

警报规则 (Alert Rules)

Setting up an alert rule using the Transmogrifier resembles an IFTT (If This, Then That) pattern. Designating a match between an alert field and specified value will enable the user to attach specific notes, links, and documentation to the alert.

使用Transmogrifier设置警报规则类似于IFTT (“先这样,然后是”)模式。 通过在警报字段和指定值之间指定匹配项,用户可以将特定的注释,链接和文档附加到警报中。

devops_迁移您的DevOps警报并避免停机

You can set as many alert rules as you’d like, with each rule being tested on every alert that comes through the pipeline. If you’d like the Transmogrification to stop after a specific rule is matched, you can specify that with the cunningly-named “Stop after this rule has been applied” option.

您可以设置任意数量的警报规则,并针对通过管道的每个警报对每个规则进行测试。 如果您希望在匹配特定规则后停止“迁移”,则可以使用巧妙地命名为“在应用此规则后停止”来指定该选项。

These rules can be very flexible, especially when wildcard characters are used in the alert value field.

这些规则非常灵活,尤其是在警报值字段中使用通配符时。

When host\_name matches db\*.victorops.com

When host\_name matches db\*.victorops.com

Note the use of the ‘’ in the value field. You can use the ‘’ and ‘?’ characters for simple wildcards, representing any string of characters or any single character, respectively.

注意在值字段中使用“ ”。 您可以使用 '和'? 简单通配符的字符,分别代表任何字符串或任何单个字符。

自定义注释 (Custom Annotations)

Another useful feature provided by the Transmogrifier is the ability to add custom annotations to your alerts. Using these annotations, you can add things like data visualizations, graphs and charts, as well as extra documentation to your alert. This feature is partly aided by the ability to use variables inside of your alert rules, shown in the following image.

Transmogrifier提供的另一个有用功能是能够向您的警报添加自定义注释。 使用这些注释,您可以在警报中添加数据可视化,图形和图表等内容,以及其他文档。 下图显示了使用警报规则内的变量的能力,部分地帮助了此功能。

devops_迁移您的DevOps警报并避免停机

自动化 (Automation)

You can even configure alerts to run certain processes for you via annotations, by writing commands directly into note fields.

您甚至可以通过将注释直接写入注释字段来配置警报以通过注释为您运行某些进程。

devops_迁移您的DevOps警报并避免停机

There are plenty of interesting things you can do with the features provided by the Transmogrifier, so be sure to check out their documentation once you’re ready to dive in!

使用Transmogrifier提供的功能,您可以做很多有趣的事情,因此一旦准备好潜水, 务必查看其文档

使用变形机 (Using the Transmogrifier)

Now that we know all about the Transmogrifier and what it can do, let’s use it to set up some alert filters!

既然我们了解了Transmogrifier以及它可以做什么,那么让我们使用它来设置一些警报过滤器!

After navigating to settings > Transmogrifier, we’re presented with a dashboard where we can start setting up filters for every alert that comes our way.

导航到“设置”>“ Transmogrifier”之后,我们将看到一个仪表板,可以在其中为出现的每种警报设置过滤器。

devops_迁移您的DevOps警报并避免停机

Clicking the “Add a Rule” button presents us with a fresh new menu to start plugging in our options.

单击“添加规则”按钮将为我们提供一个全新的菜单,以开始插入我们的选项。

devops_迁移您的DevOps警报并避免停机

My goal is to modify alerts that are notifying me when any of my Github pages sites are having trouble related to performance monitoring, so I’m going to configure my filter to start Transmogrifying as soon as the host_name field matchestevko.github.io/*. This is going to be my top-level filter, meaning that all other filters will be applied after this filter passes.

我的目标是修改当我的任何Github页面站点遇到与性能监视有关的问题时通知我的警报,因此,我将配置我的过滤器,以便在host_name字段匹配testevko.github.io/*时立即开始进行迁移。 。 这将是我的*过滤器,这意味着该过滤器通过后将应用所有其他过滤器。

Next, I’m going to annotate the alert with a note that clarifies the issue. In the “Annotate the alert with:” field, I’m placing a note that tells me the error is coming from a Github pages url. Here’s where I’ll be using the double curly bracket variable syntax, so I’ll be notified of the exact location that the error is coming from. Pretty cool!

接下来,我将通过注释来注释该警报,以澄清问题。 在“为警报添加注释:”字段中,我放置了一条便条,告诉我错误来自Github页面网址。 在这里,我将使用双花括号变量语法,因此会通知我错误的确切位置。 太酷了!

Let’s set up one more rule to ensure that the alert can be handled quickly. I’m going to add another rule that checks where exactly the alert is coming from, and updates the status of the alert if it’s coming from a specific location. In order to do that, I’ve configured the filter to check for a monitoring_tool that matches pingdom. If that checks out, then I’m going to annotate the alert with the url for pingdom’s performance monitoring tool. I’ll then transform the alert’s status to urgent, and add a note that the alert requires immediate attention.

让我们再设置一条规则,以确保可以快速处理警报。 我将添加另一条规则,以检查警报的确切来源,如果警报来自特定位置,则更新警报的状态。 为此,我将过滤器配置为检查与pingdom相匹配的monitoring_tool。 如果能够解决问题,那么我将使用pingdom的性能监控工具的url注释警报。 然后,我将警报的状态转换为紧急状态,并添加一条注释,指出警报需要立即引起注意。

devops_迁移您的DevOps警报并避免停机

Now we have a well-defined set of rules that filters through all of my alerts and informs me if one of them is a performance-related alert coming from a Github pages URL. It’s pretty incredible how specific we can get with only two rules!

现在,我们有了一组定义明确的规则,可以过滤我的所有警报,并通知我其中一个是否是来自Github页面URL的与性能相关的警报。 只需两个规则,我们就能获得多么具体的结果,这真是令人难以置信!

总体思路 (Overall thoughts)

After having a chance to preview the Transmogrifier and set up a few alerts of my own, I’m very impressed with how VictorOps has once again improved the on-call process. Alert fatigue is the most dreaded part of being on-call, and having the ability to sort through and clarify alerts makes the process run much more efficiently. Instead of having to filter through irrelevant and unimportant error messages, your alerts can come with helpful documentation and data visualizations, and some alerts can even solve or downgrade themselves!

有机会预览Transmogrifier并设置了一些我自己的警报后,VictorOps如何再次改善了通话过程给我留下了深刻的印象。 警报疲劳是待命中最令人恐惧的部分,并且具有对警报进行分类和澄清的能力,可以使流程高效得多。 您的警报可以附带有用的文档和数据可视化功能,而不必通过无关紧要的错误消息进行过滤,甚至某些警报甚至可以自行解决或降级!

My favorite part of the Transmogrifier is its ability to attach documentation to alerts on the fly. Imagine the next alert you receive including a specific and detailed path to solving the problem at hand. With the Transmogrifier, this feature is only a few clicks away.

我最喜欢Transmogrifier的部分是它能够随时随地将文档附加到警报中。 想象一下您收到的下一个警报,其中包括解决当前问题的具体且详细的路径。 使用Transmogrifier,只需单击几下即可使用此功能。

While the Transmogrifier is a great solution to alert fatigue and other on-call difficulties, it’s also incredibly easy to use. The Transmogrifier dashboard is simple yet powerful, combining a minimalistic drag-and-drop interface with a feature-rich set of user controls.

尽管Transmogrifier是解决疲劳和其他通话困难的绝佳解决方案,但它也非常易于使用。 Transmogrifier仪表板简单而强大,将简约的拖放界面与功能丰富的用户控件集结合在一起。

结论 (Conclusion)

No website or application will have 100% uptime. But with the Transmogrifier it’s a lot easier to prevent downtime while solving the issues that lead to it. If your engineers are tired of being on-call, suffering from alert fatigue, or just unhappy with their current monitoring solutions, then VictorOps’ solution with the transmogrifier feature might just be the perfect tool for you.

没有网站或应用程序具有100%的正常运行时间。 但是使用Transmogrifier,可以在解决导致故障的问题时轻松地防止停机。 如果您的工程师厌倦了随时待命,遭受警报疲劳或只是对当前的监控解决方案不满意,那么VictorOps的带有透湿器功能的解决方案可能就是您的理想工具。

How do you manage alerts and avoid alert fatigue? Have you given VictorOps a go?

您如何管理警报并避免警报疲劳? 你让VictorOps去了吗?

翻译自: https://www.sitepoint.com/transmogrify-devops-alerts-avoid-downtime/

devops