严重程度分数推导

When I made my U.S. Severity Dashboard for COVID-19, I was asked by several people:

当我为COVID-19制作美国严重性仪表板时,有几个人问我:

“How did you come up with the Severity score Calculation for your COVID-19 Dashboard?”

“您是如何得出COVID-19仪表盘的严重性分数计算的?”

严重程度分数推导

I explained that it was quite an interesting process, with many attempts. After speaking to one of my close LinkedIn connections, I realized that the process I went through might make for an interesting article. Thus, here we are.

我解释说,这是一个非常有趣的过程,进行了许多尝试。 在与我的一个紧密的LinkedIn联系人交谈之后,我意识到我所经历的过程可能会引起一篇有趣的文章。 因此,我们来了。

There are many different ways to solve problems; mathematical laws, derived mathematical equations, or helpful representations of information. For instance, a FICO credit score is a helpful and meaningful representation of someone’s ability to pay back credit. Is it a law? No. Neither is a Quarterback Score for the NFL. Many indexes that we rely on as indicators, like the Dow Jones, are not proven scientific facts, but rather a concise way to represent data in a meaningful format.

解决问题的方法有很多。 数学定律,派生的数学方程式或有用的信息表示形式。 例如,FICO信用评分可以有效地表示某人的还款能力。 这是法律吗? 否。NFL的四分卫得分都不是。 道琼斯(Dow Jones)等我们作为指标所依赖的许多指数并不是经过证实的科学事实,而是一种以有意义的格式表示数据的简洁方法。

Eric Temple Bell, creator of the Bell Series, stated:

贝尔系列创作者埃里克·邓波(Eric Temple Bell)说:

“Abstractness, sometimes hurled as a reproach at mathematics, is its chief glory and its surest title to practical usefulness. It is also the source of such beauty as may spring from Mathematics”

“抽象性有时是作为数学的一种责备而来的,它是它的主要荣耀,也是它对实用性的最可靠保证。 它也是源自数学的美丽之源。”

In other words, sometimes abstract thinking can lead to practicality.

换句话说,有时抽象思维可能导致实用性。

收集的初始数据: (Initial Data Collected:)

严重程度分数推导
Building blocks available to build Severity Score Equation
可用于构建严重性分数方程的构件

COVID Dashboards were popping up all over the internet. One commonality was that there was a lack of clarity over where the virus was hitting the hardest. Did “Severity” have to do with deaths or infections? Assuming that people in cities like New York City have more contact with others than people in rural areas of the country, should the population play a role in determining how severe the virus is?

COVID仪表板在整个Internet上弹出。 一个共同点是,对于该病毒在何处受到最严重的破坏,缺乏明确性。 “严重性”与死亡或感染有关吗? 假设像纽约市这样的城市中的人们与该国农村地区的人们相比有更多的接触,那么该人群是否应该在确定该病毒的严重性方面发挥作用?

Then it hit me…

然后打我...

“What if I could come up with a way to represent the Severity of the virus in each county of the United States so that a person could easily tell where the virus was impacting health the most?”

“如果我想出一种方法来代表美国每个县的病毒严重程度,以便一个人可以轻松地知道该病毒对健康的影响最大,那该怎么办?”

I pondered this idea and decided to research what the epidemiologists have determined.

我考虑了这个想法,决定研究流行病学家的决定。

这项研究 (The Research)

For flu outbreaks, public health officials usually use a Pandemic Severity Assessment Framework (PSAF) to help determine how “bad” a pandemic will be. It uses two factors; clinical severity and virus transmissibility. The article mentions, “The PSAF is one of two assessment tools developed by CDC to guide and coordinate actions among federal, state, local, and tribal entities involved in pandemic response”

对于流感暴发,公共卫生官员通常使用大流行性严重程度评估框架(PSAF)来帮助确定大流行的严重程度 。 它使用两个因素; 临床严重程度和病毒传播能力。 文章提到:“ PSAF是CDC开发的两种评估工具之一,用于指导和协调参与大流行应对的联邦,州,地方和部落实体之间的行动”

The article then refers to the “Novel Framework for Assessing Epidemiologic Effects of Influenza Epidemics and Pandemics” (Reed et al. 2013). The novel framework details 4 methodology steps for determining the pandemic severity dependent on data availability, transmissibility, and clinical severity indicators. The article also mentions that historically, the Case Fatality Ratio (CFR) has been used.

然后,本文引用“用于评估流感流行病和大流行病的流行病学效应的新颖框架” (Reed等人,2013年)。 新框架详细介绍了根据数据可用性,可传播性和临床严重性指标确定大流行严重性的4个方法步骤。 文章还提到,历史上一直使用病死率(CFR)。

严重程度分数推导

Another well known statistic for diseases is Mortality Rate.

另一个众所周知的疾病统计数据是死亡率

严重程度分数推导
Typically represented per 1k or 100k people
通常每千人或十万人代表

Another link on the CDC website points to the Influenza Risk Assessment Tool (IRAT). The article explains “The IRAT uses 10 scientific criteria to measure the potential pandemic risk associated with each of these scenarios. These 10 criteria can be grouped into three overarching categories: “properties of the virus,” “attributes of the population,” and “ecology & epidemiology of the virus.” (Reed et al. 2019).

CDC网站上的另一个链接指向流感风险评估工具(IRAT) 。 文章解释说:“ IRAT使用10条科学标准来衡量与每种情况相关的潜在大流行风险。 这10个标准可以分为三大类:“病毒的属性”,“人群的属性”和“病毒的生态学和流行病学”。 (Reed et al.2019)。

我的评估 (My Assessment)

  • I knew that clinical severity indicators were not going to be available for some time and that research hadn’t been fully conducted in order to determine how the virus spread biologically.

    我知道临床严重性指标将在一段时间内不可用,并且还没有完全开展研究来确定病毒在生物学上的传播方式。
  • The Case Fatality Ratio leaves out population sizes and density, which is important when comparing between different counties.

    病死率忽略了人口规模和密度,这在不同县之间进行比较时很重要。
  • The Mortality rate includes population size, but does not include infections.

    死亡率包括人口数量,但不包括感染。

There did not seem to be one metric that captured it all. The overall framework and assessment tools that exist seem to focus on the nature of how the virus spreads, not necessarily “how severe each region is”.

似乎没有一个指标可以捕获全部内容。 现有的总体框架和评估工具似乎集中于病毒传播的性质,而不一定是“每个地区的严重程度”。

This being the case, I started to brainstorm:

在这种情况下,我开始集思广益:

“What do I think determines Severity and how can I incorporate common epidemiology calculations to give this equation some legitimacy?

“我认为确定严重性的因素是什么?如何结合常见的流行病学计算方法使该方程式具有一定的合法性?

公式尝试1: (Equation Attempt 1:)

None of the articles I had read mentioned growth rates. Infections and deaths have different growth rates. Therefore, I would need to insert them as separate entities. This train of thought led me to create an Infection Score and a Death Score where:

我读过的文章都没有提到增长率。 感染和死亡的增长率不同。 因此,我需要将它们作为单独的实体插入。 这种思路使我创建了感染分数死亡分数 ,其中:

严重程度分数推导

I decided that the Death Score would be multiplied by 2 to account for the fact that there would be more infections than deaths and I initially decided to model my equation after the exponential growth formula y = ab^x where a would be the total metric, b would be the growth rate, and x would be squared. Dividing by population to normalize, the first Severity score was born.

我决定将死亡分数乘以2来说明感染多于死亡这一事实,并且我最初决定根据指数增长公式y = ab ^ x对方程进行建模,其中a为总指标, b为增长率,x为平方。 按人口除以归一化,第一个严重度评分诞生了。

Equations:

方程式

严重程度分数推导
Severity Equation Attempt 1
严重性公式尝试1

公式尝试2: (Equation Attempt 2:)

Sticking with the exponential growth concept, I added Positive Test Percentage (PTP) as a coefficient for the Total Infections Growth Rate with the rationale that a county with 1000 positive results out of 1 million tests should be represented as less severe than a county with 1000 positive results out of 1001 tests.

坚持指数增长的概念,我添加了阳性测试百分比(PTP)作为总感染增长率的系数,其基本原理是,代表100万个阳性结果中有1000个阳性结果的县应比具有1000个阳性结果的县不那么严重1001个测试中的阳性结果。

With the Positive Test Percentage reducing the impact of Total Infections Growth Rate, I thought that adding the Case Fatality Ratio as a reducing coefficient for the Total Deaths Growth Rate would be an appropriate metric to include in the Severity score.

随着阳性测试百分比降低了总感染增长率的影响,我认为将病死率作为总死亡增长率的降低系数是一种适用于严重度评分的指标。

Equations:

方程式

严重程度分数推导
Severity Equation Attempt 2
严重性公式尝试2

Helper Equations:

辅助方程式:

严重程度分数推导

Problems: There were two major questions that arose with this round of equations and one problem that I didn’t yet notice.

问题 :这一轮方程式引起了两个主要问题,而一个我尚未注意到的问题。

The two questions are:

这两个问题是:

  1. What does Case Fatality Rate * Total Deaths Growth Rate mean?

    病死率 * 总死亡增长率是什么意思?

  2. What does Positive Test Percentage * Total Infections Growth Rate mean?

    阳性测试百分比 * 总感染增长率是什么意思?

I didn’t notice that both CFR and PTP are < 1 and most of the growth rates are also < 1. Multiplying them together essentially added nothing to my score when deaths and infections could both be in the thousands.

我没有注意到CFR和PTP都<1,并且大多数增长率也都<1。当死亡和感染都可能达到数千时,将它们相乘基本上并没有增加我的分数。

Gathering Feedback

收集反馈

Leaving those problems aside, I decided to gather some feedback about what I was doing. The most common questions I received were

撇开这些问题,我决定收集一些有关自己所做工作的反馈。 我收到的最常见的问题是

“Why are Total Deaths multiplied by 2?” and “Why are you squaring the Growth Rates?”

“为什么总死亡人数乘以2?” 和“为什么要平方增长率?”

My professor, Vibhanshu Abhishek, told me that the weights for each of the scores should not be determined by me since each person has their own bias for how they would compare infections and deaths in terms of Severity.

我的教授Vibhanshu Abhishek告诉我,我不应该确定每个分数的权重,因为每个人对于按照严重性比较感染和死亡的方式都有自己的偏见。

This being the case, I decided to add in two coefficients that could be adjusted by the user, a death coefficient — α and an infection coefficient — β each ranging between 0 and 1. In addition, I added in Population Density with the rationale that an increase in Population should decrease Severity (since each infection/death is now a smaller fraction of the total), but an increase in Population Density should increase Severity as people would come into contact more often. To accomplish this behavior, Population would need to be on the denominator and Population Density would need to be on the numerator. For easier reading, it can be represented as seen below.

在这种情况下,我决定添加两个可由用户调整的系数,即死亡系数-α和感染系数-β,每个系数都在0到1之间。此外,我在人口密度中添加了以下理由: 人口的增加应适当减少严重 (因为每个感染/死亡现总数的小部分),但增加了人口密度应该增加严重性 ,因为人们会更经常接触到。 要实现此行为,“ 人口”必须位于分母上,“ 人口密度”必须位于分子上。 为了便于阅读,可以将其表示如下。

Equation Attempt 3:

公式尝试3:

Incorporating this feedback, the equations resulted in:

结合此反馈,方程式得出:

Equations:

方程式

严重程度分数推导
Severity Equation Attempt 3
严重性方程式尝试3

Problems: I did not realize the problem with the denominator until later, but

问题 :直到后来我才意识到分母有问题,但是

严重程度分数推导

Population is cancelled out and the denominator is left equal to Area.

人口被抵消,分母等于Area。

仪表盘创建 (Dashboard Creation)

With an equation riddled with questions and problems that I did not initially realize, I went ahead and created the first iteration of the U.S Severity Dashboard.

由于方程式充满了我最初没有意识到的问题,我继续前进,并创建了美国严重性仪表板的第一次迭代。

严重程度分数推导
U.S. Severity Dashboard: COVID-19 Landing page — First Iteration — (4/12/2020)
美国严重性仪表板:COVID-19登陆页面-首次迭代-(4/12/2020)

An entire view of the country based on the Severity with rankings for each state and county. I was very pleased with the result. My professor told me that I should probably take a log of the equation in order to normalize the Severity as the distribution was highly skewed in its current state.

基于严重性的整个国家视图,以及每个州和县的排名。 我对结果感到非常满意。 我的教授告诉我,为了使严重性归一化,我可能应该对等式进行对数,因为当前状态下的分布高度偏斜。

Taking the log of my Severity score, I realized that I was breaking the first law of logarithms in one of my representations.

拿我的严重性得分的对数,我意识到自己正在打破我的一种表示形式的对数第一定律。

The first law of logarithms states:

对数的第一定律指出:

log A + log B = log AB

日志A +日志B = 登录AB

This meant that:

这意味着:

log(Infection Score) + log(Death Score) ≠ log (Infection Score + Death Score)

log(感染分数)+ log(死亡分数)≠log(感染分数+死亡分数)

This posed a new problem because the Death Score and Infection Score could not be multiplied. If they were multiplied, a Death Score of 0 (no deaths) would result in a Severity score of 0, even if that state or county had millions of infections. Therefore, I needed the Infection Score to be added to the Death Score.

由于无法将死亡分数感染分数相乘,因此提出了一个新问题。 如果将它们相乘,则即使该州或县有数百万感染, 死亡分数为0(无死亡)的严重性分数也将为0。 因此,我需要将感染分数添加到死亡分数中

After playing with derivations of the proper logarithmic representation, I finally discovered the issues I had mentioned earlier and decided I needed to rework the entire equation.

在玩了正确的对数表示形式的推导之后,我终于发现了我之前提到的问题,并决定我需要重新处理整个方程。

Equation Attempt 4:

公式尝试4:

Days of experimenting went by and I tried to focus on why I was creating Severity score in the first place.

经过几天的实验,我尝试着重于为什么我要首先创建严重性评分。

“The point of the Severity score is to enable comparison of health impact between counties or states”

“严重性评分的重点是能够比较县或州之间的健康影响”

I restructured the equation to account for the fact that once the growth rate is 0, meaning no new infections, the resulting Severity score would be based on the total amount of infections and deaths that the county or state had reached.

我对方程进行了重新构造,以说明以下事实:一旦增长率为0(即没有新的感染),则得出的严重性评分将基于县或州达到的感染和死亡总数。

What Changed:

发生了什么变化:

  • Fixed all problems previously mentioned

    修复了前面提到的所有问题
  • I realized I should have been multiplying growth rates by the actual number of infections or deaths

    我意识到我应该将增长率乘以实际的感染或死亡人数
  • I realized that the Positive Test Percentage was complicating the equation with no way to tie it to growth rate.

    我意识到,“ 积极测试百分比”使方程变得复杂,无法将其与增长率联系起来。

  • I realized that the Case Fatality Ratio contained both infections and deaths. This meant that as infections went up, the Infection Score would go up, but the Death Score which was previously being multiplied by the CFR would go down since CFR < 1

    我意识到病死率既包含感染也包括死亡。 这意味着随着感染的增加, 感染分数将上升,但是以前乘以CFR的死亡分数将下降,因为CFR <1

  • I realized that Population had an overwhelming impact on the Severity. I wanted Population to normalize the Severity scores, but I did not want it to be a linear effect. I decided that a square root would have a smaller effect on larger populations whereas a log would have a much steeper effect.

    我意识到人口严重性产生了压倒性的影响。 我希望“ 人口”可以对严重性评分进行标准化,但是我不希望它是线性影响。 我决定平方根对较大的种群产生较小的影响,而对数的产生则陡峭得多。

Equations:

方程式

严重程度分数推导
Severity Equation Attempt 4
严重性公式尝试4

Gathering Feedback (pt.2)

收集反馈(第2页)

Gathering input from several people, I asked a fundamental question about the Severity equation.

收集了几个人的意见,我问了一个关于严重性方程的基本问题。

“When a county or state no longer has any infections or deaths, should their Severity score be zero or the highest that it reached?”

“当一个县或州不再有任何感染或死亡时,其严重性分数应为零或达到的最高分吗?”

I got the resounding answer

我得到了响亮的答案

“Of course the Severity Score should be zero if there are no more infections or deaths, that’s when you would say there is NO SEVERITY”

“当然,如果没有更多的感染或死亡,严重度分数应该为零,这就是当您说没有严重度的时候”

Thinking about this answer, I realized that made perfect sense. In its current state, the equation would never equal zero since Total Deaths and Total Infections would never equal zero. This meant a small adjustment needed to be made.

考虑这个答案,我意识到这是很合理的。 在当前状态下,该方程永远不会等于零,因为总死亡人数总感染数永远不会等于零。 这意味着需要进行一些小的调整。

Equation Attempt 5:

公式尝试5:

I decided that if I calculated weekly averages of the last 7 days for both infections and deaths, then if a county or state stopped having new infections or deaths, their Severity scores would be zero. I also had to change the growth rates to be based on the last 7 days as well.

我决定,如果我计算感染和死亡的最近7天的每周平均值,那么,如果某个县或州停止感染或死亡,其严重性得分将为零。 我还必须将增长率也设置为基于过去7天。

Final Equations:

最终方程式

严重程度分数推导
Severity Equation Attempt 5
严重性公式尝试5

This representation felt like the perfect balance of what I wanted to achieve in this equation.

这种表示感觉就像我想要在此等式中实现的完美平衡。

  • An inclusive equation for both infections and deaths normalized by population.

    人口归一化的感染和死亡方程。
  • A weekly snapshot of Severity allowing a score of 0 if no further infections or deaths ensue.

    如果没有进一步的感染或死亡,则每周进行一次严重性快照,其评分为0。
  • The inclusion of growth rates to reveal the directionality of trending cases.

    包括增长率以揭示趋势案例的方向性。
  • A user adjustable parameter to allow for differences in weight perception between infections and deaths.

    用户可调整的参数,以允许感染和死亡之间体重感知的差异。

Exactly 1 month after starting the Severity Dashboard, I now had a completed equation and an updated dashboard to show for it.

在启动Severity Dashboard大约1个月后,我现在有了一个完整的公式和一个更新的Dashboard来显示它。

Media Coverage

媒体报道

Two weeks later, my school at UCI wrote an article about the project. https://merage.uci.edu/news/2020/05/matthew-littman-msba-20-creates-covid-19-dashboard.html

两周后,我在UCI的学校写了一篇有关该项目的文章。 https://merage.uci.edu/news/2020/05/matthew-littman-msba-20-creates-covid-19-dashboard.html

Two months later, the Daily Pilot (LA Times) picked up the story.

两个月后,《每日飞行员》(LA Times)报道了这个故事。

https://www.latimes.com/socal/daily-pilot/news/story/2020-07-15/uc-irvine-graduate-is-the-architect-of-detailed-covid-19-dashboard

https://www.latimes.com/socal/daily-pilot/news/story/2020-07-15/uc-irvine-graduate-is-the-architect-of-detailed-covid-19-dashboard

严重程度分数推导
U.S. Severity Dashboard: COVID-19 Landing page (5/12/2020)
美国严重性仪表板:COVID-19登陆页面(5/12/2020)

What Does the Finished Product Look Like?

成品看起来像什么?

After several months of adding tooltips, updates, new features, user experience, and quality checks, the U.S. Severity Dashboard: COVID-19 in its present form is pictured below. Click on the link to check it out and thank you for reading!

在添加工具提示,更新,新功能,用户体验和质量检查几个月后,现将美国Severity仪表板:目前形式的COVID-19如下图所示。 点击链接查看并感谢您的阅读!

严重程度分数推导
U.S. Severity Dashboard: COVID-19 Landing Page on Tableau Public (9/09/2020)
美国严重性仪表板:Tableau Public的COVID-19登陆页面(9/09/2020)

Sources:

资料来源:

“Influenza Risk Assessment Tool (IRAT).” 2019. October 10, 2019. https://www.cdc.gov/flu/pandemic-resources/national-strategy/risk-assessment.htm.

“流感风险评估工具(IRAT)。” 2019.2019年10月10日 .https: //www.cdc.gov/flu/pandemic-resources/national-strategy/risk-assesssment.htm

Reed, Carrie, Matthew Biggerstaff, Lyn Finelli, Lisa M. Koonin, Denise Beauvais, Amra Uzicanin, Andrew Plummer, Joe Bresee, Stephen C. Redd, and Daniel B. Jernigan. n.d. “Novel Framework for Assessing Epidemiologic Effects of Influenza Epidemics and Pandemics — Volume 19, Number 1 — January 2013 — Emerging Infectious Diseases Journal — CDC.” Accessed September 9, 2020. https://doi.org/10.3201/eid1901.120124.

里德(Reed),嘉莉(Carrie),马修·比格斯塔夫(Matthew Biggerstaff),林恩·菲涅利(Lyn Finelli),丽莎·M·库宁(Lisa M. nd“评估流感流行病和大流行病流行病学效应的新框架,第19卷,第1期-2013年1月-新兴传染病杂志-CDC。” 于2020年9月9日访问 .https://doi.org/ 10.3201 / eid1901.120124

翻译自: https://towardsdatascience.com/severity-score-derivation-c5e63f9ae046