窗口区域的分级平均值
问题描述:
我正在尝试在GRanges对象中的meta列的某个窗口中查找平均值。窗口区域的分级平均值
所以我有一个数据对象:如果我再
gr.RleList <- mcolAsRleList(gr.data, varname = "value")
gr.RleList
RleList of length 3
$chr1
numeric-Rle of length 249239887 with 5816335 runs
Lengths: 10467 1 1 1 ... 1 1 13 2
Values : NA 0 1 0.5 ... 0 NaN NA NaN
:
> gr.data
GRanges object with 10505026 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [10468, 10468] + | 0
[2] chr1 [10469, 10469] - | 1
[3] chr1 [10470, 10470] + | 0.5
[4] chr1 [10471, 10471] - | 1
[5] chr1 [10483, 10483] + | 0.6
和窗口对象:
gr.windows
GRanges object with 6077 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr1 [ 1, 100000] *
[2] chr1 [100001, 200000] *
[3] chr1 [200001, 300000] *
[4] chr1 [300001, 400000] *
[5] chr1 [400001, 500000] *
我然后将数据对象转换为RleList使用binnedAverage,我只是得到NA值。
gr.binnedAvg <- binnedAverage(gr.windows, gr.RleList, "value")
gr.binnedAvg
GRanges object with 6077 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 100000] * | <NA>
[2] chr1 [100001, 200000] * | <NA>
[3] chr1 [200001, 300000] * | <NA>
[4] chr1 [300001, 400000] * | <NA>
[5] chr1 [400001, 500000] * | <NA>
unique(mcols(gr.binnedAvg)$value)
[1] NA
我也尝试总结这些值,结果相同。 NA列表中的NA值是否会让我感到困惑,或者我的问题出现了什么问题?
小例子:
library(BSgenome.Hsapiens.UCSC.hg19)
library(data.table)
gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000,cut.last.tile.in.chrom=TRUE)
gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value = c(20, 10))
gr.data.RleList <- mcolAsRleList(gr.data, varname = "value")
seqlevels(gr.windows, force=TRUE) <- names(gr.data.RleList)
gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.RleList, "value")
gr.data.binnedAvg
GRanges object with 4925 ranges and 1 metadata column:
seqnames ranges strand | value
<Rle> <IRanges> <Rle> | <numeric>
[1] chr1 [ 1, 100000] * | <NA>
[2] chr1 [100001, 200000] * | 0
[3] chr1 [200001, 300000] * | 0
[4] chr1 [300001, 400000] * | 0
非常感谢您的帮助!
答
使用“小例子”,以下工作:
library(BSgenome.Hsapiens.UCSC.hg19)
library(GenomicRanges)
gr.windows <- tileGenome(seqinfo(Hsapiens), tilewidth=100000, cut.last.tile.in.chrom=TRUE)
gr.data <- GRanges(c("chr1", "chr2"), IRanges(c(10, 50), c(15, 55)), value=c(20, 10))
gr.data.cov <- GenomicRanges::coverage(gr.data, weight="value")
seqlevels(gr.windows, pruning.mode="coarse") <- names(gr.data.cov)
gr.data.binnedAvg <- binnedAverage(gr.windows, gr.data.cov, "value")
反应迟缓,但可能有助于将来参照。
您可以请发布一个最小可重现的例子。 http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – emilliman5
对不起,我忘了。所以我现在附上它。我猜这是因为窗口中缺少值?但我无法弄清楚如何避免这种情况。 –
'mcolAsRleList'和'binnedAverage'不是您加载的库的一部分,它们也不在'GenomicRanges'包中 – emilliman5