与多个批次批次
我有一个CSV文件,需要对这些记录进行排序,然后将其分组为任意大小的批次(例如,每批次最多可记录300条记录)。每个批次的记录可能少于300个,因为每个批次的内容必须是同质的(基于不同列的内容)。与多个批次批次
我的LINQ声明,对batching with LINQ启发这样的回答,看起来是这样的:
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID).
Select((data, index) => new {
Record = new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]
)
,
Index = index}
).GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index/100});
的 “/ 100” 给我的任意桶大小。 groupby的其他元素旨在实现批次之间的均一性。我怀疑这几乎是我想要的,但它给我以下编译器错误:A query body must end with a select clause or a group clause
。我明白为什么我收到错误,但总体而言,我不确定如何解决此查询。它将如何完成?
UPDATE我非常接近实现我后,有以下几点:
List<EbrRecord> input = new List<EbrRecord> {
new EbrRecord {Name = "Brent",Age = 20,ID = "A"},
new EbrRecord {Name = "Amy",Age = 20,ID = "B"},
new EbrRecord {Name = "Gabe",Age = 23,ID = "B"},
new EbrRecord {Name = "Noah",Age = 27,ID = "B"},
new EbrRecord {Name = "Alex",Age = 27,ID = "B"},
new EbrRecord {Name = "Stormi",Age = 27,ID = "B"},
new EbrRecord {Name = "Roger",Age = 27,ID = "B"},
new EbrRecord {Name = "Jen",Age = 27,ID = "B"},
new EbrRecord {Name = "Adrian",Age = 28,ID = "B"},
new EbrRecord {Name = "Cory",Age = 29,ID = "C"},
new EbrRecord {Name = "Bob",Age = 29,ID = "C"},
new EbrRecord {Name = "George",Age = 29,ID = "C"},
};
//look how tiny this query is, and it is very nearly the result I want!!!
int i = 0;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = i++/3 };
foreach (var agroup in result)
{
Debug.WriteLine("ID:" + agroup.Key);
foreach (var record in agroup)
{
Debug.WriteLine(" Name:" + record.Name);
}
}
这里的窍门是绕过选择“索引位置” overlaod,通过使用闭包变量(int i
在这个案例)。输出结果如下:
ID:{ ID = A, batch = 0 }
Name:Brent
ID:{ ID = B, batch = 0 }
Name:Amy
Name:Gabe
ID:{ ID = B, batch = 1 }
Name:Noah
Name:Alex
Name:Stormi
ID:{ ID = B, batch = 2 }
Name:Roger
Name:Jen
Name:Adrian
ID:{ ID = C, batch = 3 }
Name:Cory
Name:Bob
Name:George
虽然这个答案是可以接受的,但它只是一小部分的理想结果。应该是,第一次出现“批次B”应该有3个动词(Amy,Gabe,Noah) - 不是两个(Amy,Gabe)。这是因为索引位置在每个组被识别时未被重置。任何人都知道如何重置每个组的自定义索引位置?
UPDATE 2 我想我可能找到了答案。首先,像这样的附加功能:
public static bool BatchGroup(string ID, ref string priorID)
{
if (priorID != ID)
{
priorID = ID;
return true;
}
return false;
}
其次,更新LINQ查询是这样的:
int i = 0;
string priorID = null;
var result = from q in input
orderby q.Age, q.ID
group q by new { q.ID, batch = (BatchGroup(q.ID, ref priorID) ? i=0 : ++i)/3 };
现在我想要做什么。我只是希望我不需要那个单独的功能!
orderby Location, RepName, AccountID
需要有一个SELECT子句以上后,在StriplingWarrior的回答证实。 Linq Comprehension查询必须以select或group by结尾。
遗憾的是,逻辑缺陷...假设我有第一组中的50个帐户和100个账户为100的批量大小的第二组中的原码将产生大小为50的3批,而不是2批50,100。
这里有一种方法来解决它。
IEnumerable<IGrouping<int, EbrRecord>> query = ...
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8]) into x
group x by new {Location = x.Location, RepName = x.RepName} into g
from g2 in g.Select((data, index) => new Record = data, Index = index })
.GroupBy(y => y.Index/100, y => y.Record)
select g2;
List<List<EbrRecord>> result = query.Select(g => g.ToList()).ToList();
另外请注意,使用的GroupBy批处理是很慢的,由于多余的迭代。你可以编写一个for循环,它将在有序集合上执行一遍,并且该循环运行速度比LinqToObjects快得多。
这是行不通的?
var query = (from line in EbrRecords
let EbrData = line.Split('\t')
let Location = EbrData[7]
let RepName = EbrData[4]
let AccountID = EbrData[0]
orderby Location, RepName, AccountID
select new EbrRecord(
AccountID = EbrData[0],
AccountName = EbrData[1],
MBSegment = EbrData[2],
RepName = EbrData[4],
Location = EbrData[7],
TsrLocation = EbrData[8])
).Select((data, index) => new
{
Record = data,
Index = index
})
.GroupBy(x => new {x.Record.Location, x.Record.RepName, batch = x.Index/100},
x => x.Record);
我期望的是EbrRecord列表(列表列表)。但上面给了我一个匿名类型的列表,其中只包含Location,RepName和批处理。我想知道我链接的帖子是否真的做到了我的想法或希望。 – 2011-06-02 19:26:10
@Brent:GroupBy将创建'IGrouping'的IEnumerable,它们每个都有一个带有Location,RepName和批处理的Key,但是它本身也是一个IEnumerable,它包含所选的值。如果您在更新后的答案中使用重载,您应该有一个'IEnumerable
我的智能感知和编译器拒绝让我在“选择新的”之后放置“group by”,除非我切换为点符号。 – 2011-06-02 19:33:35
“入x”很重要。 – 2011-06-02 19:33:57
修复了许多令人尴尬的错别字。现在我完成了(无论是否有效)。 – 2011-06-02 19:35:42