如何在基于某些规则的CSV解析之后连接字符串 - 逐行
问题描述:
我正在使用univocity解析器读取CSV列表 - https://www.univocity.com/pages/parsers-tutorial。下面是test.csv怎么看起来像如何在基于某些规则的CSV解析之后连接字符串 - 逐行
Active;3189;Active on this date 2015-03-15-17.03.06.000000
Catalog;3189;This is for date 2015-04-21-11.04.11.000000
Master;3190;It happens on this date 2016-04-22-09.04.27.000000
InActive;3190;Inactive on this date 2016-04-23-09.04.46.000000
下面的代码做一个解析 -
List<String[]> allRows = parser.parseAll(new FileReader("E:/test.csv"));
我如何解析后的比较行逐个并连接基于第2列独特
Ø/p
为3189点的记录 - 串x = Active on this date 2016-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000
为3190的记录 串x = It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000
答
这是你必须要更加小心,可能会出现例外的例子,所以你可以做这样的事情:
String pattern = "^(Active|Inactive);([^;]*);(.*)$";
Pattern r = Pattern.compile(pattern);
for (String[] row : allRows) {
if (row[0].matches(pattern)) {
Matcher m = r.matcher(row[0]);
if (m.find()) {
Record record = records.get(m.group(2)) == null ? new Record() : records.get(m.group(2));
record.setId(m.group(2));
if (m.group(1).equals("Active")) {
record.setActiveComment(m.group(3));
} else if (m.group(1).equals("Inactive")) {
record.setInactiveComment(m.group(3));
}
records.put(record.getId(), record);
} else {
System.out.println("NO MATCH");
}
}
}
for (Entry<String, Record> rec : records.entrySet()) {
System.out.println(rec.getValue().getActiveComment() + " and " + rec.getValue().getInactiveComment());
}
和类实录:
public class Record {
private String id;
private String activeComment;
private String inactiveComment;
//add setters getters
//hashcode equals and toString.
}
hashcode和等于只比较ID。
答
我尝试了一些方法,以某种方式解决您的问题。但我不确定它是否是一个好的设计。您可以尝试添加以下代码到你的方法:
for (int i = 0; i < allRows.size(); i++) {
if (allRows.get(i).length < 2)
continue;
for (int j = i + 1; j < allRows.size(); j++) {
if (allRows.get(j).length < 2)
continue;
if (allRows.get(i)[1].equals(allRows.get(j)[1])) // Comparing the second column with other objects
{
System.out.println("for " + allRows.get(i)[1] + " records- String X=" + allRows.get(i)[2] + " and " + allRows.get(j)[2]);
// Say if you have more than two occurences to 3189 then it prints two times this line.
}
}
}
输出:
for 3189 records- String X=Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000
for 3190 records- String X=It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000
答
我希望我得到了你的要求权。只需使用一个地图存储了“关键”的价值观,当你找到一个预先存在的值将字符串:
public static void main(String... args) {
CsvParserSettings settings = new CsvParserSettings();
settings.getFormat().setDelimiter(';');
//looks like you are not interested in the first column.
//select the columns you actually need - faster and ensures all rows will come out with 2 columns
settings.selectIndexes(1, 2);
CsvParser parser = new CsvParser(settings);
//linked hashmap to keep the original order if that's important
Map<String, String[]> rows = new LinkedHashMap<String, String[]>();
for (String[] row : parser.iterate(new File("E:/test.csv"))) {
String key = row[0];
String[] existing = rows.get(key);
if (existing == null) {
rows.put(key, row);
} else {
existing[1] += " and " + row[1];
}
}
//print the result
for(String[] row : rows.values()){
System.out.println(row[0] + " - " + row[1]);
}
}
这会打印出:
3189 - Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000
3190 - It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000
希望它可以帮助
我可以想一些肮脏的方法(不是一个好的设计!):你可以为'Active'和'Inactive'值创建两个不同的列表,并根据'id'(比如3189或3190)进行比较。如果比较匹配,则连接字符串值。 – procrastinator
赞赏你的回应。第一列是动态的,它可以是除主动或非主动以外的任何字符串。我们必须在第二列而不是第一列值上作出决定。更新问题 – Sks