如何在基于某些规则的CSV解析之后连接字符串 - 逐行

问题描述：

我正在使用univocity解析器读取CSV列表 - https://www.univocity.com/pages/parsers-tutorial。下面是test.csv怎么看起来像如何在基于某些规则的CSV解析之后连接字符串 - 逐行

Active;3189;Active on this date 2015-03-15-17.03.06.000000 

Catalog;3189;This is for date 2015-04-21-11.04.11.000000 

Master;3190;It happens on this date 2016-04-22-09.04.27.000000 

InActive;3190;Inactive on this date 2016-04-23-09.04.46.000000

下面的代码做一个解析 -

List<String[]> allRows = parser.parseAll(new FileReader("E:/test.csv"));

我如何解析后的比较行逐个并连接基于第2列独特

Ø/p

为3189点的记录 - 串x = Active on this date 2016-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000

为3190的记录串x = It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000

我可以想一些肮脏的方法（不是一个好的设计！）：你可以为'Active'和'Inactive'值创建两个不同的列表，并根据'id'（比如3189或3190）进行比较。如果比较匹配，则连接字符串值。 – procrastinator

赞赏你的回应。第一列是动态的，它可以是除主动或非主动以外的任何字符串。我们必须在第二列而不是第一列值上作出决定。更新问题 – Sks

答

这是你必须要更加小心，可能会出现例外的例子，所以你可以做这样的事情：

String pattern = "^(Active|Inactive);([^;]*);(.*)$"; 
Pattern r = Pattern.compile(pattern); 
for (String[] row : allRows) { 
    if (row[0].matches(pattern)) { 
     Matcher m = r.matcher(row[0]); 
     if (m.find()) { 
      Record record = records.get(m.group(2)) == null ? new Record() : records.get(m.group(2)); 
      record.setId(m.group(2)); 
      if (m.group(1).equals("Active")) { 
       record.setActiveComment(m.group(3)); 
      } else if (m.group(1).equals("Inactive")) { 
       record.setInactiveComment(m.group(3)); 
      } 
      records.put(record.getId(), record); 
     } else { 
      System.out.println("NO MATCH"); 
     } 
    } 
} 

for (Entry<String, Record> rec : records.entrySet()) { 
    System.out.println(rec.getValue().getActiveComment() + " and " + rec.getValue().getInactiveComment()); 
}

和类实录：

public class Record { 

    private String id; 

    private String activeComment; 

    private String inactiveComment; 

    //add setters getters 

    //hashcode equals and toString. 

}

hashcode和等于只比较ID。

赞赏您的回应。第一列是动态的，它可以是除主动或非主动以外的任何字符串。我们必须在第二列而不是第一列值上作出决定。 – Sks

更新问题以消除任何混淆。 – Sks

没有混淆！您可以根据需要编辑发布的代码。 – ddarellis

答

我尝试了一些方法，以某种方式解决您的问题。但我不确定它是否是一个好的设计。您可以尝试添加以下代码到你的方法：

for (int i = 0; i < allRows.size(); i++) { 
       if (allRows.get(i).length < 2) 
        continue; 
       for (int j = i + 1; j < allRows.size(); j++) { 
        if (allRows.get(j).length < 2) 
         continue; 
        if (allRows.get(i)[1].equals(allRows.get(j)[1])) // Comparing the second column with other objects 
        { 
         System.out.println("for " + allRows.get(i)[1] + " records- String X=" + allRows.get(i)[2] + " and " + allRows.get(j)[2]); 
         // Say if you have more than two occurences to 3189 then it prints two times this line. 
        } 
       } 
      }

输出：

for 3189 records- String X=Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000 
for 3190 records- String X=It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000

答

我希望我得到了你的要求权。只需使用一个地图存储了“关键”的价值观，当你找到一个预先存在的值将字符串：

public static void main(String... args) { 
    CsvParserSettings settings = new CsvParserSettings(); 
    settings.getFormat().setDelimiter(';'); 

    //looks like you are not interested in the first column. 
    //select the columns you actually need - faster and ensures all rows will come out with 2 columns 
    settings.selectIndexes(1, 2); 

    CsvParser parser = new CsvParser(settings); 

    //linked hashmap to keep the original order if that's important 
    Map<String, String[]> rows = new LinkedHashMap<String, String[]>(); 
    for (String[] row : parser.iterate(new File("E:/test.csv"))) { 

     String key = row[0]; 
     String[] existing = rows.get(key); 
     if (existing == null) { 
      rows.put(key, row); 
     } else { 
      existing[1] += " and " + row[1]; 
     } 
    } 

    //print the result 
    for(String[] row : rows.values()){ 
     System.out.println(row[0] + " - " + row[1]); 
    } 
}

这会打印出：

3189 - Active on this date 2015-03-15-17.03.06.000000 and This is for date 2015-04-21-11.04.11.000000 
3190 - It happens on this date 2016-04-22-09.04.27.000000 and Inactive on this date 2016-04-23-09.04.46.000000

希望它可以帮助

如何在基于某些规则的CSV解析之后连接字符串 - 逐行

相关推荐