优化项目欧拉＃22

问题描述：

我刚刚解决了Project Euler #22，这个问题涉及从文件中读取约5,000行文本，并根据字符串的总和及其字母位置来确定特定名称的值。

但是，代码需要大约5-10秒才能运行，这有点烦人。优化此代码的最佳方式是什么？我目前使用扫描仪将文件读入字符串。是否有另一种更有效的方法来做到这一点？（我尝试使用一个BufferedReader，但速度更慢）

public static int P22(){ 


    String s = null; 

    try{ 
     //create a new Scanner to read file 
     Scanner in = new Scanner(new File("names.txt")); 
     while(in.hasNext()){ 
      //add the next line to the string 
      s+=in.next(); 
     } 

    }catch(Exception e){ 

    } 
    //this just filters out the quotation marks surrounding all the names 
    String r = ""; 
    for(int i = 0;i<s.length();i++){ 
     if(s.charAt(i) != '"'){ 
      r += s.charAt(i); 
     } 
    } 
    //splits the string into an array, using the commas separating each name 
    String text[] = r.split(","); 
    Arrays.sort(text); 



    int solution = 0; 
    //go through each string in the array, summing its characters 
    for(int i = 0;i<text.length;i++){ 
     int sum = 0; 
     String name = text[i]; 
     for(int j = 0;j<name.length();j++){ 
      sum += (int)name.charAt(j)-64; 
     } 
     solution += sum*(i+1); 
    } 
    return solution; 


}

答

如果你打算使用Scanner，为什么不把它用于它应该做的事情（标记化）？

Scanner in = new Scanner(new File("names.txt")).useDelimiter("[\",]+"); 
    ArrayList<String> text = new ArrayList<String>(); 
    while (in.hasNext()) { 
    text.add(in.next()); 
    } 
    Collections.sort(text);

您不需要剥去逗号引号，或分裂 - Scanner做这一切为您服务。

这段代码包括java启动时间，在我的机器上以0.625s（用户时间）执行。我怀疑它应该比你所做的要快一点。

编辑 OP问什么字符串传递给useDelimiter是。这是一个regular expression。当你剥离出要求的转义Java中包括引号字符转换成字符串，它是[",]+ - 和含义是：

将与此模式匹配

[...] character class: match any of these characters, so 
[",] match a quote or a comma 
...+ one or more occurence modifier, so 
[",]+ match one or more of quotes or commas

序列包括：

" 
, 
,,,, 
""",,,",","

确实","，我们在这里之后要做什么。

谢谢！只是一个简单的问题：你说的执行时间是在读取问题中的文件？ – 2011-12-29 20:29:57

[/和] +的语法是什么？ – 2011-12-29 20:31:36

@JackK：这是我写在那里的全部代码 - 从读取文件到排序集合。我将在答案中解释模式。 – Amadan 2011-12-29 20:31:46

答

追加字符串在一个循环中有“+”，就像你在这里做的：

/* That's actually not the problem since there is only one line. */ 
while(in.hasNext()){ 
    //add the next line to the string 
    s+=in.next(); 
}

是缓慢的，因为它创建一个新的字符串并在每次迭代中复制所有内容。尝试使用StringBuilder，

StringBuilder sb = new StringBuilder(); 
while(in.hasNext()){ 
    sb.append(in.next()); 
} 
s = sb.toString();

但是，你真的不应该读取文件内容为String，你应该创建一个String[]或直接从文件内容的ArrayList<String>，

int names = 5000; // use the correct number of lines in the file! 
String[] sa = new String[names]; 
for(int i = 0; i < names; ++i){ 
    sa[i] = in.next(); 
}

然而，经检查，事实证明，该文件不包含约5000行，而是全部在一行上，所以你的大问题实际上是

/* This one is the problem! */ 
String r = ""; 
for(int i = 0;i<s.length();i++){ 
    if(s.charAt(i) != '"'){ 
     r += s.charAt(i); 
    } 
}

为此使用StringBuilder。或者，让您的Scanner直到下一个'，'直接读入ArrayList<String>，并从ArrayList中的每个单一名称中删除双引号。

我希望它是 “R + =的charAt（）” 行缓慢。 – 2011-12-29 20:04:15

感谢您的及时响应，但我第一次尝试使用了StringBuilder，并且实际上增加了执行时间！ – 2011-12-29 20:07:34

@kevincline是的，我一开始并没有看到这一点，以面值计算行数。 – 2011-12-29 20:19:42

答

我建议你用profiler运行你的代码。它可以让你理解，什么部分真的很慢（IO /计算等）。如果IO速度较慢，请检查NIO：http://docs.oracle.com/javase/1.4.2/docs/guide/nio/。

答

5+秒对于这个问题来说相当慢。我的整个Web应用程序（600个Java类）在四秒内编译。问题的根源可能是为文件中的每个字符分配一个新的字符串：r += s.charAt(i)

要真正加快速度，您根本不应该使用字符串。获取文件的大小，并宣读了整个事情到一个字节数组中的单个I/O调用：

public class Names { 
    private byte[] data; 
    private class Name implements Comparable<Name> { 
    private int start; // index into data 
    private int length; 
    public Name(int start, int length) { ...; } 
    public int compareTo(Name arg0) { 
     ... 
    } 
    public int score() 
    } 
    public Names(File file) throws Exception { 
    data = new byte[(int) file.length()]; 
    new FileInputStream(file).read(data, 0, data.length); 
    } 
    public int score() { 
    SortedSet<Name> names = new ... 
    for (int i = 0; i < data.length; ++i) { 
     // find limits of each name, add to the set 
    } 
    // Calculate total score... 
    } 
}

我建议使用内存映射文件 - 这会更快。对大多数排序后的集合使用quicksort（Arrays.sort）可能会导致n^2复杂性。我相信s + ='..'会优化使用StringBuilder internelly，因此没有看过javap。 – jdevelop 2011-12-29 21:29:22

答

根据应用的不同，StreamTokenizer往往是可测量的速度比Scanner。比较两者的实例可以发现here和here。

附录：Euler Project 22包括导出遇到的每个标记中的字符的校验和。自定义analyzer可以将识别和计算结合起来，而不是将令牌两次遍历。结果将被存储在SortedMap<String, Integer>中，以便稍后重复查找总计。

谢谢！我看透了javadocs;哪种方法相当于使用扫描仪的直径？它是quotechar（）？ – 2011-12-29 23:20:30

我没有在引用的文本上测试它，但'quoteChar（）'看起来正确。 – trashgod 2011-12-30 00:50:43

答

可能会觉得有趣的一种钝化解决方案。

long start = System.nanoTime(); 
long sum = 0; 
int runs = 10000; 
for (int r = 0; r < runs; r++) { 
    FileChannel channel = new FileInputStream("names.txt").getChannel(); 
    ByteBuffer bb = channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size()); 
    TLongArrayList values = new TLongArrayList(); 

    long wordId = 0; 
    int shift = 63; 
    while (true) { 
     int b = bb.remaining() < 1 ? ',' : bb.get(); 
     if (b == ',') { 
      values.add(wordId); 
      wordId = 0; 
      shift = 63; 
      if (bb.remaining() < 1) break; 

     } else if (b >= 'A' && b <= 'Z') { 
      shift -= 5; 
      long n = b - 'A' + 1; 
      wordId = (wordId | (n << shift)) + n; 

     } else if (b != '"') { 
      throw new AssertionError("Unexpected ch '" + (char) b + "'"); 
     } 
    } 

    values.sort(); 

    sum = 0; 
    for (int i = 0; i < values.size(); i++) { 
     long wordSum = values.get(i) & ((1 << 8) - 1); 
     sum += (i + 1) * wordSum; 
    } 
} 
long time = System.nanoTime() - start; 
System.out.printf("%d took %.3f ms%n", sum, time/1e6);

打印

XXXXXXX took 27.817 ms.

优化项目欧拉＃22

相关推荐