C＃中字数统计算法＃

问题描述：

我在寻找一个好的字数类或函数。当我从互联网复制并粘贴某些东西，并将其与我的自定义字数统计算法和MS Word进行比较时，它总是会多出10％。我认为这太多了。那么你们是否知道c＃中的准确字数算法。C＃中字数统计算法＃

您的算法是否始终过高或过低？或者它有所不同？ – Larsenal 2009-10-27 19:28:57

你只计算单词，还是你正在粘贴的标记。 – joshperry 2009-10-27 19:30:30

为什么使用MS Word字数作为准确度的基准？什么算作“单词”的细微差别可能会导致单词数量的显着差异。那里10％并不令人意外。你所看到的可能是完全准确的，但仅仅略有不同。 – 2009-10-27 19:33:17

答

String.Split通过预定义的字符。使用标点符号，空格（删除多个空格）以及您确定为“分词”的任何其他字符

您试过了什么？

我确实看到前一个用户被链接了，但这里有一些使用正则表达式或char匹配的例子。希望它能帮助，也没有人受伤X-）

String.Split Method (Char[])

Word counter in C#

C# Word Count

不适用于链接，相反，对于谷歌搜索显而易见的链接w /没有价值增加带来 – zvolkov 2009-10-27 19:51:16

好的，点了。 – 2009-10-27 19:52:50

@astander不要忘记-_Split_中的_StringSplitOptions.RemoveEmptyEntries_选项，否则，“word1，word2”或“word1？word2”将被计为3个单词！ – 2013-01-22 21:00:01

答

由于@astander建议，你可以做一个String.Split如下：

string[] a = s.Split(
    new char[] { ' ', ',', ';', '.', '!', '"', '(', ')', '?' }, 
    StringSplitOptions.RemoveEmptyEntries);

通过传入一个字符数组，您可以拆分多个分词符。删除空的条目将使您无法计算非单词。

这很好，但你也应该考虑换行符。如果你输入一个单词，按回车键，输入一个单词，按回车键，它将返回0的计数.Split（）的一个重载允许一个字符串数组，所以你可以改变这个数组为字符串和添加Environment.Newline（或“\ r \ n”和\ n“） – 2011-10-28 02:22:25

除非您的输入包含非常有限的格式，否则您可能需要更宽的网络 - 考虑卷曲和斜角的大括号，破折号（尽管这会产生误报）和其他标点符号。 – 2012-10-15 16:04:46

答

您还需要检查newlines,tabs和non-breaking spaces。我发现最好将源文本复制到StringBuilder并用空格替换所有换行符，制表符和句末字符。然后根据空格拆分字符串。

答

我刚刚在ClipFlair，在那里我需要计算出电影字幕WPM（单词每分钟）同样的问题，所以我想出了以下之一：

您可以定义此静扩展方法，然后在需要使用此扩展方法的任何类中为该静态类的名称空间添加using子句。扩展方法是使用s.WordCount（），其中s是一个字符串（一个标识符[可变/恒定]或文字）

public static int WordCount(this string s) 
{ 
    int last = s.Length-1; 

    int count = 0; 
    for (int i = 0; i <= last; i++) 
    { 
    if (char.IsLetterOrDigit(s[i]) && 
     ((i==last) || char.IsWhiteSpace(s[i+1]) || char.IsPunctuation(s[i+1]))) 
     count++; 
    } 
    return count; 
}

答

使用正则表达式来找到字（例如[\ W] +）调用而仅计算比赛

public static Regex regex = new Regex(
    "[\\w]+", 
RegexOptions.Multiline 
| RegexOptions.CultureInvariant 
| RegexOptions.Compiled 
);

regex.Match（_someString）.Count之间

答

这里是我的统计的话，亚洲的话，charaters等制成的C＃代码类的精简版这几乎是相同的作为Microsoft Word。我开发了用于计算Microsoft Word文档单词的原始代码。

using System; 
    using System.Collections.Generic; 
    using System.Linq; 
    using System.Text; 
    using System.Text.RegularExpressions; 
    namespace BL { 
    public class WordCount 
    { 

    public int NonAsianWordCount { get; set; } 
    public int AsianWordCount { get; set; } 
    public int TextLineCount { get; set; } 
    public int TotalWordCount { get; set; } 
    public int CharacterCount { get; set; } 
    public int CharacterCountWithSpaces { get; set; } 


    //public string Text { get; set; } 

    public WordCount(){} 

    ~WordCount() {} 


    public void GetCountWords(string s) 
    { 
     #region Regular Expression Collection 
     string asianExpression = @"[\u3001-\uFFFF]"; 
     string englishExpression = @"[\S]+"; 
     string LineCountExpression = @"[\r]+"; 
     #endregion 


     #region Asian Character 
     MatchCollection asiancollection = Regex.Matches(s, asianExpression); 

     AsianWordCount = asiancollection.Count; //Asian Character Count 

     s = Regex.Replace(s, asianExpression, " "); 

     #endregion 


     #region English Characters Count 
     MatchCollection collection = Regex.Matches(s, englishExpression); 
     NonAsianWordCount = collection.Count; 
     #endregion 

     #region Text Lines Count 
     MatchCollection Lines = Regex.Matches(s, LineCountExpression); 
     TextLineCount = Lines.Count; 
     #endregion 

     #region Total Character Count 

     CharacterCount = AsianWordCount; 
     CharacterCountWithSpaces = CharacterCount; 

     foreach (Match word in collection) 
     { 
      CharacterCount += word.Value.Length ; 
      CharacterCountWithSpaces += word.Value.Length + 1; 
     } 

     #endregion 

     #region Total Character Count 
     TotalWordCount = AsianWordCount + NonAsianWordCount; 
     #endregion 
    } 
} 
}

C＃中字数统计算法＃

相关推荐