Compare strings for equality - c#

I want to compare a collection of strings and return the the equal parts until a not equal part occurs. (and remove traling whitespace).
example:
List<string> strList = new List<string>
{
"string xyz stop",
"string abc stop",
"string qrt stop"
};
string result = GetEqualName(strList); // This should return "string"
I made the following method that works
string GetEqualName(IEnumerable<string> strList)
{
string outString = "";
bool firstTime = true;
foreach (var subString in strList)
{
if (firstTime)
{
outString = subString;
firstTime = false;
}
else
{
string stringBuilder = "";
for (int i = 0; i < outString.Count(); i++)
{
if (outString[i] == subString[i])
stringBuilder = stringBuilder + outString[i];
else
break;
}
outString = stringBuilder;
}
}
outString = outString.TrimEnd(' '); // Remove traling whitespace
return outString;
}
I just feel that this is something that can be done in a few lines and I am overdoing it. Do you guys have any suggestions?

You can Zip two strings together, take the pairs that are equal, and then create a string of those characters.
public static string LargestCommonPrefix(string first, string second)
{
return new string(first.Zip(second, Tuple.Create)
.TakeWhile(pair => pair.Item1 == pair.Item2)
.Select(pair => pair.Item1)
.ToArray());
}
Once you've solved the problem for the case of combining two strings, you can easily apply it to a sequence of strings:
public static string LargestCommonPrefix(IEnumerable<string> strings)
{
return strings.Aggregate(LargestCommonPrefix);
}

This little function does basically the same as your version, but shorter.
string GetEqualName(IEnumerable<string> strList)
{
int limit = strList.Min(s => s.Length);
int i = 0;
for (; i < limit; i++)
{
if (strList.Select(s => s.Substring(0,i+1)).Distinct().Count() > 1)
{
break;
}
}
return strList.First().Substring(0, i).Trim();
}

Here's a different method which does what you want. I looks for the longest common substring from left to right using a HashSet<string>:
string GetCommonStartsWith(IEnumerable<string> strList, StringComparer comparer = null)
{
if(!strList.Any() || strList.Any(str => string.IsNullOrEmpty(str)))
return null;
if(!strList.Skip(1).Any())
return strList.First(); // only one
if(comparer == null) comparer = StringComparer.CurrentCulture;
int commonLength = strList.Min(str => str.Length);
for (int length = commonLength; length > 0; length--)
{
HashSet<string> duptester = new HashSet<string>(comparer);
string first = strList.First().Substring(0, length).TrimEnd();
duptester.Add(first);
bool allEqual = strList.Skip(1)
.All(str => !duptester.Add(str.Substring(0, length).TrimEnd()));
if (allEqual)
return first;
}
return null;
}

Here's a version that uses less LINQ than some of the other answers and might possibly be more performant.
string GetEqualName(IEnumerable<string> strList)
{
StringBuilder builder = new StringBuilder();
int minLength = strList.Min(s => s.Length);
for (int i = 0; i < minLength; i++)
{
char? c = null;
foreach (var s in strList)
{
if (c == null)
c = s[i];
else if (s[i] != c)
return builder.ToString().TrimEnd();
}
builder.Append(c);
}
return builder.ToString().TrimEnd();
}

Related

How to sort descending a string list contains strings with alphabets & numbers in C# .net?

I have values in a string list like
AB1001_A
AB1001_B
AB1002_2
AB1002_C
AB1003_0
AB1003_
AB1003_B
AB1003_A
AB1001_0
AB1001_1
AB1001_2
AB1001_C
AB1002_B
AB1002_A
And I wanted to sort this by ascending order and the suffixes in descending order like below
AB1001_2
AB1001_1
AB1001_0
AB1001_C
AB1001_B
AB1001_A
AB1002_0
AB1002_B
AB1002_A
AB1003_0
AB1003_B
AB1003_A
AB1003_
How can I code it in C#.net?
It is quite strange sorting, but if you really need it, try something like this:
List<string> lItemsOfYourValues = new List<string>() {"AB1001_A","AB1001_B","AB1001_0" /*and next your values*/};
List<Tuple<string,string,string>> lItemsOfYourProcessedValues = new List<Tuple<string,string,string>>();
string[] arrSplitedValue;
for(int i = 0; i < lItemsOfYourValues.Count; i++)
{
arrSplitedValue = lItemsOfYourValues[i].Split("_");
lItemsOfYourProcessedValues.add(new Tuple<string,string,string>(lItemsOfYourValues[i], arrSplitedValue[0], arrSplitedValue[1]));
}
List<string> lSortedValues = lItemsOfYourProcessedValues.OrderBy(o => o.Item2).ThenByDescending(o => o.Item3).Select(o => o.Item1).ToList();
It looks like you have an error in your expected results, since AB1002_2 is in the input but not in the expected results.
Assuming that's just an error, and further assuming that the suffixes are limited to a single character or digit, you can solve the sorting by writing a special comparer like so:
static int compare(string x, string y)
{
var xParts = x.Split('_', StringSplitOptions.RemoveEmptyEntries);
var yParts = y.Split('_', StringSplitOptions.RemoveEmptyEntries);
if (xParts.Length != yParts.Length)
return yParts.Length - xParts.Length; // No suffix goes after suffix.
if (xParts.Length == 0) // Should never happen.
return 0;
int comp = string.Compare(xParts[0], yParts[0], StringComparison.Ordinal);
if (comp != 0 || xParts.Length == 1)
return comp;
if (char.IsDigit(xParts[1][0]) && !char.IsDigit(yParts[1][0]))
return -1; // Digits go before non-digit.
if (!char.IsDigit(xParts[1][0]) && char.IsDigit(yParts[1][0]))
return 1; // Digits go before non-digit.
return string.Compare(yParts[1], xParts[1], StringComparison.Ordinal);
}
Which you can then use to sort a string list, array or IEnumerable<string>, like so:
using System;
using System.Collections.Generic;
using System.Linq;
namespace Demo
{
static class Program
{
static void Main()
{
var strings = new []
{
"AB1001_A",
"AB1001_B",
"AB1002_2",
"AB1002_C",
"AB1003_0",
"AB1003_",
"AB1003_B",
"AB1003_A",
"AB1001_0",
"AB1001_1",
"AB1001_2",
"AB1001_C",
"AB1002_B",
"AB1002_A",
};
static int compare(string x, string y)
{
var xParts = x.Split('_', StringSplitOptions.RemoveEmptyEntries);
var yParts = y.Split('_', StringSplitOptions.RemoveEmptyEntries);
if (xParts.Length != yParts.Length)
return yParts.Length - xParts.Length;
if (xParts.Length == 0)
return 0;
int comp = string.Compare(xParts[0], yParts[0], StringComparison.Ordinal);
if (comp != 0 || xParts.Length == 1)
return comp;
if (char.IsDigit(xParts[1][0]) && !char.IsDigit(yParts[1][0]))
return -1; // Digits go before non-digit.
if (!char.IsDigit(xParts[1][0]) && char.IsDigit(yParts[1][0]))
return 1; // Digits go before non-digit.
return string.Compare(yParts[1], xParts[1], StringComparison.Ordinal);
}
var stringList = strings.ToList();
stringList.Sort(compare);
Console.WriteLine("Sorted list:");
Console.WriteLine(string.Join("\n", stringList));
var stringArray = strings.ToArray();
Array.Sort(stringArray, compare);
Console.WriteLine("\nSorted array:");
Console.WriteLine(string.Join("\n", stringArray));
var sequence = strings.Select(element => element);
var sortedSeq = sequence.OrderBy(element => element, Comparer<string>.Create(compare));
Console.WriteLine("\nSorted sequence:");
Console.WriteLine(string.Join("\n", sortedSeq));
}
}
}
Try it online on .Net Fiddle
Finally I got the soln by this
var mystrings = new []
{
"AB1001_A",
"AB1001_B",
"AB1002_2",
"AB1002_C",
"AB1003_0",
"AB1003_",
"AB1003_B",
"AB1003_A",
"AB1001_0",
"AB1001_1",
"AB1001_2",
"AB1001_C",
"AB1002_B",
"AB1002_A",
};
mystrings.Cast<string>().OrderBy(x => PadNumbers(x));
and then PadNumbers function as like below
public static string PadNumbers(string input)
{
return Regex.Replace(input, "[0-9]+", match => match.Value.PadLeft(10, '0'));
}

Array.Sort for strings with numbers [duplicate]

This question already has answers here:
Natural Sort Order in C#
(18 answers)
Closed 8 years ago.
I have sample codes below:
List<string> test = new List<string>();
test.Add("Hello2");
test.Add("Hello1");
test.Add("Welcome2");
test.Add("World");
test.Add("Hello11");
test.Add("Hello10");
test.Add("Welcome0");
test.Add("World3");
test.Add("Hello100");
test.Add("Hello20");
test.Add("Hello3");
test.Sort();
But what happen is, the test.Sort will sort the array to:
"Hello1",
"Hello10",
"Hello100",
"Hello11",
"Hello2",
"Hello20",
"Hello3",
"Welcome0",
"Welcome2",
"World",
"World3"
Is there any way to sort them so that the string will have the correct number order as well?
(If there is no number at the end of the string, that string will always go first - after the alphabetical order)
Expected output:
"Hello1",
"Hello2",
"Hello3",
"Hello10",
"Hello11",
"Hello20",
"Hello100",
"Welcome0",
"Welcome2",
"World",
"World3"
Here is a one possible way using LINQ:
var orderedList = test
.OrderBy(x => new string(x.Where(char.IsLetter).ToArray()))
.ThenBy(x =>
{
int number;
if (int.TryParse(new string(x.Where(char.IsDigit).ToArray()), out number))
return number;
return -1;
}).ToList();
Create an IComparer<string> implementation. The advantage of doing it this way over the LINQ suggestions is you now have a class that can be passed to anything that needs to sort in this fashion rather that recreating that linq query in other locations.
This is specific to your calling a sort from a LIST. If you want to call it as Array.Sort() please see version two:
List Version:
public class AlphaNumericComparer : IComparer<string>
{
public int Compare(string lhs, string rhs)
{
if (lhs == null)
{
return 0;
}
if (rhs == null)
{
return 0;
}
var s1Length = lhs.Length;
var s2Length = rhs.Length;
var s1Marker = 0;
var s2Marker = 0;
// Walk through two the strings with two markers.
while (s1Marker < s1Length && s2Marker < s2Length)
{
var ch1 = lhs[s1Marker];
var ch2 = rhs[s2Marker];
var s1Buffer = new char[s1Length];
var loc1 = 0;
var s2Buffer = new char[s2Length];
var loc2 = 0;
// Walk through all following characters that are digits or
// characters in BOTH strings starting at the appropriate marker.
// Collect char arrays.
do
{
s1Buffer[loc1++] = ch1;
s1Marker++;
if (s1Marker < s1Length)
{
ch1 = lhs[s1Marker];
}
else
{
break;
}
} while (char.IsDigit(ch1) == char.IsDigit(s1Buffer[0]));
do
{
s2Buffer[loc2++] = ch2;
s2Marker++;
if (s2Marker < s2Length)
{
ch2 = rhs[s2Marker];
}
else
{
break;
}
} while (char.IsDigit(ch2) == char.IsDigit(s2Buffer[0]));
// If we have collected numbers, compare them numerically.
// Otherwise, if we have strings, compare them alphabetically.
string str1 = new string(s1Buffer);
string str2 = new string(s2Buffer);
int result;
if (char.IsDigit(s1Buffer[0]) && char.IsDigit(s2Buffer[0]))
{
var thisNumericChunk = int.Parse(str1);
var thatNumericChunk = int.Parse(str2);
result = thisNumericChunk.CompareTo(thatNumericChunk);
}
else
{
result = str1.CompareTo(str2);
}
if (result != 0)
{
return result;
}
}
return s1Length - s2Length;
}
}
call like so:
test.sort(new AlphaNumericComparer());
//RESULT
Hello1
Hello2
Hello3
Hello10
Hello11
Hello20
Hello100
Welcome0
Welcome2
World
World3
Array.sort version:
Create class:
public class AlphaNumericComparer : IComparer
{
public int Compare(object x, object y)
{
string s1 = x as string;
if (s1 == null)
{
return 0;
}
string s2 = y as string;
if (s2 == null)
{
return 0;
}
int len1 = s1.Length;
int len2 = s2.Length;
int marker1 = 0;
int marker2 = 0;
// Walk through two the strings with two markers.
while (marker1 < len1 && marker2 < len2)
{
var ch1 = s1[marker1];
var ch2 = s2[marker2];
// Some buffers we can build up characters in for each chunk.
var space1 = new char[len1];
var loc1 = 0;
var space2 = new char[len2];
var loc2 = 0;
// Walk through all following characters that are digits or
// characters in BOTH strings starting at the appropriate marker.
// Collect char arrays.
do
{
space1[loc1++] = ch1;
marker1++;
if (marker1 < len1)
{
ch1 = s1[marker1];
}
else
{
break;
}
} while (char.IsDigit(ch1) == char.IsDigit(space1[0]));
do
{
space2[loc2++] = ch2;
marker2++;
if (marker2 < len2)
{
ch2 = s2[marker2];
}
else
{
break;
}
} while (char.IsDigit(ch2) == char.IsDigit(space2[0]));
// If we have collected numbers, compare them numerically.
// Otherwise, if we have strings, compare them alphabetically.
var str1 = new string(space1);
var str2 = new string(space2);
var result = 0;
if (char.IsDigit(space1[0]) && char.IsDigit(space2[0]))
{
var thisNumericChunk = int.Parse(str1);
var thatNumericChunk = int.Parse(str2);
result = thisNumericChunk.CompareTo(thatNumericChunk);
}
else
{
result = str1.CompareTo(str2);
}
if (result != 0)
{
return result;
}
}
return len1 - len2;
}
}
Call like so:
This time test is an array instead of a list.
Array.sort(test, new AlphaNumericComparer())
You can use LINQ combined with regex to ensure that you use only numbers that occur at the end of the string for your secondary ordering
test
.Select(t => new{match = Regex.Match(t, #"\d+$"), val = t})
.Select(x => new{sortVal = x.match.Success
?int.Parse(x.match.Value)
:-1,
val = x.val})
.OrderBy(x => x.val)
.ThenBy(x => x.sortVal)
.Select(x => x.val)
.ToList()

How to Get Data from index of String

I'm new in c#. and I have some Question...
I have String following this code
string taxNumber = "1222233333445";
I want to get data from This string like that
string a = "1"
string b = "2222"
string c = "33333"
string d = "44"
string e = "5"
Please Tell me about Method for get Data From String.
Thank You Very Much ^^
Use the String.Substring(int index, int length) method
string a = taxNumber.Substring(0, 1);
string b = taxNumber.Substring(1, 4);
// etc
Oh well, the best I can come up with is this:
IEnumerable<string> numbers
= taxNumber.ToCharArray()
.Distinct()
.Select(c => new string(c, taxNumber.Count(t => t == c)));
foreach (string numberGroup in numbers)
{
Console.WriteLine(numberGroup);
}
Outputs:
1
2222
33333
44
5
This can also do , you dont need to fix the no of characters, you can check by changing the no of 1's , 2's etc
string taxNumber = "1222233333445";
string s1 = taxNumber.Substring(taxNumber.IndexOf("1"), ((taxNumber.Length - taxNumber.IndexOf("1")) - (taxNumber.Length - taxNumber.LastIndexOf("1"))) + 1);
string s2 = taxNumber.Substring(taxNumber.IndexOf("2"), ((taxNumber.Length - taxNumber.IndexOf("2")) - (taxNumber.Length - taxNumber.LastIndexOf("2"))) + 1);
string s3 = taxNumber.Substring(taxNumber.IndexOf("3"), ((taxNumber.Length - taxNumber.IndexOf("3")) - (taxNumber.Length - taxNumber.LastIndexOf("3"))) + 1);
You can use Char.IsDigit to identify digits out of string, and may apply further logic as follows:
for (int i=0; i< taxNumber.Length; i++)
{
if (Char.IsDigit(taxNumber[i]))
{
if(taxNumber[i-1]==taxNumber[i])
{
/*Further assign values*/
}
}
Try this Code
string taxNumber = "1222233333445";
char[] aa = taxNumber.ToCharArray();
List<string> finals = new List<string>();
string temp = string.Empty;
for (int i = 0; i < aa.Length; i++)
{
if (i == 0)
{
temp = aa[i].ToString();
}
else
{
if (aa[i].ToString() == aa[i - 1].ToString())
{
temp += aa[i];
}
else
{
if (temp != string.Empty)
{
finals.Add(temp);
temp = aa[i].ToString();
}
}
if (i == aa.Length - 1)
{
if (aa[i].ToString() != aa[i - 1].ToString())
{
temp = aa[i].ToString();
finals.Add(temp);
}
else
{
finals.Add(temp);
}
}
}
}
and check value of finals string list
you may use regex:
string strRegex = #"(1+|2+|3+|4+|5+|6+|7+|8+|9+|0+)";
RegexOptions myRegexOptions = RegexOptions.None;
Regex myRegex = new Regex(strRegex, myRegexOptions);
string strTargetString = #"1222233333445";
return myRegex.Split(strTargetString);

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

How to word by word iterate in string in C#?

I want to iterate over string as word by word.
If I have a string "incidentno and fintype or unitno", I would like to read every word one by one as "incidentno", "and", "fintype", "or", and "unitno".
foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
...
}
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
This works even if you have ".,; tabs and new lines" between your words.
Slightly twisted I know, but you could define an iterator block as an extension method on strings. e.g.
/// <summary>
/// Sweep over text
/// </summary>
/// <param name="Text"></param>
/// <returns></returns>
public static IEnumerable<string> WordList(this string Text)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
foreach (string word in "incidentno and fintype or unitno".WordList())
System.Console.WriteLine("'" + word + "'");
Which has the advantage of not creating a big array for long strings.
Use the Split method of the string class
string[] words = "incidentno and fintype or unitno".Split(" ");
This will split on spaces, so "words" will have [incidentno,and,fintype,or,unitno].
Assuming the words are always separated by a blank, you could use String.Split() to get an Array of your words.
There are multiple ways to accomplish this. Two of the most convenient methods (in my opinion) are:
Using string.Split() to create an array. I would probably use this method, because it is the most self-explanatory.
example:
string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');
Alternatively, you could use (this is what I would use):
string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that may occur due to extra whitespace and other minor problems.
Next - to process the words, you would use:
foreach (string word in seperatedWords)
{
//Do something
}
Or, you can use regular expressions to solve this problem, as Darin demonstrated (a copy is below).
example:
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
For processing, you can use similar code to the first option.
foreach (string word in words)
{
//Do something
}
Of course, there are many ways to solve this problem, but I think that these two would be the simplest to implement and maintain. I would go with the first option (using string.Split()) just because regex can sometimes become quite confusing, while a split will function correctly most of the time.
When using split, what about checking for empty entries?
string sentence = "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}
EDIT:
I can't comment so I'm posting here but this (posted above) works:
foreach (string word in "incidentno and fintype or unitno".Split(' '))
{
...
}
My understanding of foreach is that it first does a GetEnumerator() and the calles .MoveNext until false is returned. So the .Split won't be re-evaluated on each iteration
public static string[] MyTest(string inword, string regstr)
{
var regex = new Regex(regstr);
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase);
return words;
}
? MyTest("incidentno, and .fintype- or; :unitno",#"[^\w+]")
[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"
I'd like to add some information to JDunkerley's awnser.
You can easily make this method more reliable if you give a string or char parameter to search for.
public static IEnumerable<string> WordList(this string Text,string Word)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
public static IEnumerable<string> WordList(this string Text, char c)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
I write a string processor class.You can use it.
Example:
metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();
Class:
public static class StringProcessor
{
private static List<String> PrepositionList;
public static string ToNormalString(this string strText)
{
if (String.IsNullOrEmpty(strText)) return String.Empty;
char chNormalKaf = (char)1603;
char chNormalYah = (char)1610;
char chNonNormalKaf = (char)1705;
char chNonNormalYah = (char)1740;
string result = strText.Replace(chNonNormalKaf, chNormalKaf);
result = result.Replace(chNonNormalYah, chNormalYah);
return result;
}
public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
List<String> blackListWords = null,
int minimumWordLength = 3,
char splitor = ' ',
bool perWordIsLowerCase = true)
{
string[] btArray = bodyText.ToNormalString().Split(splitor);
long numberOfWords = btArray.LongLength;
Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
foreach (string word in btArray)
{
if (word != null)
{
string lowerWord = word;
if (perWordIsLowerCase)
lowerWord = word.ToLower();
var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
.Replace("?", "").Replace("!", "").Replace(",", "")
.Replace("<br>", "").Replace(":", "").Replace(";", "")
.Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
{
if (wordsDic.ContainsKey(normalWord))
{
var cnt = wordsDic[normalWord];
wordsDic[normalWord] = ++cnt;
}
else
{
wordsDic.Add(normalWord, 1);
}
}
}
}
List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
return keywords;
}
public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
{
List<KeyValuePair<String, Int32>> result = null;
if (isBasedOnFrequency)
result = list.OrderByDescending(q => q.Value).ToList();
else
result = list.OrderByDescending(q => q.Key).ToList();
return result;
}
public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
{
List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
return result;
}
public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
{
List<String> result = new List<String>();
foreach (var item in list)
{
result.Add(item.Key);
}
return result;
}
public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
{
List<Int32> result = new List<Int32>();
foreach (var item in list)
{
result.Add(item.Value);
}
return result;
}
public static String AsString<T>(this List<T> list, string seprator = ", ")
{
String result = string.Empty;
foreach (var item in list)
{
result += string.Format("{0}{1}", item, seprator);
}
return result;
}
private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
{
bool result = false;
if (blackListWords == null) return false;
foreach (var w in blackListWords)
{
if (w.ToNormalString().Equals(word))
{
result = true;
break;
}
}
return result;
}
}

Categories