I'd like to divide the strings to two array - c#

I have a string data like "aaa.bbb.ccc" or "aaa.bbb.ccc.ddd".
If I split this like "aaa.bbb.ccc".split("."), it becomes "aaa", "bbb" and "ccc".
I want to divide this to just two strings like "aaa", "bbb.ccc".
I think I can do this by firstly dividing everything and rejoining it but it's not smart way.
Is there any way to set this more smoothly?

You can set the number of splitted parts:
string myString = "aaa.bbb.ccc";
int parts = 2;
string[] myParts = myString.Split(new string[] {"."}, parts, StringSplitOptions.None);

If you want to split on the first dot character, use Substring:
String value = "aaa.bbb.ccc";
Int32 firstDotIdx = value.IndexOf( '.' );
if( firstDotIdx > -1 ) {
return new String[] {
value.Substring( 0, firstDotIdx ),
value.Substring( firstDotIdx + 1 );
}
} else {
return new String[] {
value,
"";
}
}
If you have "aaa.bbb" then it will return { "aaa", "bbb" }.
If you have "aaa.bbb.ccc" then it will return { "aaa", "bbb.ccc" }.
If you have ".aaa.bbb" then it will return { "", "aaa.bbb" }.
If you have "aaa" then it will return { "aaa", "" }.

I have a bunch of extension methods for these kind of string manipulations. An example usage for your task would look like this.
var str = "aaa.bbb.ccc.ddd";
var result = new[] { str.TakeUntil("."), str.TakeFrom(".") };
And the extension pack looks like this. The functions` results will not include the search string itself, and if the search string is not present in the input, they will return the whole input as result (this is a convention which suites my needs for this edge case, could be changed).
public static string TakeUntil(this string s, string search)
{
if (string.IsNullOrEmpty(s)) return s;
var pos = s.IndexOf(search, StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
return s.Substring(0, pos);
return s;
}
public static string TakeUntilLast(this string s, string search)
{
if (string.IsNullOrEmpty(s)) return s;
var pos = s.LastIndexOf(search, StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
return s.Substring(0, pos);
return s;
}
public static string TakeFrom(this string s, string search)
{
if (string.IsNullOrEmpty(s)) return s;
var pos = s.IndexOf(search, StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
return s.Substring(pos + search.Length);
return s;
}
public static string TakeFromLast(this string s, string search)
{
if (string.IsNullOrEmpty(s)) return s;
var pos = s.LastIndexOf(search, StringComparison.OrdinalIgnoreCase);
if (pos >= 0)
return s.Substring(pos + search.Length);
return s;
}

Related

If a string contains a double character change it to another string C#

I created a Person Model:
Model Person = new Model();
Person.LastName = values[0];
[LastName is a string]
I would like to replace the values [0] which is "Anna" with another string value like "Frank" if it contains a double character, in this case "if Anna contains a double character, change the value
with another string".
how to do?
Write a helper function to test for consecutive equal characters:
private static bool HasDoubleCharacter(string s)
{
char? previous = null;
foreach (char ch in s) {
if (ch == previous) {
return true;
}
previous = ch;
}
return false;
}
The you can write
Model Person = new Model();
string name = values[0];
if (HasDoubleCharacter(name)) {
name = "Frank";
}
Person.LastName = name;
You could also create a new array containing only names with no double character and use that one instead:
Model Person = new Model();
string[] names = values
.Where(v => !HasDoubleCharacter(v))
.ToArray();
if (names.Length > 0) {
Person.LastName = names[0];
}
Same idea as in Michał Turczyn post - a check for double character, but implemented via regular expressions:
using System.Text.RegularExpressions;
...
public static bool HasDoubleChar(string value) =>
value != null && Regex.IsMatch(value, #"(.)\1");
pattern (.)\1 explained:
(.) - group which contains an arbitrary character
\1 - the group #1 repeated again
Note, that we have some flexibility here: (\p{L})\1 will check for double letters only, not, say, spaces; (?i)(.)\1 will compare ignoring case (Aaron will be matched) etc.
then as usual
Person.LastName = HasDoubleChar(values[0])
? "Frank"
: values[0];
This could be achieved with easy extensions funciton for string:
public static class StringExtensions
{
public static bool HasDoubleChar(this string #this)
{
ArgumentNullException.ThrowIfNull(#this);
for (int i = 1; i < #this.Length; i++)
{
if (#this[i] == #this[i - 1])
{
return true;
}
}
return false;
}
}
and then you could use it:
Person.LastName = values[0].HasDoubleChar()
? "Frank"
: values[0];
public static string Do(string source, string replacement)
{
char c = source.ToLower()[0]; // do you care about case? E.g., Aaron.
for(int i = 1; i < source.Length; i++)
{
if (source.ToLower()[i] == c)
{
return replacement;
}
}
return source;
}
public static void Main()
{
string thing = "Aaron";
thing = Do(thing, "Frank");
}

Convert String To camelCase from TitleCase C#

I have a string that I converted to a TextInfo.ToTitleCase and removed the underscores and joined the string together. Now I need to change the first and only the first character in the string to lower case and for some reason, I can not figure out how to accomplish it.
class Program
{
static void Main(string[] args)
{
string functionName = "zebulans_nightmare";
TextInfo txtInfo = new CultureInfo("en-us", false).TextInfo;
functionName = txtInfo.ToTitleCase(functionName).Replace('_', ' ').Replace(" ", String.Empty);
Console.Out.WriteLine(functionName);
Console.ReadLine();
}
}
Results: ZebulansNightmare
Desired Results: zebulansNightmare
UPDATE:
class Program
{
static void Main(string[] args)
{
string functionName = "zebulans_nightmare";
TextInfo txtInfo = new CultureInfo("en-us", false).TextInfo;
functionName = txtInfo.ToTitleCase(functionName).Replace("_", string.Empty).Replace(" ", string.Empty);
functionName = $"{functionName.First().ToString().ToLowerInvariant()}{functionName.Substring(1)}";
Console.Out.WriteLine(functionName);
Console.ReadLine();
}
}
Produces the desired output.
You just need to lower the first char in the array. See this answer
Char.ToLowerInvariant(name[0]) + name.Substring(1)
As a side note, seeing as you are removing spaces you can replace the underscore with an empty string.
.Replace("_", string.Empty)
If you're using .NET Core 3 or .NET 5, you can call:
System.Text.Json.JsonNamingPolicy.CamelCase.ConvertName(someString)
Then you'll definitely get the same results as ASP.NET's own JSON serializer.
Implemented Bronumski's answer in an extension method (without replacing underscores).
public static class StringExtension
{
public static string ToCamelCase(this string str)
{
if(!string.IsNullOrEmpty(str) && str.Length > 1)
{
return char.ToLowerInvariant(str[0]) + str.Substring(1);
}
return str.ToLowerInvariant();
}
}
//Or
public static class StringExtension
{
public static string ToCamelCase(this string str) =>
string.IsNullOrEmpty(str) || str.Length < 2
? str.ToLowerInvariant()
: char.ToLowerInvariant(str[0]) + str.Substring(1);
}
and to use it:
string input = "ZebulansNightmare";
string output = input.ToCamelCase();
Here is my code, in case it is useful to anyone
// This converts to camel case
// Location_ID => locationId, and testLEFTSide => testLeftSide
static string CamelCase(string s)
{
var x = s.Replace("_", "");
if (x.Length == 0) return "null";
x = Regex.Replace(x, "([A-Z])([A-Z]+)($|[A-Z])",
m => m.Groups[1].Value + m.Groups[2].Value.ToLower() + m.Groups[3].Value);
return char.ToLower(x[0]) + x.Substring(1);
}
If you prefer Pascal-case use:
static string PascalCase(string s)
{
var x = CamelCase(s);
return char.ToUpper(x[0]) + x.Substring(1);
}
The following code works with acronyms as well. If it is the first word it converts the acronym to lower case (e.g., VATReturn to vatReturn), and otherwise leaves it as it is (e.g., ExcludedVAT to excludedVAT).
name = Regex.Replace(name, #"([A-Z])([A-Z]+|[a-z0-9_]+)($|[A-Z]\w*)",
m =>
{
return m.Groups[1].Value.ToLower() + m.Groups[2].Value.ToLower() + m.Groups[3].Value;
});
Example 01
public static string ToCamelCase(this string text)
{
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(text);
}
Example 02
public static string ToCamelCase(this string text)
{
return string.Join(" ", text
.Split()
.Select(i => char.ToUpper(i[0]) + i.Substring(1)));
}
Example 03
public static string ToCamelCase(this string text)
{
char[] a = text.ToLower().ToCharArray();
for (int i = 0; i < a.Count(); i++)
{
a[i] = i == 0 || a[i - 1] == ' ' ? char.ToUpper(a[i]) : a[i];
}
return new string(a);
}
Adapted from Leonardo's answer:
static string PascalCase(string str) {
TextInfo cultInfo = new CultureInfo("en-US", false).TextInfo;
str = Regex.Replace(str, "([A-Z]+)", " $1");
str = cultInfo.ToTitleCase(str);
str = str.Replace(" ", "");
return str;
}
Converts to PascalCase by first adding a space before any group of capitals, and then converting to title case before removing all the spaces.
Here's my code, includes lowering all upper prefixes:
public static class StringExtensions
{
public static string ToCamelCase(this string str)
{
bool hasValue = !string.IsNullOrEmpty(str);
// doesn't have a value or already a camelCased word
if (!hasValue || (hasValue && Char.IsLower(str[0])))
{
return str;
}
string finalStr = "";
int len = str.Length;
int idx = 0;
char nextChar = str[idx];
while (Char.IsUpper(nextChar))
{
finalStr += char.ToLowerInvariant(nextChar);
if (len - 1 == idx)
{
// end of string
break;
}
nextChar = str[++idx];
}
// if not end of string
if (idx != len - 1)
{
finalStr += str.Substring(idx);
}
return finalStr;
}
}
Use it like this:
string camelCasedDob = "DOB".ToCamelCase();
If you are Ok with the Newtonsoft.JSON dependency, the following string extension method will help. The advantage of this approach is the serialization will work on par with standard WebAPI model binding serialization with high accuracy.
public static class StringExtensions
{
private class CamelCasingHelper : CamelCaseNamingStrategy
{
private CamelCasingHelper(){}
private static CamelCasingHelper helper =new CamelCasingHelper();
public static string ToCamelCase(string stringToBeConverted)
{
return helper.ResolvePropertyName(stringToBeConverted);
}
}
public static string ToCamelCase(this string str)
{
return CamelCasingHelper.ToCamelCase(str);
}
}
Here is the working fiddle
https://dotnetfiddle.net/pug8pP
In .Net 6 and above
public static class CamelCaseExtension
{
public static string ToCamelCase(this string str) =>
char.ToLowerInvariant(str[0]) + str[1..];
}
public static string CamelCase(this string str)
{
TextInfo cultInfo = new CultureInfo("en-US", false).TextInfo;
str = cultInfo.ToTitleCase(str);
str = str.Replace(" ", "");
return str;
}
This should work using System.Globalization
var camelCaseFormatter = new JsonSerializerSettings();
camelCaseFormatter.ContractResolver = new CamelCasePropertyNamesContractResolver();
JsonConvert.SerializeObject(object, camelCaseFormatter));
Strings are immutable, but we can use unsafe code to make it mutable though.
The string.Copy insured that the original string stays as is.
In order for these codes to run you have to allow unsafe code in your project.
public static unsafe string ToCamelCase(this string value)
{
if (value == null || value.Length == 0)
{
return value;
}
string result = string.Copy(value);
fixed (char* chr = result)
{
char valueChar = *chr;
*chr = char.ToLowerInvariant(valueChar);
}
return result;
}
This version modifies the original string, instead of returning a modified copy. This will be annoying though and totally uncommon. So make sure the XML comments are warning users about that.
public static unsafe void ToCamelCase(this string value)
{
if (value == null || value.Length == 0)
{
return value;
}
fixed (char* chr = value)
{
char valueChar = *chr;
*chr = char.ToLowerInvariant(valueChar);
}
return value;
}
Why use unsafe code though? Short answer... It's super fast.
Here's my code which is pretty simple. My major objective was to ensure that camel-casing was compatible with what ASP.NET serializes objects to, which the above examples don't guarantee.
public static class StringExtensions
{
public static string ToCamelCase(this string name)
{
var sb = new StringBuilder();
var i = 0;
// While we encounter upper case characters (except for the last), convert to lowercase.
while (i < name.Length - 1 && char.IsUpper(name[i + 1]))
{
sb.Append(char.ToLowerInvariant(name[i]));
i++;
}
// Copy the rest of the characters as is, except if we're still on the first character - which is always lowercase.
while (i < name.Length)
{
sb.Append(i == 0 ? char.ToLowerInvariant(name[i]) : name[i]);
i++;
}
return sb.ToString();
}
}
/// <summary>
/// Gets the camel case from snake case.
/// </summary>
/// <param name="snakeCase">The snake case.</param>
/// <returns></returns>
private string GetCamelCaseFromSnakeCase(string snakeCase)
{
string camelCase = string.Empty;
if(!string.IsNullOrEmpty(snakeCase))
{
string[] words = snakeCase.Split('_');
foreach (var word in words)
{
camelCase = string.Concat(camelCase, Char.ToUpperInvariant(word[0]) + word.Substring(1));
}
// making first character small case
camelCase = Char.ToLowerInvariant(camelCase[0]) + camelCase.Substring(1);
}
return camelCase;
}
I use This method to convert the string with separated by "_" to Camel Case
public static string ToCamelCase(string? s)
{
var nameArr = s?.ToLower().Split("_");
var str = "";
foreach (var name in nameArr.Select((value, i) => new { value, i }))
{
if(name.i >= 1)
{
str += string.Concat(name.value[0].ToString().ToUpper(), name.value.AsSpan(1));
}
else
{
str += name.value ;
}
}
return str;
}
u can change the separated by "_" with any other you want.
Simple and easy in build c#
using System;
using System.Globalization;
public class SamplesTextInfo {
public static void Main() {
// Defines the string with mixed casing.
string myString = "wAr aNd pEaCe";
// Creates a TextInfo based on the "en-US" culture.
TextInfo myTI = new CultureInfo("en-US",false).TextInfo;
// Changes a string to lowercase.
Console.WriteLine( "\"{0}\" to lowercase: {1}", myString, myTI.ToLower( myString ) );
// Changes a string to uppercase.
Console.WriteLine( "\"{0}\" to uppercase: {1}", myString, myTI.ToUpper( myString ) );
// Changes a string to titlecase.
Console.WriteLine( "\"{0}\" to titlecase: {1}", myString, myTI.ToTitleCase( myString ) );
}
}
/*
This code produces the following output.
"wAr aNd pEaCe" to lowercase: war and peace
"wAr aNd pEaCe" to uppercase: WAR AND PEACE
"wAr aNd pEaCe" to titlecase: War And Peace
*/
public static class StringExtension
{
public static string ToCamelCase(this string str)
{
return string.Join(" ", str
.Split()
.Select(i => char.ToUpper(i[0]) + i.Substring(1).ToLower()));
}
}
I had the same issue with titleCase so I just created one, hope this helps this is an extension method.
public static string ToCamelCase(this string text)
{
if (string.IsNullOrEmpty(text))
return text;
var separators = new[] { '_', ' ' };
var arr = text
.Split(separators)
.Where(word => !string.IsNullOrWhiteSpace(word));
var camelCaseArr = arr
.Select((word, i) =>
{
if (i == 0)
return word.ToLower();
var characterArr = word.ToCharArray()
.Select((character, characterIndex) => characterIndex == 0
? character.ToString().ToUpper()
: character.ToString().ToLower());
return string.Join("", characterArr);
});
return string.Join("", camelCaseArr);
}

Split string and get first value only

I wonder if it's possible to use split to devide a string with several parts that are separated with a comma, like this:
title, genre, director, actor
I just want the first part, the title of each string and not the rest?
string valueStr = "title, genre, director, actor";
var vals = valueStr.Split(',')[0];
vals will give you the title
Actually, there is a better way to do it than split:
public string GetFirstFromSplit(string input, char delimiter)
{
var i = input.IndexOf(delimiter);
return i == -1 ? input : input.Substring(0, i);
}
And as extension methods:
public static string FirstFromSplit(this string source, char delimiter)
{
var i = source.IndexOf(delimiter);
return i == -1 ? source : source.Substring(0, i);
}
public static string FirstFromSplit(this string source, string delimiter)
{
var i = source.IndexOf(delimiter);
return i == -1 ? source : source.Substring(0, i);
}
Usage:
string result = "hi, hello, sup".FirstFromSplit(',');
Console.WriteLine(result); // "hi"
You can do it:
var str = "Doctor Who,Fantasy,Steven Moffat,David Tennant";
var title = str.Split(',').First();
Also you can do it this way:
var index = str.IndexOf(",");
var title = index < 0 ? str : str.Substring(0, index);
These are the two options I managed to build, not having the luxury of working with var type, nor with additional variables on the line:
string f = "aS.".Substring(0, "aS.".IndexOf("S"));
Console.WriteLine(f);
string s = "aS.".Split("S".ToCharArray(),StringSplitOptions.RemoveEmptyEntries)[0];
Console.WriteLine(s);
This is what it gets:

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

How to word by word iterate in string in C#?

I want to iterate over string as word by word.
If I have a string "incidentno and fintype or unitno", I would like to read every word one by one as "incidentno", "and", "fintype", "or", and "unitno".
foreach (string word in "incidentno and fintype or unitno".Split(' ')) {
...
}
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
This works even if you have ".,; tabs and new lines" between your words.
Slightly twisted I know, but you could define an iterator block as an extension method on strings. e.g.
/// <summary>
/// Sweep over text
/// </summary>
/// <param name="Text"></param>
/// <returns></returns>
public static IEnumerable<string> WordList(this string Text)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(' ', cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
foreach (string word in "incidentno and fintype or unitno".WordList())
System.Console.WriteLine("'" + word + "'");
Which has the advantage of not creating a big array for long strings.
Use the Split method of the string class
string[] words = "incidentno and fintype or unitno".Split(" ");
This will split on spaces, so "words" will have [incidentno,and,fintype,or,unitno].
Assuming the words are always separated by a blank, you could use String.Split() to get an Array of your words.
There are multiple ways to accomplish this. Two of the most convenient methods (in my opinion) are:
Using string.Split() to create an array. I would probably use this method, because it is the most self-explanatory.
example:
string startingSentence = "incidentno and fintype or unitno";
string[] seperatedWords = startingSentence.Split(' ');
Alternatively, you could use (this is what I would use):
string[] seperatedWords = startingSentence.Split(new char[] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringSplitOptions.RemoveEmptyEntries will remove any empty entries from your array that may occur due to extra whitespace and other minor problems.
Next - to process the words, you would use:
foreach (string word in seperatedWords)
{
//Do something
}
Or, you can use regular expressions to solve this problem, as Darin demonstrated (a copy is below).
example:
var regex = new Regex(#"\b[\s,\.-:;]*");
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase).Where(x => !string.IsNullOrEmpty(x));
For processing, you can use similar code to the first option.
foreach (string word in words)
{
//Do something
}
Of course, there are many ways to solve this problem, but I think that these two would be the simplest to implement and maintain. I would go with the first option (using string.Split()) just because regex can sometimes become quite confusing, while a split will function correctly most of the time.
When using split, what about checking for empty entries?
string sentence = "incidentno and fintype or unitno"
string[] words = sentence.Split(new char[] { ' ', ',' ,';','\t','\n', '\r'}, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
// Process
}
EDIT:
I can't comment so I'm posting here but this (posted above) works:
foreach (string word in "incidentno and fintype or unitno".Split(' '))
{
...
}
My understanding of foreach is that it first does a GetEnumerator() and the calles .MoveNext until false is returned. So the .Split won't be re-evaluated on each iteration
public static string[] MyTest(string inword, string regstr)
{
var regex = new Regex(regstr);
var phrase = "incidentno and fintype or unitno";
var words = regex.Split(phrase);
return words;
}
? MyTest("incidentno, and .fintype- or; :unitno",#"[^\w+]")
[0]: "incidentno"
[1]: "and"
[2]: "fintype"
[3]: "or"
[4]: "unitno"
I'd like to add some information to JDunkerley's awnser.
You can easily make this method more reliable if you give a string or char parameter to search for.
public static IEnumerable<string> WordList(this string Text,string Word)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(Word, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
public static IEnumerable<string> WordList(this string Text, char c)
{
int cIndex = 0;
int nIndex;
while ((nIndex = Text.IndexOf(c, cIndex + 1)) != -1)
{
int sIndex = (cIndex == 0 ? 0 : cIndex + 1);
yield return Text.Substring(sIndex, nIndex - sIndex);
cIndex = nIndex;
}
yield return Text.Substring(cIndex + 1);
}
I write a string processor class.You can use it.
Example:
metaKeywords = bodyText.Process(prepositions).OrderByDescending().TakeTop().GetWords().AsString();
Class:
public static class StringProcessor
{
private static List<String> PrepositionList;
public static string ToNormalString(this string strText)
{
if (String.IsNullOrEmpty(strText)) return String.Empty;
char chNormalKaf = (char)1603;
char chNormalYah = (char)1610;
char chNonNormalKaf = (char)1705;
char chNonNormalYah = (char)1740;
string result = strText.Replace(chNonNormalKaf, chNormalKaf);
result = result.Replace(chNonNormalYah, chNormalYah);
return result;
}
public static List<KeyValuePair<String, Int32>> Process(this String bodyText,
List<String> blackListWords = null,
int minimumWordLength = 3,
char splitor = ' ',
bool perWordIsLowerCase = true)
{
string[] btArray = bodyText.ToNormalString().Split(splitor);
long numberOfWords = btArray.LongLength;
Dictionary<String, Int32> wordsDic = new Dictionary<String, Int32>(1);
foreach (string word in btArray)
{
if (word != null)
{
string lowerWord = word;
if (perWordIsLowerCase)
lowerWord = word.ToLower();
var normalWord = lowerWord.Replace(".", "").Replace("(", "").Replace(")", "")
.Replace("?", "").Replace("!", "").Replace(",", "")
.Replace("<br>", "").Replace(":", "").Replace(";", "")
.Replace("،", "").Replace("-", "").Replace("\n", "").Trim();
if ((normalWord.Length > minimumWordLength && !normalWord.IsMemberOfBlackListWords(blackListWords)))
{
if (wordsDic.ContainsKey(normalWord))
{
var cnt = wordsDic[normalWord];
wordsDic[normalWord] = ++cnt;
}
else
{
wordsDic.Add(normalWord, 1);
}
}
}
}
List<KeyValuePair<String, Int32>> keywords = wordsDic.ToList();
return keywords;
}
public static List<KeyValuePair<String, Int32>> OrderByDescending(this List<KeyValuePair<String, Int32>> list, bool isBasedOnFrequency = true)
{
List<KeyValuePair<String, Int32>> result = null;
if (isBasedOnFrequency)
result = list.OrderByDescending(q => q.Value).ToList();
else
result = list.OrderByDescending(q => q.Key).ToList();
return result;
}
public static List<KeyValuePair<String, Int32>> TakeTop(this List<KeyValuePair<String, Int32>> list, Int32 n = 10)
{
List<KeyValuePair<String, Int32>> result = list.Take(n).ToList();
return result;
}
public static List<String> GetWords(this List<KeyValuePair<String, Int32>> list)
{
List<String> result = new List<String>();
foreach (var item in list)
{
result.Add(item.Key);
}
return result;
}
public static List<Int32> GetFrequency(this List<KeyValuePair<String, Int32>> list)
{
List<Int32> result = new List<Int32>();
foreach (var item in list)
{
result.Add(item.Value);
}
return result;
}
public static String AsString<T>(this List<T> list, string seprator = ", ")
{
String result = string.Empty;
foreach (var item in list)
{
result += string.Format("{0}{1}", item, seprator);
}
return result;
}
private static bool IsMemberOfBlackListWords(this String word, List<String> blackListWords)
{
bool result = false;
if (blackListWords == null) return false;
foreach (var w in blackListWords)
{
if (w.ToNormalString().Equals(word))
{
result = true;
break;
}
}
return result;
}
}

Categories