C# Create Acronym from Word - c#

Given any string, I'd like to create an intelligent acronym that represents the string. If any of you have used JIRA, they accomplish this pretty well.
For example, given the word: Phoenix it would generate PHX or given the word Privacy Event Management it would create PEM.
I've got some code that will accomplish the latter:
string.Join(string.Empty, model.Name
.Where(char.IsLetter)
.Where(char.IsUpper))
This case doesn't handle if there is only one word and its lower case either.
but it doesn't account for the first case. Any ideas? I'm using C# 4.5

For the Phoenix => PHX, I think you'll need to check the strings against a dictionary of known abbreviations. As for the multiple word/camel-case support, regex is your friend!
var text = "A Big copy DayEnergyFree good"; // abbreviation should be "ABCDEFG"
var pattern = #"((?<=^|\s)(\w{1})|([A-Z]))";
string.Join(string.Empty, Regex.Matches(text, pattern).OfType<Match>().Select(x => x.Value.ToUpper()))
Let me explain what's happening here, starting with the regex pattern, which covers a few cases for matching substrings.
// must be directly after the beginning of the string or line "^" or a whitespace character "\s"
(?<=^|\s)
// match just one letter that is part of a word
(\w{1})
// if the previous requirements are not met
|
// match any upper-case letter
([A-Z])
The Regex.Matches method returns a MatchCollection, which is basically an ICollection so to use LINQ expressions, we call OfType() to convert the MatchCollection into an IEnumerable.
Regex.Matches(text, pattern).OfType<Match>()
Then we select only the value of the match (we don't need the other regex matching meta-data) and convert it to upper-case.
Select(x => x.Value.ToUpper())

I was able to extract out the JIRA key generator and posted it here. pretty interesting, and even though its JavaScript it could easily be converted to c#.

Here is a simple function that generates an acronym. Basically it puts letters or numbers into the acronym when there is a space before of this character. If there are no spaces in the string the the string is returned back. It does not capitalize letters in the acronym, but it is easy to amend.
You can just copy it in your code and start using it.
Results are the following. Just an example:
Deloitte Private Pty Ltd - DPPL
Clearwater Investment Co Pty Ltd (AC & CC Family Trust) - CICPLACFT
ASIC - ASIC
private string Acronym(string value)
{
if (string.IsNullOrWhiteSpace(value))
{
return value;
} else
{
var builder = new StringBuilder();
foreach(char c in value)
{
if (char.IsWhiteSpace(c) || char.IsLetterOrDigit(c))
{
builder.Append(c);
}
}
string trimmedValue = builder.ToString().Trim();
builder.Clear();
if (trimmedValue.Contains(' '))
{
for(int charIndex = 0; charIndex < trimmedValue.Length; charIndex++)
{
if (charIndex == 0)
{
builder.Append(trimmedValue[0]);
} else
{
char currentChar = trimmedValue[charIndex];
char previousChar = trimmedValue[charIndex - 1];
if (char.IsLetterOrDigit(currentChar) && char.IsWhiteSpace(previousChar))
{
builder.Append(trimmedValue[charIndex]);
}
}
}
return builder.ToString();
} else
{
return trimmedValue;
}
}
}

I need a not repeating code,So I create the follow method.
If you use like this,you will get
HashSet<string> idHashSet = new HashSet<string>();
for (int i = 0; i < 100; i++)
{
var eName = "China National Petroleum";
Console.WriteLine($"count:{i+1},short name:{GetIdentifierCode(eName,ref idHashSet)}");
}
the method is this.
/// <summary>
/// 根据英文名取其简写Code,优先取首字母,然后在每个单词中依次取字母作为Code,最后若还有重复则使用默认填充符(A)填充
/// todo 当名称为中文时,使用拼音作为取Code的源
/// </summary>
/// <param name="name"></param>
/// <param name="idHashSet"></param>
/// <returns></returns>
public static string GetIdentifierCode(string name, ref HashSet<string> idHashSet)
{
var words = name;
var fillChar = 'A';
if (string.IsNullOrEmpty(words))
{
do
{
words += fillChar.ToString();
} while (idHashSet.Contains(words));
}
//if (IsChinese)
//{
// words = GetPinYin(words);
//}
//中国石油天然气集团公司(China National Petroleum)
var sourceWord = new List<string>(words.Split(' '));
var returnWord = sourceWord.Select(c => new List<char>()).ToList();
int index = 0;
do
{
var listAddWord = sourceWord[index];
var addWord = returnWord[index];
//最后若还有重复则使用默认填充符(A)填充
if (sourceWord.All(c => string.IsNullOrEmpty(c)))
{
returnWord.Last().Add(fillChar);
continue;
}
//字符取完后跳过
else if (string.IsNullOrEmpty(listAddWord))
{
if (index == sourceWord.Count - 1)
index = 0;
else
{
index++;
}
continue;
}
if (addWord == null)
addWord = new List<char>();
string addString = string.Empty;
//字符全为大写时,不拆分
if (listAddWord.All(a => char.IsUpper(a)))
{
addWord = listAddWord.ToCharArray().ToList();
returnWord[index] = addWord;
addString = listAddWord;
}
else
{
addString = listAddWord.First().ToString();
addWord.Add(listAddWord.First());
}
listAddWord = listAddWord.Replace(addString, "");
sourceWord[index] = listAddWord;
if (index == sourceWord.Count - 1)
index = 0;
else
{
index++;
}
} while (idHashSet.Contains(string.Concat(returnWord.SelectMany(c => c))));
words = string.Concat(returnWord.SelectMany(c => c));
idHashSet.Add(words);
return words;

Related

Find two strings in list with a regular expression

I need to find two strings within a list that contains the characters from another string, which are not in order. To make it clear, an example could be a list of animals like:
lion
dog
bear
cat
And a given string is: oodilgn.
The answer here would be: lion and dog
Each character from the string will be used only once.
Is there a regular expression that will allow me to do this?
You could try to put the given string between []. These brackets will allow choosing - in any order - from these letters only. This may not be a perfect solution, but it will catch the majority of your list.
For example, you could write oodilgn as [oodilgn], then add a minimum number of letters to be found - let's say 3 - by using the curly brackets {}. The full regex will be like this:
[oodilgn]{3,}
This code basically says: find any word that has three of the letters that are located between brackets in any order.
Demo: https://regex101.com/r/MCWHjQ/2
Here is some example algorithm that does the job. I have assumed that the two strings together don't need to take all letters from the text else i make additional commented check. Also i return first two appropriate answers.
Here is how you call it in the outside function, Main or else:
static void Main(string[] args)
{
var text = "oodilgn";
var listOfWords = new List<string> { "lion", "dog", "bear", "cat" };
ExtractWordsWithSameLetters(text, listOfWords);
}
Here is the function with the algorithm. All string manuplations are entirely with regex.
public static void ExtractWordsWithSameLetters(string text, List<string> listOfWords)
{
string firstWord = null;
string secondWord = null;
for (var i = 0; i < listOfWords.Count - 1; i++)
{
var textCopy = text;
var firstWordIsMatched = true;
foreach (var letter in listOfWords[i])
{
var pattern = $"(.*?)({letter})(.*?)";
var regex = new Regex(pattern);
if (regex.IsMatch(text))
{
textCopy = regex.Replace(textCopy, "$1*$3", 1);
}
else
{
firstWordIsMatched = false;
break;
}
}
if (!firstWordIsMatched)
{
continue;
}
firstWord = listOfWords[i];
for (var j = i + 1; j < listOfWords.Count; j++)
{
var secondWordIsMatched = true;
foreach (var letter in listOfWords[j])
{
var pattern = $"(.*?)({letter})(.*?)";
var regex = new Regex(pattern);
if (regex.IsMatch(text))
{
textCopy = regex.Replace(textCopy, "$1*$3", 1);
}
else
{
secondWordIsMatched = false;
break;
}
}
if (secondWordIsMatched)
{
secondWord = listOfWords[j];
break;
}
}
if (secondWord == null)
{
firstWord = null;
}
else
{
//if (textCopy.ToCharArray().Any(l => l != '*'))
//{
// break;
//}
break;
}
}
if (firstWord != null)
{
Console.WriteLine($"{firstWord} { secondWord}");
}
}
Function is far from optimised but does what you want. If you want to return results, not print them just create an array and stuff firstWord and secondWord in it and have return type string[] or add two paramaters with ref out In those cases you will need to check the result in the calling function.
please try this out
Regex r=new Regex("^[.*oodilgn]$");
var list=new List<String>(){"lion","dog","fish","god"};
var output=list.Where(x=>r.IsMatch(x));
result
output=["lion","dog","god"];

C# string.split() separate string by uppercase

I've been using the Split() method to split strings. But this work if you set some character for condition in string.Split(). Is there any way to split a string when is see Uppercase?
Is it possible to get few words from some not separated string like:
DeleteSensorFromTemplate
And the result string is to be like:
Delete Sensor From Template
Use Regex.split
string[] split = Regex.Split(str, #"(?<!^)(?=[A-Z])");
Another way with regex:
public static string SplitCamelCase(string input)
{
return System.Text.RegularExpressions.Regex.Replace(input, "([A-Z])", " $1", System.Text.RegularExpressions.RegexOptions.Compiled).Trim();
}
If you do not like RegEx and you really just want to insert the missing spaces, this will do the job too:
public static string InsertSpaceBeforeUpperCase(this string str)
{
var sb = new StringBuilder();
char previousChar = char.MinValue; // Unicode '\0'
foreach (char c in str)
{
if (char.IsUpper(c))
{
// If not the first character and previous character is not a space, insert a space before uppercase
if (sb.Length != 0 && previousChar != ' ')
{
sb.Append(' ');
}
}
sb.Append(c);
previousChar = c;
}
return sb.ToString();
}
I had some fun with this one and came up with a function that splits by case, as well as groups together caps (it assumes title case for whatever follows) and digits.
Examples:
Input -> "TodayIUpdated32UPCCodes"
Output -> "Today I Updated 32 UPC Codes"
Code (please excuse the funky symbols I use)...
public string[] SplitByCase(this string s) {
var ʀ = new List<string>();
var ᴛ = new StringBuilder();
var previous = SplitByCaseModes.None;
foreach(var ɪ in s) {
SplitByCaseModes mode_ɪ;
if(string.IsNullOrWhiteSpace(ɪ.ToString())) {
mode_ɪ = SplitByCaseModes.WhiteSpace;
} else if("0123456789".Contains(ɪ)) {
mode_ɪ = SplitByCaseModes.Digit;
} else if(ɪ == ɪ.ToString().ToUpper()[0]) {
mode_ɪ = SplitByCaseModes.UpperCase;
} else {
mode_ɪ = SplitByCaseModes.LowerCase;
}
if((previous == SplitByCaseModes.None) || (previous == mode_ɪ)) {
ᴛ.Append(ɪ);
} else if((previous == SplitByCaseModes.UpperCase) && (mode_ɪ == SplitByCaseModes.LowerCase)) {
if(ᴛ.Length > 1) {
ʀ.Add(ᴛ.ToString().Substring(0, ᴛ.Length - 1));
ᴛ.Remove(0, ᴛ.Length - 1);
}
ᴛ.Append(ɪ);
} else {
ʀ.Add(ᴛ.ToString());
ᴛ.Clear();
ᴛ.Append(ɪ);
}
previous = mode_ɪ;
}
if(ᴛ.Length != 0) ʀ.Add(ᴛ.ToString());
return ʀ.ToArray();
}
private enum SplitByCaseModes { None, WhiteSpace, Digit, UpperCase, LowerCase }
Here's another different way if you don't want to be using string builders or RegEx, which are totally acceptable answers. I just want to offer a different solution:
string Split(string input)
{
string result = "";
for (int i = 0; i < input.Length; i++)
{
if (char.IsUpper(input[i]))
{
result += ' ';
}
result += input[i];
}
return result.Trim();
}

How remove some special words from a string content?

I have some strings containing code for emoji icons, like :grinning:, :kissing_heart:, or :bouquet:. I'd like to process them to remove the emoji codes.
For example, given:
Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
I want to get this:
Hello , how are you? Are you fine?
I know I can use this code:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
However, there are 856 different emoji icons I have to remove (which, using this method, would take 856 calls to Replace()). Is there any other way to accomplish this?
You can use Regex to match the word between :anything:. Using Replace with function you can make other validation.
string pattern = #":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
If you don't want to use Replace that make use of a lambda expression, you can use \w, as #yorye-nathan mentioned, to match only words.
string pattern = #":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
i would solve it that way
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
UPDATE - refering to Detail's Comment
Another approach: replace only existing Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, #":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
But i'm not sure if it's that big difference for such short chat-strings and it's more important to have code that's easy to read/maintain. OP's question was about reducing redundancy Text.Replace().Text.Replace() and not about the most efficient solution.
I would use a combination of some of the techniques already suggested. Firstly, I'd store the 800+ emoji strings in a database and then load them up at runtime. Use a HashSet to store these in memory, so that we have a O(1) lookup time (very fast). Use Regex to pull out all potential pattern matches from the input and then compare each to our hashed emoji, removing the valid ones and leaving any non-emoji patterns the user has entered themselves...
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = #":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
Which results in:
Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:,
but valid :nonetheless:
You do not have to replace all 856 emoji's. You only have to replace those that appear in the string. So have a look at:
Finding a substring using C# with a twist
Basically you extract all tokens ie the strings between : and : and then replace those with string.Empty()
If you are concerned that the search will return strings that are not emojis such as :some other text: then you could have a hash table lookup to make sure that replacing said found token is appropriate to do.
Finally got around to write something up. I'm combining a couple previously mentioned ideas, with the fact we should only loop over the string once. Based on those requirement, this sound like the perfect job for Linq.
You should probably cache the HashSet. Other than that, this has O(n) performance and only goes over the list once. Would be interesting to benchmark, but this could very well be the most efficient solution.
The approach is pretty straight forwards.
First load all Emoij in a HashSet so we can quickly look them up.
Split the string with input.Split(':') at the :.
Decide if we keep the current element.
If the last element was a match, keep the current element.
If the last element was no match, check if the current element matches.
If it does, ignore it. (This effectively removes the substring from the output).
If it doesn't, append : back and keep it.
Rebuild our string with a StringBuilder.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
Edit: I did something cool using the Rx library, but then realized Aggregate is the IEnumerable counterpart of Scan in Rx, thus simplifying the code even more.
If efficiency is a concern and to avoid processing "false positives", consider rewriting the string using a StringBuilder while skipping the special emoji tokens:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
Outputs:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
Use this code I put up below I think using this function your problem will be solved.
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
I'd use extension method like this:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
Usage:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
This method show be better in terms of performance compare to one with Regex
I would split the text with the ':' and then build the string excluding the found emoji names.
const char marker = ':';
var textSections = text.Split(marker);
var emojiRemovedText = string.Empty;
var notMatchedCount = 0;
textSections.ToList().ForEach(section =>
{
if (emojiNames.Contains(section))
{
notMatchedCount = 0;
}
else
{
if (notMatchedCount++ > 0)
{
emojiRemovedText += marker.ToString();
}
emojiRemovedText += section;
}
});

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

Regex camelcase in c#

I'm trying to use regex to convert a string like this "North Korea"
to a string like "northKorea" - does someone know how I might accomplish this in c# ?
Cheers
if you know all your input strings are in title case (like "North Korea") you can simply do:
string input = "North Korea";
input = input.Replace(" ",""); //remove spaces
string output = char.ToLower(input[0]) +
input.Substring(1); //make first char lowercase
// output = "northKorea"
if some of your input is not in title case you can use TextInfo.ToTitleCase
string input = "NoRtH kORea";
input = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase(input);
input = input.Replace(" ",""); //remove spaces
string output = char.ToLower(input[0]) +
input.Substring(1); //make first char lowercase
// output = "northKorea"
Forget regex.
All you need is a camelCase conversion algorithm:
See here:
http://www.codekeep.net/snippets/096fea45-b426-40fd-8beb-dec49d8a8662.aspx
Use this one:
string camelCase = ConvertCaseString(a, Case.CamelCase);
Copy-pasted in case it goes offline:
void Main() {
string a = "background color-red.brown";
string camelCase = ConvertCaseString(a, Case.CamelCase);
string pascalCase = ConvertCaseString(a, Case.PascalCase);
}
/// <summary>
/// Converts the phrase to specified convention.
/// </summary>
/// <param name="phrase"></param>
/// <param name="cases">The cases.</param>
/// <returns>string</returns>
static string ConvertCaseString(string phrase, Case cases)
{
string[] splittedPhrase = phrase.Split(' ', '-', '.');
var sb = new StringBuilder();
if (cases == Case.CamelCase)
{
sb.Append(splittedPhrase[0].ToLower());
splittedPhrase[0] = string.Empty;
}
else if (cases == Case.PascalCase)
sb = new StringBuilder();
foreach (String s in splittedPhrase)
{
char[] splittedPhraseChars = s.ToCharArray();
if (splittedPhraseChars.Length > 0)
{
splittedPhraseChars[0] = ((new String(splittedPhraseChars[0], 1)).ToUpper().ToCharArray())[0];
}
sb.Append(new String(splittedPhraseChars));
}
return sb.ToString();
}
enum Case
{
PascalCase,
CamelCase
}
You could just split it and put it back together:
string[] split = ("North Korea").Split(' ');
StringBuilder sb = new StringBuilder();
for (int i = 0; i < split.Count(); i++)
{
if (i == 0)
sb.Append(split[i].ToLower());
else
sb.Append(split[i]);
}
Edit: Switched to a StringBuilder instead, like Bazzz suggested.
This builds on Paolo Falabella's answer as a String extension and handles a few boundary cases such as empty string. Since there is some confusion between CamelCase and camelCase, I called it LowerCamelCase as described on Wikipedia. I resisted the temptation to go with nerdCaps.
internal static string ToLowerCamelCase( this string input )
{
string output = "";
if( String.IsNullOrEmpty( input ) == false )
{
output = System.Globalization.CultureInfo.CurrentCulture.TextInfo.ToTitleCase( input ); //in case not Title Case
output = output.Replace( " ", "" ); //remove any white spaces between words
if( String.IsNullOrEmpty( output ) == false ) //handles the case where input is " "
{
output = char.ToLower( output[0] ) + output.Substring( 1 ); //lowercase first (even if 1 character string)
}
}
return output;
}
Use:
string test = "Foo Bar";
test = test.ToLowerCamelCase();
... //test is now "fooBar"
Update:
toong raised a good point in the comments - this will not work for graphemes. See the link provided by toong. There are also examples of iterating graphemes here and here if you want to tweak the above code for graphemes.
String::Split definitely is one of my pet peeves. Also, none of the other answers deal with:
Cultures
All forms of word seperators
Numbers
What happens when it starts with word seperators
I tried to get it as close as possible to what you would find in base class library code:
static string ToCamelCaseInvariant(string value) { return ToCamelCase(value, true, CultureInfo.InvariantCulture); }
static string ToCamelCaseInvariant(string value, bool changeWordCaps) { return ToCamelCase(value, changeWordCaps, CultureInfo.InvariantCulture); }
static string ToCamelCase(string value) { return ToCamelCase(value, true, CultureInfo.CurrentCulture); }
static string ToCamelCase(string value, bool changeWordCaps) { return ToCamelCase(value, changeWordCaps, CultureInfo.CurrentCulture); }
/// <summary>
/// Converts the given string value into camelCase.
/// </summary>
/// <param name="value">The value.</param>
/// <param name="changeWordCaps">If set to <c>true</c> letters in a word (apart from the first) will be lowercased.</param>
/// <param name="culture">The culture to use to change the case of the characters.</param>
/// <returns>
/// The camel case value.
/// </returns>
static string ToCamelCase(string value, bool changeWordCaps, CultureInfo culture)
{
if (culture == null)
throw new ArgumentNullException("culture");
if (string.IsNullOrEmpty(value))
return value;
var result = new StringBuilder(value.Length);
var lastWasBreak = true;
for (var i = 0; i < value.Length; i++)
{
var c = value[i];
if (char.IsWhiteSpace(c) || char.IsPunctuation(c) || char.IsSeparator(c))
{
lastWasBreak = true;
}
else if (char.IsNumber(c))
{
result.Append(c);
lastWasBreak = true;
}
else
{
if (result.Length == 0)
{
result.Append(char.ToLower(c, culture));
}
else if (lastWasBreak)
{
result.Append(char.ToUpper(c, culture));
}
else if (changeWordCaps)
{
result.Append(char.ToLower(c, culture));
}
else
{
result.Append(c);
}
lastWasBreak = false;
}
}
return result.ToString();
}
// Tests
' This is a test. 12345hello world' = 'thisIsATest12345HelloWorld'
'--north korea' = 'northKorea'
'!nOrTH koreA' = 'northKorea'
'System.Console.' = 'systemConsole'
Try the following:
var input = "Hi my name is Rony";
var subStrs = input.ToLower().Split(' ');
var output = "";
foreach(var s in subStrs)
{
if(s!=subStrs[0])
output += s.First().ToString().ToUpper() + String.Join("", s.Skip(1));
else
output += s;
}
should get "hiMyNameIsRony" as the output
string toCamelCase(string s)
{
if (s.Length < 2) return s.ToLower();
return Char.ToLowerInvariant(s[0]) + s.Substring(1);
}
similar to Paolo Falabella's code but survives empty strings and 1 char strings.

Categories