Regex pipe character used as delimiter within subgroups - c#

I have string delimited by the pipe character. It is a repeatable sequence:
<machinenr>|<controldone>|<nrofitems|<items>
However where you see the items tag, you will have itemnumbers delimited also by the pipe character inbetween. Well, its' not a smart format, but I have to solve it, and I want to do with with regex in C#. So assuming the above format lets have a real example:
446408|0|2|111|6847|446408||0||
Note theoretically there doesn't need to be a value between the pipes, nor are the contents limited by a length. An item Id can be 111 or 877333, but even a mixed alphanumeric id XB111. So here we have a two machines with no items:
446408|0|0||447400||0||
Here we have a few machines with no or some items. Note, the pipe character is also used to delimit the items, so you have pipes within pipes:
446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0||
This machine has three items:
446408|0|3|99884|111|73732|
The item ids:
99884|111|73732
What should the regex look like? I've tried with the below named groups (easier to read), but it just doesn't work:
^(?P<machinenr>.*?)\|
(?P<controldone>.*?)\|
(?P<nrofitems>.*?)\|
(?P<items>.*?)\|
Here is a clarification for #Atterson #sln and #. Note, the amount of items can be 0-n there is no limit to the amount. Lets take this example, a long string with machines, and their items: 446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0|| What I expect the regex to do is to break up this string into three matches/parts and their values, the first match being: 446408|0|1|111| the second match: 446408|0|3|99884|111|73732| and the third match: 446408|0|0|| Ok, so to take an example of the values each part is supposed to be split into, lets use the second match/part. It is a machine with nr 446408, it has not been controlled 0, it has 3 items, the item ids: 99884|111|73732. After these items, a new sequence of:
<machinenr>|<controldone>|<nrofitems|<items>
can follow. #Sanxofon please check your regex here: [link] https://regex101.com/r/kC3gH0/87 and you'll see unfortunately it does not match.

This isn't solvable with a regex, there's no way to tell the regular expression something like: "Match .*?\| the same number of times as a certain capturing group...which happens to contain a number." This is the straightforward solution to this problem using plain old C# though.
string items = "446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0|";
var fields = items.Split('|');
for (int i = 0; i < fields.Length;) {
Console.WriteLine("machinenr:" + fields[i++]);
Console.WriteLine("controldone:" + fields[i++]);
int numSubItems = Int32.Parse(fields[i++]);
Console.WriteLine("num subitems:" + numSubItems);
if (numSubItems == 0) {
i++;
continue;
}
for (int subItemIndex = 0; subItemIndex < numSubItems; subItemIndex++) {
Console.WriteLine("\tItem:" + (subItemIndex + 1) + ": " + fields[i++]);
}
}
FYI, I trimmed the trailing "|" that your original string had, so
string items = "446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0|";
instead of
string items = "446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0||";

Named capturing groups are (?<nam>...) not (?P<name>...) in C#. Also, you expressed the desire to have repeating matches (so I have wrapped your regex in a repeating (?<grp>..).
You need to figure out how to differentiate an item from a machine. For instance, if you could say all machine numbers were 6 digits, and items were 0-5 digits you could do something like this... You would still have to split out the items collection.
^(?<grp>(?<machinenr>[^\|]{6})\|
(?<controldone>[^\|]*)\|
(?<nrofitems>[^\|]*)\|
(?<items>(?:[^\|]{0,5}\|){1,}))*$
Sample C# implementation:
class Program
{
static void Main(string[] args)
{
string strRegex =
#"^(?<grp>(?<machinenr>[^\|]{6})\|
(?<controldone>[^\|]*)\|
(?<nrofitems>[^\|]*)\|
(?<items>(?:[^\|]{0,5}\|){1,}))*$";
Regex myRegex = new Regex(strRegex, RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace);
string strTargetString = #"446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0||";
MatchCollection matches = myRegex.Matches(strTargetString);
foreach (Match m in matches)
{
for (int idx = 0; idx < m.Groups["grp"].Captures.Count; idx++)
{
Console.WriteLine("Group:");
Console.WriteLine($"\tmachinenr={m.Group["machinenr"].Captures[idx]}");
Console.WriteLine($"\tcontroldone={m.Groups["controldone"].Captures[idx]}");
Console.WriteLine($"\tnrofitems={m.Groups["nrofitems"].Captures[idx]}");
Console.WriteLine($"\titems={m.Groups["items"].Captures[idx]}");
}
}
}
}
Using C# IEnumerable<T> Algorithm
It would seem easier just to split the string and parse the subsequent array. But, if you are concerned about dealing with large strings or don't wish to use String.Split(), you can use an IEnumerable<T> method. Here is one approach...
class Program
{
public class Entry
{
public string MachineNr { get; set; }
public string ControlDone { get; set; }
public int Count { get; set; }
public List<string> Items { get; set; }
private static IEnumerable<string> fields(string list)
{
int idx = 0;
do
{
int ndx = list.IndexOf('|', idx);
if (ndx == 1)
yield return list.Substring(idx);
else
yield return list.Substring(idx, ndx - idx);
idx = ++ndx;
}
while (idx > 0 && idx < list.Length-1) ;
}
public static IEnumerable<Entry> parseList(string list)
{
int idx =0;
var fields = Entry.fields(list).GetEnumerator();
while (fields.MoveNext())
{
var e = new Entry();
e.MachineNr = fields.Current;
if (fields.MoveNext())
{
e.ControlDone = fields.Current;
if (fields.MoveNext())
{
int val = 0;
e.Count = int.TryParse(fields.Current, out val) ? val : 0;
e.Items = new List<string>();
for (int x=e.Count;x>0;x--)
{
if (fields.MoveNext())
e.Items.Add(fields.Current);
}
}
}
yield return e;
}
}
}
static void Main(string[] args)
{
string strTargetString = #"446408|0|1|111|446408|0|3|99884|111|73732|446408|0|0||";
foreach (var entry in Entry.parseList(strTargetString))
{
Console.WriteLine(
$#"Group:
Machine: {entry.MachineNr}
ControlDone: {entry.ControlDone}
Count: {entry.Count}
Items: {string.Join(", ",entry.Items)}");
}
}
}

Related

How to get every possible combination base on the ranges in brackets?

Looking for the best way to take something like 1[a-C]3[1-6]07[R,E-G] and have it output a log that would look like the following — basically every possible combination base on the ranges in brackets.
1a3107R
1a3107E
1a3107F
1a3107G
1b3107R
1b3107E
1b3107F
1b3107G
1c3107R
1c3107E
1c3107F
1c3107G
all the way to 1C3607G.
Sorry for not being more technical about what I looking for, just not sure on the correct terms to explain.
Normally what we'd do to get all combinations is to put all our ranges into arrays, then use nested loops to loop through each array, and create a new item in the inner loop that gets added to our results.
But in order to do that here, we'd first need to write a method that can parse your range string and return a list of char values defined by the range. I've written a rudimentary one here, which works with your sample input but should have some validation added to ensure the input string is in the proper format:
public static List<char> GetRange(string input)
{
input = input.Replace("[", "").Replace("]", "");
var parts = input.Split(',');
var range = new List<char>();
foreach (var part in parts)
{
var ends = part.Split('-');
if (ends.Length == 1)
{
range.Add(ends[0][0]);
}
else if (char.IsDigit(ends[0][0]))
{
var start = Convert.ToInt32(ends[0][0]);
var end = Convert.ToInt32(ends[1][0]);
var count = end - start + 1;
range.AddRange(Enumerable.Range(start, count).Select(c => (char) c));
}
else
{
var start = (int) ends[0][0];
var last = (int) ends[1][0];
var end = last < start ? 'z' : last;
range.AddRange(Enumerable.Range(start, end - start + 1)
.Select(c => (char) c));
if (last < start)
{
range.AddRange(Enumerable.Range('A', last - 'A' + 1)
.Select(c => (char) c));
}
}
}
return range;
}
Now that we can get a range of values from a string like "[a-C]", we need a way to create nested loops for each range, and to build our list of values based on the input string.
One way to do this is to replace our input string with one that contains placeholders for each range, and then we can create a loop for each range, and on each iteration we can replace the placeholder for that range with a character from the range.
So we'll take an input like this: "1[a-C]3[1-6]07[R,E-G]", and turn it into this: "1{0}3{1}07{2}". Now we can create loops where we take the characters from the first range and create a new string for each one of them, replacing the {0} with the character. Then, for each one of those strings, we iterate over the second range and create a new string that replaces the {1} placeholder with a character from the second range, and so on and so on until we've created new strings for every possible combination.
public static List<string> GetCombinatins(string input)
{
// Sample input = "1[a-C]3[1-6]07[R,E-G]"
var inputWithPlaceholders = string.Empty; // This will become "1{0}3{1}07{2}"
var placeholder = 0;
var ranges = new List<List<char>>();
for (int i = 0; i < input.Length; i++)
{
// We've found a range start, so replace this with our
// placeholder '{n}' and add the range to our list of ranges
if (input[i] == '[')
{
inputWithPlaceholders += $"{{{placeholder++}}}";
var rangeEndIndex = input.IndexOf("]", i);
ranges.Add(GetRange(input.Substring(i, rangeEndIndex - i)));
i = rangeEndIndex;
}
else
{
inputWithPlaceholders += input[i];
}
}
if (ranges.Count == 0) return new List<string> {input};
// Add strings for the first range
var values = ranges.First().Select(chr =>
inputWithPlaceholders.Replace("{0}", chr.ToString())).ToList();
// Then continually add all combinations of other ranges
for (int i = 1; i < ranges.Count; i++)
{
values = values.SelectMany(value =>
ranges[i].Select(chr =>
value.Replace($"{{{i}}}", chr.ToString()))).ToList();
}
return values;
}
Now with these methods out of the way, we can create output of all our ranges quite easily:
static void Main()
{
Console.WriteLine(string.Join(", ", GetCombinatins("1[a-C]3[1-6]07[R,E-G]")));
GetKeyFromUser("\nPress any key to exit...");
}
Output
I would approach this problem in three stages. The first stage is to transform the source string to an IEnumerable of IEnumerable<string>.
static IEnumerable<IEnumerable<string>> ParseSourceToEnumerables(string source);
For example the source "1[A-C]3[1-6]07[R,E-G]" should be transformed to the 6 enumerables below:
"1"
"A", "B", "C"
"3"
"1", "2", "3", "4", "5", "6"
"07"
"R", "E", "F", "G"
Each literal inside the source has been transformed to an IEnumerable<string> containing a single string.
The second stage would be to create the Cartesian product of these enumerables.
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
IEnumerable<IEnumerable<T>> sequences)
The final (and easiest) stage would be to concatenate each one of the inner IEnumerable<string> of the Cartesian product to a single string. For example
the sequence "1", "A", "3", "1", "07", "R" to the string "1A3107R"
The hardest stage is the first one, because it involves parsing. Below is a partial implementation:
static IEnumerable<IEnumerable<string>> ParseSourceToEnumerables(string source)
{
var matches = Regex.Matches(source, #"\[(.*?)\]", RegexOptions.Singleline);
int previousIndex = 0;
foreach (Match match in matches)
{
var previousLiteral = source.Substring(
previousIndex, match.Index - previousIndex);
if (previousLiteral.Length > 0)
yield return Enumerable.Repeat(previousLiteral, 1);
yield return SinglePatternToEnumerable(match.Groups[1].Value);
previousIndex = match.Index + match.Length;
}
var lastLiteral = source.Substring(previousIndex, source.Length - previousIndex);
if (lastLiteral.Length > 0) yield return Enumerable.Repeat(lastLiteral, 1);
}
static IEnumerable<string> SinglePatternToEnumerable(string pattern)
{
// TODO
// Should transform the pattern "X,A-C,YZ"
// to the sequence ["X", "A", "B", "C", "YZ"]
}
The second stage is hard too, but solved. I just grabbed the implementation from Eric Lippert's blog.
static IEnumerable<IEnumerable<T>> CartesianProduct<T>(
IEnumerable<IEnumerable<T>> sequences)
{
IEnumerable<IEnumerable<T>> emptyProduct = new[] { Enumerable.Empty<T>() };
return sequences.Aggregate(
emptyProduct,
(accumulator, sequence) =>
accumulator.SelectMany(_ => sequence,
(accseq, item) => accseq.Append(item)) // .NET Framework 4.7.1
);
}
The final stage is just a call to String.Join.
var source = "1[A-C]3[1-6]07[R,E-G]";
var enumerables = ParseSourceToEnumerables(source);
var combinations = CartesianProduct(enumerables);
foreach (var combination in combinations)
{
Console.WriteLine($"Combination: {String.Join("", combination)}");
}

How can I split a string to store contents in two different arrays in c#?

The string I want to split is an array of strings.
the array contains strings like:
G1,Active
G2,Inactive
G3,Inactive
.
.
G24,Active
Now I want to store the G's in an array, and Active or Inactive in a different array. So far I have tried this which has successfully store all the G's part but I have lost the other part. I used Split fucntion but did not work so I have tried this.
int i = 0;
for(i = 0; i <= grids.Length; i++)
{
string temp = grids[i];
temp = temp.Replace(",", " ");
if (temp.Contains(' '))
{
int index = temp.IndexOf(' ');
grids[i] = temp.Substring(0, index);
}
//System.Console.WriteLine(temp);
}
Please help me how to achieve this goal. I am new to C#.
If I understand the problem correctly - we have an array of strings Eg:
arrayOfStrings[24] =
{
"G1,Active",
"G2,Inactive",
"G3,Active",
...
"G24,Active"
}
Now we want to split each item and store the g part in one array and the status into another.
Working with arrays the solution is to - traverse the arrayOfStrings.
Per each item in the arrayOfStrings we split it by ',' separator.
The Split operation will return another array of two elements the g part and the status - which will be stored respectively into distinct arrays (gArray and statusArray) for later retrieval. Those arrays will have a 1-to-1 relation.
Here is my implementation:
static string[] LoadArray()
{
return new string[]
{
"G1,Active",
"G2,Inactive",
"G3,Active",
"G4,Active",
"G5,Active",
"G6,Inactive",
"G7,Active",
"G8,Active",
"G9,Active",
"G10,Active",
"G11,Inactive",
"G12,Active",
"G13,Active",
"G14,Inactive",
"G15,Active",
"G16,Inactive",
"G17,Active",
"G18,Active",
"G19,Inactive",
"G20,Active",
"G21,Inactive",
"G22,Active",
"G23,Inactive",
"G24,Active"
};
}
static void Main(string[] args)
{
string[] myarrayOfStrings = LoadArray();
string[] gArray = new string[24];
string[] statusArray = new string[24];
int index = 0;
foreach (var item in myarrayOfStrings)
{
var arraySplit = item.Split(',');
gArray[index] = arraySplit[0];
statusArray[index] = arraySplit[1];
index++;
}
for (int i = 0; i < gArray.Length; i++)
{
Console.WriteLine("{0} has status : {1}", gArray[i] , statusArray[i]);
}
Console.ReadLine();
}
seems like you have a list of Gxx,Active my recomendation is first of all you split the string based on the space, which will give you the array previoulsy mentioned doing the next:
string text = "G1,Active G2,Inactive G3,Inactive G24,Active";
string[] splitedGItems = text.Split(" ");
So, now you have an array, and I strongly recommend you to use an object/Tuple/Dictionary depends of what suits you more in the entire scenario. for now i will use Dictionary as it seems to be key-value
Dictionary<string, string> GxListActiveInactive = new Dictionary<string, string>();
foreach(var singleGItems in splitedGItems)
{
string[] definition = singleGItems.Split(",");
GxListActiveInactive.Add(definition[0], definition[1]);
}
What im achiving in this code is create a collection which is key-value, now you have to search the G24 manually doing the next
string G24Value = GxListActiveInactive.FirstOrDefault(a => a.Key == "G24").Value;
just do it :
var splitedArray = YourStringArray.ToDictionary(x=>x.Split(',')[0],x=>x.Split(',')[1]);
var gArray = splitedArray.Keys;
var activeInactiveArray = splitedArray.Values;
I hope it will be useful
You can divide the string using Split; the first part should be the G's, while the second part will be "Active" or "Inactive".
int i;
string[] temp, activity = new string[grids.Length];
for(i = 0; i <= grids.Length; i++)
{
temp = grids[i].Split(',');
grids[i] = temp[0];
activity[i] = temp[1];
}

How to get an index of a word in string array

I want to get the index of a word in a string array.
for example, the sentence I will input is 'I love you.'
I have words[1] = love, how can I get the position of 'love' is 1? I could do it but just inside the if state. I want to bring it outside. Please help me.
This is my code.
static void Main(string[] args)
{
Console.WriteLine("sentence: ");
string a = Console.ReadLine();
String[] words = a.Split(' ');
List<string> verbs = new List<string>();
verbs.Add("love");
int i = 0;
while (i < words.Length) {
foreach (string verb in verbs) {
if (words[i] == verb) {
int index = i;
Console.WriteLine(i);
}
} i++;
}
Console.ReadKey();
}
I could do it but just inside the if state. I want to bring it outside.
Your code identifies the index correctly, all you need to do now is storing it for use outside the loop.
Make a list of ints, and call Add on it for the matches that you identify:
var indexes = new List<int>();
while (i < words.Length) {
foreach (string verb in verbs) {
if (words[i] == verb) {
int index = i;
indexes.Add(i);
break;
}
}
i++;
}
You can replace the inner loop with a call of Contains method, and the outer loop with a for:
for (var i = 0 ; i != words.Length ; i++) {
if (verbs.Contains(words[i])) {
indexes.Add(i);
}
}
Finally, the whole sequence can be converted to a single LINQ query:
var indexes = words
.Select((w,i) => new {w,i})
.Where(p => verbs.Contains(p.w))
.Select(p => p.i)
.ToList();
Here is an example
var a = "I love you.";
var words = a.Split(' ');
var index = Array.IndexOf(words,"love");
Console.WriteLine(index);
private int GetWordIndex(string WordOrigin, string GetWord)
{
string[] words = WordOrigin.Split(' ');
int Index = Array.IndexOf(words, GetWord);
return Index;
}
assuming that you called the function as GetWordIndex("Hello C# World", "C#");, WordOrigin is Hello C# World and GetWord is C#
now according to the function:
string[] words = WordsOrigin.Split(' '); broke the string literal into an array of strings where the words would be split for every spaces in between them. so Hello C# World would then be broken down into Hello, C#, and World.
int Index = Array.IndexOf(words, GetWord); gets the Index of whatever GetWord is, according to the sample i provided, we are looking for the word C# from Hello C# World that is then splitted into an Array of String
return Index; simply returns whatever index it was located from

How remove some special words from a string content?

I have some strings containing code for emoji icons, like :grinning:, :kissing_heart:, or :bouquet:. I'd like to process them to remove the emoji codes.
For example, given:
Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
I want to get this:
Hello , how are you? Are you fine?
I know I can use this code:
richTextBox2.Text = richTextBox1.Text.Replace(":kissing_heart:", "").Replace(":bouquet:", "").Replace(":grinning:", "").ToString();
However, there are 856 different emoji icons I have to remove (which, using this method, would take 856 calls to Replace()). Is there any other way to accomplish this?
You can use Regex to match the word between :anything:. Using Replace with function you can make other validation.
string pattern = #":(.*?):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, (m) =>
{
if (m.ToString().Split(' ').Count() > 1) // more than 1 word and other validations that will help preventing parsing the user text
{
return m.ToString();
}
return String.Empty;
}); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
If you don't want to use Replace that make use of a lambda expression, you can use \w, as #yorye-nathan mentioned, to match only words.
string pattern = #":(\w*):";
string input = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet: Are you super fan, for example. :words not to replace:";
string output = Regex.Replace(input, pattern, String.Empty); // "Hello , how are you? Are you fine? Are you super fan, for example. :words not to replace:"
string Text = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
i would solve it that way
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
Emoj.ForEach(x => Text = Text.Replace(x, string.Empty));
UPDATE - refering to Detail's Comment
Another approach: replace only existing Emojs
List<string> Emoj = new List<string>() { ":kissing_heart:", ":bouquet:", ":grinning:" };
var Matches = Regex.Matches(Text, #":(\w*):").Cast<Match>().Select(x => x.Value);
Emoj.Intersect(Matches).ToList().ForEach(x => Text = Text.Replace(x, string.Empty));
But i'm not sure if it's that big difference for such short chat-strings and it's more important to have code that's easy to read/maintain. OP's question was about reducing redundancy Text.Replace().Text.Replace() and not about the most efficient solution.
I would use a combination of some of the techniques already suggested. Firstly, I'd store the 800+ emoji strings in a database and then load them up at runtime. Use a HashSet to store these in memory, so that we have a O(1) lookup time (very fast). Use Regex to pull out all potential pattern matches from the input and then compare each to our hashed emoji, removing the valid ones and leaving any non-emoji patterns the user has entered themselves...
public class Program
{
//hashset for in memory representation of emoji,
//lookups are O(1), so very fast
private HashSet<string> _emoji = null;
public Program(IEnumerable<string> emojiFromDb)
{
//load emoji from datastore (db/file,etc)
//into memory at startup
_emoji = new HashSet<string>(emojiFromDb);
}
public string RemoveEmoji(string input)
{
//pattern to search for
string pattern = #":(\w*):";
string output = input;
//use regex to find all potential patterns in the input
MatchCollection matches = Regex.Matches(input, pattern);
//only do this if we actually find the
//pattern in the input string...
if (matches.Count > 0)
{
//refine this to a distinct list of unique patterns
IEnumerable<string> distinct =
matches.Cast<Match>().Select(m => m.Value).Distinct();
//then check each one against the hashset, only removing
//registered emoji. This allows non-emoji versions
//of the pattern to survive...
foreach (string match in distinct)
if (_emoji.Contains(match))
output = output.Replace(match, string.Empty);
}
return output;
}
}
public class MainClass
{
static void Main(string[] args)
{
var program = new Program(new string[] { ":grinning:", ":kissing_heart:", ":bouquet:" });
string output = program.RemoveEmoji("Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:");
Console.WriteLine(output);
}
}
Which results in:
Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:,
but valid :nonetheless:
You do not have to replace all 856 emoji's. You only have to replace those that appear in the string. So have a look at:
Finding a substring using C# with a twist
Basically you extract all tokens ie the strings between : and : and then replace those with string.Empty()
If you are concerned that the search will return strings that are not emojis such as :some other text: then you could have a hash table lookup to make sure that replacing said found token is appropriate to do.
Finally got around to write something up. I'm combining a couple previously mentioned ideas, with the fact we should only loop over the string once. Based on those requirement, this sound like the perfect job for Linq.
You should probably cache the HashSet. Other than that, this has O(n) performance and only goes over the list once. Would be interesting to benchmark, but this could very well be the most efficient solution.
The approach is pretty straight forwards.
First load all Emoij in a HashSet so we can quickly look them up.
Split the string with input.Split(':') at the :.
Decide if we keep the current element.
If the last element was a match, keep the current element.
If the last element was no match, check if the current element matches.
If it does, ignore it. (This effectively removes the substring from the output).
If it doesn't, append : back and keep it.
Rebuild our string with a StringBuilder.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ConsoleApplication1
{
static class Program
{
static void Main(string[] args)
{
ISet<string> emojiList = new HashSet<string>(new[] { "kissing_heart", "bouquet", "grinning" });
Console.WriteLine("Hello:grinning: , ho:w: a::re you?:kissing_heart:kissing_heart: Are you fine?:bouquet:".RemoveEmoji(':', emojiList));
Console.ReadLine();
}
public static string RemoveEmoji(this string input, char delimiter, ISet<string> emojiList)
{
StringBuilder sb = new StringBuilder();
input.Split(delimiter).Aggregate(true, (prev, curr) =>
{
if (prev)
{
sb.Append(curr);
return false;
}
if (emojiList.Contains(curr))
{
return true;
}
sb.Append(delimiter);
sb.Append(curr);
return false;
});
return sb.ToString();
}
}
}
Edit: I did something cool using the Rx library, but then realized Aggregate is the IEnumerable counterpart of Scan in Rx, thus simplifying the code even more.
If efficiency is a concern and to avoid processing "false positives", consider rewriting the string using a StringBuilder while skipping the special emoji tokens:
static HashSet<string> emojis = new HashSet<string>()
{
"grinning",
"kissing_heart",
"bouquet"
};
static string RemoveEmojis(string input)
{
StringBuilder sb = new StringBuilder();
int length = input.Length;
int startIndex = 0;
int colonIndex = input.IndexOf(':');
while (colonIndex >= 0 && startIndex < length)
{
//Keep normal text
int substringLength = colonIndex - startIndex;
if (substringLength > 0)
sb.Append(input.Substring(startIndex, substringLength));
//Advance the feed and get the next colon
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
if (colonIndex < 0) //No more colons, so no more emojis
{
//Don't forget that first colon we found
sb.Append(':');
//Add the rest of the text
sb.Append(input.Substring(startIndex));
break;
}
else //Possible emoji, let's check
{
string token = input.Substring(startIndex, colonIndex - startIndex);
if (emojis.Contains(token)) //It's a match, so we skip this text
{
//Advance the feed
startIndex = colonIndex + 1;
colonIndex = input.IndexOf(':', startIndex);
}
else //No match, so we keep the normal text
{
//Don't forget the colon
sb.Append(':');
//Instead of doing another substring next loop, let's just use the one we already have
sb.Append(token);
startIndex = colonIndex;
}
}
}
return sb.ToString();
}
static void Main(string[] args)
{
List<string> inputs = new List<string>()
{
"Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:",
"Tricky test:123:grinning:",
"Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:"
};
foreach (string input in inputs)
{
Console.WriteLine("In <- " + input);
Console.WriteLine("Out -> " + RemoveEmojis(input));
Console.WriteLine();
}
Console.WriteLine("\r\n\r\nPress enter to exit...");
Console.ReadLine();
}
Outputs:
In <- Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:
Out -> Hello , how are you? Are you fine?
In <- Tricky test:123:grinning:
Out -> Tricky test:123
In <- Hello:grinning: :imadethis:, how are you?:kissing_heart: Are you fine?:bouquet: This is:a:strange:thing :to type:, but valid :nonetheless:
Out -> Hello :imadethis:, how are you? Are you fine? This is:a:strange:thing :to type:, but valid :nonetheless:
Use this code I put up below I think using this function your problem will be solved.
string s = "Hello:grinning: , how are you?:kissing_heart: Are you fine?:bouquet:";
string rmv = ""; string remove = "";
int i = 0; int k = 0;
A:
rmv = "";
for (i = k; i < s.Length; i++)
{
if (Convert.ToString(s[i]) == ":")
{
for (int j = i + 1; j < s.Length; j++)
{
if (Convert.ToString(s[j]) != ":")
{
rmv += s[j];
}
else
{
remove += rmv + ",";
i = j;
k = j + 1;
goto A;
}
}
}
}
string[] str = remove.Split(',');
for (int x = 0; x < str.Length-1; x++)
{
s = s.Replace(Convert.ToString(":" + str[x] + ":"), "");
}
Console.WriteLine(s);
Console.ReadKey();
I'd use extension method like this:
public static class Helper
{
public static string MyReplace(this string dirty, char separator)
{
string newText = "";
bool replace = false;
for (int i = 0; i < dirty.Length; i++)
{
if(dirty[i] == separator) { replace = !replace ; continue;}
if(replace ) continue;
newText += dirty[i];
}
return newText;
}
}
Usage:
richTextBox2.Text = richTextBox2.Text.MyReplace(':');
This method show be better in terms of performance compare to one with Regex
I would split the text with the ':' and then build the string excluding the found emoji names.
const char marker = ':';
var textSections = text.Split(marker);
var emojiRemovedText = string.Empty;
var notMatchedCount = 0;
textSections.ToList().ForEach(section =>
{
if (emojiNames.Contains(section))
{
notMatchedCount = 0;
}
else
{
if (notMatchedCount++ > 0)
{
emojiRemovedText += marker.ToString();
}
emojiRemovedText += section;
}
});

compare the characters in two strings

In C#, how do I compare the characters in two strings.
For example, let's say I have these two strings
"bc3231dsc" and "bc3462dsc"
How do I programically figure out the the strings
both start with "bc3" and end with "dsc"?
So the given would be two variables:
var1 = "bc3231dsc";
var2 = "bc3462dsc";
After comparing each characters from var1 to var2, I would want the output to be:
leftMatch = "bc3";
center1 = "231";
center2 = "462";
rightMatch = "dsc";
Conditions:
1. The strings will always be a length of 9 character.
2. The strings are not case sensitive.
The string class has 2 methods (StartsWith and Endwith) that you can use.
After reading your question and the already given answers i think there are some constraints are missing, which are maybe obvious to you, but not to the community. But maybe we can do a little guess work:
You'll have a bunch of string pairs that should be compared.
The two strings in each pair are of the same length or you are only interested by comparing the characters read simultaneously from left to right.
Get some kind of enumeration that tells me where each block starts and how long it is.
Due to the fact, that a string is only a enumeration of chars you could use LINQ here to get an idea of the matching characters like this:
private IEnumerable<bool> CommonChars(string first, string second)
{
if (first == null)
throw new ArgumentNullException("first");
if (second == null)
throw new ArgumentNullException("second");
var charsToCompare = first.Zip(second, (LeftChar, RightChar) => new { LeftChar, RightChar });
var matchingChars = charsToCompare.Select(pair => pair.LeftChar == pair.RightChar);
return matchingChars;
}
With this we can proceed and now find out how long each block of consecutive true and false flags are with this method:
private IEnumerable<Tuple<int, int>> Pack(IEnumerable<bool> source)
{
if (source == null)
throw new ArgumentNullException("source");
using (var iterator = source.GetEnumerator())
{
if (!iterator.MoveNext())
{
yield break;
}
bool current = iterator.Current;
int index = 0;
int length = 1;
while (iterator.MoveNext())
{
if(current != iterator.Current)
{
yield return Tuple.Create(index, length);
index += length;
length = 0;
}
current = iterator.Current;
length++;
}
yield return Tuple.Create(index, length);
}
}
Currently i don't know if there is an already existing LINQ function that provides the same functionality. As far as i have already read it should be possible with SelectMany() (cause in theory you can accomplish any LINQ task with this method), but as an adhoc implementation the above was easier (for me).
These functions could then be used in a way something like this:
var firstString = "bc3231dsc";
var secondString = "bc3462dsc";
var commonChars = CommonChars(firstString, secondString);
var packs = Pack(commonChars);
foreach (var item in packs)
{
Console.WriteLine("Left side: " + firstString.Substring(item.Item1, item.Item2));
Console.WriteLine("Right side: " + secondString.Substring(item.Item1, item.Item2));
Console.WriteLine();
}
Which would you then give this output:
Left side: bc3
Right side: bc3
Left side: 231
Right side: 462
Left side: dsc
Right side: dsc
The biggest drawback is in someway the usage of Tuple cause it leads to the ugly property names Item1 and Item2 which are far away from being instantly readable. But if it is really wanted you could introduce your own simple class holding two integers and has some rock-solid property names. Also currently the information is lost about if each block is shared by both strings or if they are different. But once again it should be fairly simply to get this information also into the tuple or your own class.
static void Main(string[] args)
{
string test1 = "bc3231dsc";
string tes2 = "bc3462dsc";
string firstmatch = GetMatch(test1, tes2, false);
string lasttmatch = GetMatch(test1, tes2, true);
string center1 = test1.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
string center2 = test2.Substring(firstmatch.Length, test1.Length -(firstmatch.Length + lasttmatch.Length)) ;
}
public static string GetMatch(string fist, string second, bool isReverse)
{
if (isReverse)
{
fist = ReverseString(fist);
second = ReverseString(second);
}
StringBuilder builder = new StringBuilder();
char[] ar1 = fist.ToArray();
for (int i = 0; i < ar1.Length; i++)
{
if (fist.Length > i + 1 && ar1[i].Equals(second[i]))
{
builder.Append(ar1[i]);
}
else
{
break;
}
}
if (isReverse)
{
return ReverseString(builder.ToString());
}
return builder.ToString();
}
public static string ReverseString(string s)
{
char[] arr = s.ToCharArray();
Array.Reverse(arr);
return new string(arr);
}
Pseudo code of what you need..
int stringpos = 0
string resultstart = ""
while not end of string (either of the two)
{
if string1.substr(stringpos) == string1.substr(stringpos)
resultstart =resultstart + string1.substr(stringpos)
else
exit while
}
resultstart has you start string.. you can do the same going backwards...
Another solution you can use is Regular Expressions.
Regex re = new Regex("^bc3.*?dsc$");
String first = "bc3231dsc";
if(re.IsMatch(first)) {
//Act accordingly...
}
This gives you more flexibility when matching. The pattern above matches any string that starts in bc3 and ends in dsc with anything between except a linefeed. By changing .*? to \d, you could specify that you only want digits between the two fields. From there, the possibilities are endless.
using System;
using System.Text.RegularExpressions;
using System.Collections.Generic;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
List<string> common_str = commonStrings(s1,s2);
foreach ( var s in common_str)
Console.WriteLine(s);
}
static public List<string> commonStrings(string s1, string s2){
int len = s1.Length;
char [] match_chars = new char[len];
for(var i = 0; i < len ; ++i)
match_chars[i] = (Char.ToLower(s1[i])==Char.ToLower(s2[i]))? '#' : '_';
string pat = new String(match_chars);
Regex regex = new Regex("(#+)", RegexOptions.Compiled);
List<string> result = new List<string>();
foreach (Match match in regex.Matches(pat))
result.Add(s1.Substring(match.Index, match.Length));
return result;
}
}
for UPDATE CONDITION
using System;
class Sample {
static public void Main(){
string s1 = "bc3231dsc";
string s2 = "bc3462dsc";
int len = 9;//s1.Length;//cond.1)
int l_pos = 0;
int r_pos = len;
for(int i=0;i<len && Char.ToLower(s1[i])==Char.ToLower(s2[i]);++i){
++l_pos;
}
for(int i=len-1;i>0 && Char.ToLower(s1[i])==Char.ToLower(s2[i]);--i){
--r_pos;
}
string leftMatch = s1.Substring(0,l_pos);
string center1 = s1.Substring(l_pos, r_pos - l_pos);
string center2 = s2.Substring(l_pos, r_pos - l_pos);
string rightMatch = s1.Substring(r_pos);
Console.Write(
"leftMatch = \"{0}\"\n" +
"center1 = \"{1}\"\n" +
"center2 = \"{2}\"\n" +
"rightMatch = \"{3}\"\n",leftMatch, center1, center2, rightMatch);
}
}

Categories