Finding the First Common Substring of a set of strings

Finding the First Common Substring of a set of strings - c#

I am looking for an implementation of a First Common Substring
Mike is not your average guy. I think you are great.
Jim is not your friend. I think you are great.
Being different is not your fault. I think you are great.
Using a Longest Common Substring implementation (and ignoring punctuation), you would get "I think you are great", but I am looking for the first occurring common substring, in this example:
is not your
Perhaps an implementation that generates and ordered list of all common substrings that I can just take the first from.
Edit
The tokens being compared would be complete words. Looking for a greedy match of the first longest sequence of whole words. (Assuming a suffix tree was used in the approach, each node of the tree would be a word)

There are quite a few steps to do this.
Remove Punctuation
Break down Sentences into list of Words
Create string of all combinations of contiguous words (min:1, max:wordCount)
Join the three lists on new list of string (subsentences)
Sort Accordingly.
Code:
static void Main(string[] args)
{
var sentence1 = "Mike is not your average guy. I think you are great.";
var sentence2 = "Jim is not your friend. I think you are great.";
var sentence3 = "Being different is not your fault. I think you are great.";
//remove all punctuation
// http://stackoverflow.com/questions/421616
sentence1 = new string(
sentence1.Where(c => !char.IsPunctuation(c)).ToArray());
sentence2 = new string(
sentence2.Where(c => !char.IsPunctuation(c)).ToArray());
sentence3 = new string(
sentence3.Where(c => !char.IsPunctuation(c)).ToArray());
//seperate into words
var words1 = sentence1.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words2 = sentence2.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words3 = sentence3.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
//create substring list
var subSentence1 = CreateSubstrings(words1);
var subSentence2 = CreateSubstrings(words2);
var subSentence3 = CreateSubstrings(words3);
//join then like a Sql Table
var subSentences = subSentence1
.Join(subSentence2,
sub1 => sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1,
Sub2 = sub2 })
.Join(subSentence3,
sub1 => sub1.Sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1.Sub1,
Sub2 = sub1.Sub2,
Sub3 = sub2 })
;
//Sorted by Lowest Index, then by Maximum Words
subSentences = subSentences.OrderBy(s => s.Sub1.Rank)
.ThenByDescending(s => s.Sub1.Length)
.ToList();
//Sort by Maximum Words, then Lowest Index
/*subSentences = subSentences.OrderByDescending(s => s.Sub1.Length)
.ThenBy(s => s.Sub1.Rank)
.ToList();//*/
foreach (var subSentence in subSentences)
{
Console.WriteLine(subSentence.Sub1.Length.ToString() + " "
+ subSentence.Sub1.Value);
Console.WriteLine(subSentence.Sub2.Length.ToString() + " "
+ subSentence.Sub2.Value);
Console.WriteLine(subSentence.Sub3.Length.ToString() + " "
+ subSentence.Sub3.Value);
Console.WriteLine("======================================");
}
Console.ReadKey();
}
//this could probably be done better -Erik
internal static List<SubSentence> CreateSubstrings(List<string> words)
{
var result = new List<SubSentence>();
for (int wordIndex = 0; wordIndex < words.Count; wordIndex++)
{
var sentence = new StringBuilder();
int currentWord = wordIndex;
while (currentWord < words.Count - 1)
{
sentence.Append(words.ElementAt(currentWord));
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = currentWord - wordIndex + 1 });
sentence.Append(' ');
currentWord++;
}
sentence.Append(words.Last());
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = words.Count - wordIndex });
}
return result;
}
internal class SubSentence
{
public int Rank { get; set; }
public string Value { get; set; }
public int Length { get; set; }
}
Result:
3 is not your
3 is not your
3 is not your
======================================
2 is not
2 is not
2 is not
======================================
1 is
1 is
1 is
======================================
2 not your
2 not your
2 not your
======================================
1 not
1 not
1 not
======================================
1 your
1 your
1 your
======================================
5 I think you are great
5 I think you are great
5 I think you are great
======================================
4 I think you are
4 I think you are
4 I think you are
======================================
3 I think you
3 I think you
3 I think you
======================================
2 I think
2 I think
2 I think
======================================
1 I
1 I
1 I
======================================
4 think you are great
4 think you are great
4 think you are great
======================================
3 think you are
3 think you are
3 think you are
======================================
2 think you
2 think you
2 think you
======================================
1 think
1 think
1 think
======================================
3 you are great
3 you are great
3 you are great
======================================
2 you are
2 you are
2 you are
======================================
1 you
1 you
1 you
======================================
2 are great
2 are great
2 are great
======================================
1 are
1 are
1 are
======================================
1 great
1 great
1 great
======================================

Here's a little something that will do what you want. You would actually adjust to pre-build your list of strings, pass that in and it will find for you... in this example, the phrase will be based of the string with the shortest string as a baseline.
public void SomeOtherFunc()
{
List<string> MyTest = new List<string>();
MyTest.Add( "Mike is not your average guy. I think you are great." );
MyTest.Add( "Jim is not your friend. I think you are great." );
MyTest.Add( "Being different is not your fault. I think you are great." );
string thePhrase = testPhrase( MyTest );
MessageBox.Show( thePhrase );
}
public string testPhrase(List<string> test)
{
// start with the first string and find the shortest.
// if we can't find a short string in a long, we'll never find a long string in short
// Ex "To testing a string that is longer than some other string"
// vs "Im testing a string that is short"
// Work with the shortest string.
string shortest = test[0];
string lastGoodPhrase = "";
string curTest;
int firstMatch = 0;
int lastMatch = 0;
int allFound;
foreach (string s in test)
if (s.Length < shortest.Length)
shortest = s;
// Now, we need to break the shortest string into each "word"
string[] words = shortest.Split( ' ' );
// Now, start with the first word until it is found in ALL phrases
for (int i = 0; i < words.Length; i++)
{
// to prevent finding "this" vs "is"
lastGoodPhrase = " " + words[i] + " ";
allFound = 0;
foreach (string s in test)
{
// always force leading space for string
if ((" "+s).Contains(lastGoodPhrase))
allFound++;
else
// if not found in ANY string, its not found in all, get out
break;
}
if (allFound == test.Count)
{
// we've identified the first matched field, get out for next phase test
firstMatch = i;
// also set the last common word to the same until we can test next...
lastMatch = i;
break;
}
}
// if no match, get out
if (firstMatch == 0)
return "";
// we DO have at least a first match, now keep looking into each subsequent
// word UNTIL we no longer have a match.
for( int i = 1; i < words.Length - firstMatch; i++ )
{
// From where the first entry was, build out the ENTIRE PHRASE
// until the end of the original sting of words and keep building 1 word back
curTest = " ";
for (int j = firstMatch; j <= firstMatch + i; j++)
curTest += words[j] + " ";
// see if all this is found in ALL strings
foreach (string s in test)
// we know we STARTED with a valid found phrase.
// as soon as a string NO LONGER MATCHES the new phrase,
// return the last VALID phrase
if (!(" " + s).Contains(curTest))
return lastGoodPhrase;
// if this is still a good phrase, set IT as the newest
lastGoodPhrase = curTest;
}
return lastGoodPhrase;
}

Related

Get count of unique characters between first and last letter

I'm trying to get the unique characters count that are between the first and last letter of a word. For example: if I type Yellow the expected output is Y3w, if I type People the output should be P4e and if I type Money the output should be M3y. This is what I tried:
//var strArr = wordToConvert.Split(' ');
string[] strArr = new[] { "Money","Yellow", "People" };
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
//ignore 2-letter words
string newword = null;
int distinctCount = 0;
int k = word.Length;
int samecharcount = 0;
int count = 0;
for (int i = 1; i < k - 2; i++)
{
if (word.ElementAt(i) != word.ElementAt(i + 1))
{
count++;
}
else
{
samecharcount++;
}
}
distinctCount = count + samecharcount;
char frst = word[0];
char last = word[word.Length - 1];
newword = String.Concat(frst, distinctCount.ToString(), last);
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine("Output: " + result);
Console.WriteLine("----------------------------------------------------");
With this code I'm getting the expect output for Yellow, but seems that is not working with People and Money. What can I do to fix this issue or also I'm wondering is maybe there is a better way to do this for example using LINQ/Regex.

Here's an implementation that uses Linq:
string[] strArr = new[]{"Money", "Yellow", "People"};
List<string> newsentence = new List<string>();
foreach (string word in strArr)
{
if (word.Length > 2)
{
// we want the first letter, the last letter, and the distinct count of everything in between
var first = word.First();
var last = word.Last();
var others = word.Skip(1).Take(word.Length - 2);
// Case sensitive
var distinct = others.Distinct();
// Case insensitive
// var distinct = others.Select(c => char.ToLowerInvariant(c)).Distinct();
string newword = first + distinct.Count().ToString() + last;
newsentence.Add(newword);
}
else
{
newsentence.Add(word);
}
}
var result = String.Join(" ", newsentence.ToArray());
Console.WriteLine(result);
Output:
M3y Y3w P4e
Note that this doesn't take account of case, so the output for FiIsSh is 4.

Maybe not the most performant, but here is another example using linq:
var words = new[] { "Money","Yellow", "People" };
var transformedWords = words.Select(Transform);
var sentence = String.Join(' ', transformedWords);
public string Transform(string input)
{
if (input.Length < 3)
{
return input;
}
var count = input.Skip(1).SkipLast(1).Distinct().Count();
return $"{input[0]}{count}{input[^1]}";
}

You can implement it with the help of Linq. e.g. (C# 8+)
private static string EncodeWord(string value) => value.Length <= 2
? value
: $"{value[0]}{value.Substring(1, value.Length - 2).Distinct().Count()}{value[^1]}";
Demo:
string[] tests = new string[] {
"Money","Yellow", "People"
};
var report = string.Join(Environment.NewLine, tests
.Select(test => $"{test} :: {EncodeWord(test)}"));
Console.Write(report);
Outcome:
Money :: M3y
Yellow :: Y3w
People :: P4e

A lot of people have put up some good solutions. I have two solutions for you: one uses LINQ and the other does not.
LINQ, Probably not much different from others
if (str.Length < 3) return str;
var midStr = str.Substring(1, str.Length - 2);
var midCount = midStr.Distinct().Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
Non-LINQ
if (str.Length < 3) return str;
var uniqueLetters = new Dictionary<char, int>();
var midStr = str.Substring(1, str.Length - 2);
foreach (var c in midStr)
{
if (!uniqueLetters.ContainsKey(c))
{
uniqueLetters.Add(c, 0);
}
}
var midCount = uniqueLetters.Keys.Count();
return string.Concat(str[0], midCount, str[str.Length - 1]);
I tested this with the following 6 strings:
Yellow
Money
Purple
Me
You
Hiiiiiiiii
Output:
LINQ: Y3w, Non-LINQ: Y3w
LINQ: M3y, Non-LINQ: M3y
LINQ: P4e, Non-LINQ: P4e
LINQ: Me, Non-LINQ: Me
LINQ: Y1u, Non-LINQ: Y1u
LINQ: H1i, Non-LINQ: H1i
Fiddle
Performance-wise I'd guess they're pretty much the same, if not identical, but I haven't run any real perf test on the two approaches. I can't imagine they'd be much different, if at all. The only real difference is that the second route expands Distinct() into what it probably does under the covers anyway (I haven't looked at the source to see if that's true, but that's a pretty common way to get a count of . And the first route is certainly less code.

I Would use Linq for that purpose:
string[] words = new string[] { "Yellow" , "People", "Money", "Sh" }; // Sh for 2 letter words (or u can insert 0 and then remove the trinary operator)
foreach (string word in words)
{
int uniqeCharsInBetween = word.Substring(1, word.Length - 2).ToCharArray().Distinct().Count();
string result = word[0] + (uniqeCharsInBetween == 0 ? string.Empty : uniqeCharsInBetween.ToString()) + word[word.Length - 1];
Console.WriteLine(result);
}

separate string with characters as number and ) in C#

I don't have much experience in C#. i am getting string from DB like
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name" // String is not fixed any numbers of Steps can be included.
I wanted to separate out above string like
1)step to start workorder
2)step 2 continue
3)issue of workorder
4)create workorder by name (SO ON.....)
i tried following but its static if i get more step it will fail....also solution is not good
string[] stringSeparators = new string[] { "1)", "2)", "3)", "4)" };
string[] strNames = strType.Split(stringSeparators, StringSplitOptions.None );
foreach (string strName in firstNames)
Console.WriteLine(strName);
How can I separate out string based on number and ) characters. best solution for any string...

Try the below code -
var pat = #"\d+[\)]";
var str= "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder40)create workorder by name";
var rgx = new Regex(pat);
var output = new List<string>();
var matches = rgx.Matches(str);
for(int i=0;i<matches.Count-1;i++)
{
output.Add(str.Substring(matches[i].Index, matches[i+1].Index- matches[i].Index));
Console.WriteLine(str.Substring(matches[i].Index, matches[i + 1].Index - matches[i].Index));
}
output.Add(str.Substring(matches[matches.Count - 1].Index));
Console.WriteLine(str.Substring(matches[matches.Count - 1].Index));

A straightforward approach is to split this string using a regular expression, and then work with the matched substrings:
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name";
var matches = Regex.Matches(strType, #"\d+\).*?(?=\d\)|$)");
foreach(Match match in matches)
Console.WriteLine(match.Value);
This will print
1) Step to start workorder 1
2)step 2 continue
3)issue of workorder
4)create workorder by name
The regular expression works as follows:
\d+\): Match "n)", where n is any decimal number
.*?: Match all characters until...
(?=\d\)|$): either the next "n)" follows, or the input string end is reached (this is called a lookahead)
If you want to cleanly replace the numbering by one with a more consistent formatting, you might use
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name";
int ctr = 0;
var matches = Regex.Matches(strType, #"\d+\)\s*(.*?)(?=\d\)|$)");
foreach(Match match in matches)
if(match.Groups.Count > 0)
Console.WriteLine($"{++ctr}) {match.Groups[1]}");
...which outputs
1) Step to start workorder 1
2) step 2 continue
3) issue of workorder
4) create workorder by name
The regular expression works similarly to first approach:
\d+\)\s*: Match "n)" and any following whitespace (to address inconsistent spacing)
(.*?): Match all characters and use this as match group #1
(?=\d\)|$): Lookahead, same as above
Note that only the match group #1 is printed, so the "n)" and the whitespace are omitted.

Assuming the schema is:
"[{number})Text] [{number})Text] [{number})Text]..."
Here is a solution:
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder 4)create workorder by name";
var result = new List<string>();
int count = strType.Count(c => c == ')');
if ( count > 0 )
{
int posCurrent = strType.IndexOf(')');
int delta = posCurrent - 1;
if ( count == 1 && posCurrent > 0)
result.Add(strType.Trim());
else
{
posCurrent = strType.IndexOf(')', posCurrent + 1);
int posFirst = 0;
int posSplit = 0;
do
{
for ( posSplit = posCurrent - 1; posSplit >= 0; posSplit--)
if ( strType[posSplit] == ' ' )
break;
if ( posSplit != -1 && posSplit > posFirst)
{
result.Add(strType.Substring(posFirst, posSplit - posFirst - 1 - 1 + delta).Trim());
posFirst = posSplit + 1;
}
posCurrent = strType.IndexOf(')', posCurrent + 1);
}
while ( posCurrent != -1 && posFirst != -1 );
result.Add(strType.Substring(posFirst).Trim());
}
}
foreach (string item in result)
Console.WriteLine(item);
Console.ReadKey();

You may use Regular Expression to achieve it. Following is code for your reference:
using System.Text.RegularExpressions;
string expr = #"\d+\)";
string[] matches = Regex.Split(strType, expr);
foreach(string m in matches){
Console.WriteLine(m);
}
My system does not have Visual Studio, so please test it in yours. It should be working with minor tweaks.

Parse a multiline email to var

I'm attempting to parse a multi-line email so I can get at the data which is on its own newline under the heading in the body of the email.
It looks like this:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
It appears I am getting everything on each messagebox when I use string reader readline, though all I want is the data under each ------ as shown
This is my code:
foreach (MailItem mail in publicFolder.Items)
{
if (mail != null)
{
if (mail is MailItem)
{
MessageBox.Show(mail.Body, "MailItem body");
// Creates new StringReader instance from System.IO
using (StringReader reader = new StringReader(mail.Body))
{
string line;
while ((line = reader.ReadLine()) !=null)
//Loop over the lines in the string.
if (mail.Body.Contains("Marketing ID"))
{
// var localno = mail.Body.Substring(247,15);//not correct approach
// MessageBox.Show(localrefno);
//MessageBox.Show("found");
//var conexid = mail.Body.Replace(Environment.NewLine);
var regex = new Regex("<br/>", RegexOptions.Singleline);
MessageBox.Show(line.ToString());
}
}
//var stringBuilder = new StringBuilder();
//foreach (var s in mail.Body.Split(' '))
//{
// stringBuilder.Append(s).AppendLine();
//}
//MessageBox.Show(stringBuilder.ToString());
}
else
{
MessageBox.Show("Nothing found for MailItem");
}
}
}
You can see I had numerous attempts with it, even using substring position and using regex. Please help me get the data from each line under the ---.

It is not a very good idea to do that with Regex because it is quite easy to forget the edge cases, not easy to understand, and not easy to debug. It's quite easy to get into a situation that the Regex hangs your CPU and times out. (I cannot make any comment to other answers yet. So, please check at least my other two cases before you pick your final solution.)
In your cases, the following Regex solution works for your provided example. However, some additional limitations are there: You need to make sure there are no empty values in the non-starting or non-ending column. Or, let's say if there are more than two columns and any one of them in the middle is empty will make the names and values of that line mismatched.
Unfortunately, I cannot give you a non-Regex solution because I don't know the spec, e.g.: Will there be empty spaces? Will there be TABs? Does each field has a fixed count of characters or will they be flexible? If it is flexible and can have empty values, what kind of rules to detected which columns are empty? I assume that it is quite possible that they are defined by the column name's length and will have only space as delimiter. If that's the case, there are two ways to solve it, two-pass Regex or write your own parser. If all the fields has fixed length, it would be even more easier to do: Just using the substring to cut the lines and then trim them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public class Record{
public string Name {get;set;}
public string Value {get;set;}
}
public static void Main()
{
var regex = new Regex(#"(?<name>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<name>((?!-)[\w]+[ ]?)+)?)+(?:\r\n|\r|\n)(?>(?<splitters>(-+))(?>[ \t]+)?)+(?:\r\n|\r|\n)(?<value>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<value>((?!-)[\w]+[ ]?)+)?)+", RegexOptions.Compiled);
var testingValue =
#"EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144";
var matches = regex.Matches(testingValue);
var rows = (
from match in matches.OfType<Match>()
let row = (
from grp in match.Groups.OfType<Group>()
select new {grp.Name, Captures = grp.Captures.OfType<Capture>().ToList()}
).ToDictionary(item=>item.Name, item=>item.Captures.OfType<Capture>().ToList())
let names = row.ContainsKey("name")? row["name"] : null
let splitters = row.ContainsKey("splitters")? row["splitters"] : null
let values = row.ContainsKey("value")? row["value"] : null
where names != null && splitters != null &&
names.Count == splitters.Count &&
(values==null || values.Count <= splitters.Count)
select new {Names = names, Values = values}
);
var records = new List<Record>();
foreach(var row in rows)
{
for(int i=0; i< row.Names.Count; i++)
{
records.Add(new Record{Name=row.Names[i].Value, Value=i < row.Values.Count ? row.Values[i].Value : ""});
}
}
foreach(var record in records)
{
Console.WriteLine(record.Name + " = " + record.Value);
}
}
}
output:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID = 3411144
Destination details =
Please note that this also works for this kind of message:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
output:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144
Or this:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
output:
Marketing ID =
Local Number =
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144

var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0, end = 0, length = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var mc = Regex.Matches(lines[i], #"(?:^| )-");
foreach (Match m in mc)
{
int start = m.Value.StartsWith(" ") ? m.Index + 1 : m.Index;
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length - 1) ;
length = Math.Min(end - start, lines[i - 1].Length - start);
string key = length > 0 ? lines[i - 1].Substring(start, length).Trim() : "";
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length) ;
length = Math.Min(end - start, lines[i + 1].Length - start);
string value = length > 0 ? lines[i + 1].Substring(start, length).Trim() : "";
dict.Add(key, value);
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
Live Demo
Using Regular Expressions:
var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var keys = Regex.Matches(lines[i - 1], #"(?:^| )(\w+\s?)+");
var values = Regex.Matches(lines[i + 1], #"(?:^| )(\w+\s?)+");
if (keys.Count == values.Count)
for (int j = 0; j < keys.Count; j++)
dict.Add(keys[j].Value.Trim(), values[j].Value.Trim());
else // remove bug if value of first key in a line has no value
{
if (lines[i + 1].StartsWith(" "))
{
dict.Add(keys[0].Value.Trim(), "");
dict.Add(keys[1].Value.Trim(), values[0].Value.Trim());
}
else
{
dict.Add(keys[0].Value, values[0].Value.Trim());
dict.Add(keys[1].Value.Trim(), "");
}
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
Live Demo

Here is my attempt. I don't know if the email format can change (rows, columns, etc).
I can't think of an easy way to separate the columns besides checking for a double space (my solution).
class Program
{
static void Main(string[] args)
{
var emailBody = GetEmail();
using (var reader = new StringReader(emailBody))
{
var lines = new List<string>();
const int startingRow = 2; // Starting line to read from (start at Marketing ID line)
const int sectionItems = 4; // Header row (ex. Marketing ID & Local Number Line) + Dash Row + Value Row + New Line
// Add all lines to a list
string line = "";
while ((line = reader.ReadLine()) != null)
{
lines.Add(line.Trim()); // Add each line to the list and remove any leading or trailing spaces
}
for (var i = startingRow; i < lines.Count; i += sectionItems)
{
var currentLine = lines[i];
var indexToBeginSeparatingColumns = currentLine.IndexOf(" "); // The first time we see double spaces, we will use as the column delimiter, not the best solution but should work
var header1 = currentLine.Substring(0, indexToBeginSeparatingColumns);
var header2 = currentLine.Substring(indexToBeginSeparatingColumns, currentLine.Length - indexToBeginSeparatingColumns).Trim();
currentLine = lines[i+2]; //Skip dash line
indexToBeginSeparatingColumns = currentLine.IndexOf(" ");
string value1 = "", value2 = "";
if (indexToBeginSeparatingColumns == -1) // Use case of there being no value in the 2nd column, could be better
{
value1 = currentLine.Trim();
}
else
{
value1 = currentLine.Substring(0, indexToBeginSeparatingColumns);
value2 = currentLine.Substring(indexToBeginSeparatingColumns, currentLine.Length - indexToBeginSeparatingColumns).Trim();
}
Console.WriteLine(string.Format("{0},{1},{2},{3}", header1, value1, header2, value2));
}
}
}
static string GetEmail()
{
return #"EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144";
}
}
Output looks something like this:
Marketing ID,GR332230,Local Number,0000232323
Dispatch Code,GX3472,Logic code,1
Destination ID,3411144,Destination details,

Here is an aproach asuming you don't need the headers, info comes in order and mandatory.
This won't work for data that has spaces or optional fields.
foreach (MailItem mail in publicFolder.Items)
{
MessageBox.Show(mail.Body, "MailItem body");
// Split by line, remove dash lines.
var data = Regex.Split(mail.Body, #"\r?\n|\r")
.Where(l => !l.StartsWith('-'))
.ToList();
// Remove headers
for(var i = data.Count -2; lines >= 0; i -2)
{
data.RemoveAt(i);
}
// now data contains only the info you want in the order it was presented.
// Asuming info doesn't have spaces.
var result = data.SelectMany(d => d.Split(' '));
// WARNING: Missing info will not be present.
// {"GR332230", "0000232323", "GX3472", "1", "3411144"}
}

C# Find list string element that suffix of them is greater than others

I have a list string:
["a1","b0","c0","a2","c1","d3","a3"].
I want to get a list ["a3","d3","c1","b0"] base on suffix of them.
Example: "a1","a2","a3" . Result of them is "a3".
This question may be simple but I can't solve.
Thanks for any help!

Following Linq statement does what you need.
var result= input.Select(x=> new {letter = x[0], number = x[1], item=x}) // Separate letter & number.
.GroupBy(x=>x.letter) // Group on letter and take first element (of max number)
.Select(x=> x.OrderByDescending(o=>o.number).First())
.OrderByDescending(x=>x.number) // Order on number.
.Select(x=>x.item) // get the item.
.ToArray();
Output
[
a3
,
d3
,
c1
,
b0
]
Check this Example

Below is an alternative, its quite long mainly because I try to explain every line
// create list based on your original text
var list = new List<string> { "a1", "b0", "c0", "a2", "c1", "d3", "a3" };
// use a dictionary to hold the prefix and max suffixes
var suffixMaxDictionary = new Dictionary<string, int>();
// loop through the list
for (int i = 0; i < list.Count; i++)
{
// get the prefix using Substring()
var prefix = list[i].Substring(0, 1);
// if the prefix already exist in the dictionary then skip it, it's already been processed
if (suffixMaxDictionary.ContainsKey(prefix))
continue; // continue to the next item
// set the max suffix to 0, so it can be checked against
var suffixMax = 0;
// loop through the whole list again to get the suffixes
for (int j = 0; j < list.Count; j++)
{
// get the current prefix in the second loop of the list
var thisprefix = list[j].Substring(0, 1);
// if the prefixes don't match, then skip it
// e.g. prefix = "a" and thisprefix = "b", then skip it
if (prefix != thisprefix)
continue;
// get the suffix
// warning though, it assumes 2 things:
// 1. that the second character is a number
// 2. there will only ever be numbers 0-9 as the second character
var thisSuffix = Convert.ToInt32(list[j].Substring(1, 1));
// check the current suffix number (thisSuffix) compared the suffixMax value
if (thisSuffix > suffixMax)
{
// if thisSuffix > suffixMax, set suffixMax to thisSuffix
// and it will now become the new max value
suffixMax = thisSuffix;
}
}
// add the prefix and the max suffix to the dictionary
suffixMaxDictionary.Add(prefix, suffixMax);
}
// print the values to the console
Console.WriteLine("original: \t" + string.Join(",", list));
Console.WriteLine("result: \t" + string.Join(",", suffixMaxDictionary));
Console.ReadLine();
See also https://dotnetfiddle.net/BmvFEp, thanks #Hari Prasad, I didn't know there was a fiddle for .net

This will give you the first instance of the largest "suffix" as described in the question:
string[] test = { "a3", "d3", "c1", "b0" };
string testResult = test.FirstOrDefault(s => s.Last<char>() == s.Max(t => s.Last<char>()));
In this case the result is "a3"

comparing string and variable but failing based on contains

What I have going on is I have two files. Both files are delimited by '|'. If file 1 matches a line in file 2 I need to combine the lines. Here is the code:
string[] mathlines = File.ReadAllLines(#"C:\math.txt");
var addlines = File.ReadAllLines(#"K:\add.txt");
foreach (string ml in mathlines)
{
string[] parse = ml.Split('|');
if (addlines.Contains(parse[0]))
{
File.AppendAllText(#"C:\final.txt", parse[0]+"|"+parse[1]+"\n");
}
else
{
File.AppendAllText(#"C:\final.txt", ml + "\n");
}
}
I realize that the math part isn't setup yet, but I need to get the match part working.
Here is an example:
mathlines:
dart|504.91
GI|1782.06
Gcel|194.52
clay|437.35
grado|217.77
greGCR|14.82
rp|372.54
rp2|11.92
gsg|349.92
GSxil|4520.55
addlines:
Gimet|13768994304
GSxil|394735896576
Ho|4994967296
gen|485331304448
GSctal|23482733690
Obr|88899345920
As you can see mathlines contains GSxil and so does addlines but my if (addlines.Contains) never fines the variable in addlines. Any help is always loved! Thanks.
Sorry forgot to mention that I need it to match exactly on the comparison. Also i need to split out the variable on the correct line that matches. So I would need to split out the 394735896576 this example and then append the 394735896576.

addLines.Contains(parse[0]) is going to match on the entire string; you need to match based on part. There are more efficient solutions, but a O(n^2) option is to use LINQ Any():
if (addLines.Any(l => l.StartsWith(parse[0])))
{
...

You could load all lines from addlines.txt into a dictionary and then use that to find a match for each line in mathlines.txt. This method would be much faster than what you have currently.
string[] mathlines = File.ReadAllLines(#"C:\math.txt");
string[] addlines = File.ReadAllLines(#"K:\addlines.txt");
string[] finallines = new string[mathlines.Length];
var addlinesLookup = new Dictionary<string, string>();
for (int i = 0; i < addlines.Length; i++)
{
string[] parts = addlines[i].Split('|');
if (parts.Length == 2) // Will there ever be more than 2 parts?
{
addlinesLookup.Add(parts[0], parts[1]);
}
}
for (int i = 0; i < mathlines.Length; i++)
{
string[] parts = mathlines[i].Split('|');
if (parts.Length >= 1)
{
if (addlinesLookup.ContainsKey(parts[0]))
{
finallines[i] = mathlines[i] + "|" + addlinesLookup[parts[0]] + "\n";
}
{
finallines[i] = mathlines[i] + "\n";
}
}
}
File.AppendAllLines(#"C:\final.txt", finallines, Encoding.ASCII);

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Finding the First Common Substring of a set of strings - c#

Related

Get count of unique characters between first and last letter

separate string with characters as number and ) in C#

Parse a multiline email to var

C# Find list string element that suffix of them is greater than others

comparing string and variable but failing based on contains

Categories

Resources