separate string with characters as number and ) in C# - c#

I don't have much experience in C#. i am getting string from DB like
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name" // String is not fixed any numbers of Steps can be included.
I wanted to separate out above string like
1)step to start workorder
2)step 2 continue
3)issue of workorder
4)create workorder by name (SO ON.....)
i tried following but its static if i get more step it will fail....also solution is not good
string[] stringSeparators = new string[] { "1)", "2)", "3)", "4)" };
string[] strNames = strType.Split(stringSeparators, StringSplitOptions.None );
foreach (string strName in firstNames)
Console.WriteLine(strName);
How can I separate out string based on number and ) characters. best solution for any string...

Try the below code -
var pat = #"\d+[\)]";
var str= "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder40)create workorder by name";
var rgx = new Regex(pat);
var output = new List<string>();
var matches = rgx.Matches(str);
for(int i=0;i<matches.Count-1;i++)
{
output.Add(str.Substring(matches[i].Index, matches[i+1].Index- matches[i].Index));
Console.WriteLine(str.Substring(matches[i].Index, matches[i + 1].Index - matches[i].Index));
}
output.Add(str.Substring(matches[matches.Count - 1].Index));
Console.WriteLine(str.Substring(matches[matches.Count - 1].Index));

A straightforward approach is to split this string using a regular expression, and then work with the matched substrings:
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name";
var matches = Regex.Matches(strType, #"\d+\).*?(?=\d\)|$)");
foreach(Match match in matches)
Console.WriteLine(match.Value);
This will print
1) Step to start workorder 1
2)step 2 continue
3)issue of workorder
4)create workorder by name
The regular expression works as follows:
\d+\): Match "n)", where n is any decimal number
.*?: Match all characters until...
(?=\d\)|$): either the next "n)" follows, or the input string end is reached (this is called a lookahead)
If you want to cleanly replace the numbering by one with a more consistent formatting, you might use
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder4)create workorder by name";
int ctr = 0;
var matches = Regex.Matches(strType, #"\d+\)\s*(.*?)(?=\d\)|$)");
foreach(Match match in matches)
if(match.Groups.Count > 0)
Console.WriteLine($"{++ctr}) {match.Groups[1]}");
...which outputs
1) Step to start workorder 1
2) step 2 continue
3) issue of workorder
4) create workorder by name
The regular expression works similarly to first approach:
\d+\)\s*: Match "n)" and any following whitespace (to address inconsistent spacing)
(.*?): Match all characters and use this as match group #1
(?=\d\)|$): Lookahead, same as above
Note that only the match group #1 is printed, so the "n)" and the whitespace are omitted.

Assuming the schema is:
"[{number})Text] [{number})Text] [{number})Text]..."
Here is a solution:
string strType = "1) Step to start workorder 1 2)step 2 continue 3)issue of workorder 4)create workorder by name";
var result = new List<string>();
int count = strType.Count(c => c == ')');
if ( count > 0 )
{
int posCurrent = strType.IndexOf(')');
int delta = posCurrent - 1;
if ( count == 1 && posCurrent > 0)
result.Add(strType.Trim());
else
{
posCurrent = strType.IndexOf(')', posCurrent + 1);
int posFirst = 0;
int posSplit = 0;
do
{
for ( posSplit = posCurrent - 1; posSplit >= 0; posSplit--)
if ( strType[posSplit] == ' ' )
break;
if ( posSplit != -1 && posSplit > posFirst)
{
result.Add(strType.Substring(posFirst, posSplit - posFirst - 1 - 1 + delta).Trim());
posFirst = posSplit + 1;
}
posCurrent = strType.IndexOf(')', posCurrent + 1);
}
while ( posCurrent != -1 && posFirst != -1 );
result.Add(strType.Substring(posFirst).Trim());
}
}
foreach (string item in result)
Console.WriteLine(item);
Console.ReadKey();

You may use Regular Expression to achieve it. Following is code for your reference:
using System.Text.RegularExpressions;
string expr = #"\d+\)";
string[] matches = Regex.Split(strType, expr);
foreach(string m in matches){
Console.WriteLine(m);
}
My system does not have Visual Studio, so please test it in yours. It should be working with minor tweaks.

Related

Can't properly rebuild a string with Replacement values from Dictionary

I am trying to build a file using a template. I am processing the file in a while loop line by line. The first section of the file, first 35 lines are header information. The infromation is surrounded by # signs. Take this string for example:
Field InspectionStationID 3 {"PVA TePla #WSM#", "sw#data.tool_context.TOOL_SOFTWARE_VERSION#", "#data.context.TOOL_ENTITY#"}
The expected output should be:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
This header section uses a different mapping than the rest of the file so I wanted to parse the file line by line from top to bottom and use a different logic for each section so that I don't waste time parsing the entire file at once multiple times for different sections.
The logic uses two dictionaries populated from an xml file. Because the file has mutliple tables, I combined them in the two dictionaries like so:
var headerCdataIndexKeyVals = Dictionary<string, int>(){
{"data.tool_context.TOOL_SOFTWARE_VERSION", 1},
{"data.context.TOOL_ENTITY",0}
};
var headerCdataArrayKeyVals = new Dictionary<string, List<string>>();
var tool_contextCdataList = new list <string>{"HM654", "sw0.2.002"};
var contextCdataList = new List<string>{"WSM102"}
headerCdataArrayKeyVals.add("tool_context", tool_contextCdataList);
headerCdataArrayKeyVals.add("context", contextCdataList);
To help me map the values to their respective positions in the string in one go and without having to loop through multiple dictionaries.
I am using the following logic:
public static string FindSubsInDelimetersAndReturn(string str, char openDelimiter, char closeDelimiter, HeaderMapperData mapperData )
{
string newString = string.Empty;
// Stores the indices of
Stack <int> dels = new Stack <int>();
for (int i = 0; i < str.Length; i++)
{
var let = str[i];
// If opening delimeter
// is encountered
if (str[i] == openDelimiter && dels.Count == 0)
{
dels.Push(i);
}
// If closing delimeter
// is encountered
else if (str[i] == closeDelimiter && dels.Count > 0)
{
// Extract the position
// of opening delimeter
int pos = dels.Peek();
dels.Pop();
// Length of substring
int len = i - 1 - pos;
// Extract the substring
string headerSubstring = str.Substring(pos + 1, len);
bool hasKey = mapperData.HeaderCdataIndexKeyVals.TryGetValue(headerSubstring.ToUpper(), out int headerCdataIndex);
string[] headerSubstringSplit = headerSubstring.Split('.');
string headerCDataVal = string.Empty;
if (hasKey)
{
if (headerSubstring.Contains("CONTAINER.CONTEXT", StringComparison.OrdinalIgnoreCase))
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper() + '.' + headerSubstringSplit[2].ToUpper()][headerCdataIndex];
//mapperData.HeaderCdataArrayKeyVals[]
}
else
{
headerCDataVal = mapperData.HeaderCdataArrayKeyVals[headerSubstringSplit[1].ToUpper()][headerCdataIndex];
}
string strToReplace = openDelimiter + headerSubstring + closeDelimiter;
string sub = str.Remove(i + 1);
sub = sub.Replace(strToReplace, headerCDataVal);
newString += sub;
}
else if (headerSubstring == "WSM" && closeDelimiter == '#')
{
string sub = str.Remove(len + 1);
newString += sub.Replace(openDelimiter + headerSubstring + closeDelimiter, "");
}
else
{
newString += let;
}
}
}
return newString;
}
}
But my output turns out to be:
"\tFie\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw0.2.002\tField InspectionStationID 3 {\"PVA TePla#WSM#\", \"sw#data.tool_context.TOOL_SOFTWARE_VERSION#\", \"WSM102"
Can someone help understand why this is happening and how I can go about correcting it so I get the output:
Field InspectionStationID 3 {"PVA TePla", "sw0.2.002", "WSM102"}
Am i even trying to solve this the right way or is there a better cleaner way to do it? Btw if the key is not in the dictionary I replace it with empty string

Parse a multiline email to var

I'm attempting to parse a multi-line email so I can get at the data which is on its own newline under the heading in the body of the email.
It looks like this:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
It appears I am getting everything on each messagebox when I use string reader readline, though all I want is the data under each ------ as shown
This is my code:
foreach (MailItem mail in publicFolder.Items)
{
if (mail != null)
{
if (mail is MailItem)
{
MessageBox.Show(mail.Body, "MailItem body");
// Creates new StringReader instance from System.IO
using (StringReader reader = new StringReader(mail.Body))
{
string line;
while ((line = reader.ReadLine()) !=null)
//Loop over the lines in the string.
if (mail.Body.Contains("Marketing ID"))
{
// var localno = mail.Body.Substring(247,15);//not correct approach
// MessageBox.Show(localrefno);
//MessageBox.Show("found");
//var conexid = mail.Body.Replace(Environment.NewLine);
var regex = new Regex("<br/>", RegexOptions.Singleline);
MessageBox.Show(line.ToString());
}
}
//var stringBuilder = new StringBuilder();
//foreach (var s in mail.Body.Split(' '))
//{
// stringBuilder.Append(s).AppendLine();
//}
//MessageBox.Show(stringBuilder.ToString());
}
else
{
MessageBox.Show("Nothing found for MailItem");
}
}
}
You can see I had numerous attempts with it, even using substring position and using regex. Please help me get the data from each line under the ---.
It is not a very good idea to do that with Regex because it is quite easy to forget the edge cases, not easy to understand, and not easy to debug. It's quite easy to get into a situation that the Regex hangs your CPU and times out. (I cannot make any comment to other answers yet. So, please check at least my other two cases before you pick your final solution.)
In your cases, the following Regex solution works for your provided example. However, some additional limitations are there: You need to make sure there are no empty values in the non-starting or non-ending column. Or, let's say if there are more than two columns and any one of them in the middle is empty will make the names and values of that line mismatched.
Unfortunately, I cannot give you a non-Regex solution because I don't know the spec, e.g.: Will there be empty spaces? Will there be TABs? Does each field has a fixed count of characters or will they be flexible? If it is flexible and can have empty values, what kind of rules to detected which columns are empty? I assume that it is quite possible that they are defined by the column name's length and will have only space as delimiter. If that's the case, there are two ways to solve it, two-pass Regex or write your own parser. If all the fields has fixed length, it would be even more easier to do: Just using the substring to cut the lines and then trim them.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text.RegularExpressions;
public class Program
{
public class Record{
public string Name {get;set;}
public string Value {get;set;}
}
public static void Main()
{
var regex = new Regex(#"(?<name>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<name>((?!-)[\w]+[ ]?)+)?)+(?:\r\n|\r|\n)(?>(?<splitters>(-+))(?>[ \t]+)?)+(?:\r\n|\r|\n)(?<value>((?!-)[\w]+[ ]?)*)(?>(?>[ \t]+)?(?<value>((?!-)[\w]+[ ]?)+)?)+", RegexOptions.Compiled);
var testingValue =
#"EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144";
var matches = regex.Matches(testingValue);
var rows = (
from match in matches.OfType<Match>()
let row = (
from grp in match.Groups.OfType<Group>()
select new {grp.Name, Captures = grp.Captures.OfType<Capture>().ToList()}
).ToDictionary(item=>item.Name, item=>item.Captures.OfType<Capture>().ToList())
let names = row.ContainsKey("name")? row["name"] : null
let splitters = row.ContainsKey("splitters")? row["splitters"] : null
let values = row.ContainsKey("value")? row["value"] : null
where names != null && splitters != null &&
names.Count == splitters.Count &&
(values==null || values.Count <= splitters.Count)
select new {Names = names, Values = values}
);
var records = new List<Record>();
foreach(var row in rows)
{
for(int i=0; i< row.Names.Count; i++)
{
records.Add(new Record{Name=row.Names[i].Value, Value=i < row.Values.Count ? row.Values[i].Value : ""});
}
}
foreach(var record in records)
{
Console.WriteLine(record.Name + " = " + record.Value);
}
}
}
output:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID = 3411144
Destination details =
Please note that this also works for this kind of message:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
output:
Marketing ID = GR332230
Local Number = 0000232323
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144
Or this:
EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144
output:
Marketing ID =
Local Number =
Dispatch Code = GX3472
Logic code = 1
Destination ID =
Destination details = 3411144
var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0, end = 0, length = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var mc = Regex.Matches(lines[i], #"(?:^| )-");
foreach (Match m in mc)
{
int start = m.Value.StartsWith(" ") ? m.Index + 1 : m.Index;
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length - 1) ;
length = Math.Min(end - start, lines[i - 1].Length - start);
string key = length > 0 ? lines[i - 1].Substring(start, length).Trim() : "";
end = start;
while (lines[i][end++] == '-' && end < lines[i].Length) ;
length = Math.Min(end - start, lines[i + 1].Length - start);
string value = length > 0 ? lines[i + 1].Substring(start, length).Trim() : "";
dict.Add(key, value);
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
Live Demo
Using Regular Expressions:
var dict = new Dictionary<string, string>();
try
{
var lines = email.Split(Environment.NewLine.ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
int starts = 0;
while (!lines[starts + 1].StartsWith("-")) starts++;
for (int i = starts + 1; i < lines.Length; i += 3)
{
var keys = Regex.Matches(lines[i - 1], #"(?:^| )(\w+\s?)+");
var values = Regex.Matches(lines[i + 1], #"(?:^| )(\w+\s?)+");
if (keys.Count == values.Count)
for (int j = 0; j < keys.Count; j++)
dict.Add(keys[j].Value.Trim(), values[j].Value.Trim());
else // remove bug if value of first key in a line has no value
{
if (lines[i + 1].StartsWith(" "))
{
dict.Add(keys[0].Value.Trim(), "");
dict.Add(keys[1].Value.Trim(), values[0].Value.Trim());
}
else
{
dict.Add(keys[0].Value, values[0].Value.Trim());
dict.Add(keys[1].Value.Trim(), "");
}
}
}
}
catch (Exception ex)
{
throw new Exception("Email is not in correct format");
}
Live Demo
Here is my attempt. I don't know if the email format can change (rows, columns, etc).
I can't think of an easy way to separate the columns besides checking for a double space (my solution).
class Program
{
static void Main(string[] args)
{
var emailBody = GetEmail();
using (var reader = new StringReader(emailBody))
{
var lines = new List<string>();
const int startingRow = 2; // Starting line to read from (start at Marketing ID line)
const int sectionItems = 4; // Header row (ex. Marketing ID & Local Number Line) + Dash Row + Value Row + New Line
// Add all lines to a list
string line = "";
while ((line = reader.ReadLine()) != null)
{
lines.Add(line.Trim()); // Add each line to the list and remove any leading or trailing spaces
}
for (var i = startingRow; i < lines.Count; i += sectionItems)
{
var currentLine = lines[i];
var indexToBeginSeparatingColumns = currentLine.IndexOf(" "); // The first time we see double spaces, we will use as the column delimiter, not the best solution but should work
var header1 = currentLine.Substring(0, indexToBeginSeparatingColumns);
var header2 = currentLine.Substring(indexToBeginSeparatingColumns, currentLine.Length - indexToBeginSeparatingColumns).Trim();
currentLine = lines[i+2]; //Skip dash line
indexToBeginSeparatingColumns = currentLine.IndexOf(" ");
string value1 = "", value2 = "";
if (indexToBeginSeparatingColumns == -1) // Use case of there being no value in the 2nd column, could be better
{
value1 = currentLine.Trim();
}
else
{
value1 = currentLine.Substring(0, indexToBeginSeparatingColumns);
value2 = currentLine.Substring(indexToBeginSeparatingColumns, currentLine.Length - indexToBeginSeparatingColumns).Trim();
}
Console.WriteLine(string.Format("{0},{1},{2},{3}", header1, value1, header2, value2));
}
}
}
static string GetEmail()
{
return #"EMAIL STARTING IN APRIL
Marketing ID Local Number
------------------- ----------------------
GR332230 0000232323
Dispatch Code Logic code
----------------- -------------------
GX3472 1
Destination ID Destination details
----------------- -------------------
3411144";
}
}
Output looks something like this:
Marketing ID,GR332230,Local Number,0000232323
Dispatch Code,GX3472,Logic code,1
Destination ID,3411144,Destination details,
Here is an aproach asuming you don't need the headers, info comes in order and mandatory.
This won't work for data that has spaces or optional fields.
foreach (MailItem mail in publicFolder.Items)
{
MessageBox.Show(mail.Body, "MailItem body");
// Split by line, remove dash lines.
var data = Regex.Split(mail.Body, #"\r?\n|\r")
.Where(l => !l.StartsWith('-'))
.ToList();
// Remove headers
for(var i = data.Count -2; lines >= 0; i -2)
{
data.RemoveAt(i);
}
// now data contains only the info you want in the order it was presented.
// Asuming info doesn't have spaces.
var result = data.SelectMany(d => d.Split(' '));
// WARNING: Missing info will not be present.
// {"GR332230", "0000232323", "GX3472", "1", "3411144"}
}

C# parse a line and extract all integers enclosed in "" in a given line

I am processing a file(could be a cs, xml or any)where i need to extract strings that are in format "123". Any number enclosed in "" could range from 1 to 10000.
Here is what i used but it does not return multiple matches
Expected output: "828", "9999"
My code:
var match = Regex.Match(line,"\"\\d*\"");
if (match.Success)
{
lstStringIds.Add(match.Value);
}
My match always gives only one match. How do i get multiple matches of integers??
Test it:
string myline = #"""123"" ""5587"" ""9"" ""7896""";
var resultlist = Regex.Matches(myline, #"\d+").Cast<Match>()
.Select(x=>x.Value).ToList();
Returns:
123
5587
9
7896
For further information, please see: Regex.Matches Method (String, String, RegexOptions)
Non-LINQ approach.
string line = "\"802\" and \"1009\" and \"1.0\" and \"10001\" and \"10000\"";
var lstStringIds = new List<String>();
var match = Regex.Match(line, "\"(?:\\d{1,4}|10000)\"");
while (match.Success)
{
lstStringIds.Add(match.ToString());
match = match.NextMatch();
}
Returns:
"802"
"1009"
"10000"
for (int count = 0; count < input_string.Length; count ++)
{
if ((input_string[count] == //first number) && (input_string[count + 1] == //second number) && (input_string[count + 2] == //third number))
{
lstStringIds.Add(input_count[count]);
// [count + 1]
// [count + 2]
}
}
That's how I would cycle through a list to find substrings that are equal to a condition. Let me know if that wasn't helpful or wasn't what you were looking for.
string line = "\"100\", \"200\" ";
var match = Regex.Match(line, "(\"\\d*\")");
ArrayList al = new ArrayList();
while(match.Success && match != null)
{
al.Add(match.Value);
match = match.NextMatch();
}

Finding the First Common Substring of a set of strings

I am looking for an implementation of a First Common Substring
Mike is not your average guy. I think you are great.
Jim is not your friend. I think you are great.
Being different is not your fault. I think you are great.
Using a Longest Common Substring implementation (and ignoring punctuation), you would get "I think you are great", but I am looking for the first occurring common substring, in this example:
is not your
Perhaps an implementation that generates and ordered list of all common substrings that I can just take the first from.
Edit
The tokens being compared would be complete words. Looking for a greedy match of the first longest sequence of whole words. (Assuming a suffix tree was used in the approach, each node of the tree would be a word)
There are quite a few steps to do this.
Remove Punctuation
Break down Sentences into list of Words
Create string of all combinations of contiguous words (min:1, max:wordCount)
Join the three lists on new list of string (subsentences)
Sort Accordingly.
Code:
static void Main(string[] args)
{
var sentence1 = "Mike is not your average guy. I think you are great.";
var sentence2 = "Jim is not your friend. I think you are great.";
var sentence3 = "Being different is not your fault. I think you are great.";
//remove all punctuation
// http://stackoverflow.com/questions/421616
sentence1 = new string(
sentence1.Where(c => !char.IsPunctuation(c)).ToArray());
sentence2 = new string(
sentence2.Where(c => !char.IsPunctuation(c)).ToArray());
sentence3 = new string(
sentence3.Where(c => !char.IsPunctuation(c)).ToArray());
//seperate into words
var words1 = sentence1.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words2 = sentence2.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
var words3 = sentence3.Split(new char[] { ' ' },
StringSplitOptions.RemoveEmptyEntries).ToList();
//create substring list
var subSentence1 = CreateSubstrings(words1);
var subSentence2 = CreateSubstrings(words2);
var subSentence3 = CreateSubstrings(words3);
//join then like a Sql Table
var subSentences = subSentence1
.Join(subSentence2,
sub1 => sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1,
Sub2 = sub2 })
.Join(subSentence3,
sub1 => sub1.Sub1.Value,
sub2 => sub2.Value,
(sub1, sub2) => new { Sub1 = sub1.Sub1,
Sub2 = sub1.Sub2,
Sub3 = sub2 })
;
//Sorted by Lowest Index, then by Maximum Words
subSentences = subSentences.OrderBy(s => s.Sub1.Rank)
.ThenByDescending(s => s.Sub1.Length)
.ToList();
//Sort by Maximum Words, then Lowest Index
/*subSentences = subSentences.OrderByDescending(s => s.Sub1.Length)
.ThenBy(s => s.Sub1.Rank)
.ToList();//*/
foreach (var subSentence in subSentences)
{
Console.WriteLine(subSentence.Sub1.Length.ToString() + " "
+ subSentence.Sub1.Value);
Console.WriteLine(subSentence.Sub2.Length.ToString() + " "
+ subSentence.Sub2.Value);
Console.WriteLine(subSentence.Sub3.Length.ToString() + " "
+ subSentence.Sub3.Value);
Console.WriteLine("======================================");
}
Console.ReadKey();
}
//this could probably be done better -Erik
internal static List<SubSentence> CreateSubstrings(List<string> words)
{
var result = new List<SubSentence>();
for (int wordIndex = 0; wordIndex < words.Count; wordIndex++)
{
var sentence = new StringBuilder();
int currentWord = wordIndex;
while (currentWord < words.Count - 1)
{
sentence.Append(words.ElementAt(currentWord));
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = currentWord - wordIndex + 1 });
sentence.Append(' ');
currentWord++;
}
sentence.Append(words.Last());
result.Add(new SubSentence() { Rank = wordIndex,
Value = sentence.ToString(),
Length = words.Count - wordIndex });
}
return result;
}
internal class SubSentence
{
public int Rank { get; set; }
public string Value { get; set; }
public int Length { get; set; }
}
Result:
3 is not your
3 is not your
3 is not your
======================================
2 is not
2 is not
2 is not
======================================
1 is
1 is
1 is
======================================
2 not your
2 not your
2 not your
======================================
1 not
1 not
1 not
======================================
1 your
1 your
1 your
======================================
5 I think you are great
5 I think you are great
5 I think you are great
======================================
4 I think you are
4 I think you are
4 I think you are
======================================
3 I think you
3 I think you
3 I think you
======================================
2 I think
2 I think
2 I think
======================================
1 I
1 I
1 I
======================================
4 think you are great
4 think you are great
4 think you are great
======================================
3 think you are
3 think you are
3 think you are
======================================
2 think you
2 think you
2 think you
======================================
1 think
1 think
1 think
======================================
3 you are great
3 you are great
3 you are great
======================================
2 you are
2 you are
2 you are
======================================
1 you
1 you
1 you
======================================
2 are great
2 are great
2 are great
======================================
1 are
1 are
1 are
======================================
1 great
1 great
1 great
======================================
Here's a little something that will do what you want. You would actually adjust to pre-build your list of strings, pass that in and it will find for you... in this example, the phrase will be based of the string with the shortest string as a baseline.
public void SomeOtherFunc()
{
List<string> MyTest = new List<string>();
MyTest.Add( "Mike is not your average guy. I think you are great." );
MyTest.Add( "Jim is not your friend. I think you are great." );
MyTest.Add( "Being different is not your fault. I think you are great." );
string thePhrase = testPhrase( MyTest );
MessageBox.Show( thePhrase );
}
public string testPhrase(List<string> test)
{
// start with the first string and find the shortest.
// if we can't find a short string in a long, we'll never find a long string in short
// Ex "To testing a string that is longer than some other string"
// vs "Im testing a string that is short"
// Work with the shortest string.
string shortest = test[0];
string lastGoodPhrase = "";
string curTest;
int firstMatch = 0;
int lastMatch = 0;
int allFound;
foreach (string s in test)
if (s.Length < shortest.Length)
shortest = s;
// Now, we need to break the shortest string into each "word"
string[] words = shortest.Split( ' ' );
// Now, start with the first word until it is found in ALL phrases
for (int i = 0; i < words.Length; i++)
{
// to prevent finding "this" vs "is"
lastGoodPhrase = " " + words[i] + " ";
allFound = 0;
foreach (string s in test)
{
// always force leading space for string
if ((" "+s).Contains(lastGoodPhrase))
allFound++;
else
// if not found in ANY string, its not found in all, get out
break;
}
if (allFound == test.Count)
{
// we've identified the first matched field, get out for next phase test
firstMatch = i;
// also set the last common word to the same until we can test next...
lastMatch = i;
break;
}
}
// if no match, get out
if (firstMatch == 0)
return "";
// we DO have at least a first match, now keep looking into each subsequent
// word UNTIL we no longer have a match.
for( int i = 1; i < words.Length - firstMatch; i++ )
{
// From where the first entry was, build out the ENTIRE PHRASE
// until the end of the original sting of words and keep building 1 word back
curTest = " ";
for (int j = firstMatch; j <= firstMatch + i; j++)
curTest += words[j] + " ";
// see if all this is found in ALL strings
foreach (string s in test)
// we know we STARTED with a valid found phrase.
// as soon as a string NO LONGER MATCHES the new phrase,
// return the last VALID phrase
if (!(" " + s).Contains(curTest))
return lastGoodPhrase;
// if this is still a good phrase, set IT as the newest
lastGoodPhrase = curTest;
}
return lastGoodPhrase;
}

C# Index of for space and next informations

Please, can you help me please. I have complete select adress from DB but this adress contains adress and house number but i need separately adress and house number.
I created two list for this distribution.
while (reader_org.Read())
{
string s = reader_org.GetString(0);
string ulice, cp, oc;
char mezera = ' ';
if (s.Contains(mezera))
{
Match m = Regex.Match(s, #"(\d+)");
string numStr = m.Groups[0].Value;
if (numStr.Length > 0)
{
s = s.Replace(numStr, "").Trim();
int number = Convert.ToInt32(numStr);
}
Match l = Regex.Match(s, #"(\d+)");
string numStr2 = l.Groups[0].Value;
if (numStr2.Length > 0)
{
s = s.Replace(numStr2, "").Trim();
int number = Convert.ToInt32(numStr2);
}
if (s.Contains('/'))
s = s.Replace('/', ' ').Trim();
MessageBox.Show("Adresa: " + s);
MessageBox.Show("CP:" + numStr);
MessageBox.Show("OC:" + numStr2);
}
else
{
Definitions.Ulice.Add(s);
}
}
You might find the street name consists of multiple words, or the number appears before the street name. Also potentially some houses might not have a number. Here's a way of dealing with all that.
//extract the first number found in the address string, wherever that number is.
Match m = Regex.Match(address, #"((\d+)/?(\d+))");
string numStr = m.Groups[0].Value;
string streetName = address.Replace(numStr, "").Trim();
//if a number was found then convert it to numeric
//also remove it from the address string, so now the address string only
//contains the street name
if (numStr.Length > 0)
{
string streetName = address.Replace(numStr, "").Trim();
if (numStr.Contains('/'))
{
int num1 = Convert.ToInt32(m.Groups[2].Value);
int num2 = Convert.ToInt32(m.Groups[3].Value);
}
else
{
int number = Convert.ToInt32(numStr);
}
}
Use .Split on your string that results. Then you can index into the result and get the parts of your string.
var parts = s.Split(' ');
// you can get parts[0] etc to access each part;
using (SqlDataReader reader_org = select_org.ExecuteReader())
{
while (reader_org.Read())
{
string s = reader_org.GetString(0); // this return me for example Karlínkova 514 but i need separately adress (karlínkova) and house number (514) with help index of or better functions. But now i dont know how can i make it.
var values = s.Split(' ');
var address = values.Count > 0 ? values[0]: null;
var number = values.Count > 1 ? int.Parse(values[1]) : 0;
//Do what ever you want with address and number here...
}
Here is a way to split it the address into House Number and Address without regex and only using the functions of the String class.
var fullAddress = "1111 Awesome Point Way NE, WA 98122";
var index = fullAddress.IndexOf(" "); //Gets the first index of space
var houseNumber = fullAddress.Remove(index);
var address = fullAddress.Remove(0, (index + 1));
Console.WriteLine(houseNumber);
Console.WriteLine(address);
Output: 1111
Output: Awesome Point Way NE, WA 98122

Categories