How to Split an Already Split String - c#

I have a code as below.
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
I'm getting values like "Under 56.5 Points (+56.5)".
Now I want to split again with everything after '(' for each items in the list so i will get a new list and can use it. How can I do that?

if you want to extract value inside parenthesis:
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (!test.Exists(str => str == splitText[0]))
if(splitText[0].Contains("("))
test.Add(splitText[0].Split('(', ')')[1]);
else
test.Add(splitText[0]);
}

Well, assuming you are after a solution without regular expressions, and that you have a List<string> test declared, you can follow up with a substring, with indexes (and some error handling):
foreach (var item in betSlipwithoutStake)
{
test1 = item.Text;
splitText = test1.Split(new char[] { ':' }, StringSplitOptions.RemoveEmptyEntries);
if (splitText.Length == 0)
continue;
string stringToCheck = splitText[0];
int openParenIndex = stringToCheck.IndexOf('(');
int closeParenIndex = stringToCheck.LastIndexOf(')');
if (openParenIndex >=0 && closeParenIndex >= 0)
{
// get what's inside the outermost set of parens
int length = closeParenIndex - openParenIndex + 1;
stringToCheck = stringToCheck.Substring(openParenIndex, length);
}
if (!test.Exists(str => str == splitText[0]))
test.Add(splitText[0]);
}
You can find out about all of the methods to use with strings here.

Related

Extract multiple substring in the same line

I'm trying to build a logparser but i'm stuck.
Right now my program goes trough multiple file in a directory and read all the file line by line.
I was able to identify the substring i was looking for "fct=" and extract the value next to the "=" using delimiter but i notice that when i have a line with more then one "fct=" it doesnt see it.
So i restart my code and i find a way to get the index position of all occurence of fct= in the same line using an extension method that put the index in a list but i dont see how i can use this list to get the value next to the "=" and using my delimiter.
How can i extract the value next to the "=" knowing the start position of "fct=" and the delimiter at the end of the wanted value?
I'm starting in C# so let me know if i can give you more information.
Thanks,
Here's an example of what i would like to parse:
<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>
I would like t retrieve 10019,666 and 4515.
namespace LogParserV1
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string[] dirs = Directory.GetFiles(#"C:/LogParser/LogParserV1", "*.txt");
string fctnumber;
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
foreach (string fileName in dirs)
{
StreamReader sr = new StreamReader(fileName);
{
String lineRead;
while ((lineRead = sr.ReadLine()) != null)
{
if (lineRead.Contains("fct="))
{
List<int> list = MyExtensions.GetPositions(lineRead, "fct");
//int start = lineRead.IndexOf("fct=") + 4;
// int end = lineRead.IndexOfAny(enddelimiter, start);
//string result = lineRead.Substring(start, end - start);
fctnumber = result;
//System.Console.WriteLine(fctnumber);
list.ForEach(Console.WriteLine);
}
// affiche tout les ligne System.Console.WriteLine(lineRead);
counter++;
}
System.Console.WriteLine(fileName);
sr.Close();
}
}
// Suspend the screen.
System.Console.ReadLine();
}
}
}
namespace ExtensionMethods
{
public class MyExtensions
{
public static List<int> GetPositions(string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = source.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}
}
}
You could simplify your code a lot by using Regex pattern matching instead.
The following pattern: (?<=FCT=)[0-9]* will match any group of digits preceded by FCT=.
Try it out
This enables us to do the following:
string input = "<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>...";
string pattern = "(?<=FCT=)[0-9]*";
var values = Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value);
I have tested this solution with your data, and it gives me the expected results (10019,666 and 4515)
string data = #"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>";
char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };
Regex regex = new Regex("fct=(.+)", RegexOptions.IgnoreCase);
var values = data.Split(delimiters).Select(x => regex.Match(x).Groups[1].Value);
values = values.Where(x => !string.IsNullOrWhiteSpace(x));
values.ToList().ForEach(Console.WriteLine);
I hope my solution will be helpful, let me know.
Below code is usefull to extract the repeated words with linq in text
string text = "Hi Naresh, How are you. You will be next Super man";
IEnumerable<string> strings = text.Split(' ').ToList();
var result = strings.AsEnumerable().Select(x => new {str = Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", ""), count = Regex.Matches(text.ToLowerInvariant(), #"\b" + Regex.Escape(Regex.Replace(x.ToLowerInvariant(), #"[^0-9a-zA-Z]+", "")) + #"\b").Count}).Where(x=>x.count>1).GroupBy(x => x.str).Select(x => x.First());
foreach(var item in result)
{
Console.WriteLine(item.str +" = "+item.count.ToString());
}
As always, break down the porblem into smaller bits. See if the following methods help in any way. Tying it up to your code is left as an excercise.
private const string Prefix = "fct=";
//make delimiter look up fast
private static HashSet<char> endDelimiters =
new HashSet<char>(new [] { '<', ',', '&', ':', ' ', '\\', '\'' });
private static string[] GetAllFctFields(string line) =>
line.Split(new string[] { Prefix });
private static bool TryGetValue(string delimitedString, out string value)
{
var buffer = new StringBuilder(delimitedString.Length);
foreach (var c in delimitedString)
{
if (endDelimiters.Contains(c))
break;
buffer.Append(c);
}
//I'm assuming that no end delimiter is a format error.
//Modify according to requirements
if (buffer.Length == delimitedString.Length)
{
value = null;
return false;
}
value = buffer.ToString();
return true;
}
Something like :
class Program
{
static void Main(string[] args)
{
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
var fct = "fct=";
var lineRead = "fct=value1,useless text fct=vfct=alue2,fct=value3";
var values = new List<string>();
int start = lineRead.IndexOf(fct);
while(start != -1)
{
start += fct.Length;
int end = lineRead.IndexOfAny(enddelimiter, start);
if (end == -1)
end = lineRead.Length;
string result = lineRead.Substring(start, end - start);
values.Add(result);
start = lineRead.IndexOf(fct, end);
}
values.ForEach(Console.WriteLine);
}
}
You can split the line by string[]
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
while ((lineRead = sr.ReadLine()) != null)
{
string[] parts1 = lineRead.Split(new string[] { "fct=" },StringSplitOptions.None);
if(parts1.Length > 0)
{
foreach(string _ar in parts1)
{
if(!string.IsNullOrEmpty(_ar))
{
if(_ar.IndexOfAny(enddelimiter) > 0)
{
MessageBox.Show(_ar.Substring(0, _ar.IndexOfAny(enddelimiter)));
}
else
{
MessageBox.Show(_ar);
}
}
}
}
}

Replace string if starts with string in List

I have a string that looks like this
s = "<Hello it´s me, <Hi how are you <hay"
and a List
List<string> ValidList= {Hello, hay} I need the result string to be like
string result = "<Hello it´s me, ?Hi how are you <hay"
So the result string will if it starts with an < and the rest bellogs to the list, keep it, otherwise if starts with < but doesn´t bellong to list replaces the H by ?
I tried using the IndexOf to find the position of the < and the if the string after starsWith any of the strings in the List leave it.
foreach (var vl in ValidList)
{
int nextLt = 0;
while ((nextLt = strAux.IndexOf('<', nextLt)) != -1)
{
//is element, leave it
if (!(strAux.Substring(nextLt + 1).StartsWith(vl)))
{
//its not, replace
strAux = string.Format(#"{0}?{1}", strAux.Substring(0, nextLt), strAux.Substring(nextLt + 1, strAux.Length - (nextLt + 1)));
}
nextLt++;
}
}
To give the solution I gave as a comment its proper answer:
Regex.Replace(s, string.Format("<(?!{0})", string.Join("|", ValidList)), "?")
This (obviously) uses regular expressions to replace the unwanted < characters by ?. In order to recognize those characters, we use a negative lookahead expression. For the example word list, this would look like this: (?!Hallo|hay). This will essentially match only if what we are matching is not followed by Hallo or hay. In this case, we are matching < so the full expression becomes <(?!Hallo|hay).
Now we just need to account for the dynamic ValidList by creating the regular expression on the fly. We use string.Format and string.Join there.
Something like this without using RegEx or LINQ
string s = "<Hello it´s me, <Hi how are you <hay";
List<string> ValidList = new List<string>() { "Hello", "hay" };
var arr = s.Split(new[] { '<' }, StringSplitOptions.RemoveEmptyEntries);
for (int i = 0; i < arr.Length; i++)
{
bool flag = false;
foreach (var item in ValidList)
{
if (arr[i].Contains(item))
{
flag = false;
break;
}
else
{
flag = (flag) ? flag : !flag;
}
}
if (flag)
arr[i] = "?" + arr[i];
else
arr[i] = "<" + arr[i];
}
Console.WriteLine(string.Concat(arr));
A possible solution using LINQ.It splits the string using < and checks if the "word" (text until a blank space found) following is in the Valid List,adding < or ? accordingly. Finally,it joins it all:
List<string> ValidList = new List<string>{ "Hello", "hay" };
string str = "<Hello it´s me, <Hi how are you <hay";
var res = String.Join("",str.Split(new char[] { '<' }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => ValidList.Contains(x.Split(' ').First()) ? "<" + x : "?"+x));

How to do cascade splitting with C# Linq - multiple foreach split

These are the values i want to split the string cascadingly
List<string> lstsplitWord = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
I have written them as like this but i am assuming that there must be more elegant Linq solution for this
foreach(var part1 in srSplitPart.Split(',')) {
foreach(var part2 in part1.Split('=')) {
foreach(var part3 in part2.Split('،')) {
foreach(var part4 in part3.func_Split_By_String("أو")) {
foreach(var part5 in part4.func_Split_By_String("او")) {
foreach(var part6 in part5.Split('/')) {
foreach(var part7 in part6.Split('.')) {
if (part7.Length < 3)
continue;
string srTrans = part7.FixArabic().func_Special_Trim();
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
}
}
}
}
}
}
C# .net 4.6.2
special split function
public static List<string> func_Split_By_String(this string Sentence, string srReplace)
{
return Sentence.Split(new string[] { srReplace }, StringSplitOptions.None).ToList();
}
You can just iteratively split every element to smaller parts in a given order:
string originalString = ...;
List<string> separators = new List<string> { ",", "=", "،", "أو", "او", "/", "." };
string[] result = new[] { originalString };
foreach (var separator in separators)
{
result = result.SelectMany(x => x.Split(new[] { separator }, StringSplitOptions.RemoveEmptyEntries)).ToArray();
}
result = result
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}
At the beginning, your array will contain only your original string.
After the first foreach iteration array will contain original string separated by ",".
After the second foreach iteration every comma-separated part will be separated by =.
It will repeat until result array contains only strings separated by all given separators. It then applies Length >= 3 condition and FixArabic() and func_Special_Trim().
Update: I have just understood one thing - applying all separators in a given order results into the same string array as simply applying all separators without order.
So, actually, you can just do:
string originalString = ...;
string[] separators = new[] { ",", "=", "،", "أو", "او", "/", "." };
string[] result = originalString
.Split(separators, StringSplitOptions.RemoveEmptyEntries)
.Where(x => x.Length >= 3)
.Select(x => x.FixArabic().func_Special_Trim())
.ToArray();
foreach (var item in result)
{
srTemp.AppendLine($ "{srTitle} > {srTrans} \t {irTransLevel}");
irTransLevel++;
}

Extract all occurrences of specific characters from strings

I have something like this in my code.
mystring.Split(new[]{"/","*"}, StringSplitOptions.RemoveEmptyEntries);
however, what I actually want is to separate mystring into two arrays, one holding the separated items above, and the other array to hold the delimiters above in the order they appear in the string.
I could use .IndexOf to continue searching until I extract all of them, but somehow I think this will be redundant. Is there a way to do this in .NET? If possible I want to avoid LINQ.
Thanks.
Something like:
var separators = new char[] { '/', '*' };
var words = new List<string>();
var delimiters = new List<string>();
var idx = source.IndexOfAny(separators);
var prevIdx = 0;
while (idx > -1)
{
if (idx - prevIdx > 0)
words.Add(source.Substring(prevIdx, idx - prevIdx));
prevIdx = idx + 1;
delimiters.Add(source.Substring(idx, 1));
idx = source.IndexOfAny(separators, idx + 1);
}
If I understand the questioner correctly, he wants the actual separated items as well as the delimiters.
I think the following code will work:
List<string> SeparatedItems = new List<string>();
List<string> Delimiters = new List<string>();
string sTestString = "mytest/string*isthis**and not/this";
string sSeparatedItemString = String.Empty;
foreach(char c in sTestString) {
if(c == '/' || c == '*') {
Delimiters.Add(c.ToString());
if(sSeparatedItemString != String.Empty) {
SeparatedItems.Add(sSeparatedItemString);
sSeparatedItemString = String.Empty;
}
}
else {
sSeparatedItemString += c.ToString();
}
}
if(sSeparatedItemString != String.Empty) {
SeparatedItems.Add(sSeparatedItemString);
}
Try this:
var items = new List<string>();
var delimiters = new List<string>();
items.AddRange(Regex.Split(text, #"(?<=/)|(?=/)|(?<=\*)|(?=\*)"));
for (int i = 0; i < items.Count; )
{
string item = items[i];
if (item == "*" || item == "/")
{
delimiters.Add(item);
items.RemoveAt(i);
}
else if (item == "")
{
items.RemoveAt(i);
}
else
{
i++;
}
}
You could consider a Regex expression using named groups. Try a nested named group. The outer including capturing the separator and the inner capturing the content only.
Since you're running in .NET 2.0, I'd say using IndexOf is one of the most straight forward ways to solve the problem:
public static int CountOccurences(string input, string pattern)
{
int count = 0;
int i = 0;
while (i = input.IndexOf(pattern, i) != -1)
count++;
return count;
}
The solution Rob Smyth suggests would also work, but I find this the easiest and most understandable one.

Splitting a string array

I have a string array string[] arr, which contains values like N36102W114383, N36102W114382 etc...
I want to split the each and every string such that the value comes like this N36082 and W115080.
What is the best way to do this?
This should work for you.
Regex regexObj = new Regex(#"\w\d+"); # matches a character followed by a sequence of digits
Match matchResults = regexObj.Match(subjectString);
while (matchResults.Success) {
matchResults = matchResults.NextMatch(); #two mathches N36102 and W114383
}
If you have the fixed format every time you can just do this:
string[] split_data = data_string.Insert(data_string.IndexOf("W"), ",")
.Split(",", StringSplitOptions.None);
Here you insert a recognizable delimiter into your string and then split it by this delimiter.
Forgive me if this doesn't quite compile, but I'd just break down and write the string processing function by hand:
public static IEnumerable<string> Split(string str)
{
char [] chars = str.ToCharArray();
int last = 0;
for(int i = 1; i < chars.Length; i++) {
if(char.IsLetter(chars[i])) {
yield return new string(chars, last, i - last);
last = i;
}
}
yield return new string(chars, last, chars.Length - last);
}
If you use C#, please try:
String[] code = new Regex("(?:([A-Z][0-9]+))").Split(text).Where(e => e.Length > 0 && e != ",").ToArray();
in case you're only looking for the format NxxxxxWxxxxx, this will do just fine :
Regex r = new Regex(#"(N[0-9]+)(W[0-9]+)");
Match mc = r.Match(arr[i]);
string N = mc.Groups[1];
string W = mc.Groups[2];
Using the 'Split' and 'IsLetter' string functions, this is relatively easy in c#.
Don't forget to write unit tests - the following may have some corner case errors!
// input has form "N36102W114383, N36102W114382"
// output: "N36102", "W114383", "N36102", "W114382", ...
string[] ParseSequenceString(string input)
{
string[] inputStrings = string.Split(',');
List<string> outputStrings = new List<string>();
foreach (string value in inputstrings) {
List<string> valuesInString = ParseValuesInString(value);
outputStrings.Add(valuesInString);
}
return outputStrings.ToArray();
}
// input has form "N36102W114383"
// output: "N36102", "W114383"
List<string> ParseValuesInString(string inputString)
{
List<string> outputValues = new List<string>();
string currentValue = string.Empty;
foreach (char c in inputString)
{
if (char.IsLetter(c))
{
if (currentValue .Length == 0)
{
currentValue += c;
} else
{
outputValues.Add(currentValue);
currentValue = string.Empty;
}
}
currentValue += c;
}
outputValues.Add(currentValue);
return outputValues;
}

Categories