Extract node value from xml resembling string C# - c#

I am having strings like below
<ad nameId="\862094\"></ad>
or comma seprated like below
<ad nameId="\862593\"></ad>,<ad nameId="\862094\"></ad>,<ad nameId="\865599\"></ad>
How to extract nameId value and store in single string like below
string extractedValues ="862094";
or in case of comma seprated string above
string extractedMultipleValues ="862593,862094,865599";
This is what I have started trying with but not sure
string myString = "<ad nameId="\862593\"></ad>,<ad nameId="\862094\"></ad>,<ad
nameId="\865599\"></ad>";
string[] myStringArray = myString .Split(',');
foreach (string str in myStringArray )
{
xd.LoadXml(str);
chkStringVal = xd.SelectSingleNode("/ad/#nameId").Value;
}

Search for:
<ad nameId="\\(\d*)\\"><\/ad>
Replace with:
$1
Note that you must search globally. Example: http://www.regex101.com/r/pL2lX1

Please see code below to extract all numbers in your example:
string value = #"<ad nameId=""\862093\""></ad>,<ad nameId=""\862094\""></ad>,<ad nameId=""\865599\""></ad>";
var matches = Regex.Matches(value, #"(\\\d*\\)", RegexOptions.RightToLeft);
foreach (Group item in matches)
{
string yourMatchNumber = item.Value;
}

Try like this;
string s = #"<ad nameId=""\862094\""></ad>";
if (!(s.Contains(",")))
{
string extractedValues = s.Substring(s.IndexOf("\\") + 1, s.LastIndexOf("\\") - s.IndexOf("\\") - 1);
}
else
{
string[] array = s.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
string extractedMultipleValues = "";
for (int i = 0; i < array.Length; i++)
{
extractedMultipleValues += array[i].Substring(array[i].IndexOf("\\") + 1, array[i].LastIndexOf("\\") - array[i].IndexOf("\\") - 1) + ",";
}
Console.WriteLine(extractedMultipleValues.Substring(0, extractedMultipleValues.Length -1));
}

mhasan, here goes an example of what you need(well almost)
EDITED: complete code (it's a little tricky)
(Sorry for the image but i have some troubles with tags in the editor, i can send the code by email if you want :) )
A little explanation about the code, it replaces all ocurrences of parsePattern in the given string, so if the given string has multiple tags separated by "," the final result will be the numbers separated by "," stored in parse variable....
Hope it helps

Related

Get first word on every new line in a long string?

I am trying to add a leaderboard in my unity app
I have a long string as below(just an example, actual string is http pipe data from my web service, not manually stored):
string str ="name1|10|junk data.....\n
name2|9|junk data.....\n
name3|8|junk data.....\n
name4|7|junk data....."
I want to get the first word (string before the first pipe '|' like name1,name2...) from every line and store it in an array and then get the numbers (10,9,8... arter the '|') and store it in an other one.
Anyone know whats the best way to do this?
Fiddle here: https://dotnetfiddle.net/utp4HK
code below, you may want to revisit the algorithm for performance, but if that is not an issue, this will do the trick;
using System;
public class Program
{
public static void Main()
{
string str ="name1|10|junk data.....\nname2|9|junk data.....\nname3|8|junkdata.....\nname4|7|junk data.....";
foreach (var line in str.Split('\n'))
{
Console.WriteLine(line.Split('|')[0]);
}
}
}
First split by new-line characters:
string[] lines = str.Split(new string[]{Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries);
Then you can use LINQ to get both arrays:
var data = lines.Select(l => l.Trim().Split('|')).Where(arr => arr.Length > 1);
string[] names = data.Select(arr => arr[0].Trim()).ToArray();
string[] numbers = data.Select(arr => arr[1].Trim()).ToArray();
Check out this link on splitting strings: http://msdn.microsoft.com/en-us/library/ms228388.aspx
You could first create an array of strings (one for each line) by splitting the long string with \n as the delimeter.
Then, you could split each line with | as the delimeter. The name would be the 0th index of the array and the number would be the 1st index of the array.
First of all, you can't have a multi line string without using verbatim string literal. With using verbatim string literal, you can split your string based on \r\n or Environment.NewLine like;
string str = #"name1|10|junk data.....
name2|9|junk data.....
name3|8|junk data.....
name4|7|junk data.....";
var array = str.Split(new []{Environment.NewLine},
StringSplitOptions.RemoveEmptyEntries);
foreach (var item in array)
{
Console.WriteLine(item.Split(new[]{"|"},
StringSplitOptions.RemoveEmptyEntries)[0].Trim());
}
Output will be;
name1
name2
name3
name4
Try this:
string str ="name1|10|junk data.....\n" +
"name2|9|junk data.....\n" +
"name3|8|junk data.....\n" +
"name4|7|junk data.....";
string[] tempArray1 = str.Split('\n');
string[] tempArray2 = null;
string[,] newArray = null;
for (int i = 0; i < tempArray1.Length; i++)
{
tempArray2 = tempArray1[i].Split('|');
if (newArray[0, 0].ToString().Length == 0)
{
newArray = new string[tempArray1.Length, tempArray2.Length];
}
for (int j = 0; j < tempArray2.Length; j++)
{
newArray[i,j] = tempArray2[j];
}
}

Extracting parts of a string c#

In C# what would be the best way of splitting this sort of string?
%%x%%a,b,c,d
So that I end up with the value between the %% AND another variable containing everything right of the second %%
i.e. var x = "x"; var y = "a,b,c,d"
Where a,b,c.. could be an infinite comma seperated list. I need to extract the list and the value between the two double-percentage signs.
(To combat the infinite part, I thought perhaps seperating the string out to: %%x%% and a,b,c,d. At this point I can just use something like this to get X.
var tag = "%%";
var startTag = tag;
int startIndex = s.IndexOf(startTag) + startTag.Length;
int endIndex = s.IndexOf(tag, startIndex);
return s.Substring(startIndex, endIndex - startIndex);
Would the best approach be to use regex or use lots of indexOf and substring to do the extracting based on te static %% characters?
Given that what you want is "x,a,b,c,d" the Split() function is actually pretty powerful and regex would be overkill for this.
Here's an example:
string test = "%%x%%a,b,c,d";
string[] result = test.Split(new char[] { '%', ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string s in result) {
Console.WriteLine(s);
}
Basicly we ask it to split by both '%' and ',' and ignore empty results (eg. the result between "%%"). Here's the result:
x
a
b
c
d
To Extract X:
If %% is always at the start then;
string s = "%%x%%a,b,c,d,h";
s = s.Substring(2,s.LastIndexOf("%%")-2);
//Console.WriteLine(s);
Else;
string s = "v,u,m,n,%%x%%a,b,c,d,h";
s = s.Substring(s.IndexOf("%%")+2,s.LastIndexOf("%%")-s.IndexOf("%%")-2);
//Console.WriteLine(s);
If you need to get them all at once then use this;
string s = "m,n,%%x%%a,b,c,d";
var myList = s.ToArray()
.Where(c=> (c != '%' && c!=','))
.Select(c=>c).ToList();
This'll let you do it all in one go:
string pattern = "^%%(.+?)%%(?:(.+?)(?:,|$))*$";
string input = "%%x%%a,b,c,d";
Match match = Regex.Match(input, pattern);
if (match.Success)
{
// "x"
string first = match.Groups[1].Value;
// { "a", "b", "c", "d" }
string[] repeated = match.Groups[2].Captures.Cast<Capture>()
.Select(c => c.Value).ToArray();
}
You can use the char.IsLetter to get all the list of letter
string test = "%%x%%a,b,c,d";
var l = test.Where(c => char.IsLetter(c)).ToArray();
var output = string.Join(", ", l.OrderBy(c => c));
Since you want the value between the %% and everything after in separate variables and you don't need to parse the CSV, I think a RegEx solution would be your best choice.
var inputString = #"%%x%%a,b,c,d";
var regExPattern = #"^%%(?<x>.+)%%(?<csv>.+)$";
var match = Regex.Match(inputString, regExPattern);
foreach (var item in match.Groups)
{
Console.WriteLine(item);
}
The pattern has 2 named groups called x and csv, so rather than just looping, you can easily reference them by name and assign them to values:
var x = match.Groups["x"];
var y = match.Groups["csv"];

Retrieve String Containing Specific substring C#

I am having an output in string format like following :
"ABCDED 0000A1.txt PQRSNT 12345"
I want to retreieve substring(s) having .txt in above string. e.g. For above it should return 0000A1.txt.
Thanks
You can either split the string at whitespace boundaries like it's already been suggested or repeatedly match the same regex like this:
var input = "ABCDED 0000A1.txt PQRSNT 12345 THE.txt FOO";
var match = Regex.Match (input, #"\b([\w\d]+\.txt)\b");
while (match.Success) {
Console.WriteLine ("TEST: {0}", match.Value);
match = match.NextMatch ();
}
Split will work if it the spaces are the seperator. if you use oter seperators you can add as needed
string input = "ABCDED 0000A1.txt PQRSNT 12345";
string filename = input.Split(' ').FirstOrDefault(f => System.IO.Path.HasExtension(f));
filname = "0000A1.txt" and this will work for any extension
You may use c#, regex and pattern, match :)
Here is the code, plug it in try. Please comment.
string test = "afdkljfljalf dkfjd.txt lkjdfjdl";
string ffile = Regex.Match(test, #"\([a-z0-9])+.txt").Groups[1].Value;
Console.WriteLine(ffile);
Reference: regexp
I did something like this:
string subString = "";
char period = '.';
char[] chArString;
int iSubStrIndex = 0;
if (myString != null)
{
chArString = new char[myString.Length];
chArString = myString.ToCharArray();
for (int i = 0; i < myString.Length; i ++)
{
if (chArString[i] == period)
iSubStrIndex = i;
}
substring = myString.Substring(iSubStrIndex);
}
Hope that helps.
First split your string in array using
char[] whitespace = new char[] { ' ', '\t' };
string[] ssizes = myStr.Split(whitespace);
Then find .txt in array...
// Find first element starting with .txt.
//
string value1 = Array.Find(array1,
element => element.Contains(".txt", StringComparison.Ordinal));
Now your value1 will have the "0000A1.txt"
Happy coding.

All elements before last comma in a string in c#

How can i get all elements before comma(,) in a string in c#?
For e.g.
if my string is say
string s = "a,b,c,d";
then I want all the element before d i.e. before the last comma.So my new string shout look like
string new_string = "a,b,c";
I have tried split but with that i can only one particular element at a time.
string new_string = s.Remove(s.LastIndexOf(','));
If you want everything before the last occurrence, use:
int lastIndex = input.LastIndexOf(',');
if (lastIndex == -1)
{
// Handle case with no commas
}
else
{
string beforeLastIndex = input.Substring(0, lastIndex);
...
}
Use the follwoing regex: "(.*),"
Regex rgx = new Regex("(.*),");
string s = "a,b,c,d";
Console.WriteLine(rgx.Match(s).Groups[1].Value);
You can also try:
string s = "a,b,c,d";
string[] strArr = s.Split(',');
Array.Resize(strArr, Math.Max(strArr.Length - 1, 1))
string truncatedS = string.join(",", strArr);

Getting rid of null/empty string values in a C# array

I have a program where an array gets its data using string.Split(char[] delimiter).
(using ';' as delimiter.)
Some of the values, though, are null. I.e. the string has parts where there is no data so it does something like this:
1 ;2 ; ; 3;
This leads to my array having null values.
How do I get rid of them?
Try this:
yourString.Split(new string[] {";"}, StringSplitOptions.RemoveEmptyEntries);
You could use the Where linq extension method to only return the non-null or empty values.
string someString = "1;2;;3;";
IEnumerable<string> myResults = someString.Split(';').Where<string>(s => !string.IsNullOrEmpty(s));
public static string[] nullLessArray(string[] src)
{
Array.Sort(src);
Array.Reverse(src);
int index = Array.IndexOf(src, null);
string[] outputArray = new string[index];
for (int counter = 0; counter < index; counter++)
{
outputArray[counter] = src[counter];
}
return outputArray;
}
You should replace multiple adjacent semicolons with one semicolon before splitting the data.
This would replace two semicolons with one semicolon:
datastr = datastr.replace(";;",";");
But, if you have more than two semicolons together, regex would be better.
datastr = Regex.Replace(datastr, "([;][;]+)", ";");
words = poly[a].Split(charseparators, StringSplitOptions.RemoveEmptyEntries);
foreach (string word in words)
{
richTextBox1.Text += (d + 1)+ " " + word.Trim(',')+ "\r\n";
d++;
}
charseparators is a space

Categories