one long string => array/list<string> from text within quotes - c#

3>>asdf3424"THIS TEXT".,.<<<>>3asfdf"THISTOO"6575tsdfbxbxcv"ANDTHIS",,p-01fa
To an array or list of { "THIS TEXT", "THISTOO, "ANDTHIS" }
Does anyone have an idea on how to efficiently do this?

var result = Regex.Matches(input, #"\"".+?\""")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();

If you read each character at a time and look for a quotation mark, then read the following into a char array until you find a another quotation mark, then continue looking for one, you can have a list of char arrays that are easily transferable to string.
It should just be a simple while(still characters to be read).

If you have some big string maybe like this :
string str = "hello,hi,bye";
you may split it by comma something like this:
string[] breakups = str.Split(new[] {',' });

Related

c# regex replace everything including a word base64, with nothing and keeping rest of the string

I am wanting to take a string and find base64, and get rid of that and everything prior to that
example
"asdfjljlkjaldf_base64,234u0909230948098234082304802384023094"
Notice "base64," ... I want ONLY everything after "base64,"
Desired: "234u0909230948098234082304802384023094"
I was looking at this code
"string test = "hello, base64, matching";
string regexStrTest;
regexStrTest = #"test\s\w+";
MatchCollection m1 = Regex.Matches(base64,, regexStrTest);
//gets the second matched value
string value = m1[1].Value;
but that is not quite what I want..
Why regular expressions? IndexOf + Substring seems to be quite enough:
string source = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
string tag = "base64,";
string result = source.Substring(source.IndexOf(tag) + tag.Length);
You tried a regex that matches test, a whitespace, and 1+ word chars. The input string just did not match it.
You may use
var results = Regex.Matches(s, #"base64,(\w+)")
.Cast<Match>()
.Select(m => m.Groups[1].Value)
.ToList();
See the regex demo.
The pattern matches base64, substring and then captures into Group 1 one or more word chars with (\w+) pattern. The captured value is stored inside match.Groups[1].Value, just what you get with .Select(m => m.Groups[1].Value).
Some of the other answers are good. Here is a very simple regex
string yourData = "asdfjljlkjaldf_base64,234u0909230948098234082304802384023094";
var newString = Regex.Replace(yourData, "^.*base64,", "");

Splitting on “,” but not “/,”

Question: How do I write an expression to split a string on ',' but not '/,'? Later I'll want to replace '/,' with ', '.
Details...
Delimiter: ','
Skip Char: '/'
Example input: "Mister,Bill,is,made,of/,clay"
I want to split this input into an array: {"Mister", "Bill", "is", "made", "of, clay"}
I know how to do this with a char prev, cur; and some indexers, but that seems beta.
Java Regex has a split functionality, but I don't know how to replicate this behavior in C#.
Note: This isn't a duplicate question, this is the same question but for a different language.
I believe you're looking for a negative lookbehind:
var regex = new Regex("(?<!/),");
var result = regex.Split(str);
this will split str on all commas that are not preceded by a slash. If you want to keep the '/,' in the string then this will work for you.
Since you said that you wanted to split the string and later replace the '/,' with ', ', you'll want to do the above first then you can iterate over the result and replace the strings like so:
var replacedResult = result.Select(s => s.Replace("/,", ", ");
string s = "Mister,Bill,is,made,of/,clay";
var arr = s.Replace("/,"," ").Split(',');
result : {"Mister", "Bill", "is", "made", "of clay"}
Using Regex:
var result = Regex.Split("Mister,Bill,is,made,of/,clay", "(?<=[^/]),");
Just use a Replace to remove the commas from your string :
s.Replace("/,", "//").Split(',').Select(x => x.Replace("//", ","));
You can use this in c#
string regex = #"(?:[^\/]),";
var match = Regex.Split("Mister,Bill,is,made,of/,clay", regex, RegexOptions.IgnoreCase);
After that you can replace /, and continue your operation as you like

Simple Regex Befuddlement

I have some strings of the form
string strA = "Cmd:param1:'C:\\SomePath\SomeFileName.ext':param2";
string strB = "Cmd:'C:\\SomePath\SomeFileName.ext':param2:param3";
I want to split this string on ':' so I can extract the N parameters. Some parameters can contain file paths [as explicitly] shown and I don't want to split on the ':'s that are within the parentheses. I can do this with a regex but I am confused as to how to get the regex to split only if there is no "'" on both sides of the colon.
I have attempted
string regex = #"(?<!'):(?!')";
string regex = #"(?<!'(?=')):";
that is continue matching only if no "'" on the left and no "'" on the right (negative look behind/ahead), but this is still splitting on the colon contained in 'C:\SomePath\SomeFileName.ext'.
How can I amend this regex to do as I require?
Thanks for your time.
Note: I have found that the following regex works. However, I would like to know if there is a better way of doing this?
string regex = #"(?<!.*'.*):|:(?!.*'.*)";
Consider this approach:
var guid = Guid.NewGuid().ToString();
var r = Regex.Replace(strA, #"'.*'", m =>
{
return m.Value.Replace(":", guid);
})
.Split(':')
.Select(s => s.Replace(guid, ":"))
.ToList();
Rather than try to construct a lookbehind regex to split on, you could construct a regex to match the fields themselves and take the set of matches of that regex. EG a field is either a quoted sequence of non-quotes (ie can include :), or it can't include the separator:
string regex = "'[^']*'|[^':]*";
var result = Regex.Matches(strA, regex);
You want to split on (?<!\b[a-z]):(?!\\) (use RegexOptions.IgnoreCase).
Not as pretty but you could replace :\ with safe characters and then return them back to :\ after the split.
string[] param = strA.Replace(#":\", "|||").Split(':').Select(x => x.Replace("|||", #":\")).ToArray();

Searching for a RegEx to split a text in it words

I am searching for a RegularExpression to split a text in it words.
I have tested
Regex.Split(text, #"\s+")
But this gives me for example for
this (is a) text. and
this
(is
a)
text
and
But I search for a solution, that gives me only the words - without the (, ), . etc.
It should also split a text like
end.begin
in two words.
Try this:
Regex.Split(text, #"\W+")
\W is the counterpart to \w, which means alpha-numeric.
You're probably better off matching the words rather than splitting.
If you use Split (with \W as Regexident suggested), then you could get an extra string at the beginning and end. For example, the input string (a b) would give you four outputs: "", "a", "b", and another "", because you're using the ( and ) as separators.
What you probably want to do is just match the words. You can do that like this:
Regex.Matches(text, "\\w+").Cast<Match>().Select(match => match.Value)
Then you'll get just the words, and no extra empty strings at the beginning and end.
You can do:
var text = "this (is a) text. and";
// to replace unwanted characters with space
text = System.Text.RegularExpressions.Regex.Replace(text, "[(),.]", " ");
// to split the text with SPACE delimiter
var splitted = text.Split(null as char[], StringSplitOptions.RemoveEmptyEntries);
foreach (var token in splitted)
{
Console.WriteLine(token);
}
See this Demo

Substring of a variant string

I have the following return of a printer:
{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}
Ok, I need to save, in a list, the return in parts.
e.g.
[0] "Ta005000000000000000000F"
[1] "00000000000000000I"
[2] "00000000000000000N"
...
The problem is that the number of characters varies.
A tried to make it going into the 'space', taking the substring, but failed...
Any suggestion?
Use String.Split on a single space, and use StringSplitOptions.RemoveEmptyEntries to make sure that multiple spaces are seen as only one delimiter:
var source = "00000000000000000FS 0000000...etc";
var myArray = source.Split(' ', StringSplitOptions.RemoveEmptyEntries);
#EDIT: An elegant way to get rid of the braces is to include them as separators in the Split (thanks to Joachim Isaksson in the comments):
var myArray = source.Split(new[] {' ', '{', '}'}, StringSplitOptions.RemoveEmptyEntries);
You could use a Regex for this:
string input = "{Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000}";
IEnumerable<string> matches = Regex.Matches(input, "[0-9a-zA-Z]+").Select(m => m.Value);
You can use string.split to create an array of substrings. Split allows you to specify multiple separator characters and to ignore repeated splits if necessary.
You could use the .Split member of the "String" class and split the parts up to that you want.
Sample would be:
string[] input = {Ta005000000000000000000F 00000000000000000I 00000000000000000N 00000000000000000FS 00000000000000000IS 00000000000000000NS 00000000000000000};
string[] splits = input.Split(' ');
Console.WriteLine(splits[0]); // Ta005000000000000000000F
And so on.
Just off the bat. Without considering the encompassing braces:
string printMsg = "Ta005000000000000000000F 00000000000000000I
00000000000000000N 00000000000000000FS
00000000000000000IS 00000000000000000NS 00000000000000000";
string[] msgs = printMsg.Split(' ').ForEach(s=>s.Trim()).ToArray();
Could work.

Categories