I'm facing a odd problem were C# is unable to split a string for new lines. I tried many combinations like use only Split.('\n') but all lead to return the whole string unsplited on first position of the array so lines[0] is the same as the input string to be splited, that never happen before with other strings i had to parse.
Image bellow:
String:
Don't remove the following keywords! These keywords are used in the
"compatible printer" condition of the print and filament profiles to
link the particular print and filament profiles to this printer
profile.\nPRINTER_VENDOR_PRUSA3D\nPRINTER_MODEL_SL1\nPRINTER_VENDOR_EPAX\nPRINTER_MODEL_X1\n\nSTART_CUSTOM_VALUES\nFLIP_XY\nLayerOffTime_0\nBottomLightOffDelay_2\nBottomLiftHeight_5\nLiftHeight_5.5\nBottomLiftSpeed_40.2\nLiftSpeed_60\nRetractSpeed_150\nBottomLightPWM_255\nLightPWM_255\nAntiAliasing_4
; Use 0 or 1 for disable AntiAliasing with "printer gamma correction"
set to 0, otherwise use multiples of 2 and "gamma correction" set to 1
for enable\nEND_CUSTOM_VALUES
Code:
var lines = previousString.Split(new[] { "\r\n", "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries);
Output:
An array of lenght = 1 producing lines[0] == previousString
string[] lines = theText.Split(
new[] { Environment.NewLine },
StringSplitOptions.None
);
edit:
string[] lines = theText.Split(
new[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
working fiddle: https://dotnetfiddle.net/HNY8a6
See: this SO post
Sometimes when you see a \n on screen it really is a backslash (ASCII 92 and an en(ASCII 110) not a placeholder/escape sequence for new line (ASCII 10) A big hint for that here is that text boxes will usually not display newlines with escape codes but will put in actual new lines.
To split on \n use the string "\\n" which represents a string of two characters: the two backslashes produce a single character ASCII 92 = '' in the string and then a lowercase n.
Alternately you could use #"\n". The # sign tells C# not to use escape codes in the quoted string.
I'm not quite sure why you are using the Printer methods but I hope you don't require them.
string test = "Hello \nTest \n123"; //Create Test String
string[] seperated = test.Split('\n'); //Splite String by '\n'
for(int i = 0; i < seperated.Length; i++){ //Output substrings
Console.WriteLine(seperated[i]);
}
Output:
Hello
Test
123
I hope this solution works for you!
Edit: Added \r\n and \r support
If you also need to split strings by '\r' or '\r\n' then this code is the one to go with.
string test = "Hello \r\nTest \n123 \rEnd"; //Create Test String
test = test.Replace("\r\n","\n");
test = test.Replace("\r","\n");
string[] seperated = test.Split('\n'); //Splite String by '\n'
for(int i = 0; i < seperated.Length; i++){ //Output substrings
Console.WriteLine(seperated[i]);
}
Output:
Hello
Test
123
End
Edit2: Hopefully Solution
So you are saying that
\nPRINTER_VENDOR_PRUSA3D\nPRINTER_MODEL_SL1\nPRINTER_VENDOR_EPAX\nPRINTER_MODEL_X1\n\nSTART_CUSTOM_VALUES\nFLIP_XY\nLayerOffTime_0\nBottomLightOffDelay_2\nBottomLiftHeight_5\nLiftHeight_5.5\nBottomLiftSpeed_40.2\nLiftSpeed_60\nRetractSpeed_150\nBottomLightPWM_255\nLightPWM_255\nAntiAliasing_4 ; Use 0 or 1 for disable AntiAliasing with "printer gamma correction" set to 0, otherwise use multiples of 2 and "gamma correction" set to 1 for enable\nEND_CUSTOM_VALUES
is the string then the problem might be that this string contains some " which will interfere with the .Split method
If you're able to input the string manually you should replace a simple " with a "
Related
I have input file like this:
input.txt
aa#aa.com bb#bb.com "Information" "Hi there"
cc#cc.com dd#dd.com "Follow up" "Interview"
I have used this method:
string[] words = item.Split(' ');
However, it splits every words with space. I also have spaces in quotes strings but I won't split those spaces.
Basically I want to parse this input from file to this output:
From = aa#aa.com
To = bb#bb.com
Subject = Information
Body = Hi there
How do I split these strings in C#?
Simply you can use Regex as it is said in this question
var stringValue = "aa#aa.com bb#bb.com \"Information\" \"Hi there\"";
var parts = Regex.Matches(stringValue, #"[\""].+?[\""]|[^ ]+")
.Cast<Match>()
.Select(m => m.Value)
.ToList();
//parts: aa#aa.com
bb#bb.com
"Information"
"Hi there"
Also you may try Replace function to remove those " characters.
The String.Split() method has an overload that allows you to specify the number of splits required. You can get what you want like this:
Read one line at a time
Call input.Split(new string[" "], 3, StringSplitOptions.None) - this returns an array of strings with 3 parts. Since email addresses don't have spaces in them, the first two strings will be the from/to addresses, and the third string will be the subject and message. Assume the result of this call is stored in firstSplit[], then firstSplit[0] is the from address, firstSplit[1] is the to address, and firstSplit[2] is the subject and message combined.
Call firstSplit[2].Split(new string[""" """], 2, StringSplitOptions.None) - this searches for the string " " in the concatenated subject+message from the previous call, which should pinpoint the separator between the end of the subject and the start of the message. This will give you the subject and message in another array. (The double-quotes inside are doubled to escape them)
This assumes you disallow double quotes in your subject and message. If you do allow double quotes, then you need to ensure you escape them before putting it in the file in the first place.
You can do this without using regex by just using IndexOf and SubString just put it in a loop if you have multiple emails to parse.
It's not pretty but it would be faster than RegEx if you're doing a lot of them.
string content = #"abba#aa.com dddb#bdd.com ""Information"" ""Hi there""";
string firstEmail = content.Substring(0, content.IndexOf(" ", StringComparison.Ordinal));
string secondEmail = content.Substring(firstEmail.Length, content.IndexOf(" ", firstEmail.Length + 1) - firstEmail.Length);
int firstQuote = content.IndexOf("\"", StringComparison.Ordinal);
string subjectandMessage = content.Substring(firstQuote, content.Length - content.IndexOf("\"", firstQuote, StringComparison.Ordinal));
String[] words = subjectandMessage.Split(new string[] { "\" \"" }, StringSplitOptions.None);
Console.WriteLine(firstEmail);
Console.WriteLine(secondEmail);
Console.WriteLine(words[0].Remove(0,1));
Console.WriteLine(words[1].Remove(words[1].Length -1));
Output:
aa#aa.com
bb#bb.com
Information
Hi there
As Spencer pointed out, read this file line by line using File.ReadAllLines() method and then apply String.Split[] method with spaces using something like this:
string[] elements = string.Split(new char[0]);
UPDATE
Not a pretty solution, but this is how I think it can work:
string[] readText = File.ReadAllLines(' ');
//Take value of first 3 fields by simple readText[index]; (index: 0-2)
string temp = "";
for(int i=3; i<readText.Length; i++)
{
temp += readText[i];
}
Requires reference to Microsoft.VisualBasic, but a bit more reliable than Regex:
using (var tfp = new Microsoft.VisualBasic.FileIO.TextFieldParser("input.txt")) {
for (tfp.SetDelimiters(" "); !tfp.EndOfData;) {
string[] fields = tfp.ReadFields();
Debug.Print(string.Join(",", fields)); // "aa#aa.com,bb#bb.com,Information,Hi there"
}
}
I need help to develop a logic to split a string, but only based on the last 2 delimiters of the string.
Example inputs:
string s1 = "Dog \ Cat \ Bird \ Cow";
string s2 = "Hello \ World \ How \ Are \ You";
string s3 = "I \ am \ Peter";
Expected Outputs:
string[] newS1 = "Dog Cat", "Bird", "Cow"
string[] newS2 = "Hello World How", "Are", "You"
string[] newS3 = "I", "am", "Peter"
So, as you can see, I only want to split the string on the last 2 "\", and everything else before the last 2 "\" will be concatenated into one string.
I tried the .Split method but it will just split every "\" in a string.
Edited: If the string has less than 2 "\", it will just split according to whatever it has
Updates: Wow, these are a bunch of interesting solutions! Thank you a lot!
Try this:
var parts = s1.Split(new[] { " \\ " }, StringSplitOptions.None);
var partsCount = parts.Count();
var result = new[] { string.Join(" ", parts.Take(partsCount - 2)) }.Concat(parts.Skip(partsCount - 2));
Offering a regex solution:
var output = Regex.Split(input, #"\s*\\\s*([^\\]*?)\s*\\\s*(?=[^\\]*$)");
This split finds the second to last element and splits around that, but captures it in a group so it will be included in the output array.
For input "Dog \ Cat \ Bird \ Cow", this will produce { "Dog \ Cat", "Bird", "Cow" }. If you also need to strip the \ out of the first element that can be done with a simple replace:
output[0] = output[0].Replace(" \\", "");
Update: This version will correctly handle strings with only one delimiter:
var output = Regex.Split(str, #"\s*\\\s*([^\\]*?)\s*\\\s*(?=[^\\]*$)|(?<=^[^\\\s]*)\s*\\\s*(?=[^\\\s]*$)");
Update: And to match other delimiters like whitespace, "~", and "%", you can use a character class:
var output = Regex.Split(str, #"(?:[%~\s\\]+([^%~\s\\]+?)[%~\s\\]+|(?<=^[^%~\s\\]+)[%~\s\\]+)(?=[^%~\s\\]+$)");
The structure of this regex is slightly simpler than the previous one since it represents any sequence of one or more characters in the class [%~\s\\] as a delimiter, and any sequence of one or more characters in the negated character class [^%~\s\\] to be a segment. Note that the \s means 'whitespace' character.
You might also be able to simplify this further using:
var output = Regex.Split(str, #"(?:\W+(\w+)\W+|(?<=^\w+)\W+)(?=\w+$)");
Where \w matches any 'word' character (letters, digits, or underscores) and \W matches any 'non-word' character.
Looks like you want to Split the string on every <space>\<space>:
string input = #"Dog \ Cat \ Bird \ Cow";
string[] parts = input.Split(new string[]{#" \ "},
StringSplitOptions.None);
And then Join everything with a space in between, except the final two parts:
// NOTE: Check that there are at least 2 parts.
string part0 = String.Join(" ", parts.Take(parts.Length - 2));
string part1 = parts[parts.Length - 2];
string part2 = parts[parts.Length - 1];
This will give you three strings, which you can put in an array.
string[] newParts = new []{ part0, part1, part2 };
In this example:
new [] { "Dog Cat", "Bird", "Cow" }
How about simply taking the output of split, then taking first N-2 items and Join back together, then create new string array of 3 items, first being output of Join, second being item N-1 of first split, and third being N of first split. I think that'll accomplish what you're trying to do.
Interesting question. My initial solution to this would be:
String[] tokens = theString.Split("\\");
String[] components = new String[3];
for(int i = 0; i < tokens.length - 2; i++)
{
components[0] += tokens[i];
}
components[1] = tokens[tokens.length - 2];
components[2] = tokens[tokens.length - 1];
Loop from the end of the string and count delimiters until you encounter two.
Record index positions in 2 variables previously set to -1.
After the loop, if first var is -1, nothing happens, return whole string.
If second var is -1, create array of 2 strings, split using substring and return.
Create array of 3 string, split using information from two vars, return.
Hope you understood my pseudocode, give me a comment if you need help.
Quick little question...
I need to count the length of a string, but WITHOUT the spaces inside of it.
E.g. for a string like "I am Bob", string.Length would return 8 (6 letters + 2 spaces).
I need a method, or something, to give me the length (or number of) just the letters (6 in the case of "I am Bob")
I have tried the following
s.Replace (" ", "");
s.Replace (" ", null);
s.Replace (" ", string.empty);
to try and get "IamBob", which I did, but it didn't solve my problem because it still counted "" as a character.
Any help?
This returns the number of non-whitespace characters:
"I am Bob".Count(c => !Char.IsWhiteSpace(c));
Demo
Char.IsWhiteSpace:
White space characters are the following Unicode characters:
Members of the SpaceSeparator category, which includes the characters SPACE (U+0020), OGHAM SPACE MARK (U+1680), MONGOLIAN VOWEL SEPARATOR (U+180E), EN QUAD (U+2000), EM QUAD (U+2001), EN SPACE (U+2002), EM SPACE (U+2003), THREE-PER-EM SPACE (U+2004), FOUR-PER-EM SPACE (U+2005), SIX-PER-EM SPACE (U+2006), FIGURE SPACE (U+2007), PUNCTUATION SPACE (U+2008), THIN SPACE (U+2009), HAIR SPACE (U+200A), NARROW NO-BREAK SPACE (U+202F), MEDIUM MATHEMATICAL SPACE (U+205F), and IDEOGRAPHIC SPACE (U+3000).
Members of the LineSeparator category, which consists solely of the LINE SEPARATOR character (U+2028).
Members of the ParagraphSeparator category, which consists solely of the PARAGRAPH SEPARATOR character (U+2029).
The characters CHARACTER TABULATION (U+0009), LINE FEED (U+000A), LINE TABULATION (U+000B), FORM FEED (U+000C), CARRIAGE RETURN (U+000D), NEXT LINE (U+0085), and NO-BREAK SPACE (U+00A0).
No. It doesn't.
string s = "I am Bob";
Console.WriteLine(s.Replace(" ", "").Length); // 6
Console.WriteLine(s.Replace(" ", null).Length); //6
Console.WriteLine(s.Replace(" ", string.Empty).Length); //6
Here is a DEMO.
But what are whitespace characters?
http://en.wikipedia.org/wiki/Whitespace_character
You probably forgot to reassign the result of Replace. Try this:
string s = "I am bob";
Console.WriteLine(s.Length); // 8
s = s.Replace(" ", "");
Console.WriteLine(s.Length); // 6
A pretty simple way is to write an extension method that will do just that- count the characters without the white spaces. Here's the code:
public static class MyExtension
{
public static int CharCountWithoutSpaces(this string str)
{
string[] arr = str.Split(' ');
string allChars = "";
foreach (string s in arr)
{
allChars += s;
}
int length = allChars.Length;
return length;
}
}
To execute, simply call the method on the string:
string yourString = "I am Bob";
int count = yourString.CharCountWithoutSpaces();
Console.WriteLine(count); //=6
Alternatively, you can split the string an way you want if you don't want to include say, periods or commas:
string[] arr = str.Split('.');
or:
string[] arr = str.Split(',');
this is fastest way:
var spaceCount = 0;
for (var i 0; i < #string.Lenght; i++)
{
if (#string[i]==" ") spaceCount++;
}
var res = #string.Lenght-spaceCount;
Your problem is probably related to Replace() method not actually changing the string, rather returning the replaced value;
string withSpaces = "I am Bob";
string withoutSpaces = withSpaces.Replace(" ","");
Console.WriteLine(withSpaces);
Console.WriteLine(withoutSpaces);
Console.WriteLine(withSpaces.Length);
Console.WriteLine(withoutSpaces.Length);
//output
//I am Bob
//IamBob
//8
//6
You can use a combination of Length and Count functions on the string object. Here is a simple example.
string sText = "This is great text";
int nSpaces = sText.Length - sText.Count(Char.IsWhiteSpace);
This will count single or multiple (consistent) spaces accurately.
Hope it helps.
it returns not what i expected.
i expected something like:
ab
cab
ab
what am i doing wrong?
don't do .ToCharArray()
it will split \r then \n
that why you have empty value
something like this should work
var aa = ("a" & Environment.NewLine & "b" & Environment.NewLine & "c").Split(New String[] {Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
Since you are splitting on "\r" and "n", String.Split extracts the empty string from "\r\n".
Take a look at StringSplitOptions.RemoveEmptyEntries or use new String[] { "\r\n" } instead of "\r\n".ToCharArray().
You just splitting the string using \r or \n as delimiters, not the \r\n together.
Environment.NewLine is probably the way to go but if not this works
var ab = "a\r\nb\r\nc";
var abs = ab.Split(new[]{"\r\n"}, StringSplitOptions.None);
This option also works,
string [] b = Regex.Split(abc, "\r\n");
My understanding is that the string char sequence you provide to the Split method is a list of delimiter characters, not a single delimiter madeof several characters.
In your case, Split consider the '\r' and '\n' characters as delimiters. So when it encounters the '\r\n' sequence, it returns the string between those 2 delimiters, an empty string.
I want to replace certain characters in an input string with other characters.
The input text has Microsoft left and right smart quotes which I would like to convert to just a single ".
I was planning on using the Replace operation, but am having trouble forming the text string to be searched for.
I would like to replace the input sequence (in hex) \xE2809C, and change that sequence to just a single ". Ditto with \xE2809D.
How do I form the string to use in the Replace operation?
I'm thinking of something like (in a loop):
tempTxt = tempTxt.Replace(charsToRemove[i], charsToSubstitute[i]);
but I'm having trouble creating the charsToRemove array.
Maybe a bigger question is whether the whole input file can be read and converted to plain ASCII using some read/write and string conversions in C#.
Thanks, Mike
Something like this?
char [] charsToRemove = {
'\u201C', // These are the Unicode code points (not the UTF representation)
'\u201D'
};
char [] charsToSubstitute = {
'"',
'"'
};
You may want to give Regex a shot. Here's an example that will replace smart-quoted text with the single ".
string tempTxt = "I am going to “test” this. “Hope” it works";
string formattedText = Regex.Replace(tempTxt, "s/“|”|“|”/", #"""");
I'm using a ReqPro40.dll to read data. The data is stored as text. Hope I didn't lose too much on copy/paste below. The stuff below works to the best of my knowledge. But I want to get rid of longer sequences of bad characters. E2809C should become a quote, but I'm having trouble matching it.
string tempTxt = Req.get_Tag(ReqPro40.enumTagFormat.eTagFormat_ReqNameOrReqText);
tempTxt=tempTxt.Substring(1, tempTxt.Length-1);
char[] charsToRemoveForXMLLegality = new char[]
{ '\x000a', '\x000b', '\x0002', '\x001e', // NL, VT, STX, RS
'\x0034', '\x8220', '\x8221', // ", left double, right double quote
'\x8216', '\x8217', // left single quote, right single quote
'x8211', '\x8212', // en-dash, em-dash
'\x0188', '\x0177', // 1/4 fraction, plus/minus
'\x8230', '\x0160' // ellipsis, non-breaking space
};
string[] charsToSubstituteForXMLLegality = new string[]
{ " ", " ", "", "-",
"\"", "\"", "\"",
"\'", "\'",
"-", "-",
"1/4", "+/-",
"...", " "
};
for (int i = 0; i < charsToRemoveForXMLLegality.Length; i++)
{
tempTxt = tempTxt.Replace(charsToRemoveForXMLLegality[i].ToString(), charsToSubstituteForXMLLegality[i]);
}