Related
I want a code that takes a string with 'Or' in it and takes text before and after 'Or' and stores it in seperate variable
I tried the substring function
var text = "Actor or Actress";
var result= text.Substring(0, text.LastIndexOf("or"));
but with this getting only actor I want actor as well as actress but in seperate variables as a whole word so it can be anything in place of 'actor or actress'
You need to use one of the flavors of String.Split that accepts an array of string delimiters:
string text = "Actor or Actress";
string[] delim = new string[] { " or " }; // add spaces around to avoid spliting `Actor` due to the `or` in the end
string[] elements = text.Split(delim, StringSplitOptions.None);
foreach (string elem in elements)
{
Console.WriteLine(elem);
}
Output:
Actor
Actress
Note: I am using .NET framework 4.8, but .NET 6 also has an overload of String.Split that accepts a single string delimiter so there's no need to create delim as an array of strings, unless you want to be able to split based on variations like " Or "," or ".
Spli() should do the job
text.Split(“or”);
use the Split() method
var text = "Actor or Actress";
Console.WriteLine(text.Split("or")[0]);
Console.WriteLine(text.Split("or")[1]);
Output
Actor
Actress
try this
string[] array = text.Split(' ');
foreach (string item in array)
{
if (item != "or")
{
Console.WriteLine(item);
}
}
Output
Actor
Actress
I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}
I would need some help with matching data in this example string:
req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}
Essentially, every parameter is separated by "," but it is also included within {} and I need some help with regex as I am not that good with it.
Desired Output:
req = "REQUESTER_NAME"
key = "abc"
act = "UPDATE"
sku[0] = "ABC123"
sku[1] = "DEF-123"
qty[0] = 10
qty[1] = 5
I would suggest you do the following
Use String Split with ',' character as the separator (eg output req:{REQUESTER_NAME})
With each pair of data, do String Split with ';' character as the separator (eg output "req", "{REQUESTER_NAME}")
Do a String Replace for characters '{' and '}' with "" (eg output REQUESTER_NAME)
Do a String Split again with ',' character as separator (eg output "ABC123", "DEF-123")
That should parse it for you perfectly. You can store the results into your data structure as the results come in. (Eg. You can store the name at step 2 whereas the value for some might be available at Step 3 and for others at Step 4)
Hope That Helped
Note:
- If you don't know string split - http://www.dotnetperls.com/split-vbnet
- If you don't know string replace - http://www.dotnetperls.com/replace-vbnet
The below sample may helps to solve your problem. But here lot of string manipulations are there.
string input = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}";
Console.WriteLine(input);
string[] words = input.Split(new string[] { "}," }, StringSplitOptions.RemoveEmptyEntries);
foreach (string item in words)
{
if (item.Contains(':'))
{
string modifiedString = item.Replace(",", "," + item.Substring(0, item.IndexOf(':')) + ":");
string[] wordsColl = modifiedString.Split(new char[] { ',' }, StringSplitOptions.RemoveEmptyEntries);
foreach (string item1 in wordsColl)
{
string finalString = item1.Replace("{", "");
finalString = finalString.Replace("}", "");
Console.WriteLine(finalString);
}
}
}
First, use Regex.Matches to get the parameters inside { and }.
string str = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}";
MatchCollection matches = Regex.Matches(str,#"\{.+?\}");
string[] arr = matches.Cast<Match>()
.Select(m => m.Groups[0].Value.Trim(new char[]{'{','}',' '}))
.ToArray();
foreach (string s in arr)
Console.WriteLine(s);
output
REQUESTER_NAME
abc
UPDATE
ABC123,DEF-123
10,5
then use Regex.Split to get the parameter names
string[] arr1 = Regex.Split(str,#"\{.+?\}")
.Select(x => x.Trim(new char[]{',',':',' '}))
.Where(x => !string.IsNullOrEmpty(x)) //need this to get rid of empty strings
.ToArray();
foreach (string s in arr1)
Console.WriteLine(s);
output
req
key
act
sku
qty
Now you can easily traverse through the parameters. something like this
for(int i=0; i<arr.Length; i++)
{
if(arr1[i] == "req")
//arr[i] contains req parameters
else if(arr1[i] == "sku")
//arr[i] contains sku parameters
//use string.Split(',') to get all the sku paramters and process them
}
Kishore's answer is correct. This extension method may help implement that suggestion:
<Extension()>
Function WideSplit(InputString As String, SplitToken As String) As String()
Dim aryReturn As String()
Dim intIndex As Integer = InputString.IndexOf(SplitToken)
If intIndex = -1 Then
aryReturn = {InputString}
Else
ReDim aryReturn(1)
aryReturn(0) = InputString.Substring(0, intIndex)
aryReturn(1) = InputString.Substring(intIndex + SplitToken.Length)
End If
Return aryReturn
End Function
If you import System.Runtime.CompilerServices, you can use it like this:
Dim stringToParse As String = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}"
Dim strTemp As String
Dim aryTemp As String()
strTemp = stringToParse.WideSplit("req:{")(1)
aryTemp = strTemp.WideSplit("},key:{")
req = aryTemp(0)
aryTemp = aryTemp(1).WideSplit("},act:{")
key = aryTemp(0)
'etc...
You may be able do this more memory efficiently, though, as this method creates a number of temporary string allocations.
Kishore's solution is perfect, but here is another solution that works with regex:
Dim input As String = "req:{REQUESTER_NAME},key:{abc},act:{UPDATE},sku:{ABC123,DEF-123},qty:{10,5}"
Dim Array = Regex.Split(input, ":{|}|,")
This does essentially the same, it uses regex to split on :{, } and ,. The solution might be a bit shorter though. The values will be put into the array like this:
"req", "REQUESTER_NAME","", ... , "qty", "10", "5", ""
Notice after the parameter and its value(s) there will be an empty string in the array. When looping over the array you can use this to let the program know when a new parameter starts. Then you can create a new array/data structure to store its values.
Using the .NET MicroFramework which is a really cut-down version of C#. For instance, System.String barely has any of the goodies that we've enjoyed over the years.
I need to split a text document into lines, which means splitting by \r\n. However, String.Split only provides a split by char, not by string.
How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?
P.S. System.String is also missing a Replace method, so that won't work.
P.P.S. Regex is not part of the MicroFramework either.
You can do
string[] lines = doc.Split('\n');
for (int i = 0; i < lines.Length; i+= 1)
lines[i] = lines[i].Trim();
Assuming that the µF supports Trim() at all. Trim() will remove all whitespace, that might be useful. Otherwise use TrimEnd('\r')
I would loop across each char in the document, because that's clearly required. How do you think String.Split works? I would try to do so only hitting each character once, however.
Keep a list of strings found so far. Use IndexOf repeatedly, passing in the current offset into the string (i.e. the previous match + 2).
How can I split a document into lines in an efficient manner (e.g. not looping madly across each char in the doc)?
How do you think the built-in Split works?
Just reimplement it yourself as an extension method.
What about:
string path = "yourfile.txt";
string[] lines = File.ReadAllLines(path);
Or
string content = File.ReadAllText(path);
string[] lines = content.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyEntries);
Readind that .NET Micro Framework 3.0, this code can work:
string line = String.Empty;
StreamReader reader = new StreamReader(path);
while ((line = reader.ReadLine()) != null)
{
// do stuff
}
This may help in some scenario:
StreamReader reader = new StreamReader(file);
string _Line = reader.ReadToEnd();
string IntMediateLine = string.Empty;
IntMediateLine = _Line.Replace("entersign", "");
string[] ArrayLineSpliter = IntMediateLine.Split('any specail chaarater');
If you'd like a MicroFramework compatible split function that works for an entire string of characters, here's one that does the trick, similar to the regular frameworks' version using StringSplitOptions.None:
private static string[] Split(string s, string delim)
{
if (s == null) throw new NullReferenceException();
// Declarations
var strings = new ArrayList();
var start = 0;
// Tokenize
if (delim != null && delim != "")
{
int i;
while ((i = s.IndexOf(delim, start)) != -1)
{
strings.Add(s.Substring(start, i - start));
start = i + delim.Length;
}
}
// Append left over
strings.Add(s.Substring(start));
return (string[]) strings.ToArray(typeof(string));
}
You can split your string with a substring.
String.Split(new string[] { "\r\n" }, StringSplitOptions.None);
My string is as follows:
smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
I need back:
smtp:jblack#test.com
SMTP:jb#test.com
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;
The problem is the semi-colons seperate the addresses and also part of the X400 address. Can anyone suggest how best to split this?
PS I should mentioned the order differs so it could be:
X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com
There can be more than 3 address, 4, 5.. 10 etc including an X500 address, however they do all start with either smtp: SMTP: X400 or X500.
EDIT: With the updated information, this answer certainly won't do the trick - but it's still potentially useful, so I'll leave it here.
Will you always have three parts, and you just want to split on the first two semi-colons?
If so, just use the overload of Split which lets you specify the number of substrings to return:
string[] bits = text.Split(new char[]{';'}, 3);
May I suggest building a regular expression
(smtp|SMTP|X400|X500):((?!smtp:|SMTP:|X400:|X500:).)*;?
or protocol-less
.*?:((?![^:;]*:).)*;?
in other words find anything that starts with one of your protocols. Match the colon. Then continue matching characters as long as you're not matching one of your protocols. Finish with a semicolon (optionally).
You can then parse through the list of matches splitting on ':' and you'll have your protocols. Additionally if you want to add protocols, just add them to the list.
Likely however you're going to want to specify the whole thing as case-insensitive and only list the protocols in their uppercase or lowercase versions.
The protocol-less version doesn't care what the names of the protocols are. It just finds them all the same, by matching everything up to, but excluding a string followed by a colon or a semi-colon.
Split by the following regex pattern
string[] items = System.Text.RegularExpressions.Split(text, ";(?=\w+:)");
EDIT: better one can accept more special chars in the protocol name.
string[] items = System.Text.RegularExpressions.Split(text, ";(?=[^;:]+:)");
http://msdn.microsoft.com/en-us/library/c1bs0eda.aspx
check there, you can specify the number of splits you want. so in your case you would do
string.split(new char[]{';'}, 3);
Not the fastest if you are doing this a lot but it will work for all cases I believe.
string input1 = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string input2 = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;;smtp:jblack#test.com;SMTP:jb#test.com";
Regex splitEmailRegex = new Regex(#"(?<key>\w+?):(?<value>.*?)(\w+:|$)");
List<string> sets = new List<string>();
while (input2.Length > 0)
{
Match m1 = splitEmailRegex.Matches(input2)[0];
string s1 = m1.Groups["key"].Value + ":" + m1.Groups["value"].Value;
sets.Add(s1);
input2 = input2.Substring(s1.Length);
}
foreach (var set in sets)
{
Console.WriteLine(set);
}
Console.ReadLine();
Of course many will claim Regex: Now you have two problems. There may even be a better regex answer than this.
You could always split on the colon and have a little logic to grab the key and value.
string[] bits = text.Split(':');
List<string> values = new List<string>();
for (int i = 1; i < bits.Length; i++)
{
string value = bits[i].Contains(';') ? bits[i].Substring(0, bits[i].LastIndexOf(';') + 1) : bits[i];
string key = bits[i - 1].Contains(';') ? bits[i - 1].Substring(bits[i - 1].LastIndexOf(';') + 1) : bits[i - 1];
values.Add(String.Concat(key, ":", value));
}
Tested it with both of your samples and it works fine.
This caught my curiosity .... So this code actually does the job, but again, wants tidying :)
My final attempt - stop changing what you need ;=)
static void Main(string[] args)
{
string fneh = "X400:C=US400;A= ;P=Test;O=Exchange;S=Jack;G=Black;x400:C=US400l;A= l;P=Testl;O=Exchangel;S=Jackl;G=Blackl;smtp:jblack#test.com;X500:C=US500;A= ;P=Test;O=Exchange;S=Jack;G=Black;SMTP:jb#test.com;";
string[] parts = fneh.Split(new char[] { ';' });
List<string> addresses = new List<string>();
StringBuilder address = new StringBuilder();
foreach (string part in parts)
{
if (part.Contains(":"))
{
if (address.Length > 0)
{
addresses.Add(semiColonCorrection(address.ToString()));
}
address = new StringBuilder();
address.Append(part);
}
else
{
address.AppendFormat(";{0}", part);
}
}
addresses.Add(semiColonCorrection(address.ToString()));
foreach (string emailAddress in addresses)
{
Console.WriteLine(emailAddress);
}
Console.ReadKey();
}
private static string semiColonCorrection(string address)
{
if ((address.StartsWith("x", StringComparison.InvariantCultureIgnoreCase)) && (!address.EndsWith(";")))
{
return string.Format("{0};", address);
}
else
{
return address;
}
}
Try these regexes. You can extract what you're looking for using named groups.
X400:(?<X400>.*?)(?:smtp|SMTP|$)
smtp:(?<smtp>.*?)(?:;+|$)
SMTP:(?<SMTP>.*?)(?:;+|$)
Make sure when constructing them you specify case insensitive. They seem to work with the samples you gave
Lots of attempts. Here is mine ;)
string src = "smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
Regex r = new Regex(#"
(?:^|;)smtp:(?<smtp>([^;]*(?=;|$)))|
(?:^|;)x400:(?<X400>.*?)(?=;x400|;x500|;smtp|$)|
(?:^|;)x500:(?<X500>.*?)(?=;x400|;x500|;smtp|$)",
RegexOptions.IgnoreCase | RegexOptions.IgnorePatternWhitespace);
foreach (Match m in r.Matches(src))
{
if (m.Groups["smtp"].Captures.Count != 0)
Console.WriteLine("smtp: {0}", m.Groups["smtp"]);
else if (m.Groups["X400"].Captures.Count != 0)
Console.WriteLine("X400: {0}", m.Groups["X400"]);
else if (m.Groups["X500"].Captures.Count != 0)
Console.WriteLine("X500: {0}", m.Groups["X500"]);
}
This finds all smtp, x400 or x500 addresses in the string in any order of appearance. It also identifies the type of address ready for further processing. The appearance of the text smtp, x400 or x500 in the addresses themselves will not upset the pattern.
This works!
string input =
"smtp:jblack#test.com;SMTP:jb#test.com;X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G=Black;";
string[] parts = input.Split(';');
List<string> output = new List<string>();
foreach(string part in parts)
{
if (part.Contains(":"))
{
output.Add(part + ";");
}
else if (part.Length > 0)
{
output[output.Count - 1] += part + ";";
}
}
foreach(string s in output)
{
Console.WriteLine(s);
}
Do the semicolon (;) split and then loop over the result, re-combining each element where there is no colon (:) with the previous element.
string input = "X400:C=US;A= ;P=Test;O=Exchange;S=Jack;G="
+"Black;;smtp:jblack#test.com;SMTP:jb#test.com";
string[] rawSplit = input.Split(';');
List<string> result = new List<string>();
//now the fun begins
string buffer = string.Empty;
foreach (string s in rawSplit)
{
if (buffer == string.Empty)
{
buffer = s;
}
else if (s.Contains(':'))
{
result.Add(buffer);
buffer = s;
}
else
{
buffer += ";" + s;
}
}
result.Add(buffer);
foreach (string s in result)
Console.WriteLine(s);
here is another possible solution.
string[] bits = text.Replace(";smtp", "|smtp").Replace(";SMTP", "|SMTP").Replace(";X400", "|X400").Split(new char[] { '|' });
bits[0],
bits[1], and
bits[2]
will then contains the three parts in the order from your original string.