Extract list items from a string - c#

I have a line like
"In this task, you need to use the following equipment:
Perforator
Screwdriver
Drill
After finishing work, the tool must be cleaned."
How do I extract elements from this string? As a result, I need an array like {"Perforator", "Screwdriver", "Drill"}

This is how I would do this using regular expressions (assuming the input text is similar to your example and there is always numbering in front of each item):
string input = #"
In this task, you need to use the following equipment:
1. Perforator
2. Screwdriver
3. Drill
After finishing work, the tool must be cleaned.";
string pattern = #"(\d\. )([a-zA-z]*)";
var results = Regex.Matches(input, pattern);
foreach (Match result in results)
{
Console.WriteLine(result.Groups[2].Value); // ...or insert each in a List
}
Result:
Perforator
Screwdriver
Drill

One possible way to do it would be to break the string into lines, then try to convert each line into a number plus text, and take only the lines where the conversion is successful.
IEnumerable<string> GetNumberedItems(string input)
{
var lines = input.Split(new [] { '\r', '\n' }, StringSplitOptions.RemoveEmptyEntries);
foreach (var line in lines)
{
var items = line.Split('.');
if (items.Length != 2) continue;
var ok = int.TryParse(items[0].Trim(), out _);
if (ok) yield return items[1].Trim();
}
}

Related

how to get before and after text in the string which contains 'Or' in it

I want a code that takes a string with 'Or' in it and takes text before and after 'Or' and stores it in seperate variable
I tried the substring function
var text = "Actor or Actress";
var result= text.Substring(0, text.LastIndexOf("or"));
but with this getting only actor I want actor as well as actress but in seperate variables as a whole word so it can be anything in place of 'actor or actress'
You need to use one of the flavors of String.Split that accepts an array of string delimiters:
string text = "Actor or Actress";
string[] delim = new string[] { " or " }; // add spaces around to avoid spliting `Actor` due to the `or` in the end
string[] elements = text.Split(delim, StringSplitOptions.None);
foreach (string elem in elements)
{
Console.WriteLine(elem);
}
Output:
Actor
Actress
Note: I am using .NET framework 4.8, but .NET 6 also has an overload of String.Split that accepts a single string delimiter so there's no need to create delim as an array of strings, unless you want to be able to split based on variations like " Or "," or ".
Spli() should do the job
text.Split(“or”);
use the Split() method
var text = "Actor or Actress";
Console.WriteLine(text.Split("or")[0]);
Console.WriteLine(text.Split("or")[1]);
Output
Actor
Actress
try this
string[] array = text.Split(' ');
foreach (string item in array)
{
if (item != "or")
{
Console.WriteLine(item);
}
}
Output
Actor
Actress

C# Split a string and build a stringarray out of the string [duplicate]

I need to split a string into newlines in .NET and the only way I know of to split strings is with the Split method. However that will not allow me to (easily) split on a newline, so what is the best way to do it?
To split on a string you need to use the overload that takes an array of strings:
string[] lines = theText.Split(
new string[] { Environment.NewLine },
StringSplitOptions.None
);
Edit:
If you want to handle different types of line breaks in a text, you can use the ability to match more than one string. This will correctly split on either type of line break, and preserve empty lines and spacing in the text:
string[] lines = theText.Split(
new string[] { "\r\n", "\r", "\n" },
StringSplitOptions.None
);
What about using a StringReader?
using (System.IO.StringReader reader = new System.IO.StringReader(input)) {
string line = reader.ReadLine();
}
Try to avoid using string.Split for a general solution, because you'll use more memory everywhere you use the function -- the original string, and the split copy, both in memory. Trust me that this can be one hell of a problem when you start to scale -- run a 32-bit batch-processing app processing 100MB documents, and you'll crap out at eight concurrent threads. Not that I've been there before...
Instead, use an iterator like this;
public static IEnumerable<string> SplitToLines(this string input)
{
if (input == null)
{
yield break;
}
using (System.IO.StringReader reader = new System.IO.StringReader(input))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
This will allow you to do a more memory efficient loop around your data;
foreach(var line in document.SplitToLines())
{
// one line at a time...
}
Of course, if you want it all in memory, you can do this;
var allTheLines = document.SplitToLines().ToArray();
You should be able to split your string pretty easily, like so:
aString.Split(Environment.NewLine.ToCharArray());
Based on Guffa's answer, in an extension class, use:
public static string[] Lines(this string source) {
return source.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
}
Regex is also an option:
private string[] SplitStringByLineFeed(string inpString)
{
string[] locResult = Regex.Split(inpString, "[\r\n]+");
return locResult;
}
For a string variable s:
s.Split(new string[]{Environment.NewLine},StringSplitOptions.None)
This uses your environment's definition of line endings. On Windows, line endings are CR-LF (carriage return, line feed) or in C#'s escape characters \r\n.
This is a reliable solution, because if you recombine the lines with String.Join, this equals your original string:
var lines = s.Split(new string[]{Environment.NewLine},StringSplitOptions.None);
var reconstituted = String.Join(Environment.NewLine,lines);
Debug.Assert(s==reconstituted);
What not to do:
Use StringSplitOptions.RemoveEmptyEntries, because this will break markup such as Markdown where empty lines have syntactic purpose.
Split on separator new char[]{Environment.NewLine}, because on Windows this will create one empty string element for each new line.
I just thought I would add my two-bits, because the other solutions on this question do not fall into the reusable code classification and are not convenient.
The following block of code extends the string object so that it is available as a natural method when working with strings.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Collections;
using System.Collections.ObjectModel;
namespace System
{
public static class StringExtensions
{
public static string[] Split(this string s, string delimiter, StringSplitOptions options = StringSplitOptions.None)
{
return s.Split(new string[] { delimiter }, options);
}
}
}
You can now use the .Split() function from any string as follows:
string[] result;
// Pass a string, and the delimiter
result = string.Split("My simple string", " ");
// Split an existing string by delimiter only
string foo = "my - string - i - want - split";
result = foo.Split("-");
// You can even pass the split options parameter. When omitted it is
// set to StringSplitOptions.None
result = foo.Split("-", StringSplitOptions.RemoveEmptyEntries);
To split on a newline character, simply pass "\n" or "\r\n" as the delimiter parameter.
Comment: It would be nice if Microsoft implemented this overload.
Starting with .NET 6 we can use the new String.ReplaceLineEndings() method to canonicalize cross-platform line endings, so these days I find this to be the simplest way:
var lines = input
.ReplaceLineEndings()
.Split(Environment.NewLine, StringSplitOptions.None);
I'm currently using this function (based on other answers) in VB.NET:
Private Shared Function SplitLines(text As String) As String()
Return text.Split({Environment.NewLine, vbCrLf, vbLf}, StringSplitOptions.None)
End Function
It tries to split on the platform-local newline first, and then falls back to each possible newline.
I've only needed this inside one class so far. If that changes, I will probably make this Public and move it to a utility class, and maybe even make it an extension method.
Here's how to join the lines back up, for good measure:
Private Shared Function JoinLines(lines As IEnumerable(Of String)) As String
Return String.Join(Environment.NewLine, lines)
End Function
Well, actually split should do:
//Constructing string...
StringBuilder sb = new StringBuilder();
sb.AppendLine("first line");
sb.AppendLine("second line");
sb.AppendLine("third line");
string s = sb.ToString();
Console.WriteLine(s);
//Splitting multiline string into separate lines
string[] splitted = s.Split(new string[] {System.Environment.NewLine}, StringSplitOptions.RemoveEmptyEntries);
// Output (separate lines)
for( int i = 0; i < splitted.Count(); i++ )
{
Console.WriteLine("{0}: {1}", i, splitted[i]);
}
string[] lines = text.Split(
Environment.NewLine.ToCharArray(),
StringSplitOptions.RemoveEmptyStrings);
The RemoveEmptyStrings option will make sure you don't have empty entries due to \n following a \r
(Edit to reflect comments:) Note that it will also discard genuine empty lines in the text. This is usually what I want but it might not be your requirement.
I did not know about Environment.Newline, but I guess this is a very good solution.
My try would have been:
string str = "Test Me\r\nTest Me\nTest Me";
var splitted = str.Split('\n').Select(s => s.Trim()).ToArray();
The additional .Trim removes any \r or \n that might be still present (e. g. when on windows but splitting a string with os x newline characters). Probably not the fastest method though.
EDIT:
As the comments correctly pointed out, this also removes any whitespace at the start of the line or before the new line feed. If you need to preserve that whitespace, use one of the other options.
Examples here are great and helped me with a current "challenge" to split RSA-keys to be presented in a more readable way. Based on Steve Coopers solution:
string Splitstring(string txt, int n = 120, string AddBefore = "", string AddAfterExtra = "")
{
//Spit each string into a n-line length list of strings
var Lines = Enumerable.Range(0, txt.Length / n).Select(i => txt.Substring(i * n, n)).ToList();
//Check if there are any characters left after split, if so add the rest
if(txt.Length > ((txt.Length / n)*n) )
Lines.Add(txt.Substring((txt.Length/n)*n));
//Create return text, with extras
string txtReturn = "";
foreach (string Line in Lines)
txtReturn += AddBefore + Line + AddAfterExtra + Environment.NewLine;
return txtReturn;
}
Presenting a RSA-key with 33 chars width and quotes are then simply
Console.WriteLine(Splitstring(RSAPubKey, 33, "\"", "\""));
Output:
Hopefully someone find it usefull...
Silly answer: write to a temporary file so you can use the venerable
File.ReadLines
var s = "Hello\r\nWorld";
var path = Path.GetTempFileName();
using (var writer = new StreamWriter(path))
{
writer.Write(s);
}
var lines = File.ReadLines(path);
using System.IO;
string textToSplit;
if (textToSplit != null)
{
List<string> lines = new List<string>();
using (StringReader reader = new StringReader(textToSplit))
{
for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
lines.Add(line);
}
}
}
Very easy, actually.
VB.NET:
Private Function SplitOnNewLine(input as String) As String
Return input.Split(Environment.NewLine)
End Function
C#:
string splitOnNewLine(string input)
{
return input.split(environment.newline);
}

Replacing XElement content with XElement

Is there a way to selectively replace XElement content with other XElements?
I have this XML:
<prompt>
There is something I want to tell you.[pause=3]
You are my favorite caller today.[pause=1]
Have a great day!
</prompt>
And I want to render it as this:
<prompt>
There is something I want to tell you.<break time="3s"/>
You are my favorite caller today.<break time="1s"/>
Have a great day!
</prompt>
I need to replace the placeholders with actual XElements, but when I try to alter the content of an XElement, .NET of course escapes all of the angle brackets. I understand why the content would normally need to be correctly escaped, but I need to bypass that behavior and inject XML directly into content.
Here's my code that would otherwise work.
MatchCollection matches = Regex.Matches(content, #"\[(\w+)=(\d+)]");
foreach (XElement element in voiceXmlDocument.Descendants("prompt"))
{
if (matches[0] == null)
continue;
element.Value = element.Value.Replace(matches[0].Value, #"<break time=""5s""/>");
}
This is a work in progress, so don't worry so much about the validity of the RegEx pattern, as I will work that out later to match several conditions. This is proof of concept code and the focus is on replacing the placeholders as described. I only included the iteration and RegEx code here to illustrate that I need to be able to do this to a whole document that is already populated with content.
You can use XElement.Parse() method:
First, get the outer xml of your XElement, for example,
string outerXml = element.ToString();
The you have exactly this string to work with:
<prompt>
There is something I want to tell you.[pause=3]
You are my favorite caller today.[pause=1]
Have a great day!
</prompt>
Then you can do your replacement
outerXml = outerXml.Replace(matches[0].Value, #"<break time=""5s""/>");
Then you can parse it back:
XElement repElement = XElement.Parse(outerXml);
And, finally, replace original XElement:
element.ReplaceWith(repElement);
The key to all of this is the XText, which allows you to work with text as an element.
This is the loop:
foreach (XElement prompt in voiceXmlDocument.Descendants("prompt"))
{
string text = prompt.Value;
prompt.RemoveAll();
foreach (string phrase in text.Split('['))
{
string[] parts = phrase.Split(']');
if (parts.Length > 1)
{
string[] pause = parts[0].Split('=');
prompt.Add(new XElement("break", new XAttribute("time", pause[1])));
// add a + "s" if you REALLY want it, but then you have to get rid
// of it later in some other code.
}
prompt.Add(new XText(parts[parts.Length - 1]));
}
}
This is the end result
<prompt>
There is something I want to tell you.<break time="3" />
You are my favorite caller today.<break time="1" />
Have a great day!
</prompt>
class Program
{
static void Main(string[] args)
{
var xml =
#"<prompt>There is something I want to tell you.[pause=3] You are my favorite caller today.[pause=1] Have a great day!</prompt>";
var voiceXmlDocument = XElement.Parse(xml);
var pattern = new Regex(#"\[(\w+)=(\d+)]");
foreach (var element in voiceXmlDocument.DescendantsAndSelf("prompt"))
{
var matches = pattern.Matches(element.Value);
foreach (var match in matches)
{
var matchValue = match.ToString();
var number = Regex.Match(matchValue, #"\d+").Value;
var newValue = string.Format(#"<break time=""{0}""/>", number);
element.Value = element.Value.Replace(matchValue, newValue);
}
}
Console.WriteLine(voiceXmlDocument.ToString());
}
}
Oh, my goodness, you guys were quicker than I expected! So, thanks for that, however in the meantime, I solved it a slightly different way. The code here looks expanded from before because once I got it working, I added some specifics into this particular condition:
foreach (XElement element in voiceXmlDocument.Descendants("prompt").ToArray())
{
// convert the element to a string and see to see if there are any instances
// of pause placeholders in it
string elementAsString = element.ToString();
MatchCollection matches = Regex.Matches(elementAsString, #"\[pause=(\d+)]");
if (matches == null || matches.Count == 0)
continue;
// if there were no matches or an empty set, move on to the next one
// iterate through the indexed matches
for (int i = 0; i < matches.Count; i++)
{
int pauseValue = 0; // capture the original pause value specified by the user
int pauseMilliSeconds = 1000; // if things go wrong, use a 1 second default
if (matches[i].Groups.Count == 2) // the value is expected to be in the second group
{
// if the value could be parsed to an integer, convert it from 1/8 seconds to milliseconds
if (int.TryParse(matches[i].Groups[1].Value, out pauseValue))
pauseMilliSeconds = pauseValue * 125;
}
// replace the specific match with the new <break> tag content
elementAsString = elementAsString.Replace(matches[i].Value, string.Format(#"<break time=""{0}ms""/>", pauseMilliSeconds));
}
// finally replace the element by parsing
element.ReplaceWith(XElement.Parse(elementAsString));
}
Oh, my goodness, you guys were quicker than I expected!
Doh! Might as well post my solution anyway!
foreach (var element in xml.Descendants("prompt"))
{
Queue<string> pauses = new Queue<string>(Regex.Matches(element.Value, #"\[pause *= *\d+\]")
.Cast<Match>()
.Select(m => m.Value));
Queue<string> text = new Queue<string>(element.Value.Split(pauses.ToArray(), StringSplitOptions.None));
element.RemoveAll();
while (text.Any())
{
element.Add(new XText(text.Dequeue()));
if (pauses.Any())
element.Add(new XElement("break", new XAttribute("time", Regex.Match(pauses.Dequeue(), #"\d+"))));
}
}
For every prompt element, Regex match all your pauses and put them in a queue.
Then use these prompts to delimit the inner text and grab the 'other' text and put that in a queue.
Clear all data from the element using RemoveAll and then iterate over your delimited data and re-add it as the appropriate data type. When you are adding in the new attributes you can use Regex to get the number value out of the original match.

Splitting a string using regex: *category**bob*!cookshop

This is what I have been trying so far. Basically, I want to split the string but keep the separator. My regex knowledge is very limited but I've been trying using a forward lookup to match the expression. Whenever I try to introduce \*1 into the string split, it goes badly so I'm not sure what to do and if this is possible.
var tests = new List<string>
{
"*foo**bar*!bob",
"*foo*!42",
"!foo*bar*"
};
foreach (var expression in tests)
{
var strings = Regex.Split(expression, #"(?=[!])");
Console.WriteLine(String.Join(Environment.NewLine, strings));
}
1st line:
*foo**bar*
!bob
2nd Line (this is working as expected)
*foo*
!42
3rd line
{EMPTY LINE}
!foo*bar*
But I'm trying to get back is:
1st line
*foo*
*bar*
!bob
2nd line - As Above (this is correct)
3rd line
!foo
*bar*
Try this...
var tests = new List<string>
{
"*foo**bar*!bob",
"*foo*!42",
"!foo*bar*"
};
foreach (var expression in tests)
{
var strings = Regex.Split(expression, #"(?=[!])|(\*[^\*]+\*)").Where(exp => !String.IsNullOrEmpty(exp));
Console.WriteLine(String.Join(Environment.NewLine, strings));
}
Results:
*foo*
*bar*
!bob
*foo*
!42
!foo
*bar*

split a string from a text file into another list

Hi i know the Title might sound a little confusing but im reading in a text file with many lines of data
Example
12345 Test
34567 Test2
i read in the text 1 line at a time and add to a list
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line);
}
}
how do i then separate the 1234 from the test so i can pull only the first column of data if i need like list(1).pars[1] would be 12345 and list(2).pars[2] would be test2
i know this sounds foggy but i hope someone out there understands
Maybe something like this:
string test="12345 Test";
var ls= test.Split(' ');
This will get you a array of string. You can get them with ls[0] and ls[1].
If you just what the 12345 then ls[0] is the one to choose.
If you're ok with having a list of string[]'s you can simply do this:
var list = new List<string[]>();
using (StreamReader reader = new StreamReader("Test.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
list.Add(line.Split(' '));
}
}
string firstWord = list[0][0]; //12345
string secondWord = list[0][1]; //Test
When you have a string of text you can use the Split() method to split it in many parts. If you're sure every word (separated by one or more spaces) is a column you can simply write:
string[] columns = line.Split(' ');
There are several overloads of that function, you can specify if blank fields are skipped (you may have, for example columns[1] empty in a line composed by 2 words but separated by two spaces). If you're sure about the number of columns you can fix that limit too (so if any text after the last column will be treated as a single field).
In your case (add to the list only the first column) you may write:
if (String.IsNullOrWhiteSpace(line))
continue;
string[] columns = line.TrimLeft().Split(new char[] { ' ' }, 2);
list.Add(columns[0]);
First check is to skip empty or lines composed just of spaces. The TrimLeft() is to remove spaces from beginning of the line (if any). The first column can't be empty (because the TrimLeft() so yo do not even need to use StringSplitOptions.RemoveEmptyEntries with an additional if (columns.Length > 1). Finally, if the file is small enough you can read it in memory with a single call to File.ReadAllLines() and simplify everything with a little of LINQ:
list.Add(
File.ReadAllLines("test.txt")
.Where(x => !String.IsNullOrWhiteSpace(x))
.Select(x => x.TrimLeft().Split(new char[] { ' ' }, 2)[0]));
Note that with the first parameter you can specify more than one valid separator.
When you have multiple spaces
Regex r = new Regex(" +");
string [] splitString = r.Split(stringWithMultipleSpaces);
var splitted = System.IO.File.ReadAllLines("Test.txt")
.Select(line => line.Split(' ')).ToArray();
var list1 = splitted.Select(split_line => split_line[0]).ToArray();
var list2 = splitted.Select(split_line => split_line[1]).ToArray();

Categories