extracting a substring within a multiline string

extracting a substring within a multiline string - c#

I have a text file containing the following lines:
<TestInfo."Content">
{
<Label> "Content"
<Visible> "true"
"This is the text I want to get"
}
<TestInfo."Content2">
{
<Label> "Content2"
<Visible> "true"
"I don't want e.g. this"
}
I want to extract This is the text I want to get.
I tried e.g. the following:
string tmp = File.ReadAllText(textfile);
string result = Regex.Match(tmp, #"<Label> ""Content"" \n\s+ <Visible> ""true"" \n\s+ ""(.+?)""", RegexOptions.Singleline).Groups[1].Value;
However, in this case I get only the first word.
So, my output is: This
And I have no idea why...
I would appreciate any help. Thanks!

If you want the entire line after the line that starts with <Visible>, you'd better read the file line by line instead of using File.ReadAllText and a regular expression:
string result;
using (StreamReader sr = new StreamReader(textfile))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (line.StartsWith("<Visible>"))
{
result = sr.ReadLine();
break;
}
}
}

Try this:
var tmp = File.ReadAllText("TextFile1.txt");
var result = Regex.Match(tmp, "This is the text I want to get", RegexOptions.Multiline);
if (result.Groups.Count> 0)
for (int i = 0; i < result.Groups.Count; i++)
Console.WriteLine(result.Groups[i].Value);
else
Console.WriteLine("string not found.");
Regards,
//jafc

You could change your regex this way:
var result = Regex.Match(tmp, #"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value;
If you want to get all the matches, not only the first one, you could use Regex.Matches

Thanks a lot for your input! This helped me to find a final solution:
First, I extracted only a small part containing the string I want to extract to avoid ambiguities:
string[] tmp = File.ReadAllLines(textfile);
List<string> Content = new List<string>();
bool dumpA = false;
Regex regBEGIN = new Regex(#"<TestInfo\.""Content"">");
Regex regEND = new Regex(#"<TestInfo\.""Content2"">");
foreach (string line in tmp)
{
if (dumpA)
Content.Add(line.Trim());
if (regBEGIN.IsMatch(line))
dumpA = true;
if (regEND.IsMatch(line)) break;
}
Then I can extract the (now only once existing) line starting with '"':
string result = "";
foreach (string line in Content)
{
if (line.StartsWith("\""))
{
result = line;
result = result.Replace("\"", "");
result = result.Trim();
}
}

Related

C# search line and then overwrite it

How can I find a line in C# and overwrite it (.sii file)?
string result = string.Empty;
var lines = File.ReadAllLines(Path);
foreach (var line in lines)
{
if (line.Contains("my_truck_placement: ("))
{
var text = line.Replace("my_truck_placement: ", "");
result = text.Trim();
File.WriteAllText(Path, result);
}
}

The main problem of yours is that you are trying to write to file too early, before you finish analyzing content of the file.
// use implicit types wherever possible
// Good to explicitly initiate with string.Empty :)
var result = string.Empty;
var lines = File.ReadAllLines(Path);
// I prefer here for each loop, as we are oging ot modify content of
// collection being iterated over.
for (var i = 0; i < lines.Length; i++)
{
var line = lines[i];
if (line.Contains("my_truck_placement: ("))
{
lines[i] = line.Replace("my_truck_placement: ", "");
}
}
// Here, after all manipulations, you are able to write to file.
File.WriteAllLines(Path, lines);
You could simplify even further, for example loop body:
lines[i] = lines[i].Replace("my_truck_placement: (", "(");
If you are sure the phrase will only happen at the beginning of the line.
You could even limit yourself to such code
File.WriteAllLines(
Path,
File.ReadAllLines(Path)
.Select(x => x.Replace("my_truck_placement: (", "("))
.ToArray());

Search multiple words in a text file

I made a code to search for several words in a text file but only the last word is searched, I would like to solve it
code:
string txt_text;
string[] words = {
"var",
"bob",
"for",
"example"
};
StreamReader file = new StreamReader("test.txt");
foreach(string _words in words) {
while ((txt_text = file.ReadToEnd()) != null) {
if (txt_text.Contains(_words)) {
textBox1.Text = "founded";
break;
} else {
textBox1.Text = "nothing founded";
break;
}
}
}

First of all, you can get rid of StreamReader and loop and query the file with a help of Linq
using System.Linq;
using System.IO;
...
textBox1.Text = File
.ReadLines("test.txt")
.Any(line => words.Any(word => line.Contains(word)))
? "found"
: "nothing found";
If you insist on loop, you should drop else:
// using - do not forget to Dispose IDisposable
using StreamReader file = new StreamReader("test.txt");
// shorter version is
// string txt_text = File.ReadAllText("test.txt");
string txt_text = file.ReadToEnd();
bool found = false;
foreach (string word in words)
if (txt_text.Contains(word)) {
// If any word has been found, stop further searching
found = true;
break;
} // no else here: keep on looping for other words
textBox1.Text = found
? "found"
: "nothing found";

I'd save the text in a variable and then loop over your words to check if it exists in the file. Something like this:
string[] words = { "var", "bob", "for", "example"};
var text = file.ReadToEnd();
List<string> foundWords = new List<string>();
foreach (var word in words)
{
if (text.Contains(word))
foundWords.Add(word);
}
Then, the list foundWords contains all matching words.
(PS: Don't forget to put your StreamReader in a using statement so it gets disposed correctly)

Why is this code not replacing data in a text file?

I'm working on a small app which should read a file (ANSI 835) and replace data at certain positions with generic data. Basically I'm trying to scrub a person's first and last name from the file.
The line I'm searching for that contains the name looks like this:
NM1*QC*1*Doe*John*R***MI*010088307 01~
My code looks like this:
string[] input_file = (string[])(e.Data.GetData(DataFormats.FileDrop));
string output_file = #"c:\scrubbed.txt";
foreach (string file in input_file)
{
string[] lines = File.ReadAllLines(file);
foreach (string line in lines)
{
if (line.StartsWith("NM1*QC"))
{
line.Split('*')[1] = "Lastname";
line.Split('*')[2] = "Firstname";
}
}
File.WriteAllLines(output_file, lines);
}
The File.WriteAllLines works, but the data isn't being changed. I'm trying to get any line that starts with NM1*QC to look like this:
NM1*QC*1*Lastname*Firstname*R***MI*010088307 01~
There are many lines in the file that start with NM1*QC. What's the proper way to 'find and replace' and then create a new file in this situation?
As always, thanks for your time!

The calls to String.Split return variables that you neither capture, nor use, they do not change the underlying string. So your code equates to this:
if (line.StartsWith("NM1*QC"))
{
string[] split1 = line.Split('*')[1] = "Lastname";
string[] split2 = line.Split('*')[2] = "Firstname";
}
You would need to take the results of split1 and split2 and use those to recreate your string. Here is how I would re-write your code:
string[] input_file = (string[])(e.Data.GetData(DataFormats.FileDrop));
string output_file = #"c:\scrubbed.txt";
foreach (string file in input_file)
{
string[] lines = File.ReadAllLines(file);
for (int i=0; i < lines.length; i++)
{
string line = lines[i];
if (line.StartsWith("NM1*QC"))
{
string[] values = line.Split('*');
values[1] = "Lastname";
values[2] = "Firstname";
lines[i] = String.Join("*", values);
}
}
File.WriteAllLines(output_file, lines);
}
Notice I am recombining the individual values using the String.Join method, and inserting the new string back into the array of lines. That will then get written out as you expect.

Here you are creating a temporary array:
line.Split('*')
And you are changing its contents:
line.Split('*')[1] = "Lastname";
After the line has been executed the reference to this temporary array is lost and along with it go your changes.
In order to persist the changes you need to write directly to lines:
for (var i = 0; i < lines.Length; ++i)
{
var line = lines[i];
if (!line.StartsWith("NM1*QC"))
{
continue;
}
var parts = line.Split('*');
parts[3] = "Lastname";
parts[4] = "Firstname";
lines[i] = string.Join("*", parts);
}

How to create C# Regex to Split the string with some words in quotations?

I have a string as in the following format:
"one,",2,3,"four " ","five"
I need output in the following format:
one,
2
3
four "
five
Can anyone help me to create Regex for the above?

You can do this without Regex. It's not clear to me, what you're trying to do though. I've adjusted the code for the updated question:
var text = "\"one\",2,3,\"four \"\",\"five\"";
var collection = text
.Split(',')
.Select(s =>
{
if (s.StartsWith("\"") && s.EndsWith("\""))
{
s = s.Substring(1, s.Length - 2);
}
return s;
})
.ToList();
foreach (var item in collection)
{
Console.WriteLine(item);
}
I've added another sample for you, which uses a CSV reader. I've installed the "CsvHelper" package from NuGet:
const string text = "\"one,\",2,3,\"four \"\"\",\"five\"";
using (var textReader = new StringReader(text))
using (var reader = new CsvReader(textReader))
{
reader.Configuration.Delimiter = ',';
reader.Configuration.AllowComments = false;
reader.Configuration.HasHeaderRecord = false;
if (reader.Read())
{
foreach (var item in reader.CurrentRecord)
{
Console.WriteLine(item);
}
}
}

string newString = Regex.Replace(oldString, #'[^",]', ' ');
I hope the regular expression is good, but I just want you you to see the idea.
EDIT:
string newString = Regex.Replace(oldString, #'[^",]', '\n');

C# Find if a word is in a document

I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?

What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()

Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}

Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}

What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.

You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}

You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

extracting a substring within a multiline string - c#

You could change your regex this way: var result = Regex.Match(tmp, #"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value; If you want to get all the matches, not only the first one, you could use Regex.Matches

Related

C# search line and then overwrite it

Search multiple words in a text file

Why is this code not replacing data in a text file?

How to create C# Regex to Split the string with some words in quotations?

C# Find if a word is in a document

Categories

Resources