C# Find if a word is in a document

C# Find if a word is in a document - c#

I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?

What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()

Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}

Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}

What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.

You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}

You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}

Related

extracting a substring within a multiline string

I have a text file containing the following lines:
<TestInfo."Content">
{
<Label> "Content"
<Visible> "true"
"This is the text I want to get"
}
<TestInfo."Content2">
{
<Label> "Content2"
<Visible> "true"
"I don't want e.g. this"
}
I want to extract This is the text I want to get.
I tried e.g. the following:
string tmp = File.ReadAllText(textfile);
string result = Regex.Match(tmp, #"<Label> ""Content"" \n\s+ <Visible> ""true"" \n\s+ ""(.+?)""", RegexOptions.Singleline).Groups[1].Value;
However, in this case I get only the first word.
So, my output is: This
And I have no idea why...
I would appreciate any help. Thanks!

If you want the entire line after the line that starts with <Visible>, you'd better read the file line by line instead of using File.ReadAllText and a regular expression:
string result;
using (StreamReader sr = new StreamReader(textfile))
{
while (sr.Peek() >= 0)
{
string line = sr.ReadLine();
if (line.StartsWith("<Visible>"))
{
result = sr.ReadLine();
break;
}
}
}

Try this:
var tmp = File.ReadAllText("TextFile1.txt");
var result = Regex.Match(tmp, "This is the text I want to get", RegexOptions.Multiline);
if (result.Groups.Count> 0)
for (int i = 0; i < result.Groups.Count; i++)
Console.WriteLine(result.Groups[i].Value);
else
Console.WriteLine("string not found.");
Regards,
//jafc

You could change your regex this way:
var result = Regex.Match(tmp, #"<Visible> ""true""\s*""([\S ]+)""", RegexOptions.Singleline).Groups[1].Value;
If you want to get all the matches, not only the first one, you could use Regex.Matches

Thanks a lot for your input! This helped me to find a final solution:
First, I extracted only a small part containing the string I want to extract to avoid ambiguities:
string[] tmp = File.ReadAllLines(textfile);
List<string> Content = new List<string>();
bool dumpA = false;
Regex regBEGIN = new Regex(#"<TestInfo\.""Content"">");
Regex regEND = new Regex(#"<TestInfo\.""Content2"">");
foreach (string line in tmp)
{
if (dumpA)
Content.Add(line.Trim());
if (regBEGIN.IsMatch(line))
dumpA = true;
if (regEND.IsMatch(line)) break;
}
Then I can extract the (now only once existing) line starting with '"':
string result = "";
foreach (string line in Content)
{
if (line.StartsWith("\""))
{
result = line;
result = result.Replace("\"", "");
result = result.Trim();
}
}

Delete all but {x} C# string

I'm trying to cycle through a .txt to build a test function for another application I'm building.
I've got a list of UK based lat/long values that are formatted like this:
Latitude: 57°39′55″N 57.665198
Longitude: 6°57′27″W -6.95739395
Distance: 184.8338 mi Bearing: 329.815°
with the intended result of this small application being just the lat/long values:
57.665198
-6.95739395
So far I've got a StreamReader working with a myString.StartsWith("Latitude") {} but I'm stuck.
How do I detect a splitstring of 2 spaces " " inside of a string and delete everything before that? My code so far is this:
static void Main(string[] args)
{
string text = "";
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
text = streamReader.ReadToEnd();
if (text.Trim().StartsWith("Latitude: "))
{
text.Split()
} else if (text.StartsWith("Distance: "))
{
} else if (text.StartsWith(""))
{
}
streamReader.ReadLine();
}
Console.ReadKey();
}
Thanks in advance

You can try using regular expressions
var result = File
.ReadLines(#"C:\MyFile.txt")
.SelectMany(line => Regex
.Matches(line, #"(?<=\s)-?[0-9]+(\.[0-9]+)*$")
.OfType<Match>()
.Select(match => match.Value));
Test
// 57.665198
// -6.95739395
Console.Write(String.Join(Environment.NewLine, result));

Use string.IndexOf(" ") to find the position of the two spaces in the string. Then you can use string.Substring(position) to get the string after that point.
In your code:
if (text.Trim().StartsWith("Latitude: "))
{
var positionOfTwoSpaces = text.IndexOf(" ");
var latString = text.Substring(positionOfTwoSpaces);
var latValue = float.Parse(latString);
}

You can try the regular expression solution. (You might need to fix up the space counts in the regex definitions)
static void Main(string[] args)
{
string text = "";
Regex lat = new Regex("Latitude: .+? (.+)");
Regex lon = new Regex("Longitude .+? (.+)");
using (var streamReader = new StreamReader(#"c:\mb\latlong.txt", Encoding.UTF8))
{
string line;
while ((line = streamReader.ReadLine() != null)
{
if (lat.IsMatch(line))
lat.Match(line).Groups[1].Value // latitude
else if(lon.IsMatch(line))
lon.Match(line).Groups[1].Value // longitude
}
}
Console.ReadKey();
}

A simple solution would be
string[] fileLines = IO.File.ReadAllLines("input file path");
List<string> resultLines = new List<string>();
foreach (string line in fileLines) {
string[] parts = line.Split(" "); //Double space
if (parts.Count() > 1) {
string lastPart = parts.LastOrDefault();
if (!string.IsNullOrEmpty(lastPart)) {
resultLines.Add(lastPart);
}
}
}
IO.File.WriteAllLines("output file path", resultLines.ToArray());

As I already suggested in my comment. You can look for the last occurrence of the space and substring from there.
using System;
using System.IO;
using System.Text;
public class Test
{
public static void Main()
{
String line = String.Empty;
while(!String.IsNullOrEmpty((line = streamReader.ReadLine())))
{
if(line.StartsWith("Latitude:"))
{
line = line.Substring(line.LastIndexOf(' ') + 1);
Console.WriteLine(line);
}
}
Console.ReadKey();
}
}
Working example.
I didn't provide all the code because this is just copy paste for the longitude case. I think you can do this by your own. :)

Scan multiple word from a text file

I have a list of words. I want the program to scan for multiple words from a text file.
This is what i already have:
int counter = 0;
string line;
StringBuilder sb = new StringBuilder();
string[] words = { "var", "bob", "for", "example"};
try
{
using (StreamReader file = new StreamReader("test.txt"))
{
while ((line = file.ReadLine()) != null)
{
if (line.Contains(Convert.ToChar(words)))
{
sb.AppendLine(line.ToString());
}
}
}
listResults.Text += sb.ToString();
}
catch (Exception ex)
{
listResults.ForeColor = Color.Red;
listResults.Text = "---ERROR---";
}
So i want to scan the file for a word, and if it's not there, scan for the next word...

String.Contains() only takes one argument: a string. What your call to Contains(Convert.ToChar(words)) does, is probably not what you expect.
As explained in Using C# to check if string contains a string in string array, you might want to do something like this:
using (StreamReader file = new StreamReader("test.txt"))
{
while ((line = file.ReadLine()) != null)
{
foreach (string word in words)
{
if (line.Contains(word))
{
sb.AppendLine(line);
}
}
}
}
Or if you want to follow your exact problem statement ("scan the file for a word, and if it's not there, scan for the next word"), you might want to take a look at Return StreamReader to Beginning:
using (StreamReader file = new StreamReader("test.txt"))
{
foreach (string word in words)
{
while ((line = file.ReadLine()) != null)
{
if (line.Contains(word))
{
sb.AppendLine(line);
}
}
if (sb.Length == 0)
{
// Rewind file to prepare for next word
file.Position = 0;
file.DiscardBufferedData();
}
else
{
return sb.ToString();
}
}
}
But this will think "bob" is part of "bobcat". If you don't agree, see String compare C# - whole word match, and replace:
line.Contains(word)
with
string wordWithBoundaries = "\\b" + word + "\\b";
Regex.IsMatch(line, wordWithBoundaries);

StringBuilder sb = new StringBuilder();
string[] words = { "var", "bob", "for", "example" };
string[] file_lines = File.ReadAllLines("filepath");
for (int i = 0; i < file_lines.Length; i++)
{
string[] split_words = file_lines[i].Split(' ');
foreach (string str in split_words)
{
foreach (string word in words)
{
if (str == word)
{
sb.AppendLine(file_lines[i]);
}
}
}
}

This works a treat:
var query =
from line in System.IO.File.ReadLines("test.txt")
where words.Any(word => line.Contains(word))
select line;
To get these out as a single string, just do this:
var results = String.Join(Environment.NewLine, query);
Couldn't be much simpler.
If you want to match only whole words it becomes only a little more complicated. You can do this:
Regex[] regexs =
words
.Select(word => new Regex(String.Format(#"\b{0}\b", Regex.Escape(word))))
.ToArray();
var query =
from line in System.IO.File.ReadLines(fileName)
where regexs.Any(regex => regex.IsMatch(line))
select line;

c# Remove rows from csv

I have two csv files. In the first file i have a list of users, and in the second file i have a list of duplicate users. Im trying to remove the rows in the first file that are equal to the second file.
Heres the code i have so far:
StreamWriter sw = new StreamWriter(path3);
StreamReader sr = new StreamReader(path2);
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
string user = sr.ReadLine();
if (line != user)
{
sw.WriteLine(line);
}
File 1 example:
Modify,ABAMA3C,Allpay - Free State - HO,09072701
Modify,ABCG327,Processing Centre,09085980
File 2 Example:
Modify,ABAA323,Group HR Credit Risk & Finance
Modify,ABAB959,Channel Sales & Service,09071036
Any suggestions?
Thanks.

All you'd have to do is change the following file paths in the code below and you will get a file back (file one) without the duplicate users from file 2. This code was written with the idea in mind that you want something that is easy to understand. Sure there are other more elegant solutions, but I wanted to make it as basic as possible for you:
(Paste this in the main method of your program)
string line;
StreamReader sr = new StreamReader(#"C:\Users\J\Desktop\texts\First.txt");
StreamReader sr2 = new StreamReader(#"C:\Users\J\Desktop\texts\Second.txt");
List<String> fileOne = new List<string>();
List<String> fileTwo = new List<string>();
while (sr.Peek() >= 0)
{
line = sr.ReadLine();
if(line != "")
{
fileOne.Add(line);
}
}
sr.Close();
while (sr2.Peek() >= 0)
{
line = sr2.ReadLine();
if (line != "")
{
fileTwo.Add(line);
}
}
sr2.Close();
var t = fileOne.Except(fileTwo);
StreamWriter sw = new StreamWriter(#"C:\Users\justin\Desktop\texts\First.txt");
foreach(var z in t)
{
sw.WriteLine(z);
}
sw.Flush();

If this is not homework, but a production thing, and you can install assemblies, you'll save 3 hours of your life if you swallow your pride and use a piece of the VB library:
There are many exceptions (CR/LF between commas=legal in quotes; different types of quotes; etc.) This will handle anything excel will export/import.
Sample code to load a 'Person' class pulled from a program I used it in:
Using Reader As New Microsoft.VisualBasic.FileIO.TextFieldParser(CSVPath)
Reader.TextFieldType = Microsoft.VisualBasic.FileIO.FieldType.Delimited
Reader.Delimiters = New String() {","}
Reader.TrimWhiteSpace = True
Reader.HasFieldsEnclosedInQuotes = True
While Not Reader.EndOfData
Try
Dim st2 As New List(Of String)
st2.addrange(Reader.ReadFields())
If iCount > 0 Then ' ignore first row = field names
Dim p As New Person
p.CSVLine = st2
p.FirstName = st2(1).Trim
If st2.Count > 2 Then
p.MiddleName = st2(2).Trim
Else
p.MiddleName = ""
End If
p.LastNameSuffix = st2(0).Trim
If st2.Count >= 5 Then
p.TestCase = st2(5).Trim
End If
If st2(3) > "" Then
p.AccountNumbersFromCase.Add(st2(3))
End If
While p.CSVLine.Count < 15
p.CSVLine.Add("")
End While
cases.Add(p)
End If
Catch ex As Microsoft.VisualBasic.FileIO.MalformedLineException
MsgBox("Line " & ex.Message & " is not valid and will be skipped.")
End Try
iCount += 1
End While
End Using

this to close the streams properly:
using(var sw = new StreamWriter(path3))
using(var sr = new StreamReader(path2))
{
string[] lines = File.ReadAllLines(path);
foreach (string line in lines)
{
string user = sr.ReadLine();
if (line != user)
{
sw.WriteLine(line);
}
}
}
for help on the real logic of removal or compare, answer the comment of El Ronnoco above...

You need to close the streams or utilize using clause
sw.Close();
using(StreamWriter sw = new StreamWriter(#"c:\test3.txt"))

You can use LINQ...
class Program
{
static void Main(string[] args)
{
var fullList = "TextFile1.txt".ReadAsLines();
var removeThese = "TextFile2.txt".ReadAsLines();
//Change this line if you need to change the filter results.
//Note: this assume you are wanting to remove results from the first
// list when the entire record matches. If you want to match on
// only part of the list you will need to split/parse the records
// and then filter your results.
var cleanedList = fullList.Except(removeThese);
cleanedList.WriteAsLinesTo("result.txt");
}
}
public static class Tools
{
public static IEnumerable<string> ReadAsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
yield return reader.ReadLine();
}
public static void WriteAsLinesTo(this IEnumerable<string> lines, string filename)
{
using (var writer = new StreamWriter(filename) { AutoFlush = true, })
foreach (var line in lines)
writer.WriteLine(line);
}
}

using(var sw = new StreamWriter(path3))
using(var sr = new StreamReader(path))
{
string []arrRemove = File.ReadAllLines(path2);
HashSet<string> listRemove = new HashSet<string>(arrRemove.Count);
foreach(string s in arrRemove)
{
string []sa = s.Split(',');
if( sa.Count < 2 ) continue;
listRemove.Add(sa[1].toUpperCase());
}
string line = sr.ReadLine();
while( line != null )
{
string []sa = line.Split(',');
if( sa.Count < 2 )
sw.WriteLine(line);
else if( !listRemove.contains(sa[1].toUpperCase()) )
sw.WriteLine(line);
line = sr.ReadLine();
}
}

Delete specific line from a text file?

I need to delete an exact line from a text file but I cannot for the life of me workout how to go about doing this.
Any suggestions or examples would be greatly appreciated?
Related Questions
Efficient way to delete a line from a text file (C#)

If the line you want to delete is based on the content of the line:
string line = null;
string line_to_delete = "the line i want to delete";
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
if (String.Compare(line, line_to_delete) == 0)
continue;
writer.WriteLine(line);
}
}
}
Or if it is based on line number:
string line = null;
int line_number = 0;
int line_to_delete = 12;
using (StreamReader reader = new StreamReader("C:\\input")) {
using (StreamWriter writer = new StreamWriter("C:\\output")) {
while ((line = reader.ReadLine()) != null) {
line_number++;
if (line_number == line_to_delete)
continue;
writer.WriteLine(line);
}
}
}

The best way to do this is to open the file in text mode, read each line with ReadLine(), and then write it to a new file with WriteLine(), skipping the one line you want to delete.
There is no generic delete-a-line-from-file function, as far as I know.

One way to do it if the file is not very big is to load all the lines into an array:
string[] lines = File.ReadAllLines("filename.txt");
string[] newLines = RemoveUnnecessaryLine(lines);
File.WriteAllLines("filename.txt", newLines);

Hope this simple and short code will help.
List linesList = File.ReadAllLines("myFile.txt").ToList();
linesList.RemoveAt(0);
File.WriteAllLines("myFile.txt"), linesList.ToArray());
OR use this
public void DeleteLinesFromFile(string strLineToDelete)
{
string strFilePath = "Provide the path of the text file";
string strSearchText = strLineToDelete;
string strOldText;
string n = "";
StreamReader sr = File.OpenText(strFilePath);
while ((strOldText = sr.ReadLine()) != null)
{
if (!strOldText.Contains(strSearchText))
{
n += strOldText + Environment.NewLine;
}
}
sr.Close();
File.WriteAllText(strFilePath, n);
}

You can actually use C# generics for this to make it real easy:
var file = new List<string>(System.IO.File.ReadAllLines("C:\\path"));
file.RemoveAt(12);
File.WriteAllLines("C:\\path", file.ToArray());

This can be done in three steps:
// 1. Read the content of the file
string[] readText = File.ReadAllLines(path);
// 2. Empty the file
File.WriteAllText(path, String.Empty);
// 3. Fill up again, but without the deleted line
using (StreamWriter writer = new StreamWriter(path))
{
foreach (string s in readText)
{
if (!s.Equals(lineToBeRemoved))
{
writer.WriteLine(s);
}
}
}

Read and remember each line
Identify the one you want to get rid
of
Forget that one
Write the rest back over the top of
the file

I cared about the file's original end line characters ("\n" or "\r\n") and wanted to maintain them in the output file (not overwrite them with what ever the current environment's char(s) are like the other answers appear to do). So I wrote my own method to read a line without removing the end line chars then used it in my DeleteLines method (I wanted the option to delete multiple lines, hence the use of a collection of line numbers to delete).
DeleteLines was implemented as a FileInfo extension and ReadLineKeepNewLineChars a StreamReader extension (but obviously you don't have to keep it that way).
public static class FileInfoExtensions
{
public static FileInfo DeleteLines(this FileInfo source, ICollection<int> lineNumbers, string targetFilePath)
{
var lineCount = 1;
using (var streamReader = new StreamReader(source.FullName))
{
using (var streamWriter = new StreamWriter(targetFilePath))
{
string line;
while ((line = streamReader.ReadLineKeepNewLineChars()) != null)
{
if (!lineNumbers.Contains(lineCount))
{
streamWriter.Write(line);
}
lineCount++;
}
}
}
return new FileInfo(targetFilePath);
}
}
public static class StreamReaderExtensions
{
private const char EndOfFile = '\uffff';
/// <summary>
/// Reads a line, similar to ReadLine method, but keeps any
/// new line characters (e.g. "\r\n" or "\n").
/// </summary>
public static string ReadLineKeepNewLineChars(this StreamReader source)
{
if (source == null)
throw new ArgumentNullException(nameof(source));
char ch = (char)source.Read();
if (ch == EndOfFile)
return null;
var sb = new StringBuilder();
while (ch != EndOfFile)
{
sb.Append(ch);
if (ch == '\n')
break;
ch = (char)source.Read();
}
return sb.ToString();
}
}

Are you on a Unix operating system?
You can do this with the "sed" stream editor. Read the man page for "sed"

What?
Use file open, seek position then stream erase line using null.
Gotch it? Simple,stream,no array that eat memory,fast.
This work on vb.. Example search line culture=id where culture are namevalue and id are value and we want to change it to culture=en
Fileopen(1, "text.ini")
dim line as string
dim currentpos as long
while true
line = lineinput(1)
dim namevalue() as string = split(line, "=")
if namevalue(0) = "line name value that i want to edit" then
currentpos = seek(1)
fileclose()
dim fs as filestream("test.ini", filemode.open)
dim sw as streamwriter(fs)
fs.seek(currentpos, seekorigin.begin)
sw.write(null)
sw.write(namevalue + "=" + newvalue)
sw.close()
fs.close()
exit while
end if
msgbox("org ternate jua bisa, no line found")
end while
that's all..use #d

We Keep Coding

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

C# Find if a word is in a document - c#

I am looking for a way to check if the "foo" word is present in a text file using C#. I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines. Any comments ?

What's wrong with a simple search? If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()

What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries. Regex r = new Regex("\bfoo\b"); Also ensure you are taking into consideration case insensitivity if you need to.

Related

extracting a substring within a multiline string

Delete all but {x} C# string

Scan multiple word from a text file

c# Remove rows from csv

Delete specific line from a text file?

Categories

Resources