I have a list of words. I want the program to scan for multiple words from a text file.
This is what i already have:
int counter = 0;
string line;
StringBuilder sb = new StringBuilder();
string[] words = { "var", "bob", "for", "example"};
try
{
using (StreamReader file = new StreamReader("test.txt"))
{
while ((line = file.ReadLine()) != null)
{
if (line.Contains(Convert.ToChar(words)))
{
sb.AppendLine(line.ToString());
}
}
}
listResults.Text += sb.ToString();
}
catch (Exception ex)
{
listResults.ForeColor = Color.Red;
listResults.Text = "---ERROR---";
}
So i want to scan the file for a word, and if it's not there, scan for the next word...
String.Contains() only takes one argument: a string. What your call to Contains(Convert.ToChar(words)) does, is probably not what you expect.
As explained in Using C# to check if string contains a string in string array, you might want to do something like this:
using (StreamReader file = new StreamReader("test.txt"))
{
while ((line = file.ReadLine()) != null)
{
foreach (string word in words)
{
if (line.Contains(word))
{
sb.AppendLine(line);
}
}
}
}
Or if you want to follow your exact problem statement ("scan the file for a word, and if it's not there, scan for the next word"), you might want to take a look at Return StreamReader to Beginning:
using (StreamReader file = new StreamReader("test.txt"))
{
foreach (string word in words)
{
while ((line = file.ReadLine()) != null)
{
if (line.Contains(word))
{
sb.AppendLine(line);
}
}
if (sb.Length == 0)
{
// Rewind file to prepare for next word
file.Position = 0;
file.DiscardBufferedData();
}
else
{
return sb.ToString();
}
}
}
But this will think "bob" is part of "bobcat". If you don't agree, see String compare C# - whole word match, and replace:
line.Contains(word)
with
string wordWithBoundaries = "\\b" + word + "\\b";
Regex.IsMatch(line, wordWithBoundaries);
StringBuilder sb = new StringBuilder();
string[] words = { "var", "bob", "for", "example" };
string[] file_lines = File.ReadAllLines("filepath");
for (int i = 0; i < file_lines.Length; i++)
{
string[] split_words = file_lines[i].Split(' ');
foreach (string str in split_words)
{
foreach (string word in words)
{
if (str == word)
{
sb.AppendLine(file_lines[i]);
}
}
}
}
This works a treat:
var query =
from line in System.IO.File.ReadLines("test.txt")
where words.Any(word => line.Contains(word))
select line;
To get these out as a single string, just do this:
var results = String.Join(Environment.NewLine, query);
Couldn't be much simpler.
If you want to match only whole words it becomes only a little more complicated. You can do this:
Regex[] regexs =
words
.Select(word => new Regex(String.Format(#"\b{0}\b", Regex.Escape(word))))
.ToArray();
var query =
from line in System.IO.File.ReadLines(fileName)
where regexs.Any(regex => regex.IsMatch(line))
select line;
Related
Im converting csv files to xml files via c#. I'm saving the csv file in a List of string, but it doesn't take symbols symbols such as ä, á, ê.
public Lesson CsvToLesson(List<string> csv)
{
string lesName = csv[0][csv[0].Length - 3].ToString();
List<Word> words = new List<Word>();
for(int i = 3; i < csv.Count; i++)
{
string lang1 = "";
string lang2 = "";
bool firstWord = true;
foreach (char c in csv[i])
{
if (firstWord)
{
if(c != ';')
{
lang1 += c;
} else
{
firstWord = false;
}
} else {
if (c != ';')
{
lang2 += c;
}
else
{
break;
}
}
}
words.Add(new Word(lang1, lang2, 1, i));
}
return new Lesson(lesName, words);
}
to return them as an Object called Lesson.
<Word kasten="1" id="24">
<lang1>eine Sekret�rin</lang1>
<lang2>une secr�taire</lang2>
</Word>
The reading method:
public void saveCsv(string path)
{
string line;
List<string> csv = new List<string>();
StreamReader file = new StreamReader(path);
while((line = file.ReadLine()) != null)
{
csv.Add(line);
}
file.Close();
AddLesson(controller.CsvToLesson(csv));
}
How can I fix this?
Thanks to Cid.
The Problem was that the csv file wasn't saved as utf-8 csv file. There are multiple options in Excel.
Take a look at that post, if you have the same problem:
How to check encoding of a CSV file
I want to compare two csv files and print the differences in a file. I currently use the code below to remove a row. Can I change this code so that it compares two csv files or is there a better way in c# to compare csv files?
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains(csvseperator))
{
string[] split = line.Split(Convert.ToChar(scheidingsteken));
if (split[selectedRow] == value)
{
}
else
{
line = string.Join(csvseperator, split);
lines.Add(line);
}
}
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (string line in lines)
writer.WriteLine(line);
}
}
Here is another way to find differences between CSV files, using Cinchoo ETL - an open source library
For the below sample CSV files
sample1.csv
id,name
1,Tom
2,Mark
3,Angie
sample2.csv
id,name
1,Tom
2,Mark
4,Lu
METHOD 1:
Using Cinchoo ETL, below code shows how to find differences between rows by all columns
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
}
sampleDiff.csv
id,name
3,Angie
4,Lu
Sample fiddle: https://dotnetfiddle.net/nwLeJ2
METHOD 2:
If you want to do the differences by id column,
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}
Sample fiddle: https://dotnetfiddle.net/t6mmJW
If you only want to compare one column you can use this code:
List<string> lines = new List<string>();
List<string> lines2 = new List<string>();
try
{
StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad));
StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2));
string line;
string line2;
//With this you can change the cells you want to compair
int comp1 = 1;
int comp2 = 1;
while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null)
{
string[] split = line.Split(Convert.ToChar(seperator));
string[] split2 = line2.Split(Convert.ToChar(seperator));
if (line.Contains(seperator) && line2.Contains(seperator))
{
if (split[comp1] != split2[comp2])
{
//It is not the same
}
else
{
//It is the same
}
}
}
reader.Dispose();
read.Dispose();
}
catch
{
}
I want to remove a column with a specific value. The code below is what I used to remove a row. Can I reverse this to remove a column?
int row = comboBox1.SelectedIndex;
string verw = Convert.ToString(txtChange.Text);
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(filepath)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains(","))
{
string[] split = line.Split(',');
if (split[row] == kill)
{
//achter split vul je de rij in
}
else
{
line = string.Join(",", split);
lines.Add(line);
}
}
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (string line in lines)
writer.WriteLine(line);
}
Assuming we ignore the subtleties of writing CSV, this should work:
public void RemoveColumnByIndex(string path, int index)
{
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(path))
{
var line = reader.ReadLine();
List<string> values = new List<string>();
while(line != null)
{
values.Clear();
var cols = line.Split(',');
for (int i = 0; i < cols.Length; i++)
{
if (i != index)
values.Add(cols[i]);
}
var newLine = string.Join(",", values);
lines.Add(newLine);
line = reader.ReadLine();
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (var line in lines)
{
writer.WriteLine(line);
}
}
}
The code essentially loads each line, breaks it down into columns, loops through the columns ignoring the column in question, then puts the values back together into a line.
This is an over-simplified method, of course. I am sure there are more performant ways.
To remove the column by name, here is a little modification to your code example.
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path)))
{
string target = "";//the name of the column to skip
int? targetPosition = null; //this will be the position of the column to remove if it is available in the csv file
string line;
List<string> collected = new List<string>();
while ((line = reader.ReadLine()) != null)
{
string[] split = line.Split(',');
collected.Clear();
//to get the position of the column to skip
for (int i = 0; i < split.Length; i++)
{
if (string.Equals(split[i], target, StringComparison.OrdinalIgnoreCase))
{
targetPosition = i;
break; //we've got what we need. exit loop
}
}
//iterate and skip the column position if exist
for (int i = 0; i < split.Length; i++)
{
if (targetPosition != null && i == targetPosition.Value) continue;
collected.Add(split[i]);
}
lines.Add(string.Join(",", collected));
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (string line in lines)
writer.WriteLine(line);
}
How to start reading file from 2nd line skipping 1st line. This seems to work but is it best way to do so?
using (StreamReader sr = new StreamReader(varFile, Encoding.GetEncoding(1250))) {
string[] stringSeparator = new string[] { "\",\"" };
int i = 0;
while (!sr.EndOfStream) {
string line = sr.ReadLine(); //.Trim('"');
if (i > 0) {
string[] values = line.Split(stringSeparator, StringSplitOptions.None);
for (int index = 0; index < values.Length; index++) {
MessageBox.Show(values[index].Trim('"'));
}
}
i++;
}
}
If the file is not very large and can fit in memory:
foreach (var line in File.ReadAllLines(varFile, Encoding.GetEncoding(1250)).Skip(1))
{
string[] values = line.Split(',');
...
}
If not write an iterator:
public IEnumerable<string> ReadAllLines(string filename, Encoding encoding)
{
using (var reader = new StreamReader(filename, encoding))
{
string line;
while ((line = reader.ReadLine()) != null)
{
yield return line;
}
}
}
and then consume it:
foreach (var line in ReadAllLines(varFile, Encoding.GetEncoding(1250)).Skip(1))
{
string[] values = line.Split(',');
...
}
Could you not just read the first line outside of the loop without assigning it to a variable?
using (StreamReader sr = new StreamReader(varFile, Encoding.GetEncoding(1250))) {
string[] stringSeparator = new string[] { "\",\"" };
if (!sr.EndOfStream)
sr.ReadLine();
while (!sr.EndOfStream) {
string line = sr.ReadLine(); //.Trim('"');
string[] values = line.Split(stringSeparator, StringSplitOptions.None);
for (int index = 0; index < values.Length; index++) {
MessageBox.Show(values[index].Trim('"'));
}
}
}
I'm sorry but I see no problem with the way you are doing it though. I couldn't add comment.
So just for the sake of answering, you probably could have try to call ReadLine() once before the loop. Might not be the best way as I don't know whats the behavior of running ReadLine() if its already end of stream, but it nothing is gonna happen then thats gonna save you some checks.
Updated:
To give a more complete answer, calling ReadLine() when the stream is at its end will return a null.
Reference: http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
Remember to check the return for null value.
I am looking for a way to check if the "foo" word is present in a text file using C#.
I may use a regular expression but I'm not sure that is going to work if the word is splitted in two lines. I got the same issue with a streamreader that enumerates over the lines.
Any comments ?
What's wrong with a simple search?
If the file is not large, and memory is not a problem, simply read the entire file into a string (ReadToEnd() method), and use string Contains()
Here ya go. So we look at the string as we read the file and we keep track of the first word last word combo and check to see if matches your pattern.
string pattern = "foo";
string input = null;
string lastword = string.Empty;
string firstword = string.Empty;
bool result = false;
FileStream FS = new FileStream("File name and path", FileMode.Open, FileAccess.Read, FileShare.Read);
StreamReader SR = new StreamReader(FS);
while ((input = SR.ReadLine()) != null)
{
firstword = input.Substring(0, input.IndexOf(" "));
if(lastword.Trim() != string.Empty) { firstword = lastword.Trim() + firstword.Trim(); }
Regex RegPattern = new Regex(pattern);
Match Match1 = RegPattern.Match(input);
string value1 = Match1.ToString();
if (pattern.Trim() == firstword.Trim() || value1 != string.Empty) { result = true; }
lastword = input.Trim().Substring(input.Trim().LastIndexOf(" "));
}
Here is a quick quick example using LINQ
static void Main(string[] args)
{
{ //LINQ version
bool hasFoo = "file.txt".AsLines()
.Any(l => l.Contains("foo"));
}
{ // No LINQ or Extension Methods needed
bool hasFoo = false;
foreach (var line in Tools.AsLines("file.txt"))
if (line.Contains("foo"))
{
hasFoo = true;
break;
}
}
}
}
public static class Tools
{
public static IEnumerable<string> AsLines(this string filename)
{
using (var reader = new StreamReader(filename))
while (!reader.EndOfStream)
{
var line = reader.ReadLine();
while (line.EndsWith("-") && !reader.EndOfStream)
line = line.Substring(0, line.Length - 1)
+ reader.ReadLine();
yield return line;
}
}
}
What about if the line contains football? Or fool? If you are going to go down the regular expression route you need to look for word boundaries.
Regex r = new Regex("\bfoo\b");
Also ensure you are taking into consideration case insensitivity if you need to.
You don't need regular expressions in a case this simple. Simply loop over the lines and check if it contains foo.
using (StreamReader sr = File.Open("filename", FileMode.Open, FileAccess.Read))
{
string line = null;
while (!sr.EndOfStream) {
line = sr.ReadLine();
if (line.Contains("foo"))
{
// foo was found in the file
}
}
}
You could construct a regex which allows for newlines to be placed between every character.
private static bool IsSubstring(string input, string substring)
{
string[] letters = new string[substring.Length];
for (int i = 0; i < substring.Length; i += 1)
{
letters[i] = substring[i].ToString();
}
string regex = #"\b" + string.Join(#"(\r?\n?)", letters) + #"\b";
return Regex.IsMatch(input, regex, RegexOptions.ExplicitCapture);
}