Remove Stop Words From text File - c#

i want to remove stop words from my text file and i write the following code for this purpose
TextWriter tw = new StreamWriter("D:\\output.txt");
private void button1_Click(object sender, EventArgs e)
{
StreamReader reader = new StreamReader("D:\\input1.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
tw.Write("\r\n");
}
but it doesn't show the result in the output file and the output file remain empty.

A regular expression might be perfect for the job:
Regex replacer = new Regex("\b(?:is|are|am|could|will)\b");
using (TextWriter writer = new StreamWriter("C:\\output.txt"))
{
using (StreamReader reader = new StreamReader("C:\\input.txt"))
{
while (!reader.EndOfStream)
{
string line = reader.ReadLine();
replacer.Replace(line, "");
writer.WriteLine(line);
}
}
writer.Flush();
}
This method will only replace the words with blanks and do nothing with the stopwords if they are part of another word.
Good luck with your quest.

The following works as expected for me. However, it's not a good approach because it will remove the stop words even when they are part of a larger word. Also, it doesn't clean up extra spaces between removed words.
string[] stopWord = new string[] { "is", "are", "am","could","will" };
TextWriter writer = new StreamWriter("C:\\output.txt");
StreamReader reader = new StreamReader("C:\\input.txt");
string line;
while ((line = reader.ReadLine()) != null)
{
foreach (string word in stopWord)
{
line = line.Replace(word, "");
}
writer.WriteLine(line);
}
reader.Close();
writer.Close();
Also, I recommend using using statements for when you create your streams in order to ensure the files are closed in a timely manner.

You should wrap your IO objects in using statements so that they are disposed properly.
using (TextWriter tw = new TextWrite("D:\\output.txt"))
{
using (StreamReader reader = new StreamReader("D:\\input1.txt"))
{
string line;
while ((line = reader.ReadLine()) != null)
{
string[] parts = line.Split(' ');
string[] stopWord = new string[] { "is", "are", "am","could","will" };
foreach (string word in stopWord)
{
line = line.Replace(word, "");
tw.Write("+"+line);
}
}
}
}

Try wrapping StreamWriter and StreamReader in using() {} clauses.
using (TextWriter tw = new StreamWriter(#"D:\output.txt")
{
...
}
You may also want to call tw.Flush() at the very end.

Related

How do I remove corrupted data from .csv file?

So I have lots of data but I'm not sure how I remove the corrupt data.
In the file the list is like this:
EMERIE,ESPARZA,166,57,34,BLUE,BLONDE
ADALINE,PARSONS,158,39,£$**),BROWN,GREY
The £$**) represents corrupted data but I don't know how to remove it, I have over 10,000 names and some of them are like this.
Assuming you want to completely remove the corrupted data rows rather than modify them, you could do something like the following:
public void RemoveCorruptData()
{
string path = #"C:\CSV.txt";
string newPath = #"C:\new-CSV.txt";
List<string> lines = new List<string>();
Regex corrupt = new Regex("£$**");
if (File.Exists(path))
{
using (StreamReader reader = new StreamReader(path))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (!corrupt.IsMatch(line))
{
lines.Add(line);
}
}
}
using (StreamWriter writer = new StreamWriter(newpath, false))
{
foreach (String line in lines)
writer.WriteLine(line);
}
}
}

Compare two csv files in C#

I want to compare two csv files and print the differences in a file. I currently use the code below to remove a row. Can I change this code so that it compares two csv files or is there a better way in c# to compare csv files?
List<string> lines = new List<string>();
using (StreamReader reader = new StreamReader(System.IO.File.OpenRead(path)))
{
string line;
while ((line = reader.ReadLine()) != null)
{
if (line.Contains(csvseperator))
{
string[] split = line.Split(Convert.ToChar(scheidingsteken));
if (split[selectedRow] == value)
{
}
else
{
line = string.Join(csvseperator, split);
lines.Add(line);
}
}
}
}
using (StreamWriter writer = new StreamWriter(path, false))
{
foreach (string line in lines)
writer.WriteLine(line);
}
}
Here is another way to find differences between CSV files, using Cinchoo ETL - an open source library
For the below sample CSV files
sample1.csv
id,name
1,Tom
2,Mark
3,Angie
sample2.csv
id,name
1,Tom
2,Mark
4,Lu
METHOD 1:
Using Cinchoo ETL, below code shows how to find differences between rows by all columns
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), ChoDynamicObjectEqualityComparer.Default));
}
sampleDiff.csv
id,name
3,Angie
4,Lu
Sample fiddle: https://dotnetfiddle.net/nwLeJ2
METHOD 2:
If you want to do the differences by id column,
var input1 = new ChoCSVReader("sample1.csv").WithFirstLineHeader().ToArray();
var input2 = new ChoCSVReader("sample2.csv").WithFirstLineHeader().ToArray();
using (var output = new ChoCSVWriter("sampleDiff.csv").WithFirstLineHeader())
{
output.Write(input1.OfType<ChoDynamicObject>().Except(input2.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
output.Write(input2.OfType<ChoDynamicObject>().Except(input1.OfType<ChoDynamicObject>(), new ChoDynamicObjectEqualityComparer(new string[] { "id" })));
}
Sample fiddle: https://dotnetfiddle.net/t6mmJW
If you only want to compare one column you can use this code:
List<string> lines = new List<string>();
List<string> lines2 = new List<string>();
try
{
StreamReader reader = new StreamReader(System.IO.File.OpenRead(pad));
StreamReader read = new StreamReader(System.IO.File.OpenRead(pad2));
string line;
string line2;
//With this you can change the cells you want to compair
int comp1 = 1;
int comp2 = 1;
while ((line = reader.ReadLine()) != null && (line2 = read.ReadLine()) != null)
{
string[] split = line.Split(Convert.ToChar(seperator));
string[] split2 = line2.Split(Convert.ToChar(seperator));
if (line.Contains(seperator) && line2.Contains(seperator))
{
if (split[comp1] != split2[comp2])
{
//It is not the same
}
else
{
//It is the same
}
}
}
reader.Dispose();
read.Dispose();
}
catch
{
}

StreamReader row and line delimiters

I am trying to figure out how to tokenize a StreamReader of a text file. I have been able to separate the lines, but now I am trying to figure out how to break down those lines by a tab delimiter as well. This is what I have so far.
string readContents;
using (StreamReader streamReader = new StreamReader(#"File.txt"))
{
readContents = streamReader.ReadToEnd();
string[] lines = readContents.Split('\r');
foreach (string s in lines)
{
Console.WriteLine(s);
}
}
Console.ReadLine();
string readContents;
using (StreamReader streamReader = new StreamReader(#"File.txt"))
{
readContents = streamReader.ReadToEnd();
string[] lines = readContents.Split('\r');
foreach (string s in lines)
{
string[] lines2 = s.Split('\t');
foreach (string s2 in lines2)
{
Console.WriteLine(s2);
}
}
}
Console.ReadLine();
not really sure if that is what you want, but... it breaks (tab) the already broken (return) lines
Just call Split() on each of the lines and keep them in a List. If you need an array you can always call ToArray() on the list:
string readContents;
using (StreamReader streamReader = new StreamReader(#"File.txt"))
{
readContents = streamReader.ReadToEnd();
string[] lines = readContents.Split('\r');
List<string> pieces = new List<string>();
foreach (string s in lines)
{
pieces.AddRange(s.Split('\t'));
Console.WriteLine(s);
}
}
Console.ReadLine();

Replacing a certain word in a text file

I know this has been asked a few times, but I have seen a lot of regex etc., and I'm sure there is another way to do this with just a stream reader/writer. Below is my code. I'm trying to replace "tea" with the word "cabbage". Can somebody help? I believe I have the wrong syntax.
namespace Week_9_Exer_4
{
class TextImportEdit
{
public void EditorialControl()
{
string fileName;
string lineReadFromFile;
Console.WriteLine("");
// Ask for the name of the file to be read
Console.Write("Which file do you wish to read? ");
fileName = Console.ReadLine();
Console.WriteLine("");
// Open the file for reading
StreamReader fileReader = new StreamReader("C:\\Users\\Greg\\Desktop\\Programming Files\\story.txt");
// Read the lines from the file and display them
// until a null is returned (indicating end of file)
lineReadFromFile = fileReader.ReadLine();
Console.WriteLine("Please enter the word you wish to edit out: ");
string editWord = Console.ReadLine();
while (lineReadFromFile != null)
{
Console.WriteLine(lineReadFromFile);
lineReadFromFile = fileReader.ReadLine();
}
String text = File.ReadAllText("C:\\Users\\Greg\\Desktop\\Programming Files\\story.txt");
fileReader.Close();
StreamWriter fileWriter = new StreamWriter("C:\\Users\\Greg\\Desktop\\Programming Files\\story.txt", false);
string newText = text.Replace("tea", "cabbage");
fileWriter.WriteLine(newText);
fileWriter.Close();
}
}
}
If you don't care about memory usage:
string fileName = #"C:\Users\Greg\Desktop\Programming Files\story.txt";
File.WriteAllText(fileName, File.ReadAllText(fileName).Replace("tea", "cabbage"));
If you have a multi-line file that doesn't randomly split words at the end of the line, you could modify one line at a time in a more memory-friendly way:
// Open a stream for the source file
using (var sourceFile = File.OpenText(fileName))
{
// Create a temporary file path where we can write modify lines
string tempFile = Path.Combine(Path.GetDirectoryName(fileName), "story-temp.txt");
// Open a stream for the temporary file
using (var tempFileStream = new StreamWriter(tempFile))
{
string line;
// read lines while the file has them
while ((line = sourceFile.ReadLine()) != null)
{
// Do the word replacement
line = line.Replace("tea", "cabbage");
// Write the modified line to the new file
tempFileStream.WriteLine(line);
}
}
}
// Replace the original file with the temporary one
File.Replace("story-temp.txt", "story.txt", null);
In the end i used this : Hope it can help out others
public List<string> EditorialResponse(string fileName, string searchString, string replacementString)
{
List<string> list = new List<string>();
using (StreamReader reader = new StreamReader(fileName))
{
string line;
while ((line = reader.ReadLine()) != null)
{
line = line.Replace(searchString, replacementString);
list.Add(line);
Console.WriteLine(line);
}
reader.Close();
}
return list;
}
}
class Program
{
static void Main(string[] args)
{
TextImportEdit tie = new TextImportEdit();
List<string> ls = tie.EditorialResponse(#"C:\Users\Tom\Documents\Visual Studio 2013\story.txt", "tea", "cockrel");
StreamWriter writer = new StreamWriter(#"C:\Users\Tom\Documents\Visual Studio 2013\story12.txt");
foreach (string line in ls)
{
writer.WriteLine(line);
}
writer.Close();
}
}
}

I am having trouble with streamreader in c#

I am having a little trouble with the streamreader.
I am opening emails from the file dialog, and those emails are placed inside a listbox.
each letter in the emails, are on one line, as shown in the picture below.
I want the emails to be on one line, can some one help me, this is giving me a headache.
private void button2_Click(object sender, EventArgs e)
{
OpenFileDialog ofg = new OpenFileDialog();
ofg.Filter = "Text Files|*.txt";
if (ofg.ShowDialog() == System.Windows.Forms.DialogResult.OK)
{
var fileName = ofg.FileName;
StreamReader sr = new StreamReader(File.OpenRead(fileName));
var line = sr.ReadToEnd();
foreach (var l in line)
listBox1.Items.Add(l.ToString());
sr.Dispose();
}
}
var lines = File.ReadAllLines( fileName );
foreach (var l in lines )
{
listBox1.Items.Add( l );
}
assuming that you have
email1#email1.com
email2#email2.com
in your file (this is what I understood from your description).
use this:
string line;
while((line = reader.ReadLine()) != null)
listBox1.Items.Add(line);
Use it as follow:
using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
{
string line;
while ((line = sr.ReadLine()) != null)
{
listBox1.Items.Add(line.ToString());
}
}
This reads all the lines in the file and adds it to the listbox line by line.
string containl chars, so foreach (var l ...) iterates through chars in line.
You should replace your foreach with
foreach( var email in line.Split(' '))
In case your emails separated with spaces.
Another approach would be File.ReadAllLines, in case emails in your file is on separate lines...
using (StreamReader sr = new StreamReader(File.OpenRead(fileName)))
{
while (sr.Peek() >= 0)
{
listBox1.Items.Add(sr.ReadLine());
}
}
Reference: http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline

Categories