I have an application that crunches a bunch of text files. Currently, I have code like this (snipped-together excerpt):
FileInfo info = new FileInfo(...)
if (info.Length > 0) {
string content = getFileContents(...);
// uses a StreamReader
// returns reader.ReadToEnd();
Debug.Assert(!string.IsNullOrEmpty(contents)); // FAIL
}
private string getFileContents(string filename)
{
TextReader reader = null;
string text = "";
try
{
reader = new StreamReader(filename);
text = reader.ReadToEnd();
}
catch (IOException e)
{
// File is concurrently accessed. Come back later.
text = "";
}
finally
{
if (reader != null)
{
reader.Close();
}
}
return text;
}
Why am I getting a failed assert? The FileInfo.Length attribute was already used to validate that the file is non-empty.
Edit: This appears to be a bug -- I'm catching IO exceptions and returning empty-string. But, because of the discussion around fileInfo.Length(), here's something interesting: fileInfo.Length returns 2 for an empty, only-BOM-marker text file (created in Notepad).
You might have a file which is empty apart from a byte-order mark. I think TextReader.ReadToEnd() would remove the byte-order mark, giving you an empty string.
Alternatively, the file could have been truncated between checking the length and reading it.
For diagnostic purposes, I suggest you log the file length when you get an empty string.
See that catch (IOException) block you have? That's what returns an empty string and triggers the assert even when the file is not empty.
If I remember well, a file ends with end of file, which won't be included when you call ReadToEnd.
Therefore, the file size is not 0, but it's content size is.
What's in the getFileContents method?
It may be repositioning the stream's pointer to the end of the stream before ReadToEnd() is called.
Related
I have this kind of text file show below. I want to read this file up to the empty line and shows that lines in the message box and then delete the text lines including the empty line that I have just read and same process will repeats until the whole file has been read. I have just not been able to read a single line though.
**Hi**
**Hello**
**How are you?
I am fine.**
**And how about you?
Me too fine
Whats going on.**
Here is code sample.
StreamReader sr = new StreamReader(fileNameAndPath);
string line;
try
{
while ((line = sr.ReadLine()) != null)
{
if(line.StartsWith(null))
{
MessageBox.Show(line);
}
sr.Close();
}
}
catch
{
MessageBox.Show("Got empty line while reading a file");
}
StartsWith() cant be passed null to.
This should work:
StreamReader sr = new StreamReader(fileNameAndPath);
string line;
try
{
while ((line = sr.ReadLine()) != null)
{
if(line == null || line == "")
{
MessageBox.Show(line);
}
sr.Close();
}
}
catch
{
MessageBox.Show("Got empty line while reading a file");
}
So a couple things:
There is a difference between a null string and an empty string. You can think of a string like a cookie jar (where characters are the cookies)... An empty string (e.g. string.empty or "") is a cookie jar that has no cookies in it. Whereas null means there is no cookie jar at all. When you read a file line, it always returns a string, as long as there is a line to read. So an empty line comes back as an empty string.
With that being said, while it is true that an empty line starts with nothing, it is not checking the exact thing you want to know. Coding solutions that do
almost the same thing" will get you in a lot of trouble later in life. A more appropriate check would be something like string.IsNullOrEmpty(line)
Finally, if you want to "delete part of the file", I would recommend actually you "remove all of the file, and then only write the part you want to remain". So what you should do is read the entire file into a List<string>. Then, remove the items in the List you don't want. After that, overwrite the file with the remaining items in the List. You can read and write to a file at the same time, but I personally find it messy, and I think it couples your logic. But the biggest point (in both cases) to help you look up how to do it is: Deleting info from a file is considered writing to the file. So look up how to write to a file to delete stuff from it.
You can't pass null to StartsWith(). See here: https://msdn.microsoft.com/en-us/library/baketfxw(v=vs.110).aspx
Suggest that what you're after is
if (string.IsNullOrEmpty(line))
{
...
}
I'm trying to convert a file's encoding and replace some text along the way. Unfortunately, I'm getting an OutOfMemory exception. I'm not sure why. As I understand it, it streams the original file line by line into a var (str), completes a couple of string replacements, and then writes the converted line to the StreamWriter.
Can someone tell me what I'm doing wrong here?
EDIT 1
- I'm currently testing a single file - 1GB:2.5m rows.
- Replaced read and replace into a single line. Same results!
EDIT 2
???By the way, can anyone tell me why the question was downgraded? I'd like to know for future postings.???
The problem is with the file itself. It's output from SQL Server BCP where I explicitly flag the row terminator with a specific string. By default, when the row terminator flag is omitted, BCP adds a newline at the end of each row and the code below works perfectly.
What I still don't understand is: when I set the row terminator flag with a specific string, each record appears on a newline, so why doesn't streamreader see each record on a separate line? Instead, it appears it views the entire file as one long line. That still doesn't explain the OOM exception since I have well over a 100G of memory.
Unfortunately, explicitly setting the row terminator flag is a must. For now, I'll take this over to dba exchange.
Thanks
static void Main(string[] args)
{
String msg = String.Empty;
String str = String.Empty;
DirectoryInfo dInfo = new DirectoryInfo(#"\\server\share");
foreach (var f in dInfo.GetFiles())
{
using (StreamReader sr = new StreamReader(f.FullName, Encoding.Unicode, false))
{
using (StreamWriter sw = new StreamWriter(f.DirectoryName + "\\new\\" + f.Name, false, Encoding.UTF8))
{
try
{
while (!sr.EndOfStream)
{
str = sr.ReadLine().Replace("this","that");
sw.WriteLine(str);
}
}
catch (Exception e)
{
msg += f.Name + ": " + e.Message;
}
}
}
}
Console.WriteLine(msg);
Console.ReadLine();
}
Well, you're main reading and writing code needs just one line of data. Your msg string, on the other hand, keeps getting larger and larger with each exception.
You'll need to have many millions of files in the folder to get an OutOfMemory exception this way, though.
I have a use-case where I'm required to read in some information from an XML file and act on it accordingly. The problem is, this XML file is technically allowed to be empty or full of whitespace and this means "there's no info, do nothing", any other error should fail hard.
I'm currently thinking about something along the lines of:
public void Load (string fileName)
{
XElement xml;
try {
xml = XElement.Load (fileName);
}
catch (XmlException e) {
// Check if the file contains only whitespace here
// if not, re-throw the exception
}
if (xml != null) {
// Do this only if there wasn't an exception
doStuff (xml);
}
// Run this irrespective if there was any xml or not
tidyUp ();
}
Does this pattern seem ok? If so, how do people recommend implementing the check for if the file contained only whitespace inside the catch block? Google only throws up checks for if a string is whitespace...
Cheers muchly,
Graham
Well, the easiest way is probably to make sure it isn't whitespace in the first place, by reading the entire file into a string first (I'm assuming it isn't too huge):
public void Load (string fileName)
{
var stream = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read);
var reader = new StreamReader(stream, Encoding.UTF8, true);
var xmlString = reader.ReadToEnd();
if (!string.IsNullOrWhiteSpace(xmlString)) { // Use (xmlString.Trim().Length == 0) for .NET < 4
var xml = XElement.Parse(xmlString); // Exceptions will bubble up
doStuff(xml);
}
tidyUp();
}
I have a strange problem. I am getting a stream of text from a tcp client and writing it to a file. The stream is not fully filled hence while converting it to string the unfilled parts of the byte array are converted to \0 so i finally end up having,
str = "blah foo bar \0\0\0\0\0...";
so what i did is
str = str.trim('\0');
But if i do this then the string is not getting written to a file using stream writer. If i comment the trim line then its getting written along with all the white space characters. Here is my full code
StreamWriter sw = new StreamWriter("c:\\a\\ta.txt");
while (true)
{
try
{
NetworkStream ns = tc.GetStream();
byte[] instream = new byte[tc.ReceiveBufferSize];
Thread.Sleep(2500);
ns.Read(instream, 0, tc.ReceiveBufferSize);
string decodedData = string.Empty;
decodedData = System.Text.Encoding.ASCII.GetString(instream);
decodedData = decodedData.Trim('\0');
//string a = "dfdsfdsfdsfdsf";
//string b = a.Trim('\0');
try
{
sw.Write(decodedData);
//MessageBox.Show(decodedData);
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
Can some one explain me why this is and how i can solve htis out.
oh on debugging i can see that decodedData has the trimmed text neat and clean but i dont know why its not being written to the file.
There are three problems here.
First, you grab the text from the whole array, regardless of how many bytes you actually did receive. Most likely this is the source of your zero characters.
To fix that, change the code as follows:
int actuallyRead = ns.Read(instream, 0, tc.ReceiveBufferSize);
string decodedData = Encoding.ASCII.GetString(instream, 0, actuallyRead);
Secondly, you need to close the stream in order for it to flush its contents. The best way to do that is to wrap it in a using block:
using (StreamWriter sw = new StreamWriter("c:\\a\\ta.txt"))
{
... rest of your code here
}
Thirdly, the code would normally never complete. Add a way for it to complete without relying on exception handling, for instance:
int actuallyRead = ns.Read(instream, 0, tc.ReceiveBufferSize);
if (actuallyRead == 0)
break;
string decodedData = Encoding.ASCII.GetString(instream, 0, actuallyRead);
You're never flushing the writer - I suspect everything's just buffered. You should use a using statement for your StreamWriter, so that it gets disposed when you leave the block. That will then flush the file.
You should also look at the value returned from Stream.Read, and only create a string using the portion of the buffer which has actually been read.
Finally, it's not clear how you expect this to terminate, given that you've got a while(true) loop. You're currently only going to terminate when you get an exception. You should probably terminate if ns.Read returns 0.
Try this:
decodedData = new string(decodedData.ToCharArray());
IIRC, the string constructor will trim trailing NULL terminators.
have you tried...
decodedData = decodedData.Trim(#"\0");
I have a text file that contains about 100000 articles.
The structure of file is:
.Document ID 42944-YEAR:5
.Date 03\08\11
.Cat political
Article Content 1
.Document ID 42945-YEAR:5
.Date 03\08\11
.Cat political
Article Content 2
I want to open this file in c# for processing it line by line.
I tried this code:
String[] FileLines = File.ReadAllText(
TB_SourceFile.Text).Split(Environment.NewLine.ToCharArray());
But it says:
Exception of type
'System.OutOfMemoryException' was
thrown.
The question is How can I open this file and read it line by line.
File Size: 564 MB (591,886,626 bytes)
File Encoding: UTF-8
File contains Unicode characters.
You can open the file and read it as a stream rather than loading everything into memory all at once.
From MSDN:
using System;
using System.IO;
class Test
{
public static void Main()
{
try
{
// Create an instance of StreamReader to read from a file.
// The using statement also closes the StreamReader.
using (StreamReader sr = new StreamReader("TestFile.txt"))
{
String line;
// Read and display lines from the file until the end of
// the file is reached.
while ((line = sr.ReadLine()) != null)
{
Console.WriteLine(line);
}
}
}
catch (Exception e)
{
// Let the user know what went wrong.
Console.WriteLine("The file could not be read:");
Console.WriteLine(e.Message);
}
}
}
Your file is too large to be read into memory in one go, as File.ReadAllText is trying to do. You should instead read the file line by line.
Adapted from MSDN:
string line;
// Read the file and display it line by line.
using (StreamReader file = new StreamReader(#"c:\yourfile.txt"))
{
while ((line = file.ReadLine()) != null)
{
Console.WriteLine(line);
// do your processing on each line here
}
}
In this way, no more than a single line of the file is in memory at any one time.
If you are using .NET Framework 4, there is a new static method on System.IO.File called ReadLines that returns an IEnumerable of string. I believe it was added to the framework for this exact scenario; however, I have yet to use it myself.
MSDN Documentation - File.ReadLines Method (String)
Related Stack Overflow Question - Bug in the File.ReadLines(..) method of the .net framework 4.0
Something like this:
using (var fileStream = File.OpenText(#"path to file"))
{
do
{
var fileLine = fileStream.ReadLine();
// process fileLine here
} while (!fileStream.EndOfStream);
}