I have a strange problem. I am getting a stream of text from a tcp client and writing it to a file. The stream is not fully filled hence while converting it to string the unfilled parts of the byte array are converted to \0 so i finally end up having,
str = "blah foo bar \0\0\0\0\0...";
so what i did is
str = str.trim('\0');
But if i do this then the string is not getting written to a file using stream writer. If i comment the trim line then its getting written along with all the white space characters. Here is my full code
StreamWriter sw = new StreamWriter("c:\\a\\ta.txt");
while (true)
{
try
{
NetworkStream ns = tc.GetStream();
byte[] instream = new byte[tc.ReceiveBufferSize];
Thread.Sleep(2500);
ns.Read(instream, 0, tc.ReceiveBufferSize);
string decodedData = string.Empty;
decodedData = System.Text.Encoding.ASCII.GetString(instream);
decodedData = decodedData.Trim('\0');
//string a = "dfdsfdsfdsfdsf";
//string b = a.Trim('\0');
try
{
sw.Write(decodedData);
//MessageBox.Show(decodedData);
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
catch (Exception ex)
{
MessageBox.Show(ex.ToString());
}
}
Can some one explain me why this is and how i can solve htis out.
oh on debugging i can see that decodedData has the trimmed text neat and clean but i dont know why its not being written to the file.
There are three problems here.
First, you grab the text from the whole array, regardless of how many bytes you actually did receive. Most likely this is the source of your zero characters.
To fix that, change the code as follows:
int actuallyRead = ns.Read(instream, 0, tc.ReceiveBufferSize);
string decodedData = Encoding.ASCII.GetString(instream, 0, actuallyRead);
Secondly, you need to close the stream in order for it to flush its contents. The best way to do that is to wrap it in a using block:
using (StreamWriter sw = new StreamWriter("c:\\a\\ta.txt"))
{
... rest of your code here
}
Thirdly, the code would normally never complete. Add a way for it to complete without relying on exception handling, for instance:
int actuallyRead = ns.Read(instream, 0, tc.ReceiveBufferSize);
if (actuallyRead == 0)
break;
string decodedData = Encoding.ASCII.GetString(instream, 0, actuallyRead);
You're never flushing the writer - I suspect everything's just buffered. You should use a using statement for your StreamWriter, so that it gets disposed when you leave the block. That will then flush the file.
You should also look at the value returned from Stream.Read, and only create a string using the portion of the buffer which has actually been read.
Finally, it's not clear how you expect this to terminate, given that you've got a while(true) loop. You're currently only going to terminate when you get an exception. You should probably terminate if ns.Read returns 0.
Try this:
decodedData = new string(decodedData.ToCharArray());
IIRC, the string constructor will trim trailing NULL terminators.
have you tried...
decodedData = decodedData.Trim(#"\0");
Related
I use the following code to write log file:
Encoding enc = Encoding.GetEncoding(932);
MemoryStream msLog = new MemoryStream();
StreamWriter swLog = new StreamWriter(msLog, enc);
swLog.WriteLine("Line Number,Error,Additional Information"); //log header
After some complex processing I'd like to know whether there any log line was added except the header. Obviously, one way is to set some boolean variable to true whenever I use swLog.WriteLine(), but because of long and complex code I'd like to avoid this approach. How can I easily check line count of memory stream?
As you noted, there are other better ways to do this. However, here is a direct answer to your question:
First, make sure that the StreamWriter has flushed the data into the stream like this:
swLog.Flush();
Then, you can use the following method to detect if the MemoryStream has more than one line:
private bool HasMoreThanNumberOfLines(Stream stream, Encoding enc, int number_of_lines)
{
long current_position = stream.Position;
stream.Position = 0;
try
{
using(StreamReader sr = new StreamReader(stream, enc, true, 1024, true))
{
for (int i = 0; i < number_of_lines + 1 ; i++)
{
string line = sr.ReadLine();
if (line == null)
return false;
}
}
return true;
}
finally
{
stream.Position = current_position;
}
}
Please note that I am using a special constructor of StreamReader to make sure that it does not close the underlying stream (stream) when it is disposed of.
Notice also how this method saves the current position of the stream, and then restores it after executing its logic so that the StreamWriter would continue to work normally.
You can use this method like this:
var has_another_line = HasMoreThanNumberOfLines(msLog, enc, 1);
Please note that this is not thread-safe. I am assuming that the stream will be accessed by a single thread at any point in time. You would need to put some locks to make it thread-safe.
I'm trying to convert a file's encoding and replace some text along the way. Unfortunately, I'm getting an OutOfMemory exception. I'm not sure why. As I understand it, it streams the original file line by line into a var (str), completes a couple of string replacements, and then writes the converted line to the StreamWriter.
Can someone tell me what I'm doing wrong here?
EDIT 1
- I'm currently testing a single file - 1GB:2.5m rows.
- Replaced read and replace into a single line. Same results!
EDIT 2
???By the way, can anyone tell me why the question was downgraded? I'd like to know for future postings.???
The problem is with the file itself. It's output from SQL Server BCP where I explicitly flag the row terminator with a specific string. By default, when the row terminator flag is omitted, BCP adds a newline at the end of each row and the code below works perfectly.
What I still don't understand is: when I set the row terminator flag with a specific string, each record appears on a newline, so why doesn't streamreader see each record on a separate line? Instead, it appears it views the entire file as one long line. That still doesn't explain the OOM exception since I have well over a 100G of memory.
Unfortunately, explicitly setting the row terminator flag is a must. For now, I'll take this over to dba exchange.
Thanks
static void Main(string[] args)
{
String msg = String.Empty;
String str = String.Empty;
DirectoryInfo dInfo = new DirectoryInfo(#"\\server\share");
foreach (var f in dInfo.GetFiles())
{
using (StreamReader sr = new StreamReader(f.FullName, Encoding.Unicode, false))
{
using (StreamWriter sw = new StreamWriter(f.DirectoryName + "\\new\\" + f.Name, false, Encoding.UTF8))
{
try
{
while (!sr.EndOfStream)
{
str = sr.ReadLine().Replace("this","that");
sw.WriteLine(str);
}
}
catch (Exception e)
{
msg += f.Name + ": " + e.Message;
}
}
}
}
Console.WriteLine(msg);
Console.ReadLine();
}
Well, you're main reading and writing code needs just one line of data. Your msg string, on the other hand, keeps getting larger and larger with each exception.
You'll need to have many millions of files in the folder to get an OutOfMemory exception this way, though.
I need to read the first line from a stream to determine file's encoding, and then recreate the stream with that Encoding
The following code does not work correctly:
var r = response.GetResponseStream();
var sr = new StreamReader(r);
string firstLine = sr.ReadLine();
string encoding = GetEncodingFromFirstLine(firstLine);
string text = new StreamReader(r, Encoding.GetEncoding(encoding)).ReadToEnd();
The text variable doesn't contain the whole text. For some reason the first line and several lines after it are skipped.
I tried everything: closing the StreamReader, resetting it, calling a separate GetResponseStream... but nothing worked.
I can't get the response stream again as I'm getting this file from the internet, and redownloading it again would be bad performance wise.
Update
Here's what GetEncodingFromFirstLine() looks like:
public static string GetEncodingFromFirstLine(string line)
{
int encodingIndex = line.IndexOf("encoding=");
if (encodingIndex == -1)
{
return "utf-8";
}
return line.Substring(encodingIndex + "encoding=".Length).Replace("\"", "").Replace("'", "").Replace("?", "").Replace(">", "");
}
...
// true
Assert.AreEqual("windows-1251", GetEncodingFromFirstLine(#"<?xml version=""1.0"" encoding=""windows-1251""?>"));
** Update 2 **
I'm working with XML files, and the text variable is parsed as XML:
var feedItems = XElement.Parse(text);
Well you're asking it to detect the encoding... and that requires it to read data. That's reading it from the underlying stream, and you're then creating another StreamReader around the same stream.
I suggest you:
Get the response stream
Retrieve all the data into a byte array (or MemoryStream)
Detect the encoding (which should be performed on bytes, not text - currently you're already assuming UTF-8 by creating a StreamReader)
Create a MemoryStream around the byte array, and a StreamReader around that
It's not clear what your GetEncodingFromFirstLine method does... or what this file really is. More information may make it easier to help you.
EDIT: If this is to load some XML, don't reinvent the wheel. Just give the stream to one of the existing XML-parsing classes, which will perform the appropriate detection for you.
You need to change the current position in the stream to the beginning.
r.Position = 0;
string text = new StreamReader(r, Encoding.GetEncoding(encoding)).ReadToEnd();
I found the answer to my question here:
How can I read an Http response stream twice in C#?
Stream responseStream = CopyAndClose(resp.GetResponseStream());
// Do something with the stream
responseStream.Position = 0;
// Do something with the stream again
private static Stream CopyAndClose(Stream inputStream)
{
const int readSize = 256;
byte[] buffer = new byte[readSize];
MemoryStream ms = new MemoryStream();
int count = inputStream.Read(buffer, 0, readSize);
while (count > 0)
{
ms.Write(buffer, 0, count);
count = inputStream.Read(buffer, 0, readSize);
}
ms.Position = 0;
inputStream.Close();
return ms;
}
I have an application that crunches a bunch of text files. Currently, I have code like this (snipped-together excerpt):
FileInfo info = new FileInfo(...)
if (info.Length > 0) {
string content = getFileContents(...);
// uses a StreamReader
// returns reader.ReadToEnd();
Debug.Assert(!string.IsNullOrEmpty(contents)); // FAIL
}
private string getFileContents(string filename)
{
TextReader reader = null;
string text = "";
try
{
reader = new StreamReader(filename);
text = reader.ReadToEnd();
}
catch (IOException e)
{
// File is concurrently accessed. Come back later.
text = "";
}
finally
{
if (reader != null)
{
reader.Close();
}
}
return text;
}
Why am I getting a failed assert? The FileInfo.Length attribute was already used to validate that the file is non-empty.
Edit: This appears to be a bug -- I'm catching IO exceptions and returning empty-string. But, because of the discussion around fileInfo.Length(), here's something interesting: fileInfo.Length returns 2 for an empty, only-BOM-marker text file (created in Notepad).
You might have a file which is empty apart from a byte-order mark. I think TextReader.ReadToEnd() would remove the byte-order mark, giving you an empty string.
Alternatively, the file could have been truncated between checking the length and reading it.
For diagnostic purposes, I suggest you log the file length when you get an empty string.
See that catch (IOException) block you have? That's what returns an empty string and triggers the assert even when the file is not empty.
If I remember well, a file ends with end of file, which won't be included when you call ReadToEnd.
Therefore, the file size is not 0, but it's content size is.
What's in the getFileContents method?
It may be repositioning the stream's pointer to the end of the stream before ReadToEnd() is called.
I came across this piece of code today:
public static byte[] ReadContentFromFile(String filePath)
{
FileInfo fi = new FileInfo(filePath);
long numBytes = fi.Length;
byte[] buffer = null;
if (numBytes > 0)
{
try
{
FileStream fs = new FileStream(filePath, FileMode.Open);
BinaryReader br = new BinaryReader(fs);
buffer = br.ReadBytes((int)numBytes);
br.Close();
fs.Close();
}
catch (Exception e)
{
System.Console.WriteLine(e.StackTrace);
}
}
return buffer;
}
My first thought is to refactor it down to this:
public static byte[] ReadContentFromFile(String filePath)
{
return File.ReadAllBytes(filePath);
}
System.IO.File.ReadAllBytes is documented as:
Opens a binary file, reads the
contents of the file into a byte
array, and then closes the file.
... but am I missing some key difference?
The original code returns a null reference if the file is empty, and won't throw an exception if it can't be read. Personally I think it's better to return an empty array, and to not swallow exceptions, but that's the difference between refactoring and redesigning I guess.
Oh, also, if the file length is changed between finding out the length and reading it, then the original code will read the original length. Again, I think the File.ReadAllBytes behaviour is better.
What do you want to happen if the file doesn't exist?
That's basically the same method if you add the try {...} catch{...} block. The method name, ReadContentFromFile, further proves the point.
What a minute... isn't that something a unit test should tell?
In this case, no you are not missing anything at all. from a file operation standpoint. Now you know that your lack of exception handling will change the behavior of the system.
It is a streamlined way of reading the bytes of a file.
NOTE: that if you need to set any custom options on the read, then you would need the long form.