I want to detect the encoding of a XML document before parsing it. So I found on stackoverflow this script.
public static XElement GetXMLFromStream(Stream uploadStream)
{
/** Remember position */
var position = uploadStream.Position;
/** Get encoding */
var xmlReader = new XmlTextReader(uploadStream);
xmlReader.MoveToContent();
/** Move to remembered position */
uploadStream.Seek(position, SeekOrigin.Begin); // with "pos" = 0 it not works, too
uploadStream.Seek(position, SeekOrigin.Current); // if I remove this I have the same issue!
/** Read content with detected encoding */
var streamReader = new StreamReader(uploadStream, xmlReader.Encoding);
var streamReaderString = streamReader.ReadToEnd();
return XElement.Parse(streamReaderString);
}
But it doesn't work.
Always I get EndOfStream true. But it isn't!!!! -.-
For example I have the string <test></test>.
Begin: 0, End: 13
If I ReadToEnd or MoveToContent then the end is reached successfully. The EndOfStream is true then.
If I reset the position via Seek or Position to 0 (for example) then a new StreamReader shows always EndOfStream is true.
The thing is that the uploadStream is a stream which I can not close.
It's a SharpZipLib stream of a http upload stream. So I can't close this stream. I can only working with it.
And the bad thing is only because Position and Seek not work... Only because ReadToEnd relays on this Position. - Else it would work. I think!
Maybe you can help my with this situation :-)
Thank you very much in Advance!
Example:
This approach is fundamentally incompatible with some types of input streams. Streams are not required to support Seek at all. In fact, Stream has a property specifically to detect whether Seek is usable, called CanSeek. Code needs to take into account that Seek can fail.
The simple but not very memory-efficient way is to copy your stream's content into a MemoryStream. That one does support Seek, and you can then do whatever you want with it. The fact that you're using ReadToEnd() suggests that the data is not so large that the memory use is going to cause a problem, so you can probably just go with this.
Note: as documented, if Seek is not supported, it's supposed to throw a NotSupportedException. It looks like with the stream implementation you're dealing with, it's not supported, but not properly implemented. I hope at least that CanSeek returns false for you, so you can still reliably detect this.
Option 1:
XElement has a Load() method that will read directly from an xml stream. It will manange the encoding for you internally. And it'll be more efficient by avoid a needless string. So why not use this.
XElement.Load(uploadStream);
Option 2:
If you really want to work with a string, dont use new XmlTextReader(). The XmlTextReader.Create() has more features so do this instead:
var xmlReader = XmlTextReader.Create(uploadStream);
var streamReaderString = xmlReader.ReadOuterXml();
return XElement.Parse(streamReaderString);
Related
Suppose you want to write a .WAV file format writer like so:
using var stream = File.OpenRead("test.wav");
using var writer = new WavWriter(stream, ... /* .WAV format parameters */);
// write the file
// writer.Dispose() does a few things:
// - writes user-added chunks
// - updates the file header (chunk size) so the file is valid
There is a concpetual problem in doing so:
the user can change the stream position and therefore screw the writing process
You may suggest the following:
the writer should own the stream, this would work if writing to a file, but not to a stream
own its own memory stream so it can write to streams too, okay but memory concerns
I guess you get the point...
To me, the only viable thing would be to document that aspect but I may have missed something, hence the question.
Question:
How to make a file format writer be able to write to a stream yet defend yourself about possible changes to its position?
My suggestion would be to keep an internal position field in the WavWriter. Each time you do some operation you can check that this matches the position in the backing stream and throw an exception if it does not. Update this value at the end of each write operation.
Ideally you should also handle streams that does not support seeking, but it does not sound like your design would permit that anyway. It might be a good idea to check CanSeek in the constructor and throw if seek is not supported. It is in general a good idea to validate any arguments before usage.
I have a MemoryStream /BinaryWriter , I use it as following:
memStram = new MemoryStream();
memStramWriter = new BinaryWriter(memStram);
memStramWriter(byteArrayData);
now to read I do the following:
byte[] data = new byte[this.BulkSize];
int readed = this.memStram.Read(data, 0, Math.Min(this.BulkSize,(int)memStram.Length));
My 2 question is:
After I read, the position move to currentPosition+readed , Does the
memStram.Length will changed?
I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new: ;
memStram.Position = 0;
memStram.SetLength(0);
Thanks.
Joseph
No; why should Length (i.e. data size) change on read?
Yes; SetLength(0) is faster: there's no overhead with memory allocation and re-allocation in this case.
1: After I read, the position move to currentPosition+readed , Does the memStram.Length will changed?
Reading doesn't usually change the .Length - just the .Position; but strictly speaking, it is a bad idea even to look at the .Length and .Position when reading (and often: when writing), as that is not supported on all streams. Usually, you read until (one of, depending on the scenario):
until you have read an expected number of bytes, for example via some length-header that told you how much to expect
until you see a sentinel value (common in text protocols; not so common in binary protocols)
until the end of the stream (where Read returns a non-positive value)
I would also probably say: don't use BinaryWriter. There doesn't seem to be anything useful that it is adding over just using Stream.
2: I want to init the stream (like I just create it), can I do the following instead using Dispose and new again, if not is there any faster way than dispose&new:
Yes, SetLength(0) is fine for MemoryStream. It isn't necessarily fine in all cases (for example, it won't make much sense on a NetworkStream).
No the lenght should not change, and you can easily inspect that with a watch variable
i would use the using statement, so the syntax will be more elegant and clear, and you will not forget to dispose it later...
Which one is better : MemoryStream.WriteTo(Stream destinationStream) or Stream.CopyTo(Stream destinationStream)??
I am talking about the comparison of these two methods without Buffer as I am doing like this :
Stream str = File.Open("SomeFile.file");
MemoryStream mstr = new MemoryStream(File.ReadAllBytes("SomeFile.file"));
using(var Ms = File.Create("NewFile.file", 8 * 1024))
{
str.CopyTo(Ms) or mstr.WriteTo(Ms);// Which one will be better??
}
Update
Here is what I want to Do :
Open File [ Say "X" Type File]
Parse the Contents
From here I get a Bunch of new Streams [ 3 ~ 4 Files ]
Parse One Stream
Extract Thousands of files [ The Stream is an Image File ]
Save the Other Streams To Files
Editing all the Files
Generate a New "X" Type File.
I have written every bit of code which is actually working correctly..
But Now I am optimizing the code to make the most efficient.
It is an historical accident that there are two ways to do the same thing. MemoryStream always had the WriteTo() method, Stream didn't acquire the CopyTo() method until .NET 4.
The MemoryStream.WriteTo() version looks like this:
public virtual void WriteTo(Stream stream)
{
// Exception throwing code elided...
stream.Write(this._buffer, this._origin, this._length - this._origin);
}
The Stream.CopyTo() implementation like this:
private void InternalCopyTo(Stream destination, int bufferSize)
{
int num;
byte[] buffer = new byte[bufferSize];
while ((num = this.Read(buffer, 0, buffer.Length)) != 0)
{
destination.Write(buffer, 0, num);
}
}
Stream.CopyTo() is more universal, it works for any stream. And helps programmers that fumble copying data from, say, a NetworkStream. Forgetting to pay attention to the return value from Read() was a very common bug. But it of course copies the bytes twice and allocates that temporary buffer, MemoryStream doesn't need it since it can write directly from its own buffer. So you'd still prefer WriteTo(). Noticing the difference isn't very likely.
MemoryStream.WriteTo: Writes the entire contents of this memory stream to another stream.
Stream.CopyTo: Reads the bytes from the current stream and writes them to the destination stream. Copying begins at the current position in the current stream.
You'll need to seek back to 0, to get the whole source stream copied.
So I think MemoryStream.WriteTo better option for this situation
If you use Stream.CopyTo, you don't need to read all the bytes into memory to start with. However:
This code would be simpler if you just used File.Copy
If you are going to load all the data into memory, you can just use:
byte[] data = File.ReadAllBytes("input");
File.WriteAllBytes("output", data);
You should have a using statement for the input as well as the output stream
If you really need processing so can't use File.Copy, using Stream.CopyTo will cope with larger files than loading everything into memory. You may not need that, of course, or you may need to load the whole file into memory for other reasons.
If you have got a MemoryStream, I'd probably use MemoryStream.WriteTo rather than Stream.CopyTo, but it probably won't make much difference which you use, except that you need to make sure you're at the start of the stream when using CopyTo.
I think Hans Passant's claim of a bug in MemoryStream.WriteTo() is wrong; it does not "ignore the return value of Write()". Stream.Write() returns void, which implies to me that the entire count bytes are written, which implies that Stream.Write() will block as necessary to complete the operation to, e.g., a NetworkStream, or throw if it ultimately fails.
That is indeed different from the write() system call in ?nix, and its many emulations in libc and so forth, which can return a "short write". I suspect Hans leaped to the conclusion that Stream.Write() followed that, which I would have expected, too, but apparently it does not.
It is conceivable that Stream.Write() could perform a "short write", without returning any indication of that, requiring the caller to check that the Position property of the Stream has actually been advanced by count. That would be a very error-prone API, and I doubt that it does that, but I have not thoroughly tested it. (Testing it would be a bit tricky: I think you would need to hook up a TCP NetworkStream with a reader on the other end that blocked forever, and write enough to fill up the wire buffers. Or something like that...)
The comments for Stream.Write() are not quite unambiguous:
Summary:
When overridden in a derived class, writes a sequence of bytes to the current
stream and advances the current position within this stream by the number
of bytes written.
Parameters: buffer:
An array of bytes. This method copies count bytes from buffer to the current stream.
Compare that to the Linux man page for write(2):
write() writes up to count bytes from the buffer pointed buf to the file referred to by the file descriptor fd.
Note the crucial "up to". That sentence is followed by explanation of some of the conditions under which a "short write" might occur, making it very explicit that it can occur.
This is really a critical issue: we need to know how Stream.Write() behaves, beyond all doubt.
The CopyTo method creates a buffer, populates its with data from the original stream and then calls the Write method passing the created buffer as a parameter. The WriteTo uses the memoryStream's internal buffer to write. That is the difference. What is better - it is up to you to decide which method you prefer.
Creating a MemoryStream from a HttpInputStream in Vb.Net:
Dim filename As String = MyFile.PostedFile.FileName
Dim fileData As Byte() = Nothing
Using binaryReader = New BinaryReader(MyFile.PostedFile.InputStream)
binaryReader.BaseStream.Position = 0
fileData = binaryReader.ReadBytes(MyFile.PostedFile.ContentLength)
End Using
Dim memoryStream As MemoryStream = New MemoryStream(fileData)
Very strange behavior
If I create a bufferedstream on top of a file and then seek to an offset I get a block of bytes back
If I move the debugger back to the seek and re-seek I get an extra two characters
I ve triple checked this
Can there possibly be a bug with this class ?
If I reseek back to position I expect to get the same - The file has not changed - I open it in read only mode and I seek based on Origin
Reproduction:
bufferedStream.Seek(100,0, 100)
bufferedStream.Reade(buffer, 0, 100)
is different to what you get from here
bufferedStream.Seek(100,0, 100)
bufferedStream.Reade(buffer, 0, 100)
First off, it is hard to know if they are the same without checking the return value fro Read - is it possible they are just choosing different chunks? (perfectly valid; it is your job to ensure you loop over Read until you have enough data, or EOF).
However, I wonder if a BOM is involved here - especially if you are sitting a text-reader on top of this. Simply, a reader expects the BOM at the start; so it may well hide it the first time through. But if you rewind the stream while using the same reader/decoder, it won't be expecting a BOM, so will try to report it as character data (or throw an error, depending on the configuraion).
I've got a text file that contains several 'records' inside of it. Each record contains a name and a collection of numbers as data.
I'm trying to build a class that will read through the file, present only the names of all the records, and then allow the user to select which record data he/she wants.
The first time I go through the file, I only read header names, but I can keep track of the 'position' in the file where the header is. I need random access to the text file to seek to the beginning of each record after a user asks for it.
I have to do it this way because the file is too large to be read in completely in memory (1GB+) with the other memory demands of the application.
I've tried using the .NET StreamReader class to accomplish this (which provides very easy to use 'ReadLine' functionality, but there is no way to capture the true position of the file (the position in the BaseStream property is skewed due to the buffer the class uses).
Is there no easy way to do this in .NET?
There are some good answers provided, but I couldn't find some source code that would work in my very simplistic case. Here it is, with the hope that it'll save someone else the hour that I spent searching around.
The "very simplistic case" that I refer to is: the text encoding is fixed-width, and the line ending characters are the same throughout the file. This code works well in my case (where I'm parsing a log file, and I sometime have to seek ahead in the file, and then come back. I implemented just enough to do what I needed to do (ex: only one constructor, and only override ReadLine()), so most likely you'll need to add code... but I think it's a reasonable starting point.
public class PositionableStreamReader : StreamReader
{
public PositionableStreamReader(string path)
:base(path)
{}
private int myLineEndingCharacterLength = Environment.NewLine.Length;
public int LineEndingCharacterLength
{
get { return myLineEndingCharacterLength; }
set { myLineEndingCharacterLength = value; }
}
public override string ReadLine()
{
string line = base.ReadLine();
if (null != line)
myStreamPosition += line.Length + myLineEndingCharacterLength;
return line;
}
private long myStreamPosition = 0;
public long Position
{
get { return myStreamPosition; }
set
{
myStreamPosition = value;
this.BaseStream.Position = value;
this.DiscardBufferedData();
}
}
}
Here's an example of how to use the PositionableStreamReader:
PositionableStreamReader sr = new PositionableStreamReader("somepath.txt");
// read some lines
while (something)
sr.ReadLine();
// bookmark the current position
long streamPosition = sr.Position;
// read some lines
while (something)
sr.ReadLine();
// go back to the bookmarked position
sr.Position = streamPosition;
// read some lines
while (something)
sr.ReadLine();
FileStream has the seek() method.
You can use a System.IO.FileStream instead of StreamReader. If you know exactly, what file contains ( the encoding for example ), you can do all operation like with StreamReader.
If you're flexible with how the data file is written and don't mind it being a little less text editor-friendly, you could write your records with a BinaryWriter:
using (BinaryWriter writer =
new BinaryWriter(File.Open("data.txt", FileMode.Create)))
{
writer.Write("one,1,1,1,1");
writer.Write("two,2,2,2,2");
writer.Write("three,3,3,3,3");
}
Then, initially reading each record is simple because you can use the BinaryReader's ReadString method:
using (BinaryReader reader = new BinaryReader(File.OpenRead("data.txt")))
{
string line = null;
long position = reader.BaseStream.Position;
while (reader.PeekChar() > -1)
{
line = reader.ReadString();
//parse the name out of the line here...
Console.WriteLine("{0},{1}", position, line);
position = reader.BaseStream.Position;
}
}
The BinaryReader isn't buffered so you get the proper position to store and use later. The only hassle is parsing the name out of the line, which you may have to do with a StreamReader anyway.
Is the encoding a fixed-size one (e.g. ASCII or UCS-2)? If so, you could keep track of the character index (based on the number of characters you've seen) and find the binary index based on that.
Otherwise, no - you'd basically need to write your own StreamReader implementation which lets you peek at the binary index. It's a shame that StreamReader doesn't implement this, I agree.
I think that the FileHelpers library runtime records feature might help u. http://filehelpers.sourceforge.net/runtime_classes.html
A couple of items that may be of interest.
1) If the lines are a fixed set of characters in length, that is not of necessity useful information if the character set has variable sizes (like UTF-8). So check your character set.
2) You can ascertain the exact position of the file cursor from StreamReader by using the BaseStream.Position value IF you Flush() the buffers first (which will force the current position to be where the next read will begin - one byte after the last byte read).
3) If you know in advance that the exact length of each record will be the same number of characters, and the character set uses fixed-width characters (so each line is the same number of bytes long) the you can use FileStream with a fixed buffer size to match the size of a line and the position of the cursor at the end of each read will be, perforce, the beginning of the next line.
4) Is there any particular reason why, if the lines are the same length (assuming in bytes here) that you don't simply use line numbers and calculate the byte-offset in the file based on line size x line number?
Are you sure that the file is "too large"? Have you tried it that way and has it caused a problem?
If you allocate a large amount of memory, and you aren't using it right now, Windows will just swap it out to disk. Hence, by accessing it from "memory", you will have accomplished what you want -- random access to the file on disk.
This exact question was asked in 2006 here: http://www.devnewsgroups.net/group/microsoft.public.dotnet.framework/topic40275.aspx
Summary:
"The problem is that the StreamReader buffers data, so the value returned in
BaseStream.Position property is always ahead of the actual processed line."
However, "if the file is encoded in a text encoding which is fixed-width, you could keep track of how much text has been read and multiply that by the width"
and if not, you can just use the FileStream and read a char at a time and then the BaseStream.Position property should be correct
Starting with .NET 6, the methods in the System.IO.RandomAccess class is the official and supported way to randomly read and write to a file. These APIs work with Microsoft.Win32.SafeHandles.SafeFileHandles which can be obtained with the new System.IO.File.OpenHandle function, also introduced in .NET 6.